Clea Siguret

Charbonnières-Les-Vieilles, Puy-de-Dôme, France · clea.siguret@gmail.com

I am a bioinformatics research engineer with a strong background in data science, comparative genomics, and evolutionary biology. With a Ph.D. in Bioinformatics and several years of experience in both academic and industrial environments, I specialise in developing scalable, reproducible workflows (e.g., Intergalactic Workflow Commission workflows with DOIs), integrating and deploying analysis tools, and delivering user-friendly solutions on platforms such as Galaxy. Working at the interface between biology and computing, I combine scientific expertise with operational skills in Linux systems (administration, troubleshooting), workflow automation and containerisation (Conda, Docker/Singularity), and technical documentation for end-users and developers.

I also contribute to the reliable operation of research computing environments by supporting deployment and maintenance activities (Galaxy servers, HPC/Slurm environments, data processing pipelines), implementing reproducibility and quality assurance practices (CI testing, code versioning, FAIRification), and providing training to research teams (Galaxy Training Academy, FAIR-Bioinfo, GTN tutorials). My involvement in user support, training, and workflow curation helps laboratories adopt best practices in data and software management. Curious, autonomous, and quick to learn new technologies, I thrive in collaborative environments dedicated to delivering robust, well-documented, and sustainable research services.



Professional Experience

Bioinformatics Research Engineer (Fixed-term Contract)

Institut Français de Bioinformatique (CNRS), Plateforme AuBi (UCA), Aubière

At the IFB, I develop, integrate and maintain genomic workflows and tools for microbial genomics on Galaxy for the ABRomics project, while at the AuBi platform I help for the administration of Galaxy services, and ensure user support and the coordination of R-Shiny application deployment for local research teams.

  • Development, documentation and restructuring of ABRomics genomic workflows on Galaxy, including modular pipelines for QC, genome assembly, annotation and AMR detection; publication of validated workflows via the Intergalactic Workflow Commission (IWC).
  • Integration and enhancement of ABRomics tools in Galaxy such as ToolDistillator and CoreProfiler (wrappers, Data Managers, CI testing), and adaptation of community tools (StarAMR, CheckM2) to improve reproducibility and pipeline reliability.
  • Technical support for the AuBi Galaxy service: assistance with installation and configuration of tools, troubleshooting, documentation for local research teams using the platform.
  • Initiation and animation of the R-Shiny project at AuBi: survey of user needs, organisation of collaborative workshops, and support for hosting and improving laboratory applications on the mesocentre infrastructure.
Supervisor: Nadia GOUÉ

Since April 2024

Bioinformatics Research Engineer (Postdoctoral)

PaleoEVO team, GDEC - UMR 1095 INRAE/UCA, Clermont-Ferrand

During this postdoctoral position, I contributed to several large-scale comparative genomics and evolutionary biology projects while also strengthening the team’s computational workflows and analysis capacity.

  • Design and optimisation of large-scale comparative genomics pipelines, including automated analyses and reproducible workflows for evolutionary studies.
  • Management and processing of high-volume datasets, notably the comparative analysis of 271 vertebrate genomes, with a focus on genome evolution and functional divergence.
  • Development of visualisation tools for analysing gene flow and orthologous SNPs across populations (Whealbi Project), facilitating interpretation of complex evolutionary patterns.
  • Study of gymnosperm evolutionary history, integrating phylogenetic, synteny and genome structure analyses.
  • Development of a method for selecting EMS mutants to support translational research into agronomically relevant traits.
  • Collaboration with multidisciplinary teams (genomics, evolutionary biology, bioinformatics) and contribution to computational research infrastructure, including workflow reliability and data-processing strategies.
Supervisor: Jérôme SALSE

October 2023 – March 2024

Ph.D. in Bioinformatics - Specialization in Data Science, Genomics, Evolution

PaleoEVO team, GDEC - UMR 1095 INRAE/UCA, Clermont-Ferrand

During my Ph.D., I carried out large-scale comparative genomics projects on flowering plants and gymnosperms, while developing reproducible computational workflows and robust analytical pipelines to support long-term evolutionary research.

  • Development and execution of scalable comparative genomics workflows for large datasets, including 84 angiosperm genomes and multiple gymnosperm species.
  • Study of genome evolution in flowering plants: gene conservation, genome structure, functional divergence, and translational research applications.
  • Comparative genomics of gymnosperms, integrating phylogenetic, synteny, and structural genome analyses.
  • Identification and analysis of orthologous SNPs in Triticeae (Whealbi Project) to characterise population structure and evolutionary signals.
  • Data organisation, metadata tracking and implementation of reproducible analysis practices for multi-year research projects.
Supervisor: Jérôme SALSE

June 2020 – September 2023

Bioinformatician (Fixed-term Contract)

Integration of NGS Analysis Pipelines on a GALAXY Interface

BIOGEMMA/LIMAGRAIN, Centre de Recherche de Chappes

During this position, I contributed to the deployment, optimisation, and operational maintenance of the company’s production bioinformatics pipelines within the Limagrain Galaxy platform. I worked closely with researchers, engineers, and IT teams to improve performance, reliability, and usability of analysis workflows.

  • Integration and maintenance of NGS pipelines on the Limagrain Galaxy platform, ensuring workflow reliability and reproducibility.
  • Expansion and optimisation of the tool portfolio to improve overall pipeline efficiency and analysis performance.
  • Galaxy administration: deployment of new features, configuration updates, performance monitoring, and troubleshooting.
  • User support: training sessions, issue resolution, and technical assistance for researchers and technicians.
  • Writing and updating documentation for tools, workflows, and platform usage.
Supervisors: Magalie LEVEUGLE, Frédéric SAPET

October 2018 – March 2020

Bioinformatician (Fixed-term Contract)

NGS Data Analysis (Sequencing Genotyping, RNA-seq, Whole Genome Shotgun, TILLING)

BIOGEMMA, Centre de Recherche de Chappes

In this role, I was responsible for running, monitoring, and optimising production NGS pipelines to support internal research and breeding programmes.

  • Execution and monitoring of production pipelines (RNA-seq, WGS, genotyping, TILLING) on plant and bacterial datasets.
  • Preparation of analysis reports and quality control at each stage of the workflow, ensuring reliable and traceable deliverables.
  • Pipeline adjustment and optimisation according to project needs and technological updates to improve speed and analysis quality.
  • Handling of technical incidents and coordination with laboratory and bioinformatics teams to ensure operational continuity.
Supervisors: Magalie LEVEUGLE, Frédéric SAPET, Jorge DUARTE

September 2017 – June 2018

Research internship (Master 2)

Construction of networks to analyse gene flow between ecosystems, with a focus on antibiotic resistance genes.

LMGE – UMR CNRS 6023, Aubière

During this internship, I conducted a large-scale meta-analysis of metagenomic datasets (viruses and prophages) and genomic sequences (Bacteria, Archaea, plasmids) to investigate the role of viruses in the dissemination of antibiotic resistance genes.

  • Construction of gene flow networks to analyse the transfer of antibiotic resistance genes across biological entities and ecosystems (marine, freshwater, soil, human and animal microbiomes).
  • Identification and clustering of genes using sequence assembly and annotation pipelines; detection of viral and bacterial signatures.
  • Execution of phylogenetic and functional analyses to characterise evolutionary relationships and gene transfer events.
  • Integration and visualisation of interaction networks using Cytoscape to highlight potential transmission routes of resistance genes.
  • Development and automation of analysis scripts in Perl, Bash and R, ensuring reproducibility.
Supervisors: Didier DEBROAS

January 2017 – June 2017

Research internship (Master 1)

Identification of CRISPR sequences in bacterial and archaeal metagenomes from a lacustrine environment.

LMGE – UMR CNRS 6023, Aubière

This internship aimed to study virus–host interactions in a lacustrine ecosystem through the detection and analysis of CRISPR elements in microbial metagenomes.

  • Detection and extraction of CRISPR arrays from bacterial and archaeal metagenomes using automated tools.
  • Establishment of virus–host pairs by comparing CRISPR spacer sequences with viral metagenomic data.
  • Use of sequence similarity searches (BLAST+) to link microbial spacers with viral genomes and identify potential infection relationships.
  • Functional and taxonomic analysis of identified CRISPR loci and associated viral matches.
  • Development of scripts in Perl, Bash and use of R for data filtering, automation and result visualisation.
Supervisors: Viviane RAVET

May 2016 – July 2016

Research Internship (Undergraduate – Organismal Biology)

Statistical analysis of habitats in the CASDAR ATOUS agro-ecology project

CPIE Pays Basque, Saint-Étienne-de-Baïgorry

This internship focused on the ecological characterisation of grassland habitats within the CASDAR ATOUS project, centred on the agro-ecology of forage meadows. As part of my internship work, I contributed the statistical analyses included in the following technical report:

J.M. Arranz, C. Artano-Garmendia, N. Bernos, M. Charbonneau, P. Gascouat, et al. Les prairies permanentes basco-béarnaises. Caractériser la diversité, les modes d'utilisation et les services écosystémiques. SET, UPPA, 2015. HAL: hal-02181104

  • Use of R to perform multivariate analyses, including Correspondence Analysis (AFC) and matrix diagonalisation to identify habitat typologies.
  • Integration and interpretation of ecological descriptors using databases such as CORINE Biotope, Baseflor and NATURA 2000 Habitat guidelines.
  • Construction of habitat classification schemes based on species composition and environmental variables.
  • Contribution to the ecological assessment of agricultural grasslands within an applied agro-ecology framework.
Supervisor: Philippe INARRA

March 2015 – May 2015

Education

Ph.D. in Bioinformatics - Specialization in Data Science, Genomics, Evolution

Understanding the Paleo-evolution of Flowering Plants for Modern Agriculture Challenges

Université Clermont-Auvergne, Clermont-Ferrand, France

Methodology: Comparative genomics of 84 species of flowering plants

  • Understanding the Evolution of Angiosperm Genomes:
    • Identification of gene functions conserved among Angiosperms and within botanical families.
    • Characterization of key drivers of gene sequence evolution in plants.
    • Dating the emergence and polyploidy events within Angiosperms.
    • Analysis of the impact of polyploidy events on genes and associated functions.
    • Study of life history traits in Angiosperms: identification of genes specific to species groups.
  • Translational Research of Knowledge from a Model or Key Plant to an Agronomically Important Plant:
    • Cataloging functionally validated genes in model plants related to agronomic traits of interest.
    • Characterization of potentially transferable and exploitable conserved traits among Angiosperms.
    • Selection and validation strategy for genes potentially involved in desirable traits in common wheat.
    • Identification of genes and agronomic traits impacted by life history traits.

2020–2023

Master in Genetics and Physiology, Bioinformatics - Specialization in Data Analysis and Modeling

Université Clermont-Auvergne, Clermont-Ferrand, France

This Master’s program offers an integrated approach to understanding the functioning of multicellular organisms, combining expertise in genetics, bioinformatics, endocrinology, oncology-related signaling, and neurophysio-pharmacology.

The Data Analysis and Modeling track strengthens interdisciplinary skills at the interface of biology, computer science, and mathematics. It provides solid foundations in genetics and physiology, information systems, high-performance and distributed computing, data mining, and statistics. Students are trained in key bioinformatics methods such as biological network analysis, molecular modeling, and large-scale data processing.

The program develops advanced competencies in analytical methods and in the design and implementation of bioinformatics tools, with applications spanning major genomics and post-genomics projects.

Key skills: Data Analysis, Bioinformatics, Evolutionary Biology, Biostatistics, Comparative Genomics

2015–2017

License in Organismal Biology

Université de Pau et des Pays de l'Adour, Pau/Anglet, France

This Licence’s program is designed for students interested in organismal biology, population biology, ecosystem functioning, and more broadly, environmental sciences. It provides fundamental knowledge in biochemistry, cell and molecular biology, genetics, microbiology, and animal and plant physiology. Statistical training reinforces quantitative skills for processing and analysing biological data.

Throughout the three-year curriculum, students develop core competencies in evolutionary biology, population genetics, physiology, microbiology, and biostatistics, forming a solid foundation for advanced studies in life sciences and bioinformatics.

Key skills: Evolutionary Biology, Animal/Plant Physiology, Biostatistics, Population Genetics, Microbiology

2012–2015

Additional Training

  • FAIR Practices in Bioinformatics (30h) – AuBi Platform, UCA (July 2023)
    Versioning and Code Sustainability (Git), Environment Management (conda, Docker/singularity), Workflow Management (snakemake), Code Documentation (Rmarkdown, jupyter). (Training)
  • PSC1 – Civil Security Certificate (7h) – INRAE / UCA (June 2022) (Prevention)
  • Skills specific to the researcher profession (56h) – UCA (April 2022)
    Quality approach in research activities (14h) ; Mastering strategic information: a challenge for research (14h) ; Research funding: call for projects (28h). (Training)
  • English Certificate (26h) – UCA (June 2021)
    Level B2. (Certification)
  • Management and entrepreneurship: Project management introduction (14h) – UCA (May 2021) (Training)
  • Python 3: From Fundamentals to Advanced Concepts in the Language (81h) – MOOC (Dec. 2020) (Certification)
  • Statistics with R (25h) – MOOC (Oct. 2020) (Certification)

Skills

Programming Languages & Tools
Bioinformatics
  • Genome annotation, sequence alignment
  • Comparative genomics, phylogeny, clustering
  • Statistical analysis, data manipulation
  • Environment & workflow managers
Platforms
  • GNU/Linux, Windows
  • Cluster computing (PBS, Slurm)
Soft Skills
  • Teamwork, Planning, Troubleshooting
  • Decision-making, Persuasion, Conflict resolution
Languages
  • French (Native)
  • English (Proficient)

Interests

Sewing

  • Accessories: buttons, patches, bags
  • Garment creation from scratch
Relaxing and expressive hobby

Sports

  • Former rugby player
  • Now exploring:
    • Gym & Pilates
    • Yoga
    • 🏸Badminton
Focus on fun & wellbeing

Reading

  • Manga, comics, and novels
  • Genres:
    • Fantasy
    • Romance
    • Science Fiction
    • Adventure
Imagination & storytelling

Cooking

  • Sweet and savory dishes
  • Creative cooking & baking
Sharing food and joy

Supervision & Training Activities

  • Galaxy Training Network – Tutorials Authored & Edited
    • Cléa Siguret, Nadia Goué, Bérénice Batut. Annotate, prepare tests and publish on workflow registries Galaxy workflows. (Tutorial) (Author)
    • Bérénice Batut. Checking expected species and contamination in bacterial isolate. (Tutorial) (Editor)
    • Bérénice Batut. Building an amplicon sequence variant (ASV) table from 16S data using DADA2. (Tutorial) (Editor)
    • Bérénice Batut et al. Reference-based RNA-Seq data analysis. (Tutorial) (Editor)
    • Bérénice Batut, Cléa Siguret. Quality and contamination control in bacterial isolate using Illumina MiSeq data. (Tutorial) (Author)
  • Courses Delivered
    • June 2025 – FAIR-Bioinfo (1 week): Introduction to FAIR principles and reproducibility in bioinformatics, covering code versioning (Git), environment management (conda, Docker/Singularity), workflow languages (Snakemake, Galaxy), and literate programming tools (Markdown, Quarto, Jupyter). (Program)
    • May 2025 – Galaxy Training Academy (1 week): Trainer for the Bacterial genomics & AMR detection track. (Overview)
    • May 2025 – Printemps de la donnée (half day): Open science & FAIRification of Galaxy workflows. (Program)
    • Oct. 2024 – Galaxy Training Academy (1 week): Trainer for the Bacterial genomics & AMR detection track. (Overview)
    • Sep. 2024 – Galaxy Metagenomics (half day): Introduction to taxonomic profiling & microbial community visualization. (Program)
    • July 2024 – Galaxy Transcriptomics (1 day): Introduction to RNA-seq analysis. (Program)
    • June 2024 – Galaxy Metabarcoding (half day): 16S metabarcoding analysis. (Program)
    • May 2024 – Genome Annotation (half day): Bacterial genome annotation with Galaxy. (Program)
    • April 2024 – Galaxy QC & Mapping (half day): Sequencing QC & alignment to reference genome. (Program)
  • Event Organization
    • Dec. 2024 – AuBi platform animation day: The 2024 edition featured seminars and discussions focused on national and European bioinformatics infrastructures, while showcasing the tools and research developed by the AuBi platform. Co-organized with first-year Master’s students in Bioinformatics, the event fostered interaction between students, researchers, and academic partners. (Program)
    • April 2023 – Young Researchers' Days of the INRAE ​​Plant Biology and Breeding Department: For three days, INRAE PhD students had the opportunity to present the progress of their thesis work through oral presentations or posters.
    • May 2021 – Scientific Days of the Doctoral School of Life Sciences, Health, Agronomy, and the Environment: The 2021 JED SVSAE allowed PhD students to present their research progress and concluded with a keynote by Prof. Eric Delaporte on “The One Health Approach in the Fight Against Emerging Diseases. (Advertisement) (Closing conference)
  • Supervision & Jury Duties
    • May–July 2025: Supervision of Master 1 students in Bioinformatics: This internship focuses on the "Benchmarking and Validation of Galaxy Genomic Workflows" as part of the ABRomics project.
    • Since 2021: Member of the Bioinformatics Master internship jury.
  • Community Engagement
    • Since 2024 – Co-coordinator of JeBiF PUB Clermont node: Organization of 3 community events (Oct 2024, Dec 2024, May 2025) that brought together students and professionals in bioinformatics.
    • Since 2024 – Galaxy Training Network Contributor: Actively involved in the Galaxy AuBi and France platforms, and closely following developments on Galaxy Europe, I contribute by developing workflows for the Intergalactic Workflow Commission (IWC), tools for the Intergalactic Utilities Commission (IUC), and authoring training tutorials. (Hall of Fame)

Software & Workflow Contributions


Publications

  • Siguret C., Olivier M., Huneau C., Sow M. D., Klopp C., Martin M-L., Tamby J-P., Civan P., Pont C., Mathieu O., Salse J. Plant Ancestral Genomes for Translational Research of key Traits and Processes in Modern Crops.
    Submission ongoing (Nature Communications)
  • Sow M. D., Forestan C., Pont C., Civan P., Battaglia R., Seidel M., Siguret C., Luca Curci P., Tondelli A., Bustos Korts D., Mazzucotelli E., Leroy T., Huneau C., Delayhe M., ... Salse J. Striking convergent selection history of wheat and barley and its potential for breeding.
    Nature Plants, 11, 2268–2285 (2025) – doi:10.1038/s41477-025-02128-0
  • Nasr E., Amato P., Bhardwaj A., Blankenberg D., Brites D., Cumbo F., Do K., Ferrari E., Griffin T., Grüning B., Hiltemann S., Jagtap P., Mehta S., Métris K., Momin S., Oba A., Pavloudi C., Pechlivanis N., Péguilhan R., Psomopoulos F., Rosic N., Schatz M., Schiml V., Siguret C., Soranzo N., Stubbs A., van Heusden P., Vohra M., Zierep P., Batut B. microGalaxy: A gateway to tools, workflows, and training for reproducible and FAIR analysis of microbial data
    BioRxiv (2025) – doi:10.1101/2024.12.23.629682
    Submission ongoing (Communications Biology)
  • Debroas D., Siguret C. Viruses as key reservoirs of antibiotic resistance genes in the environment.
    The ISME Journal, 13(11), 2019 – doi:10.1038/s41396-019-0478-9

Talks & Posters

  • Batut B., Siguret C., Serville H., Piot G., Wawrzyniak I. , El Alaoui H., Delbac F., Goué N. Building a standardized database for honey bee microbiome: Addressing metadata and data comparability gaps.
    JOBIM 2025, Bordeaux, France ⟨hal-05132669⟩
  • Siguret C., Batut B., Goué N. ATELIER - Dépôt et curation de données - Communauté Biologie et Santé (plateforme de bioanalyses Galaxy)
    Printemps de la donnée 2025 de l'Université Clermont-Auvergne, Clermont-Ferrand, France ⟨hal-05095680⟩
  • Siguret C. ABRomics : Une plateforme basée sur Galaxy pour la recherche sur la résistance aux antibiotiques et la santé publique.
    Journée d’animation annuelle de la plateforme AuBi 2024, Clermont-Ferrand, France ⟨hal-05084079⟩
  • Ogereau F., Hiriart M., Marin P., Batut B., Siguret C., Ruiz P., Desset S., Pouchin P., Souc F., Paulhe N., Giacomoni F., Grimbichler D, Bellembois T., Legué V., Peyret P., Mahul A., Goué N. FAIRing Research Data to Live @AuBi Platform.
    JOBIM 2024, Toulouse, France ⟨hal-04643973⟩
  • Lao J., Tackx R., Marin P., Dieuaide A., Mignon T., Batut B., Siguret C. , Dallet R., Hillion K.-H., Goué N., Ruppé E., Corguillé G. L., Glaser P., Mareuil F., Médigue C. The ABRomics platform — a One Health Antimicrobial resistance analysis service.
    JOBIM 2024, Toulouse, France ⟨hal-04618645⟩
  • Siguret C., Huneau C., Sow M. D., Salse J. Consequences of polyploidy events on angiosperm genome evolution.
    Journées Jeunes Chercheurs 2023 du département de Biologie et Amélioration des Plantes de l'INRAE, Clermont-Ferrand, France ⟨hal-04620735⟩
  • Siguret C., Salse J. Recurrences and consequences of polyploidy events on gene evolution in angiosperms: Comparative analysis of 80 genomes.
    Journées scientifiques de l'École Doctorale des Sciences de la Vie, de la Santé, de l’Agronomie, et de l’Environnement 2022, Clermont-Ferrand, France ⟨hal-04622228⟩
  • Siguret C., Huneau C., Salse J. L’origine des plantes à fleurs (angiospermes).
    Journées Portes Ouvertes du Centre INRAE Clermont-Auvergne-Rhône-Alpes, May 2022, Clermont-Ferrand, France ⟨hal-05411005⟩
  • Siguret C. Les réseaux pour analyser les flux de gènes au sein des écosystèmes : application à l’étude des gènes de résistance aux antibiotiques.
    Journées Scientifiques 2017 – Colloque 14 : Journée Réseaux du GDR Génomique Environnementale, Nantes, France