ERGA

Overview
Related items

Welcome to the ERGA Space!

Here we collect, curate and develop pipelines to assemble and annotation reference-quality genomes for all eukaryotic life.

For Genome Assembly pipelines head to our Assembly Team

For Genome Annotation pipelines head to our Annotation Team

Development, discussions and issue tracking takes place in the ERGA github repo

If you would like to join our journey, why not become an ERGA member to join one of our committees and be kept up to date with our developments.

We aim to democratise the task of reference genome generation for all life.

Web page: https://www.erga-biodiversity.eu/

Funding details:

No funding details specified

Discussion Channel

ERGA Pipeline Discussions Channel

Related items

Advanced People list for this Space with search and filtering

Mahesh Binzer-Panchal

Teams: NBIS, ERGA Assembly

Organizations: NBIS – National Bioinformatics Infrastructure Sweden

https://orcid.org/0000-0003-1675-0677

Expertise: Bioinformatics, Genomics, Scientific workflow developement, Workflows

Tools: Nextflow, nf-core

I'm a bioinformatician for the National Bioinformatics Infrastrure Sweden. I specialise in de novo genome assembly and workflow development with Nextflow. I'm also a Nextflow ambassador and nf-core maintainer.

Tom Brown

Teams: ERGA Assembly, ERGA Annotation, Biodiversity Genomics Europe (general), ERGA Analysis

Organizations: Leibniz Institute for Zoo and Wildlife Research

https://orcid.org/0000-0001-8293-4816

Solenne Correard

Teams: ERGA Assembly, Galaxy Training Network

Organizations: Inria, IRISA

https://orcid.org/0000-0002-0554-5443

Diego De Panis

Teams: ERGA Assembly

Organizations: IZW

https://orcid.org/0000-0002-3679-9585

Phuong Doan

Teams: ERGA Annotation, Bioinformatics Laboratory for Genomics and Biodiversity (LBGB)

Organizations: Genoscope

https://orcid.org/0000-0002-6621-9908

Expertise: Bioinformatics

Tools: Nextflow, Python, R, Genetic analysis, Single Cell analysis

Valentina Galeone

Teams: ERGA Assembly

Organizations: IZW

Jessica Gomez-Garrido

Teams: ERGA Assembly, ERGA Annotation

Organizations: CNAG

https://orcid.org/0000-0001-6409-8009

Benjamin Istace

Teams: Bioinformatics Laboratory for Genomics and Biodiversity (LBGB), ERGA Assembly

Organizations: Genoscope

https://orcid.org/0000-0003-1042-4803

Sagane Joye-Dind

Teams: ERGA Assembly, ERGA Annotation

Organizations: University of Lausanne

https://orcid.org/0000-0003-4771-6113

Rafał Wóycicki

Teams: ERGA Assembly, ERGA Annotation

Organizations: Applied Omics Wóycicki

https://orcid.org/0000-0001-7991-7150

Advanced Teams list for this Space with search and filtering

ERGA Analysis

A collection of workflows developed by members of the ERGA community for data analysis based on reference genomes.

Space: ERGA

Public web page: https://www.erga-biodiversity.eu/team-1/dac---data-analysis-committee

Organisms: Not specified

ERGA Assembly

A collection of workflows and pipelines developed as part of the ERGA consortium

Space: ERGA

Public web page: https://www.erga-biodiversity.eu/

Organisms: Not specified

ERGA Annotation

A collection of workflows designed to annotate elements of the genome. These include repeat regions, protein-coding genes, ncRNA, miRNA.

Space: ERGA

Public web page: Not specified

Organisms: Not specified

Advanced Organizations list for this Space with search and filtering

Applied Omics Wóycicki

ROR ID: Not specified

Department: Not specified

Country: Poland

City: Kraków

Web page: https://www.appliedomics.com

CNAG

ROR ID: Not specified

Department: Not specified

Country: Spain

City: Barcelona

Web page: Not specified

Genoscope

ROR ID: Not specified

Department: Not specified

Country: France

City: Paris

Web page: https://jacob.cea.fr/drf/ifrancoisjacob/Pages/Departements/Genoscope.aspx

Inria

ROR ID: Not specified

Department: Not specified

Country: France

City: Not specified

Web page: https://inria.fr

IZW

ROR ID: Not specified

Department: Not specified

Country: Not specified

City: Not specified

Web page: Not specified

Leibniz Institute for Zoo and Wildlife Research

ROR ID: Not specified

Department: Not specified

Country: Germany

City: Berlin

Web page: https://www.izw-berlin.de/en/home.html

NBIS – National Bioinformatics Infrastructure Sweden

ROR ID: Not specified

Department: Not specified

Country: Sweden

City: Not specified

Web page: https://nbis.se

University of Lausanne

ROR ID: Not specified

Department: Not specified

Country: Switzerland

City: Lausanne (VD) Lausanne

Web page: https://unil.ch/index.html

Showing 20 out of a possible 26 Workflows Advanced Workflows list for this Space with search and filtering

ERGA Profiling Long Reads v2505 (WF1)

ERGA Assembly

Stable

The workflow takes a (trimmed) Long reads collection, runs Meryl to create a K-mer database, Genomescope2 to estimate genome properties and Smudgeplot to estimate ploidy (optional). The main results are K-mer database and genome profiling plots, tables, and values useful for downstream analysis. Default K-mer length and ploidy for Genomescope are 31 and 2, respectively.

Type: Galaxy

Creators: Diego De Panis, ERGA

Submitter: Diego De Panis

DOI: 10.48546/workflowhub.workflow.603.1

Created: 6th Oct 2023 at 14:25, Last updated: 24th Jun 2025 at 16:41

ERGA DataQC Illumina v2505 (WF0)

ERGA Assembly

Stable

The workflow takes a paired-reads collection (like illumina WGS or HiC), runs FastQC and SeqKit, trims with Fastp, and creates a MultiQC report. The main outputs are a paired collection of trimmed reads, a report with raw and trimmed reads stats, and a table with raw reads stats.

Type: Galaxy

Creators: Diego De Panis, ERGA

Submitter: Diego De Panis

DOI: 10.48546/workflowhub.workflow.601.1

Created: 6th Oct 2023 at 14:03, Last updated: 1st Jun 2025 at 13:15

ERGA DataQC ONT v2505 (WF0)

ERGA Assembly

Stable

The workflow takes ONT reads collection, runs SeqKit and Nanoplot. The main outputs are a table and plots of raw reads stats.

Type: Galaxy

Creators: Diego De Panis, ERGA

Submitter: Diego De Panis

Created: 8th Jan 2024 at 15:25, Last updated: 1st Jun 2025 at 13:08

ERGA DataQC HiFi v2505 (WF0)

ERGA Assembly

Stable

The workflow takes a HiFi reads collection, runs FastQC and SeqKit, filters with Cutadapt, and creates a MultiQC report. The main outputs are a collection of filtred reads, a report with raw and filtered reads stats, and a table with raw reads stats.

Type: Galaxy

Creators: Diego De Panis, ERGA

Submitter: Diego De Panis

DOI: 10.48546/workflowhub.workflow.602.1

Created: 6th Oct 2023 at 14:17, Last updated: 1st Jun 2025 at 13:03

ERGA Long Reads PriAlt Purge+QC v2505 (WF3)

ERGA Assembly

Stable

The workflow takes a Long Reads collection, Pri/Alt contigs, and the values for transition parameter and max coverage depth (calculated from WF1) to run Purge_Dups. It produces purged Pri and Alt contigs assemblies, and runs all the QC analysis (gfastats, BUSCO, and Merqury).

Type: Galaxy

Creators: Diego De Panis, ERGA

Submitter: Diego De Panis

DOI: 10.48546/workflowhub.workflow.1163.1

Created: 24th Sep 2024 at 22:44, Last updated: 1st Jun 2025 at 12:25

ERGA HiC Pri Scaffolding+QC YaHS v2505 (WF4)

ERGA Assembly

Stable

The workflow takes trimmed HiC paired-end reads collection, and Pri/Alt assemblies to produce a scaffolded primary assembly (and alternate contigs) using YaHS. It also runs Pretext and all the QC analyses (gfastats, BUSCO, and Merqury).

Type: Galaxy

Creators: Diego De Panis, ERGA

Submitter: Diego De Panis

Created: 24th Sep 2024 at 22:54, Last updated: 1st Jun 2025 at 12:05

ERGA Long reads-only Assembly+QC Hifiasm v2505 (WF2)

ERGA Assembly

Stable

The workflow takes a long reads collection (HiFi, or ONT also possible now), and max coverage depth (calculated from WF1) to run Hifiasm in solo mode. It produces a Pri/Alt assembly, Bandage plots, and runs all the QC analysis (gfastats, BUSCO, and Merqury).

Type: Galaxy

Creators: Diego De Panis, ERGA

Submitter: Diego De Panis

DOI: 10.48546/workflowhub.workflow.1162.1

Created: 24th Sep 2024 at 22:29, Last updated: 30th May 2025 at 13:55

ERGA-BGE Genome Report ANNOT analyses

ERGA Annotation

Stable

The workflow requires the user to provide:

ENSEMBL link address of the annotation GFF3 file
ENSEMBL link address of the assembly FASTA file
NCBI taxonomy ID
BUSCO lineage
OMArk database

Thw workflow will produce statistics of the annotation based on AGAT, BUSCO and OMArk.

Type: Galaxy

Creators: Diego De Panis, ERGA

Submitter: Diego De Panis

DOI: 10.48546/workflowhub.workflow.1096.1

Created: 9th Aug 2024 at 15:14, Last updated: 24th Feb 2025 at 15:24

ERGA-BGE Genome Report ASM analyses (one-asm WGS Illumina PE + HiC)

ERGA Assembly

Stable

Assembly Evaluation for ERGA-BGE Reports

One Assembly, Illumina WGS reads + HiC reads

The workflow requires the following:

Species Taxonomy ID number
NCBI Genome assembly accession code
BUSCO Lineage
WGS accurate reads accession code
NCBI HiC reads accession code

The workflow will get the data and process it to generate genome profiling (genomescope, smudgeplot -optional-), assembly stats (gfastats), merqury stats (QV, completeness), BUSCO, snailplot, contamination blobplot, and ...

Type: Galaxy

Creators: Diego De Panis, ERGA

Submitter: Diego De Panis

DOI: 10.48546/workflowhub.workflow.1103.2

Created: 19th Aug 2024 at 10:38, Last updated: 5th Dec 2024 at 16:48

ERGA-BGE Genome Report ASM analyses (one-asm HiFi + HiC)

ERGA Assembly

Stable

Assembly Evaluation for ERGA-BGE Reports

One Assembly, HiFi WGS reads + HiC reads

The workflow requires the following:

Species Taxonomy ID number
NCBI Genome assembly accession code
BUSCO Lineage
WGS accurate reads accession code
NCBI HiC reads accession code

Type: Galaxy

Creators: Diego De Panis, ERGA

Submitter: Diego De Panis

DOI: 10.48546/workflowhub.workflow.1104.1

Created: 20th Aug 2024 at 14:19, Last updated: 5th Dec 2024 at 16:47

GALOP - Genome Assembly using Long reads Pipeline

Bioinformatics Laboratory for Genomics and Biodiversity (LBGB), ERGA Assembly

(Show All)

Work-in-progress

GALOP - Genome Assembly using Long reads Pipeline

This repository contains an exact copy of the standard Genoscope long reads assembly pipeline.

At the moment, this is not intended for users to download as it uses grid submission commands that will only work at Genoscope. As time goes on, we intend to make this pipeline available to a broader audience. However, genome assembly and polishing commands are accessible in the lib/assembly.py and lib/polishing.py files.

galop.py -h 
Mandatory
...

Type: Python

Creators: Benjamin Istace, Jean-Marc Aury, Caroline Belser

Submitter: Benjamin Istace

DOI: 10.48546/workflowhub.workflow.1200.2

Created: 12th Nov 2024 at 07:37, Last updated: 14th Nov 2024 at 06:55

Swedish Earth Biogenome Project Genome Assembly Workflow

NBIS, ERGA Assembly

Work-in-progress

Swedish Earth Biogenome Project - Genome Assembly Workflow

The primary genome assembly workflow for the Earth Biogenome Project at NBIS.

Workflow overview

General aim:

flowchart LR 
hifi[/ HiFi reads /] --> data_inspection 
ont[/ ONT reads /] --> data_inspection 
hic[/ Hi-C reads /] --> data_inspection 
data_inspection[[ Data inspection ]] --> preprocessing 
preprocessing[[ Preprocessing ]] --> assemble 
assemble[[ Assemble ]] --> validation 
validation[[ Assembly
...

Type: Nextflow

Creators: Mahesh Binzer-Panchal, Martin Pippel

Submitter: Mahesh Binzer-Panchal

Created: 23rd Aug 2024 at 14:16

HiC scaffolding pipeline

ERGA Assembly, Biodiversity Genomics Europe (general)

Stable

HiC scaffolding pipeline

Snakemake pipeline for scaffolding of a genome using HiC reads using yahs.

Prerequisites

This pipeine has been tested using Snakemake v7.32.4 and requires conda for installation of required tools. To run the pipline use the command:

snakemake --use-conda --cores N

where N is number of cores to use. There are provided a set of configuration and running scripts for exectution on a slurm queueing system. After configuring the cluster.json file run:

./run_cluster ...

Type: Snakemake

Creator: Tom Brown

Submitter: Tom Brown

DOI: 10.48546/workflowhub.workflow.796.2

Created: 16th Mar 2024 at 09:01

Purge retained haplotypes using Purge-Dups

ERGA Assembly, Biodiversity Genomics Europe (general)

Purge dups

This snakemake pipeline is designed to be run using as input a contig-level genome and pacbio reads. This pipeline has been tested with snakemake v7.32.4. Raw long-read sequencing files and the input contig genome assembly must be given in the config.yaml file. To execute the workflow run:

snakemake --use-conda --cores N

Or configure the cluster.json and run using the ./run_cluster command

Type: Snakemake

Creator: Tom Brown

Submitter: Tom Brown

DOI: 10.48546/workflowhub.workflow.506.2

Created: 16th Jun 2023 at 14:56, Last updated: 16th Mar 2024 at 07:49

HiC contact map generation

ERGA Assembly, Biodiversity Genomics Europe (general)

Stable

HiC contact map generation

Snakemake pipeline for the generation of .pretext and .mcool files for visualisation of HiC contact maps with the softwares PretextView and HiGlass, respectively.

Prerequisites

This pipeine has been tested using Snakemake v7.32.4 and requires conda for installation of required tools. To run the pipline use the command:

snakemake --use-conda

There are provided a set of configuration and running scripts for exectution on a slurm queueing system. After configuring ...

Type: Snakemake

Creator: Tom Brown

Submitter: Tom Brown

DOI: 10.48546/workflowhub.workflow.795.2

Created: 14th Mar 2024 at 09:50, Last updated: 14th Mar 2024 at 09:52

ERGA ONT+Illumina Assembly+QC NextDenovo+HyPo v2403 (WF2)

ERGA Assembly

Work-in-progress

The workflow takes raw ONT reads and trimmed Illumina WGS paired reads collections, the ONT raw stats table (calculated from WF1) and the estimated genome size (calculated from WF1) to run NextDenovo and subsequently polish the assembly with HyPo. It produces collapsed assemblies (unpolished and polished) and runs all the QC analyses (gfastats, BUSCO, and Merqury).

Type: Galaxy

Creators: Diego De Panis, ERGA

Submitter: Diego De Panis

Created: 11th Mar 2024 at 14:45

ERGA ONT+Illumina Assembly+QC Flye+HyPo v2403 (WF2)

ERGA Assembly

Stable

The workflow takes raw ONT reads and trimmed Illumina WGS paired reads collections, and the estimated genome size and Max depth (both calculated from WF1) to run Flye and subsequently polish the assembly with HyPo. It produces collapsed assemblies (unpolished and polished) and runs all the QC analyses (gfastats, BUSCO, and Merqury).

Type: Galaxy

Creators: Diego De Panis, ERGA

Submitter: Diego De Panis

Created: 11th Mar 2024 at 12:41

CLAWS (CNAG's long-read assembly workflow in Snakemake)

ERGA Assembly

Stable

CLAWS (CNAG's Long-read Assembly Workflow in Snakemake)

Snakemake Pipeline used for de novo genome assembly @CNAG. It has been developed for Snakemake v6.0.5.

It accepts Oxford Nanopore Technologies (ONT) reads, PacBio HFi reads, illumina paired-end data, illumina 10X data and Hi-C reads. It does the preprocessing of the reads, assembly, polishing, purge_dups, scaffodling and different evaluation steps. By default it will preprocess the reads, run Flye + Hypo + purge_dups + yahs and evaluate ...

Type: Snakemake

Creators: Jessica Gomez-Garrido, Fernando Cruz (CNAG), Francisco Camara (CNAG), Tyler Alioto (CNAG)

Submitter: Jessica Gomez-Garrido

DOI: 10.48546/workflowhub.workflow.567.2

Created: 12th Sep 2023 at 14:23, Last updated: 2nd Feb 2024 at 12:24

ERGA HiC Collapsed Scaffolding+QC YaHS v2311 (WF4)

ERGA Assembly

Work-in-progress

The workflow takes trimmed HiC forward and reverse reads, and one assembly (e.g.: Hap1 or Pri or Collapsed) to produce a scaffolded assembly using YaHS. It also runs all the QC analyses (gfastats, BUSCO, and Merqury).

Type: Galaxy

Creators: Diego De Panis, ERGA

Submitter: Diego De Panis

Created: 9th Jan 2024 at 11:00

ERGA ONT+Illumina Collapsed Purge+QC v2311 (WF3)

ERGA Assembly

Work-in-progress

The workflow takes a trimmed Illumina WGS paired-end reads collection, Collapsed contigs, and the values for transition parameter and max coverage depth (calculated from WF1) to run Purge_Dups. It produces purged Collapsed contigs assemblies, and runs all the QC analysis (gfastats, BUSCO, and Merqury).

Type: Galaxy

Creators: Diego De Panis, ERGA

Submitter: Diego De Panis

Created: 9th Jan 2024 at 10:40, Last updated: 9th Jan 2024 at 10:44

View all 26 Workflows

Advanced Collections list for this Space with search and filtering

ERGA Assembly Galaxy Long Reads & Hi-C Pipelines (Hifiasm-solo + Purge_Dups + YaHS)

Collection of de-novo genome assembly workflows written for implementation in Galaxy

Input data should be PacBio HiFi or ONT reads and Illumina 3-dimensional Chromatin Confirmation Capture (Hi-C) reads

Executing the workflows collection will output a scaffolded primary assembly and alternate contigs, with the complete QC analyses

Please run the workflows in order: WF0', WF1, WF2, WF3, WF4

'Notice there is one for HiFi, one for ONT, and one for Illumina (WGS or Hi-C). Run according to your data. ...

Maintainers: Diego De Panis

Number of items: 7

Tags: Genome assembly, Biodiversity

Created: 24th Sep 2024 at 22:32, Last updated: 1st Jun 2025 at 13:17

Genome Assembly Workflows for ERGA-BGE genomes

Pipelines used by the genomes assembly teams part of the Biodiversity Genomics Europe project

https://biodiversitygenomics.eu/

Maintainers: Tom Brown

Number of items: 3

Tags: Assembly, Genomics, Biodiversity

Created: 4th Sep 2024 at 09:54, Last updated: 14th Nov 2024 at 08:35

Genome Evaluation for ERGA-BGE Reports

Collection of Galaxy workflows for generating results used for creating ERGA-BGE Reports

For a given genome, two workflows should be run: the assembly evaluation (ASM analyses), and the annotation evaluation (ANNOT analyses)

Depending on the kind of data used for the genome assembly, you should choose HiFi or ONT (Illumina) workflows for ASM analyses

Maintainers: Diego De Panis

Number of items: 3

Tags: Genomics, QC, Genome assembly

Created: 20th Aug 2024 at 14:44, Last updated: 26th Aug 2024 at 13:03

ERGA Assembly Snakemake HiFi & HiC Pipelines

Collection of workflows designed to assembled a set of PacBio HiFi and Illumina HiC reads into a chromosome-scale de-novo assembly.

Development versions of these pipelines can be found in the ERGA github and any questions or queries can be raised on the ERGA Discussions Channel

Want to find out more about the work done by ERGA? Become a member ...

Maintainers: Tom Brown, Diego De Panis, ERGA

Number of items: 3

Tags: Genome assembly

Created: 16th Mar 2024 at 09:08, Last updated: 16th Mar 2024 at 09:10

ERGA Assembly Galaxy ONT+Illumina & HiC Pipelines (NextDenovo-HyPo + Purge_Dups + YaHS)

Collection of de-novo genome assembly workflows written for implementation in Galaxy

Input data should be Oxford Nanopore raw reads plus Illumina WGS reads and Illumina 3-dimensional Chromatin Confirmation Capture (HiC) reads

Executing all workflows will output one scaffolded collapsed assembly and the complete QC analyses

Please run the workflows in order: WF0 (there are two, one for ONT, and another one for Illumina that can be used independently for the WGS and HiC reads), WF1, WF2, WF3, WF4

Maintainers: Diego De Panis

Number of items: 6

Tags: Assembly, Bioinformatics, Galaxy, Genomics, Genome assembly, ONT, illumina, Hi-C

Created: 8th Jan 2024 at 09:51, Last updated: 11th Mar 2024 at 14:45

ERGA Assembly Galaxy ONT+Illumina & HiC Pipelines (Flye-HyPo + Purge_Dups + YaHS)

Collection of de-novo genome assembly workflows written for implementation in Galaxy

Input data should be Oxford Nanopore raw reads plus Illumina WGS reads and Illumina 3-dimensional Chromatin Confirmation Capture (HiC) reads

Executing all workflows will output one scaffolded collapsed assembly and the complete QC analyses

Please run the workflows in order: WF0 (there are two, one for ONT, and another one for Illumina that can be used independently for the WGS and HiC reads), WF1, WF2, WF3, WF4

Maintainers: Diego De Panis

Number of items: 6

Tags: Assembly, Bioinformatics, Galaxy, Genomics, Genome assembly, ONT, illumina, Hi-C

Created: 8th Jan 2024 at 09:54, Last updated: 11th Mar 2024 at 12:42

ERGA Assembly Galaxy HiFi & HiC Pipelines (Hifiasm-HiC + Purge_Dups + YaHS)

Collection of de-novo genome assembly workflows written for implementation in Galaxy

Input data should be PacBio HiFi reads and Illumina 3-dimensional Chromatin Confirmation Capture (HiC) reads

Executing all workflows will output two scaffolded haplotype assemblies and the complete QC analyses

Please run the workflows in order: WF0 (there are two, one for HiFi and one for Illumina HiC), WF1, WF2, WF3, WF4

Maintainers: Tom Brown, Diego De Panis

Number of items: 6

Tags: Assembly, Bioinformatics, Galaxy, Genomics, Genome assembly, HiFi, Hi-C

Created: 16th Jun 2023 at 15:07, Last updated: 20th Nov 2023 at 16:20