Workflows
What is a Workflow?Filters
The workflow takes a paired-reads collection (like illumina WGS or HiC), runs FastQC and SeqKit, trims with Fastp, and creates a MultiQC report. The main outputs are a paired collection of trimmed reads, a report with raw and trimmed reads stats, and a table with raw reads stats.
The workflow takes ONT reads collection, runs SeqKit and Nanoplot. The main outputs are a table and plots of raw reads stats.
The workflow takes a HiFi reads collection, runs FastQC and SeqKit, filters with Cutadapt, and creates a MultiQC report. The main outputs are a collection of filtred reads, a report with raw and filtered reads stats, and a table with raw reads stats.
The workflow takes a (trimmed) Long reads collection, runs Meryl to create a K-mer database, Genomescope2 to estimate genome properties and Smudgeplot to estimate ploidy (optional). The main results are K-mer database and genome profiling plots, tables, and values useful for downstream analysis. Default K-mer length and ploidy for Genomescope are 31 and 2, respectively.
The workflow takes a Long Reads collection, Pri/Alt contigs, and the values for transition parameter and max coverage depth (calculated from WF1) to run Purge_Dups. It produces purged Pri and Alt contigs assemblies, and runs all the QC analysis (gfastats, BUSCO, and Merqury).
The workflow takes trimmed HiC paired-end reads collection, and Pri/Alt assemblies to produce a scaffolded primary assembly (and alternate contigs) using YaHS. It also runs Pretext and all the QC analyses (gfastats, BUSCO, and Merqury).
The workflow takes a long reads collection (HiFi, or ONT also possible now), and max coverage depth (calculated from WF1) to run Hifiasm in solo mode. It produces a Pri/Alt assembly, Bandage plots, and runs all the QC analysis (gfastats, BUSCO, and Merqury).
Assembly Evaluation for ERGA-BGE Reports
One Assembly, Illumina WGS reads + HiC reads
The workflow requires the following:
- Species Taxonomy ID number
- NCBI Genome assembly accession code
- BUSCO Lineage
- WGS accurate reads accession code
- NCBI HiC reads accession code
The workflow will get the data and process it to generate genome profiling (genomescope, smudgeplot -optional-), assembly stats (gfastats), merqury stats (QV, completeness), BUSCO, snailplot, contamination blobplot, and ...
Assembly Evaluation for ERGA-BGE Reports
One Assembly, HiFi WGS reads + HiC reads
The workflow requires the following:
- Species Taxonomy ID number
- NCBI Genome assembly accession code
- BUSCO Lineage
- WGS accurate reads accession code
- NCBI HiC reads accession code
The workflow will get the data and process it to generate genome profiling (genomescope, smudgeplot -optional-), assembly stats (gfastats), merqury stats (QV, completeness), BUSCO, snailplot, contamination blobplot, and HiC ...
GALOP - Genome Assembly using Long reads Pipeline
This repository contains an exact copy of the standard Genoscope long reads assembly pipeline.
At the moment, this is not intended for users to download as it uses grid submission commands that will only work at Genoscope. As time goes on, we intend to make this pipeline available to a broader audience. However, genome assembly and polishing commands are accessible in the lib/assembly.py
and lib/polishing.py
files.
galop.py -h
Mandatory
...