Workflows
What is a Workflow?Filters
Combined workflow for large genome assembly
The tutorial document for this workflow is here: https://doi.org/10.5281/zenodo.5655813
What it does: A workflow for genome assembly, containing subworkflows:
- Data QC
- Kmer counting
- Trim and filter reads
- Assembly with Flye
- Assembly polishing
- Assess genome quality
Inputs:
- long reads and short reads in fastq format
- reference genome for Quast
Outputs:
- Data information - QC, kmers
- Filtered, trimmed reads
- Genome assembly, assembly graph, ...
Assess genome quality; can run alone or as part of a combined workflow for large genome assembly.
- What it does: Assesses the quality of the genome assembly: generate some statistics and determine if expected genes are present; align contigs to a reference genome.
- Inputs: polished assembly; reference_genome.fasta (e.g. of a closely-related species, if available).
- Outputs: Busco table of genes found; Quast HTML report, and link to Icarus contigs browser, showing contigs aligned to a reference ...
Assembly polishing subworkflow: Racon polishing with long reads
Inputs: long reads and assembly contigs
Workflow steps:
- minimap2 : long reads are mapped to assembly => overlaps.paf.
- overaps, long reads, assembly => Racon => polished assembly 1
- using polished assembly 1 as input; repeat minimap2 + racon => polished assembly 2
- using polished assembly 2 as input, repeat minimap2 + racon => polished assembly 3
- using polished assembly 3 as input, repeat minimap2 + racon => ...
Assembly with Flye; can run alone or as part of a combined workflow for large genome assembly.
- What it does: Assembles long reads with the tool Flye
- Inputs: long reads (may be raw, or filtered, and/or corrected); fastq.gz format
- Outputs: Flye assembly fasta; Fasta stats on assembly.fasta; Assembly graph image from Bandage; Bar chart of contig sizes; Quast reports of genome assembly
- Tools used: Flye, Fasta statistics, Bandage, Bar chart, Quast
- Input parameters: None required, but recommend ...
Trim and filter reads; can run alone or as part of a combined workflow for large genome assembly.
- What it does: Trims and filters raw sequence reads according to specified settings.
- Inputs: Long reads (format fastq); Short reads R1 and R2 (format fastq)
- Outputs: Trimmed and filtered reads: fastp_filtered_long_reads.fastq.gz (But note: no trimming or filtering is on by default), fastp_filtered_R1.fastq.gz, fastp_filtered_R2.fastq.gz
- Reports: fastp report on long reads, html; fastp report ...
Kmer counting step, can run alone or as part of a combined workflow for large genome assembly.
- What it does: Estimates genome size and heterozygosity based on counts of kmers
- Inputs: One set of short reads: e.g. R1.fq.gz
- Outputs: GenomeScope graphs
- Tools used: Meryl, GenomeScope
- Input parameters: None required
- Workflow steps: The tool meryl counts kmers in the input reads (k=21), then converts this into a histogram. GenomeScope: runs a model on the histogram; reports estimates. k-mer ...
Data QC step, can run alone or as part of a combined workflow for large genome assembly.
- What it does: Reports statistics from sequencing reads.
- Inputs: long reads (fastq.gz format), short reads (R1 and R2) (fastq.gz format).
- Outputs: For long reads: a nanoplot report (the HTML report summarizes all the information). For short reads: a MultiQC report.
- Tools used: Nanoplot, FastQC, MultiQC.
- Input parameters: None required.
- Workflow steps: Long reads are analysed by Nanoplot; Short reads ...
Assembly polishing subworkflow: Racon polishing with short reads
Inputs: short reads and assembly (usually pre-polished with other tools first, e.g. Racon + long reads; Medaka)
Workflow steps:
- minimap2: short reads (R1 only) are mapped to the assembly => overlaps.paf. Minimap2 setting is for short reads.
- overlaps + short reads + assembly => Racon => polished assembly 1
- using polished assembly 1 as input; repeat minimap2 + racon => polished assembly 2
- Racon short-read polished ...
Assembly polishing; can run alone or as part of a combined workflow for large genome assembly.
- What it does: Polishes (corrects) an assembly, using long reads (with the tools Racon and Medaka) and short reads (with the tool Racon). (Note: medaka is only for nanopore reads, not PacBio reads).
- Inputs: assembly to be polished: assembly.fasta; long reads - the same set used in the assembly (e.g. may be raw or filtered) fastq.gz format; short reads, R1 only, in fastq.gz format
- Outputs: ...
This is a Galaxy workflow that uses to convert the16S BIOM file to table and figures. It is part of the metaDEGalaxy workflow MetaDEGalaxy: Galaxy workflow for differential abundance analysis of 16s metagenomic data.