Workflow (hybrid) metagenomic assembly and binning
- Workflow Illumina Quality:
- Sequali (control)
- hostile contamination filter
- fastp (quality trimming)
- Workflow Longread Quality:
- NanoPlot (control)
- fastplong (quality trimming)
- hostile contamination filter
- Kraken2 taxonomic classification of FASTQ reads
- SPAdes/Flye (Assembly)
- Medaka/PyPolCA (Assembly polishing)
- QUAST (Assembly quality report)
(optional)
- Workflow binnning
- Metabat2/MaxBin2/SemiBin
- Binette
- BUSCO
- GTDB-Tk
(optional)
- Workflow Genome-scale metabolic models https://workflowhub.eu/workflows/372
- CarveMe (GEM generation)
- MEMOTE (GEM test suite)
- SMETANA (Species METabolic interaction ANAlysis)
Other UNLOCK workflows on WorkflowHub: https://workflowhub.eu/projects/16/workflows?view=default
All tool CWL files and other workflows can be found here:
https://gitlab.com/m-unlock/cwl/
How to setup and use an UNLOCK workflow:
https://docs.m-unlock.nl/docs/workflows/setup.html
Click and drag the diagram to pan, double click or use the controls to zoom.
Inputs
ID | Name | Description | Type |
---|---|---|---|
identifier | Identifier | Identifier for this dataset used in this workflow (required) |
|
threads | Number of threads | Number of threads to use for each computational processe (default 2) |
|
memory | Memory usage (MB) | Maximum memory usage in megabytes. This mostly important for SPAdes assembly. (default 8GB) |
|
illumina_forward_reads | Forward reads | Illumina Forward sequence file(s) |
|
illumina_reverse_reads | Reverse reads | Illumina Reverse sequence file(s) |
|
pacbio_reads | PacBio reads | File(s) with PacBio reads in FASTQ format |
|
nanopore_reads | Oxford Nanopore reads | File(s) with Oxford Nanopore reads in FASTQ format |
|
fastq_rich | Fastq rich (ONT) | Input fastq is generated by albacore, MinKNOW or guppy with additional information concerning channel and time. Used to creating more informative quality plots (default false) |
|
longread_minimum_length | Minimum length required | Reads shorter will be discarded. (default 100) |
|
longread_length_limit | Maximum length limit | Reads longer than length_limit will be discarded. (default no limit) |
|
longread_qualified_quality_phred | Qualified_quality_phred | The quality value that a base is qualified. (default 9 means phred quality >=Q9 is qualified) |
|
longread_mean_qual | Mean quality | If one read's mean_qual quality score < mean_qual, then this read is discarded. (default 10) |
|
longread_trim_front | Trim_front | Trimming how many bases in front for read. (default 0) |
|
longread_trim_tail | trim_tail | Trimming how many bases in tail for read. (default 0) |
|
longread_trim_poly_x | Trim_poly_x | Enable polyX trimming in 3' ends. (default false) |
|
longread_poly_x_min_len | Poly_x_min_len | The minimum length to detect polyX in the read tail. (default 10 when trim_poly_x is true) |
|
longread_start_adapter | start_adapter | The adapter sequence at read start (5'). (default auto-detect) |
|
longread_end_adapter | End adapter | The adapter sequence at read end (3'). (default auto-detect) |
|
longread_adapter_fasta | Adapter fasta | Specify a FASTA file to trim both read ends by all the sequences in this FASTA file. (default None) |
|
longread_disable_adapter_trimming | Disable adapter trimming | Adapter trimming is enabled by default. If this option is specified, adapter trimming is disabled. (default false) |
|
illumina_humandb | Filter human reads | Bowtie2 index folder. Provide the folder in which the in index files are located. (optional) |
|
longread_humandb | Filter human illumina reads | A fasta file or minimap2 indexed filed (.mmi) index needs to be provided. Preindexed is much faster. (optional) |
|
illumina_reference_filter_db | Illumina reference filter db | Custom reference database for filtering with Hostile. Provide the folder in which the bowtie2 index files are located. (optional) |
|
longread_reference_filter_db | Longread reference filter db | A fasta file or minimap2 indexed filed (.mmi) index needs to be provided. Preindexed is much faster. (optional) |
|
use_reference_mapped_reads | Keep mapped reads | Discard unmapped and keep reads mapped to the given reference. (default false (discard mapped)) |
|
keep_filtered_reads | Keep filtered reads | Keep filtered reads in the final output (default false) |
|
deduplicate_illumina_reads | Deduplicate illumina reads | Remove exact duplicate reads Illumina reads with fastp (default false) |
|
run_kraken2_illumina | Run kraken2 on Illumina reads | Run kraken2 on Illumina reads. A kraken2 database needs to be provided using the input kraken2_database. (default false) |
|
skip_bracken | Run Bracken | Skip Bracken analysis. Illumina only. A bracken compatible kraken2 database needs to be provided using the input kraken2_database. (default false) |
|
bracken_levels | Bracken levels | Taxonomy levels in bracken estimate abundances on. Default runs through; [P,C,O,F,G,S] |
|
illumina_read_length | Read length | Read length to use in bracken only atm. Usually 50,75,100,150,200,250 or 300. (default 150) |
|
kraken2_confidence | Kraken2 confidence threshold | Confidence score threshold must be in [0, 1] (default 0.0) |
|
kraken2_database | Kraken2 database | Database location of kraken2. (optional) |
|
kraken2_standard_report | Kraken2 standard report | Also output Kraken2 standard report with per read classification. These can be large. (default false) |
|
genome_size | Genome Size | Estimated genome size (for example, 5m or 2.6g). Used in Flye. (optional) |
|
metagenome | When working with metagenomes | Metagenome option for assemblers (default true) |
|
run_spades | Use SPAdes | Run with SPAdes assembler (default true) |
|
only_assembler_mode_spades | Only spades assembler | Run spades in only assembler mode (without read error correction). (default false) |
|
use_spades_scaffolds | Use SPAdes scaffolds | Use SPAdes scaffolds instead of contigs for post-processing (polishing/mapping/binning). (default false) |
|
run_flye | Use Flye | Run with Flye assembler. Requires long reads (default false) |
|
flye_deterministic | Deterministic Flye | Perform disjointig assembly single-threaded in Flye assembler (slower). (default false) |
|
run_medaka | Use Medaka | Run with Mekada assembly polishing using nanopore (not pacbio) reads only. (default false) |
|
run_pypolca | Use PyPolCA | Run with PyPolCA assembly polishing using Illumina reads only. (default false) |
|
assembly_choice | Assembly choice | User's choice of assembly for post-assembly (binning) processes ('spades', 'flye', 'pypolca', 'medaka'). Optional. Only one choice allowed. When none is given, the first available assembly in this order is chosen: pypolca, medaka, flye, spades. |
|
output_bam_file | Output BAM file | Output BAM file of mapped reads to assembly of choice. (default false) |
|
ont_basecall_model | ONT Basecalling model used for MEDAKA | Used in MEDAKA Basecalling model used with guppy default r941_min_high. Available: r103_fast_g507, r103_fast_snp_g507, r103_fast_variant_g507, r103_hac_g507, r103_hac_snp_g507, r103_hac_variant_g507, r103_min_high_g345, r103_min_high_g360, r103_prom_high_g360, r103_prom_snp_g3210, r103_prom_variant_g3210, r103_sup_g507, r103_sup_snp_g507, r103_sup_variant_g507, r1041_e82_400bps_fast_g615, r1041_e82_400bps_fast_variant_g615, r1041_e82_400bps_hac_g615, r1041_e82_400bps_hac_variant_g615, r1041_e82_400bps_sup_g615, r1041_e82_400bps_sup_variant_g615, r104_e81_fast_g5015, r104_e81_fast_variant_g5015, r104_e81_hac_g5015, r104_e81_hac_variant_g5015, r104_e81_sup_g5015, r104_e81_sup_g610, r104_e81_sup_variant_g610, r10_min_high_g303, r10_min_high_g340, r941_e81_fast_g514, r941_e81_fast_variant_g514, r941_e81_hac_g514, r941_e81_hac_variant_g514, r941_e81_sup_g514, r941_e81_sup_variant_g514, r941_min_fast_g303, r941_min_fast_g507, r941_min_fast_snp_g507, r941_min_fast_variant_g507, r941_min_hac_g507, r941_min_hac_snp_g507, r941_min_hac_variant_g507, r941_min_high_g303, r941_min_high_g330, r941_min_high_g340_rle, r941_min_high_g344, r941_min_high_g351, r941_min_high_g360, r941_min_sup_g507, r941_min_sup_snp_g507, r941_min_sup_variant_g507, r941_prom_fast_g303, r941_prom_fast_g507, r941_prom_fast_snp_g507, r941_prom_fast_variant_g507, r941_prom_hac_g507, r941_prom_hac_snp_g507, r941_prom_hac_variant_g507, r941_prom_high_g303, r941_prom_high_g330, r941_prom_high_g344, r941_prom_high_g360, r941_prom_high_g4011, r941_prom_snp_g303, r941_prom_snp_g322, r941_prom_snp_g360, r941_prom_sup_g507, r941_prom_sup_snp_g507, r941_prom_sup_variant_g507, r941_prom_variant_g303, r941_prom_variant_g322, r941_prom_variant_g360, r941_sup_plant_g610, r941_sup_plant_variant_g610 (required for Medaka) |
|
binning | Run binning workflow | Run with contig binning workflow (default false) |
|
run_maxbin2 | Run Maxbin2 | Run with MaxBin2 binner. (default true) |
|
run_semibin2 | Run SemiBin | Run with SemiBin2 binner. (default true) |
|
semibin2_environment | SemiBin Environment | Semibin2 Built-in models (none/global/human_gut/dog_gut/ocean/soil/cat_gut/human_oral/mouse_gut/pig_gut/built_environment/wastewater/chicken_caecum). Choosing a built-in model is generally faster. Otherwise it will do (single-sample) training on the data. Default global. Choose none if you want to do training on your own data. |
|
gtdbtk_data | gtdbtk data directory | Directory containing the GTDBTK repository |
|
busco_data | BUSCO dataset | Path to the BUSCO dataset downloaded location. (optional) |
|
annotate_bins | Annotate bins | Annotate bins. (default false) |
|
annotate_unbinned | Annotate unbinned | Annotate unbinned contigs. Will be treated as metagenome. (default false) |
|
bakta_db | Bakta DB | Bakta Database directory. Default is built-in bakta-light db. (optional) |
|
skip_bakta_crispr | Skip bakta CRISPR | Skip bakta CRISPR array prediction using PILER-CR. (default false) |
|
interproscan_directory | InterProScan 5 directory | Directory of the (full) InterProScan 5 program. Used for annotating bins. (optional) |
|
eggnog_dbs | n/a | n/a |
|
run_kofamscan | Run kofamscan | Run with KEGG KO KoFamKOALA annotation. (default false) |
|
kofamscan_limit_sapp | SAPP kofamscan limit | Limit max number of entries of kofamscan hits per locus in SAPP. (default 5) |
|
run_eggnog | Run eggNOG-mapper | Run with eggNOG-mapper annotation. Requires eggnog database files. (default false) |
|
run_interproscan | Run InterProScan | Run with eggNOG-mapper annotation. Requires InterProScan v5 program files. (default false) |
|
interproscan_applications | InterProScan applications | Comma separated list of analyses: FunFam,SFLD,PANTHER,Gene3D,Hamap,PRINTS,ProSiteProfiles,Coils,SUPERFAMILY,SMART,CDD,PIRSR,ProSitePatterns,AntiFam,Pfam,MobiDBLite,PIRSF,NCBIfam default Pfam,SFLD,SMART,AntiFam,NCBIfam |
|
destination | Output Destination | Optional output destination only used for cwl-prov reporting. |
|
source | Input URLs used for this run | A provenance element to capture the original source of the input data |
|
Steps
ID | Name | Description |
---|---|---|
workflow_quality_illumina | Oxford Nanopore quality workflow | Quality, filtering and taxonomic classification workflow for Oxford Nanopore reads |
workflow_quality_nanopore | Oxford Nanopore quality workflow | Quality, filtering and taxonomic classification workflow for Oxford Nanopore reads |
workflow_quality_pacbio | PacBio quality and filtering workflow | Quality, filtering and taxonomic classification for PacBio reads |
workflow_kraken2_illumina | Kraken2 illumina | Taxonomic classification using kraken2 Illumina reads |
spades | SPAdes assembly | Genome assembly using SPAdes with illumina and or long reads |
spades_assembly | SPAdes contigs or scaffolds | Get chosen spades assembly. Contigs or scaffolds |
compress_spades | SPAdes compressed | Compress the large Spades assembly output files |
flye | Flye assembly | De novo assembly of single-molecule reads with Flye |
medaka | Medaka polishing of assembly | Medaka for (ont reads) polishing of an assembled (flye) genome |
workflow_pypolca | Run PyPolCA assemlby polishing | PyPolCA polishing of longreads assembly with illumina reads |
get_assembly_to_use | Assembly choice | Get assembly choice |
assembly_read_mapping_illumina | Minimap2 | Illumina read mapping using Minimap2 on assembled scaffolds |
contig_read_counts | Samtools idxstats | Reports alignment summary statistics |
workflow_binning | Binning workflow | Binning workflow to create bins |
keep_readfilter_files_to_folder | Read filtering output folder | Preparation of read filtering output files to a specific output folder |
readfilter_files_to_folder | Read filtering output folder | Preparation of read filtering reports specific output folder |
spades_files_to_folder | SPADES output to folder | Preparation of SPAdes output files to a specific output folder |
flye_files_to_folder | Flye output folder | Preparation of Flye output files to a specific output folder |
medaka_files_to_folder | Medaka output folder | Preparation of Medaka output files to a specific output folder |
pypolca_files_to_folder | PyPolca output folder | Preparation of PyPolCA output files to a specific output folder |
output_bamfile | Output bam file | Step needed to output bam file because there is an option to. |
assembly_files_to_folder | Flye output folder | Preparation of Flye output files to a specific output folder |
binning_files_to_folder | Binning output to folder | Preparation of binning output files and folders to a specific output folder |
Outputs
ID | Name | Description | Type |
---|---|---|---|
read_filtering_output_keep | Read filtering output | Read filtering stats + filtered reads |
|
read_filtering_output | Read filtering output | Read filtering stats |
|
assembly_output | Assembly output | Output from different assembly steps |
|
binning_output | Binning output | Binning outputfolders |
|
Version History
Version 3 (latest) Created 9th Sep 2025 at 13:28 by Bart Nijsse
Major changes: This version changes the way read filtering is performed and replaces DAStool with Binette.
Open
master
d1190f4
WFP Created 16th Dec 2024 at 07:46 by Bart Nijsse
Workflow version used in analysis: "A metadata managed FAIR end-to-end workflow for microbial community Omics data analysis"
Frozen
WFP
7c7adba
Version 1 (earliest) Created 14th Jun 2022 at 09:14 by Bart Nijsse
Initial commit
Frozen
Version-1
1e42c47

Creators
Submitter
Views: 5463 Downloads: 896
Created: 14th Jun 2022 at 09:14
Last updated: 9th Sep 2025 at 14:47

None