Workflow Type: Common Workflow Language
Work-in-progress

Workflow for Metagenomics binning from assembly

Minimal inputs are: Identifier, assembly (fasta) and a associated sorted BAM file

Summary

  • MetaBAT2 (binning)
  • MaxBin2 (binning)
  • SemiBin (binning)
  • DAS Tool (bin merging)
  • EukRep (eukaryotic classification)
  • CheckM (bin completeness and contamination)
  • BUSCO (bin completeness)
  • GTDB-Tk (bin taxonomic classification)

Other UNLOCK workflows on WorkflowHub: https://workflowhub.eu/projects/16/workflows?view=default

All tool CWL files and other workflows can be found here:
Tools: https://gitlab.com/m-unlock/cwl
Workflows: https://gitlab.com/m-unlock/cwl/workflows

How to setup and use an UNLOCK workflow:
https://m-unlock.gitlab.io/docs/setup/setup.html

Click and drag the diagram to pan, double click or use the controls to zoom.

Inputs

ID Name Description Type
identifier Identifier used Identifier for this dataset used in this workflow
  • string
assembly Assembly fasta Assembly in fasta format
  • File
bam_file Bam file Mapping file in sorted bam format containing reads mapped to the assembly
  • File
threads Threads Number of threads to use for computational processes
  • int?
memory memory usage (MB) Maximum memory usage in megabytes
  • int?
gtdbtk_data gtdbtk data directory Directory containing the GTDB database. When none is given GTDB-Tk will be skipped.
  • Directory?
busco_data BUSCO dataset Directory containing the BUSCO dataset location.
  • Directory?
run_semibin Run SemiBin Run with SemiBin binner
  • boolean?
semibin_environment SemiBin Environment Semibin Built-in models (human_gut/dog_gut/ocean/soil/cat_gut/human_oral/mouse_gut/pig_gut/built_environment/wastewater/global/chicken_caecum)
  • string?
sub_workflow Sub workflow Run Use this when you need the output bins as File[] for subsequent analysis workflow steps in another workflow.
  • boolean
step CWL base step number Step number for order of steps
  • int?
destination Output destination (not used in the workflow itself) Optional output destination path for cwl-prov reporting.
  • string?

Steps

ID Name Description
metabat2_contig_depths contig depths MetabatContigDepths to obtain the depth file used in the MetaBat2 and SemiBin binning process
eukrep EukRep EukRep, eukaryotic sequence classification
eukrep_stats EukRep stats EukRep fasta statistics
metabat2 MetaBAT2 binning Binning procedure using MetaBAT2
metabat2_filter_bins Keep MetaBAT2 genome bins Only keep genome bin fasta files (exlude e.g TooShort.fa)
metabat2_contig2bin MetaBAT2 to contig to bins List the contigs and their corresponding bin.
maxbin2 MaxBin2 binning Binning procedure using MaxBin2
maxbin2_to_folder MaxBin2 bins to folder Create folder with MaxBin2 bins
maxbin2_contig2bin MaxBin2 to contig to bins List the contigs and their corresponding bin.
semibin Semibin binning Binning procedure using SemiBin
semibin_contig2bin SemiBin to contig to bins List the contigs and their corresponding bin.
das_tool DAS Tool integrate predictions from multiple binning tools DAS Tool
das_tool_bins Bin dir to files[] DAS Tool bins folder to File array for further analysis
remove_unbinned Remove unbinned Remove unbinned fasta from bin directory. So analysed by subsequent tools.
checkm CheckM CheckM bin quality assessment
busco BUSCO BUSCO assembly completeness workflow
gtdbtk GTDBTK Taxomic assigment of bins with GTDB-Tk
compress_gtdbtk Compress GTDB-Tk Compress GTDB-Tk output folder
aggregate_bin_depths Depths per bin Depths per bin
bins_summary Bins summary Table of all bins and their statistics like size, contigs, completeness etc
bin_readstats Bin and assembly read stats Table general bin and assembly read mapping stats
metabat2_files_to_folder MetaBAT2 output folder Preparation of MetaBAT2 output files + unbinned contigs to a specific output folder
maxbin2_files_to_folder MaxBin2 output folder Preparation of maxbin2 output files to a specific output folder.
semibin_files_to_folder SemiBin output folder Preparation of SemiBin output files to a specific output folder.
das_tool_files_to_folder DAS Tool output folder Preparation of DAS Tool output files to a specific output folder.
checkm_files_to_folder CheckM output Preparation of CheckM output files to a specific output folder
busco_files_to_folder BUSCO output folder Preparation of BUSCO output files to a specific output folder
gtdbtk_files_to_folder GTBD-Tk output folder Preparation of GTDB-Tk output files to a specific output folder
output_bin_files Bin files Bin files for subsequent workflow runs when sub_worflow = true

Outputs

ID Name Description Type
bins Bin files Bins files in fasta format. To be be used in other workflows.
  • File[]?
metabat2_output MetaBAT2 MetaBAT2 output directory
  • Directory
maxbin2_output MaxBin2 MaxBin2 output directory
  • Directory
semibin_output SemiBin MaxBin2 output directory
  • Directory?
das_tool_output DAS Tool DAS Tool output directory
  • Directory
checkm_output CheckM CheckM output directory
  • Directory
busco_output BUSCO BUSCO output directory
  • Directory
gtdbtk_output GTDB-Tk GTDB-Tk output directory
  • Directory?
bins_summary_table Bins summary Summary of info about the bins
  • File
bins_read_stats Assembly/Bin read stats General assembly and bin coverage
  • File
eukrep_fasta EukRep fasta EukRep eukaryotic classified contigs
  • File
eukrep_stats_file EukRep stats EukRep fasta statistics
  • File

Version History

Version 11 (latest) Created 18th Oct 2021 at 10:49 by Jasper Koehorst

Added more binning and assembly reports


Open master d4c912c

Version 10 Created 7th Jun 2021 at 18:34 by Jasper Koehorst

No revision comments

Frozen master c2519b1

Version 9 Created 1st Jun 2021 at 11:43 by Jasper Koehorst

No revision comments

Frozen master d6fcbfa

Version 8 Created 6th May 2021 at 07:03 by Jasper Koehorst

No revision comments

Frozen master 0660405

Version 7 Created 8th Jan 2021 at 10:15 by Jasper Koehorst

No revision comments

Frozen master f3919f2
help Creators and Submitter
Discussion Channel
Activity

Views: 9840   Downloads: 1034

Created: 15th Oct 2020 at 14:55

Last updated: 2nd Nov 2022 at 15:29

Annotated Properties
Topic annotations
Operation annotations
help Attributions

None

Total size: 18.3 KB
Powered by
(v.1.16.0-main)
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH