CWL-assembly
Description
This repository contains two workflows for metagenome and metatranscriptome assembly of short read data. MetaSPAdes is used as default for paired end data, and MEGAHIT for single end data. MEGAHIT can be specified as the default assembler in the yaml file if preferred. Steps include:
QC - removal of short reads, low quality regions, adapters and host decontamination
Assembly - with metaSPADES or MEGAHIT
Post-assembly - Host and PhiX decontamination, contig length filter (500bp), stats generation.
Multiple input read files can also be specified for co-assembly.
Requirements
This pipeline requires and environment with cwltool, blastn, metaspades and megahit.
Databases
Predownload fasta files for host decontamination and generate: - bwa index folder - blast index folder
Specify the locations in the yaml file when running the pipeline.
Main pipeline executables
src/workflows/metagenome_pipeline.cwl src/workflows/metatranscriptome_pipeline.cwl
Example output directory structure
SRP0741
└── SRP074153 Project directory containing all assemblies under that project
├── downloads.yml Raw data download caching logfile, to avoid duplicate downloads of raw data
├── SRR6257
│ └── SRR6257420 Run directory
│ └── megahit
│ ├── 001 Assembly directory
│ │ ├── SRR6257420.fasta Trimmed assembly
│ │ ├── SRR6257420.fasta.gz Archive trimmed assembly
│ │ ├── SRR6257420.fasta.gz.md5 MD5 hash of above archive
│ │ ├── coverage.tab Coverage file
│ │ ├── final.contigs.fa Raw assembly
│ │ ├── job_config.yml CWL job configuration
│ │ ├── megahit.log Assembler output log
│ │ ├── output.json Human-readable Assembly stats file
│ │ ├── sorted.bam BAM file of assembly
│ │ ├── sorted.bam.bai Secondary BAM file
│ │ └── toil.log cwlToil output log
│ └── metaspades Assembly of equivalent data using another assembler (eg metaspades, spades...)
│ └── ...
│
├── raw Raw data directory
│ └── SRR6257420.fastq.gz Raw data files
│
└── tmp Temporary directory for assemblies
└── SRR6257
└── SRR6257420
└── megahit
└── 001
Version History
master @ 39efebc (latest) Created 21st Jun 2023 at 11:41 by Germana Baldi
Merge pull request #8 from EBI-Metagenomics/readme_requirements
Update of README, examples, and installation requirements
Frozen
master
39efebc
master @ b269a55 (earliest) Created 19th May 2023 at 14:59 by Varsha Kale
Update README.md
Frozen
master
b269a55
Creators
Not specifiedSubmitter
Views: 2714 Downloads: 323
Created: 19th May 2023 at 14:59
Last updated: 21st Jun 2023 at 11:41
This item has not yet been tagged.
None
(v.1.16.0-main)