Workflow Type: Common Workflow Language
Work-in-progress

CWL-assembly

Codacy Badge Build Status

Description

This repository contains two workflows for metagenome and metatranscriptome assembly of short read data. MetaSPAdes is used as default for paired end data, and MEGAHIT for single end data. MEGAHIT can be specified as the default assembler in the yaml file if preferred. Steps include:

QC - removal of short reads, low quality regions, adapters and host decontamination

Assembly - with metaSPADES or MEGAHIT

Post-assembly - Host and PhiX decontamination, contig length filter (500bp), stats generation.

Multiple input read files can also be specified for co-assembly.

Requirements

This pipeline requires and environment with cwltool, blastn, metaspades and megahit.

Databases

Predownload fasta files for host decontamination and generate: - bwa index folder - blast index folder

Specify the locations in the yaml file when running the pipeline.

Main pipeline executables

src/workflows/metagenome_pipeline.cwl src/workflows/metatranscriptome_pipeline.cwl

Example output directory structure

SRP0741
    └── SRP074153               Project directory containing all assemblies under that project
        ├── downloads.yml       Raw data download caching logfile, to avoid duplicate downloads of raw data
        ├── SRR6257
        │   └── SRR6257420      Run directory
        │       └── megahit
        │           ├── 001     Assembly directory
        │           │   ├── SRR6257420.fasta               Trimmed assembly
        │           │   ├── SRR6257420.fasta.gz            Archive trimmed assembly
        │           │   ├── SRR6257420.fasta.gz.md5        MD5 hash of above archive
        │           │   ├── coverage.tab                   Coverage file
        │           │   ├── final.contigs.fa               Raw assembly
        │           │   ├── job_config.yml                 CWL job configuration
        │           │   ├── megahit.log                    Assembler output log
        │           │   ├── output.json                    Human-readable Assembly stats file
        │           │   ├── sorted.bam                     BAM file of assembly
        │           │   ├── sorted.bam.bai                 Secondary BAM file
        │           │   └── toil.log                       cwlToil output log
        │           └── metaspades Assembly of equivalent data using another assembler (eg metaspades, spades...)
        │               └── ... 
        │ 
        ├── raw                 Raw data directory
        │   └── SRR6257420.fastq.gz                        Raw data files
        │
        └── tmp                 Temporary directory for assemblies
            └── SRR6257
                └── SRR6257420
                    └── megahit
                        └── 001
Could not render the workflow diagram.

Version History

master @ 39efebc (latest) Created 21st Jun 2023 at 11:41 by Germana Baldi

Merge pull request #8 from EBI-Metagenomics/readme_requirements

Update of README, examples, and installation requirements


Frozen master 39efebc

master @ b269a55 (earliest) Created 19th May 2023 at 14:59 by Varsha Kale

Update README.md


Frozen master b269a55
help Creators and Submitter
Creators
Not specified
Submitter
Activity

Views: 2844   Downloads: 345

Created: 19th May 2023 at 14:59

Last updated: 21st Jun 2023 at 11:41

help Tags

This item has not yet been tagged.

help Attributions

None

Total size: 15.8 MB