Genome assembly workflow for nanopore reads, for TSI

Genome assembly workflow for nanopore reads, for TSI

Input:

Nanopore reads (can be in format: fastq, fastq.gz, fastqsanger, or fastqsanger.gz)

Optional settings to specify when the workflow is run:

[1] how many input files to split the original input into (to speed up the workflow). default = 0. example: set to 2000 to split a 60 GB read file into 2000 files of ~ 30 MB.
[2] filtering: min average read quality score. default = 10
[3] filtering: min read length. default = 200
[4] trimming: trim this many nucleotides from start of read. default = 50
[5] note: these are suggestions and will depend on the characteristics of your raw reads and downstream aims. If filtering and trimming settings are too stringent, there may be no reads remaining and workflow will fail.

Workflow steps:

[1] runs FastQC on raw reads
[2] splits input reads file into separate files to speed up the next step of Porechop
[3] trims nanopore adapters using Porechop
[4] trims and filters nanopore reads by quality and length using Nanofilt
[5] collapses back into a single read file, fastqsanger format
[6] runs FastqQC on trimmed/filtered reads
[7] assembles genome with Flye
[8] calculates statistics on genome assembly contigs with Fasta Statistics
[9] draws genome assembly graph with Bandage

Main outputs:

[1] FastQC report on raw reads, html
[2] Adpater-chopped, trimmed, filtered reads in fastqsanger format
[3] FastQC report on filtered reads, html
[4] genome assembly contigs in fasta format (primary assembly)
[5] genome assembly statistics
[6] genome assembly graph in Bandage format

Note: You may wish to plot raw reads first (e.g. using the tool NanoPlot), to get a better of idea of read lengths and quality, to decide on filtering/trimming settings.

Inputs

ID	Name	Description	Type
How many new files to split into during read filtering stage?	How many new files to split into during read filtering stage?	Split input to speed up next step with Porechop. e.g. if input fastq is 60 GB, split into 2000 files of approx 30 MB.	int
Minimum average read quality score to filter on	Minimum average read quality score to filter on	n/a	int?
Minimum read length to filter on	Minimum read length to filter on	n/a	int?
Sequencing reads (in any of these formats: fastq, fastq.gz, fastqsanger, fastqsanger.gz)	Sequencing reads (in any of these formats: fastq, fastq.gz, fastqsanger, fastqsanger.gz)	n/a	File
Trim this many nucleotides from start of read	Trim this many nucleotides from start of read	n/a	int?

Steps

ID	Name	Description
5	Raw reads FastQC	toolshed.g2.bx.psu.edu/repos/devteam/fastqc/fastqc/0.74+galaxy0
6	Split into separate files	toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/split_file_to_collection/0.5.2
7	Porechop on each file	toolshed.g2.bx.psu.edu/repos/iuc/porechop/porechop/0.2.4+galaxy0
8	NanoFilt	toolshed.g2.bx.psu.edu/repos/leomrtns/nanofilt/nanofilt/0.1.0
9	Collapse Collection	toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/5.1.0
10	Trimmed, filtered reads FastQC	toolshed.g2.bx.psu.edu/repos/devteam/fastqc/fastqc/0.74+galaxy0
11	Flye	default setting changed: remove non-primary contigs from assembly = yes toolshed.g2.bx.psu.edu/repos/bgruening/flye/flye/2.9.3+galaxy0
12	Primary assembly Fasta Statistics	toolshed.g2.bx.psu.edu/repos/iuc/fasta_stats/fasta-stats/1.0.3
13	Primary assembly Bandage info	toolshed.g2.bx.psu.edu/repos/iuc/bandage/bandage_info/0.8.1+galaxy1
14	Primary assembly Bandage image	toolshed.g2.bx.psu.edu/repos/iuc/bandage/bandage_image/0.8.1+galaxy3

Outputs

ID	Name	Description	Type
text_file	text_file	n/a	File
html_file	html_file	n/a	File
output	output	n/a	File
assembly_graph	assembly_graph	n/a	File
assembly_gfa	assembly_gfa	n/a	File
assembly_info	assembly_info	n/a	File
flye_log	flye_log	n/a	File
consensus	consensus	n/a	File
metrics	metrics	n/a	File
primary bandage info	primary bandage info	n/a	File
primary bandage image	primary bandage image	n/a	File

Genome assembly workflow for nanopore reads, for TSI
Version 1

Inputs

Steps

Outputs

Version History

Version 1 (earliest) Created 3rd Sep 2024 at 02:07 by Anna Syme

Creator

Submitter

Genome assembly workflow for nanopore reads, for TSI Version 1

Inputs

Steps

Outputs

Version History

Version 1 (earliest) Created 3rd Sep 2024 at 02:07 by Anna Syme

Creator

Submitter

Related items

Genome assembly workflow for nanopore reads, for TSI
Version 1