Genome assembly workflow for nanopore reads, for TSI
Version 1

Workflow Type: Galaxy

Genome assembly workflow for nanopore reads, for TSI

Input:

  • Nanopore reads (can be in format: fastq, fastq.gz, fastqsanger, or fastqsanger.gz)

Optional settings to specify when the workflow is run:

  • [1] how many input files to split the original input into (to speed up the workflow). default = 0. example: set to 2000 to split a 60 GB read file into 2000 files of ~ 30 MB.
  • [2] filtering: min average read quality score. default = 10
  • [3] filtering: min read length. default = 200
  • [4] trimming: trim this many nucleotides from start of read. default = 50
  • [5] note: these are suggestions and will depend on the characteristics of your raw reads and downstream aims. If filtering and trimming settings are too stringent, there may be no reads remaining and workflow will fail.

Workflow steps:

  • [1] runs FastQC on raw reads
  • [2] splits input reads file into separate files to speed up the next step of Porechop
  • [3] trims nanopore adapters using Porechop
  • [4] trims and filters nanopore reads by quality and length using Nanofilt
  • [5] collapses back into a single read file, fastqsanger format
  • [6] runs FastqQC on trimmed/filtered reads
  • [7] assembles genome with Flye
  • [8] calculates statistics on genome assembly contigs with Fasta Statistics
  • [9] draws genome assembly graph with Bandage

Main outputs:

  • [1] FastQC report on raw reads, html
  • [2] Adpater-chopped, trimmed, filtered reads in fastqsanger format
  • [3] FastQC report on filtered reads, html
  • [4] genome assembly contigs in fasta format (primary assembly)
  • [5] genome assembly statistics
  • [6] genome assembly graph in Bandage format

Note: You may wish to plot raw reads first (e.g. using the tool NanoPlot), to get a better of idea of read lengths and quality, to decide on filtering/trimming settings.

Inputs

ID Name Description Type
How many new files to split into during read filtering stage? How many new files to split into during read filtering stage? Split input to speed up next step with Porechop. e.g. if input fastq is 60 GB, split into 2000 files of approx 30 MB.
  • int
Minimum average read quality score to filter on Minimum average read quality score to filter on n/a
  • int?
Minimum read length to filter on Minimum read length to filter on n/a
  • int?
Sequencing reads (in any of these formats: fastq, fastq.gz, fastqsanger, fastqsanger.gz) Sequencing reads (in any of these formats: fastq, fastq.gz, fastqsanger, fastqsanger.gz) n/a
  • File
Trim this many nucleotides from start of read Trim this many nucleotides from start of read n/a
  • int?

Steps

ID Name Description
5 Raw reads FastQC toolshed.g2.bx.psu.edu/repos/devteam/fastqc/fastqc/0.74+galaxy0
6 Split into separate files toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/split_file_to_collection/0.5.2
7 Porechop on each file toolshed.g2.bx.psu.edu/repos/iuc/porechop/porechop/0.2.4+galaxy0
8 NanoFilt toolshed.g2.bx.psu.edu/repos/leomrtns/nanofilt/nanofilt/0.1.0
9 Collapse Collection toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/5.1.0
10 Trimmed, filtered reads FastQC toolshed.g2.bx.psu.edu/repos/devteam/fastqc/fastqc/0.74+galaxy0
11 Flye default setting changed: remove non-primary contigs from assembly = yes toolshed.g2.bx.psu.edu/repos/bgruening/flye/flye/2.9.3+galaxy0
12 Primary assembly Fasta Statistics toolshed.g2.bx.psu.edu/repos/iuc/fasta_stats/fasta-stats/1.0.3
13 Primary assembly Bandage info toolshed.g2.bx.psu.edu/repos/iuc/bandage/bandage_info/0.8.1+galaxy1
14 Primary assembly Bandage image toolshed.g2.bx.psu.edu/repos/iuc/bandage/bandage_image/0.8.1+galaxy3

Outputs

ID Name Description Type
text_file text_file n/a
  • File
html_file html_file n/a
  • File
output output n/a
  • File
assembly_graph assembly_graph n/a
  • File
assembly_gfa assembly_gfa n/a
  • File
assembly_info assembly_info n/a
  • File
flye_log flye_log n/a
  • File
consensus consensus n/a
  • File
metrics metrics n/a
  • File
primary bandage info primary bandage info n/a
  • File
primary bandage image primary bandage image n/a
  • File

Version History

Version 1 (earliest) Created 3rd Sep 2024 at 02:07 by Anna Syme

Initial commit


Frozen Version-1 fc40931
help Creators and Submitter
Creator
Submitter
Citation
Syme, A. (2024). Genome assembly workflow for nanopore reads, for TSI. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.1114.1
Activity

Views: 991   Downloads: 117   Runs: 2

Created: 3rd Sep 2024 at 02:07

Last updated: 3rd Sep 2024 at 02:13

help Attributions

None

Total size: 972 KB
Powered by
(v.1.16.0-main)
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH