Workflow Type: Galaxy

Post-genome assembly quality control workflow using Quast, BUSCO, Meryl, Merqury and Fasta Statistics. Updates November 2023. Inputs: reads as fastqsanger.gz (not fastq.gz), and assembly.fasta. New default settings for BUSCO: lineage = eukaryota; for Quast: lineage = eukaryotes, genome = large. Reports assembly stats into a table called metrics.tsv, including selected metrics from Fasta Stats, and read coverage; reports BUSCO versions and dependencies; and displays these tables in the workflow report. Note: a known bug is that sometimes the workflow report text resets to default text. To restore, look for an earlier workflow version with correct workflow report text, and copy and paste report text into current version.

Inputs

ID Name Description Type
FASTA contigs - Primary Assembly #main/FASTA contigs - Primary Assembly n/a
  • File
Raw reads #main/Raw reads n/a
  • File

Steps

ID Name Description
2 FASTQ to FASTA toolshed.g2.bx.psu.edu/repos/devteam/fastqtofasta/fastq_to_fasta_python/1.1.5
3 Meryl toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy6
4 Fasta Statistics toolshed.g2.bx.psu.edu/repos/iuc/fasta_stats/fasta-stats/2.0
5 Quast toolshed.g2.bx.psu.edu/repos/iuc/quast/quast/5.0.2+galaxy1
6 Busco toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.4.6+galaxy0
7 Fasta Statistics toolshed.g2.bx.psu.edu/repos/iuc/fasta_stats/fasta-stats/2.0
8 Merqury toolshed.g2.bx.psu.edu/repos/iuc/merqury/merqury/1.3
9 Search in textfiles toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_grep_tool/1.1.1
10 Relabel some items in Fasta stats toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sed_tool/1.1.1
11 Get required Busco stats toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_grep_tool/1.1.1
12 Get Busco version toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_grep_tool/1.1.1
13 Get Busco dependencies toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_grep_tool/1.1.1
14 Search in textfiles toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_grep_tool/1.1.1
15 Cut Cut1
16 Filter out unneeded lines from fasta stats toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.0
17 Rename some items and add in delimiters for later toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sed_tool/1.1.1
18 Reformat some text toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sed_tool/1.1.1
19 Cut Cut1
20 Extract assembly size toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.0
21 Extract number of contigs toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.0
22 Extract Contig N and L 50s and 90s toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.0
23 Extract longest contig toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.0
24 Extract GC content toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.0
25 Convert commas to tabs Convert characters1
26 Collate Busco info cat1
27 Paste Paste1
28 Add blank header toolshed.g2.bx.psu.edu/repos/bgruening/add_line_to_file/add_line_to_file/0.1.0
29 Transpose cols to rows toolshed.g2.bx.psu.edu/repos/iuc/datamash_transpose/datamash_transpose/1.8+galaxy0
30 Convert to table Convert characters1
31 Compute coverage, total reads length divided by assembly length toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/2.0
32 Convert underscores to tabs Convert characters1
33 Keep two columns Cut1
34 Round the percentage to 2 decimal places toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/2.0
35 Label the column toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_column/1.1.3
36 Join info into one table cat1

Outputs

ID Name Description Type
Busco and dependencies version #main/Busco and dependencies version n/a
  • File
Busco on input dataset(s): full table #main/Busco on input dataset(s): full table n/a
  • File
Fasta Statistics on input dataset(s): summary stats #main/Fasta Statistics on input dataset(s): summary stats n/a
  • File
Genome assembly metrics #main/Genome assembly metrics n/a
  • File
Genome coverage #main/Genome coverage n/a
  • File
Merqury on input dataset(s): bed #main/Merqury on input dataset(s): bed n/a
  • File
Merqury on input dataset(s): png #main/Merqury on input dataset(s): png n/a
  • File
Merqury on input dataset(s): qv #main/Merqury on input dataset(s): qv n/a
  • File
Merqury on input dataset(s): size files #main/Merqury on input dataset(s): size files n/a
  • File
Merqury on input dataset(s): stats #main/Merqury on input dataset(s): stats n/a
  • File
Merqury on input dataset(s): wig #main/Merqury on input dataset(s): wig n/a
  • File
Meryl on input dataset(s): read-db.meryldb #main/Meryl on input dataset(s): read-db.meryldb n/a
  • File
Quast on input dataset(s): HTML report #main/Quast on input dataset(s): HTML report n/a
  • File
Quast on input dataset(s): PDF report #main/Quast on input dataset(s): PDF report n/a
  • File
Quast on input dataset(s): Log #main/Quast on input dataset(s): Log n/a
  • File
Quast on input dataset(s): tabular report #main/Quast on input dataset(s): tabular report n/a
  • File
_anonymous_output_1 #main/_anonymous_output_1 n/a
  • File
_anonymous_output_10 #main/_anonymous_output_10 n/a
  • File
_anonymous_output_11 #main/_anonymous_output_11 n/a
  • File
_anonymous_output_12 #main/_anonymous_output_12 n/a
  • File
_anonymous_output_13 #main/_anonymous_output_13 n/a
  • File
_anonymous_output_14 #main/_anonymous_output_14 n/a
  • File
_anonymous_output_15 #main/_anonymous_output_15 n/a
  • File
_anonymous_output_16 #main/_anonymous_output_16 n/a
  • File
_anonymous_output_17 #main/_anonymous_output_17 n/a
  • File
_anonymous_output_18 #main/_anonymous_output_18 n/a
  • File
_anonymous_output_19 #main/_anonymous_output_19 n/a
  • File
_anonymous_output_2 #main/_anonymous_output_2 n/a
  • File
_anonymous_output_20 #main/_anonymous_output_20 n/a
  • File
_anonymous_output_21 #main/_anonymous_output_21 n/a
  • File
_anonymous_output_22 #main/_anonymous_output_22 n/a
  • File
_anonymous_output_23 #main/_anonymous_output_23 n/a
  • File
_anonymous_output_24 #main/_anonymous_output_24 n/a
  • File
_anonymous_output_25 #main/_anonymous_output_25 n/a
  • File
_anonymous_output_26 #main/_anonymous_output_26 n/a
  • File
_anonymous_output_3 #main/_anonymous_output_3 n/a
  • File
_anonymous_output_4 #main/_anonymous_output_4 n/a
  • File
_anonymous_output_5 #main/_anonymous_output_5 n/a
  • File
_anonymous_output_6 #main/_anonymous_output_6 n/a
  • File
_anonymous_output_7 #main/_anonymous_output_7 n/a
  • File
_anonymous_output_8 #main/_anonymous_output_8 n/a
  • File
_anonymous_output_9 #main/_anonymous_output_9 n/a
  • File
out_file1 #main/out_file1 n/a
  • File
outfile #main/outfile n/a
  • File

Version History

Version 2 (latest) Created 6th Aug 2024 at 10:57 by Johan Gustafsson

Added/updated 9 files


Open master 7ca9943

Version 1 (earliest) Created 13th Mar 2024 at 23:32 by Johan Gustafsson

Added/updated 9 files


Frozen Version-1 178e9ce
help Creators and Submitter
Creators
  • Gareth Price
  • Anna Syme
Submitter
Citation
Price, G., & Syme, A. (2023). {Genome assessment post assembly}. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.403.3
Activity

Views: 2166   Downloads: 381   Runs: 0

Created: 13th Mar 2024 at 23:32

help Attributions

None

Total size: 1.02 MB
Powered by
(v.1.16.0-main)
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH