Assembly Evaluation for ERGA-BGE Reports One Assmebly, Illumina WGS reads + HiC reads
The workflow requires the following:
Species Taxonomy ID number NCBI Genome assembly accession code BUSCO Lineage WGS accurate reads accession code NCBI HiC reads accession code The workflow will get the data and process it to generate genome profiling (genomescope, smudgeplot -optional-), assembly stats (gfastats), merqury stats (QV, completeness), BUSCO, snailplot, contamination blobplot, and HiC heatmap.
Use this workflow for ONT-based assemblies where the WGS accurate reads are Illumina PE
Inputs
| ID | Name | Description | Type | 
|---|---|---|---|
| BUSCO Lineage | BUSCO Lineage | Choose the (eukaryotic) BUSCO lineage that corresponds to the assembled species, e.g.: mammalia_odb10 | 
 | 
| Multiple HiC paired-end files? | Multiple HiC paired-end files? | IMPORTANT! If you entered more than one accession code, select Yes | 
 | 
| NCBI Genome assembly accession code | NCBI Genome assembly accession code | Should start with GCA or GCF, e.g.: GCA_963556495.2 | 
 | 
| NCBI HiC reads accession code | NCBI HiC reads accession code | Comma-separated accession code of the reads. Must start with SRR, DRR or ERR, e.g. SRR925743, ERR343809 | 
 | 
| NCBI Illumina WGS PE reads accession code | NCBI Illumina WGS PE reads accession code | Comma-separated accession code of the reads. Must start with SRR, DRR or ERR, e.g. SRR925743, ERR343809 | 
 | 
| Ploidy | Ploidy | Default value: 2 | 
 | 
| Run Smudgeplot? | Run Smudgeplot? | n/a | 
 | 
| Species Taxonomy ID number | Species Taxonomy ID number | Get the NCBI taxonomy number here: https://www.ncbi.nlm.nih.gov/taxonomy | 
 | 
| kmer length | kmer length | Default value: 21 | 
 | 
Steps
| ID | Name | Description | 
|---|---|---|
| 1 | taxdump address | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_text_file_with_recurring_lines/9.3+galaxy1 | 
| 10 | downloads | lftp | 
| 11 | NCBI Datasets Genomes | toolshed.g2.bx.psu.edu/repos/iuc/ncbi_datasets/datasets_download_genome/16.20.0+galaxy0 | 
| 12 | Faster Download and Extract Reads in FASTQ | toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/3.1.1+galaxy0 | 
| 13 | Faster Download and Extract Reads in FASTQ | toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/3.1.1+galaxy0 | 
| 14 | Collapse Collection | toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/5.1.0 | 
| 15 | Flatten collection | __FLATTEN__ | 
| 16 | fastp | toolshed.g2.bx.psu.edu/repos/iuc/fastp/fastp/0.23.4+galaxy0 | 
| 17 | fastp | toolshed.g2.bx.psu.edu/repos/iuc/fastp/fastp/0.23.4+galaxy0 | 
| 18 | Extract dataset | __EXTRACT_DATASET__ | 
| 19 | Flatten collection | __FLATTEN__ | 
| 20 | Create BlobtoolKit | toolshed.g2.bx.psu.edu/repos/bgruening/blobtoolkit/blobtoolkit/4.0.7+galaxy1 | 
| 21 | gfastats | toolshed.g2.bx.psu.edu/repos/bgruening/gfastats/gfastats/1.3.6+galaxy0 | 
| 22 | Diamond | toolshed.g2.bx.psu.edu/repos/bgruening/diamond/bg_diamond/2.0.15+galaxy0 | 
| 23 | BWA-MEM2 | toolshed.g2.bx.psu.edu/repos/iuc/bwa_mem2/bwa_mem2/2.2.1+galaxy1 | 
| 24 | Busco | toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.5.0+galaxy0 | 
| 25 | BWA-MEM2 | toolshed.g2.bx.psu.edu/repos/iuc/bwa_mem2/bwa_mem2/2.2.1+galaxy1 | 
| 26 | Convert FASTA to fai file | CONVERTER_fasta_to_fai | 
| 27 | Meryl | toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy6 | 
| 28 | Merge BAM Files | toolshed.g2.bx.psu.edu/repos/devteam/sam_merge/sam_merge2/1.2.0 | 
| 29 | Sambamba merge | toolshed.g2.bx.psu.edu/repos/bgruening/sambamba_merge/sambamba_merge/1.0.1+galaxy1 | 
| 30 | Extract dataset | __EXTRACT_DATASET__ | 
| 31 | Cut | Cut1 | 
| 32 | Meryl | toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy6 | 
| 33 | BlobToolKit | toolshed.g2.bx.psu.edu/repos/bgruening/blobtoolkit/blobtoolkit/4.0.7+galaxy1 | 
| 34 | BAM/SAM Mapping Stats | toolshed.g2.bx.psu.edu/repos/nilesh/rseqc/rseqc_bam_stat/5.0.3+galaxy0 | 
| 35 | Pick parameter value | toolshed.g2.bx.psu.edu/repos/iuc/pick_value/pick_value/0.2.0 | 
| 36 | bedtools MakeWindowsBed | toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_makewindowsbed/2.31.1 | 
| 37 | Merqury | toolshed.g2.bx.psu.edu/repos/iuc/merqury/merqury/1.3+galaxy3 | 
| 38 | Meryl | toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy6 | 
| 39 | BlobToolKit | toolshed.g2.bx.psu.edu/repos/bgruening/blobtoolkit/blobtoolkit/4.0.7+galaxy2 | 
| 40 | BlobToolKit | toolshed.g2.bx.psu.edu/repos/bgruening/blobtoolkit/blobtoolkit/4.0.7+galaxy2 | 
| 41 | Pairtools parse | toolshed.g2.bx.psu.edu/repos/iuc/pairtools_parse/pairtools_parse/1.1.0+galaxy1 | 
| 42 | Sambamba flagstat | toolshed.g2.bx.psu.edu/repos/bgruening/sambamba_flagstat/sambamba_flagstat/1.0.1+galaxy1 | 
| 43 | Smudgeplot | toolshed.g2.bx.psu.edu/repos/galaxy-australia/smudgeplot/smudgeplot/0.2.5+galaxy3 | 
| 44 | GenomeScope | toolshed.g2.bx.psu.edu/repos/iuc/genomescope/genomescope/2.0+galaxy2 | 
| 45 | Pairtools sort | toolshed.g2.bx.psu.edu/repos/iuc/pairtools_sort/pairtools_sort/1.1.0+galaxy1 | 
| 46 | Pairtools dedup | toolshed.g2.bx.psu.edu/repos/iuc/pairtools_dedup/pairtools_dedup/1.1.0+galaxy1 | 
| 47 | Pairtools split | toolshed.g2.bx.psu.edu/repos/iuc/pairtools_split/pairtools_split/1.1.0+galaxy1 | 
| 48 | cooler csort with tabix | toolshed.g2.bx.psu.edu/repos/lldelisle/cooler_csort_tabix/cooler_csort_tabix/0.8.11+galaxy1 | 
| 49 | cooler_cload_tabix | toolshed.g2.bx.psu.edu/repos/lldelisle/cooler_cload_tabix/cooler_cload_tabix/0.8.11+galaxy1 | 
| 50 | hicMergeMatrixBins | toolshed.g2.bx.psu.edu/repos/bgruening/hicexplorer_hicmergematrixbins/hicexplorer_hicmergematrixbins/3.7.2+galaxy0 | 
| 51 | hicMergeMatrixBins | toolshed.g2.bx.psu.edu/repos/bgruening/hicexplorer_hicmergematrixbins/hicexplorer_hicmergematrixbins/3.7.2+galaxy0 | 
| 52 | hicPlotMatrix | toolshed.g2.bx.psu.edu/repos/bgruening/hicexplorer_hicplotmatrix/hicexplorer_hicplotmatrix/3.7.2+galaxy0 | 
| 53 | hicPlotMatrix | toolshed.g2.bx.psu.edu/repos/bgruening/hicexplorer_hicplotmatrix/hicexplorer_hicplotmatrix/3.7.2+galaxy0 | 
Outputs
| ID | Name | Description | Type | 
|---|---|---|---|
| Busco on input dataset(s): short summary | Busco on input dataset(s): short summary | n/a | 
 | 
| Busco on input dataset(s): full table | Busco on input dataset(s): full table | n/a | 
 | 
Version History
Version 1.2 (latest) Created 4th Nov 2024 at 14:27 by Diego De Panis
Open
 master
mastera3102c4
    Version 1.1 Created 19th Aug 2024 at 13:35 by Diego De Panis
Frozen
 Version-1.1
Version-1.131ffcd1
    Version 1 (earliest) Created 19th Aug 2024 at 10:38 by Diego De Panis
Initial commit
Frozen
 Version-1
Version-1f4998e3
     Creators and Submitter
 Creators and SubmitterCreator
Additional credit
ERGA
Submitter
Views: 4983 Downloads: 999 Runs: 50
Created: 19th Aug 2024 at 10:38
Last updated: 5th Dec 2024 at 16:48
 Attributions
 AttributionsNone
 Collections
 Collections
 View on GitHub
View on GitHub Download RO-Crate
Download RO-Crate Run on Galaxy
Run on Galaxy
 Genome Evaluation f...
        Genome Evaluation f...
 Biodiversity & ecol...
        Biodiversity & ecol...


 https://orcid.org/0000-0002-3679-9585
 https://orcid.org/0000-0002-3679-9585


