reference (and plasmid) preprocessing workflow
Version 1

Workflow Type: Common Workflow Language

Workflow for preprocessing the reference file. Downloads the GenBank file from NCBI if not provided, concatenates plasmid GenBank file(s) with each other and the reference file.

This workflow on WorkflowHub: https://workflowhub.eu/projects/upcoming

All tool CWL files and other workflows can be found here: Tools: https://git.wur.nl/ssb/automated-data-analysis/cwl/-/tree/main/tools Workflows: https://git.wur.nl/ssb/automated-data-analysis/cwl/-/tree/main/workflows

Click and drag the diagram to pan, double click or use the controls to zoom.

Inputs

ID Name Description Type
accession_number accession number accession number, used to download a GenBank file from NCBI, mandatory when not inputting a reference file.
  • string
fasta_extraction_script FASTA extraction script Python script that extracts a FASTA file from GenBank Files. Passed externally within the git structure to avoid having to host a new image.
  • File
gff3_extraction_script GFF3 extraction script BioPerl script that extracts a GFF3 file from GenBank Files. Passed externally within the git structure to avoid having to host a new image.
  • File
merging_genbank_script merging script Python script to merge multiple GenBank Files. Passed externally within the git structure to avoid having to host a new image.
  • File
plasmids plasmid file(s) Input plasmid GenBank files.
  • array containing
    • File
reference_file reference GenBank file Reference file in GenBank format.
  • File

Steps

ID Name Description
determine_output determine output Determines relevant final outputs.
extract_fasta extract FASTA Extracts FASTA file from input reference file when no plasmids are provided.
extract_gff3 extract GFF3 Extracts GFF3 annotation file from the (merged) reference.
fetch_reference fetch reference Downloads the associated GenBank file from the supplied accession number.
merge_plasmids merge plasmids Merges plasmids when more than one are present.
merge_reference merge plasmid(s) with reference Merges the plasmid(s) with the reference GenBank file.

Outputs

ID Name Description Type
fasta_final FASTA output file Final FASTA output file.
  • File
genbank_final GenBank output file Final GenBank output file.
  • File
gff3 GFF3 output file Final GFF3 output file.
  • File

Version History

Version 1 (earliest) Created 22nd Jul 2025 at 16:22 by Martijn Melissen

Initial commit


Open master 1c0d264
help Creators and Submitter
Activity

Views: 18   Downloads: 4

Created: 22nd Jul 2025 at 16:22

help Attributions

None

Total size: 38 KB
Powered by
(v.1.17.0-main)
Copyright © 2008 - 2025 The University of Manchester and HITS gGmbH