**Workflow for preprocessing a reference file. **
Steps:
-When a GenBank file is not provided, it is downloaded from NCBI based on a accession number.
-When multiple plasmid GenBank files are provided, they are merged into one file.
-When any amount of plasmid GenBank files are provided, the reference is merged with the plasmid GenBank file(s) into one file. A FASTA file is also extracted.
-When no plasmid Genbank files are provided, a FASTA file is extracted from the reference GenBank file.
-A GFF3 file is extracted from the final GenBank file.
-The final step determines the relevant outputs.
All tool CWL files and other workflows can be found here:
Tools: https://git.wur.nl/ssb/automated-data-analysis/cwl/-/tree/main/tools
Workflows: https://git.wur.nl/ssb/automated-data-analysis/cwl/-/tree/main/workflows
Click and drag the diagram to pan, double click or use the controls to zoom.
Inputs
| ID | Name | Description | Type |
|---|---|---|---|
| accession_number | accession number | accession number, used to download a GenBank file from NCBI, mandatory when not inputting a reference file. |
|
| fasta_extraction_script | FASTA extraction script | Python script that extracts a FASTA file from GenBank Files. Passed externally within the git structure to avoid having to host a new image. |
|
| gff3_extraction_script | GFF3 extraction script | BioPerl script that extracts a GFF3 file from GenBank Files. Passed externally within the git structure to avoid having to host a new image. |
|
| merging_genbank_script | merging script | Python script to merge multiple GenBank Files. Passed externally within the git structure to avoid having to host a new image. |
|
| plasmids | plasmid file(s) | Input plasmid GenBank files. |
|
| reference_file | reference GenBank file | Reference file in GenBank format. |
|
Steps
| ID | Name | Description |
|---|---|---|
| determine_output | determine output | Determines relevant final outputs. |
| extract_fasta | extract FASTA | Extracts FASTA file from input reference file when no plasmids are provided. |
| extract_gff3 | extract GFF3 | Extracts GFF3 annotation file from the (merged) reference. |
| fetch_reference | fetch reference | Downloads the associated GenBank file from the supplied accession number. |
| merge_plasmids | merge plasmids | Merges plasmids when more than one are present. |
| merge_reference | merge plasmid(s) with reference | Merges the plasmid(s) with the reference GenBank file. |
Outputs
| ID | Name | Description | Type |
|---|---|---|---|
| fasta_final | FASTA output file | Final FASTA output file. |
|
| genbank_final | GenBank output file | Final GenBank output file. |
|
| gff3 | GFF3 output file | Final GFF3 output file. |
|
Version History
Version 1 (earliest) Created 22nd Jul 2025 at 16:22 by Martijn Melissen
Initial commit
Open
master
1c0d264
Creators and SubmitterCreator
Submitter
Views: 643 Downloads: 195
Created: 22nd Jul 2025 at 16:22
Last updated: 12th Aug 2025 at 11:40
Tags
AttributionsNone
Visit source
https://orcid.org/0009-0005-0017-0928