This workflow begins from a set of genome assemblies of different samples, strains, species. The genome is first annotated with Funnanotate. Predicted proteins are furtner annotated with Busco. Next, 'ProteinOrtho' finds orthologs across the samples and makes orthogroups. Orthogroups where all samples are represented are extracted. Orthologs in each orthogroup are aligned with ClustalW. The alignments are cleaned with ClipKIT and the concatenation matrix is built using PhyKit. This can be used for phylogeny reconstruction.
Associated Tutorial
This workflows is part of the tutorial Preparing genomic data for phylogeny reconstruction, available in the GTN
Thanks to...
Workflow Author(s): Miguel Roncoroni
Tutorial Author(s): Miguel Roncoroni, Brigida Gallone
Inputs
| ID | Name | Description | Type | 
|---|---|---|---|
| Input genomes as collection | #main/Input genomes as collection | n/a | 
 | 
Steps
| ID | Name | Description | 
|---|---|---|
| 1 | Replace Text | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/1.1.2 | 
| 2 | RepeatMasker | toolshed.g2.bx.psu.edu/repos/bgruening/repeat_masker/repeatmasker_wrapper/4.1.2-p1+galaxy0 | 
| 3 | Funannotate predict annotation | toolshed.g2.bx.psu.edu/repos/iuc/funannotate_predict/funannotate_predict/1.8.9+galaxy2 | 
| 4 | Extract ORF | toolshed.g2.bx.psu.edu/repos/bgruening/glimmer_gbk_to_orf/glimmer_gbk_to_orf/3.02 | 
| 5 | Regex Find And Replace | toolshed.g2.bx.psu.edu/repos/galaxyp/regex_find_replace/regex1/1.0.1 | 
| 6 | Collapse Collection | toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/4.2 | 
| 7 | Proteinortho | toolshed.g2.bx.psu.edu/repos/iuc/proteinortho/proteinortho/6.0.14+galaxy2.9.1 | 
| 8 | Busco | toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/4.1.4 | 
| 9 | Filter | Filter1 | 
| 10 | Proteinortho grab proteins | toolshed.g2.bx.psu.edu/repos/iuc/proteinortho_grab_proteins/proteinortho_grab_proteins/6.0.14+galaxy2.9.1 | 
| 11 | Regex Find And Replace | toolshed.g2.bx.psu.edu/repos/galaxyp/regex_find_replace/regex1/1.0.1 | 
| 12 | ClustalW | toolshed.g2.bx.psu.edu/repos/devteam/clustalw/clustalw/2.1 | 
| 13 | ClipKIT. Alignment trimming software for phylogenetics. | toolshed.g2.bx.psu.edu/repos/padge/clipkit/clipkit/0.1.0 | 
| 14 | PhyKit - Alignment-based functions | toolshed.g2.bx.psu.edu/repos/padge/phykit/phykit_alignment_based/0.1.0 | 
Outputs
| ID | Name | Description | Type | 
|---|---|---|---|
| A partition file ready for input into RAxML or IQ-tree | #main/A partition file ready for input into RAxML or IQ-tree | n/a | 
 | 
| An occupancy file that summarizes the taxon occupancy per sequence | #main/An occupancy file that summarizes the taxon occupancy per sequence | n/a | 
 | 
| ClustalW on input dataset(s): clustal | #main/ClustalW on input dataset(s): clustal | n/a | 
 | 
| Concatenated fasta alignment file | #main/Concatenated fasta alignment file | n/a | 
 | 
| Proteinortho on input dataset(s): orthology-groups | #main/Proteinortho on input dataset(s): orthology-groups | n/a | 
 | 
| Proteinortho_extract_by_orthogroup | #main/Proteinortho_extract_by_orthogroup | n/a | 
 | 
| Trimmed alignment. | #main/Trimmed alignment. | n/a | 
 | 
| _anonymous_output_1 | #main/_anonymous_output_1 | n/a | 
 | 
| _anonymous_output_2 | #main/_anonymous_output_2 | n/a | 
 | 
| _anonymous_output_3 | #main/_anonymous_output_3 | n/a | 
 | 
| _anonymous_output_4 | #main/_anonymous_output_4 | n/a | 
 | 
| _anonymous_output_5 | #main/_anonymous_output_5 | n/a | 
 | 
| _anonymous_output_6 | #main/_anonymous_output_6 | n/a | 
 | 
| _anonymous_output_7 | #main/_anonymous_output_7 | n/a | 
 | 
| extracted_ORFs | #main/extracted_ORFs | n/a | 
 | 
| fasta_header_cleaned | #main/fasta_header_cleaned | n/a | 
 | 
| funannotate_predicted_proteins | #main/funannotate_predicted_proteins | n/a | 
 | 
| headers_shortened | #main/headers_shortened | n/a | 
 | 
| proteomes_to_one_file | #main/proteomes_to_one_file | n/a | 
 | 
| repeat_masked | #main/repeat_masked | n/a | 
 | 
| sample_names_to_headers | #main/sample_names_to_headers | n/a | 
 | 
Version History
 Creators and Submitter
 Creators and SubmitterCreators
Not specifiedSubmitter
Views: 1194 Downloads: 185 Runs: 0
Created: 2nd Jun 2025 at 11:03
 Attributions
 AttributionsNone

 Visit source
Visit source Download RO-Crate
Download RO-Crate Run on Galaxy
Run on Galaxy
 master
master



