Galaxy Workflow Documentation: MS Finder Pipeline
This document outlines a MSFinder Galaxy workflow designed for peak annotation. The workflow consists of several steps aimed at preprocessing MS data, filtering, enhancing, and running MSFinder.
Step 1: Data Collection and Preprocessing
Collect if the inchi and smiles are missing from the dataset, and subsequently filter out the spectra which are missing inchi and smiles.
1.1 MSMetaEnhancer: Collect InChi, Isomeric_smiles, and Nominal_mass
- Utilizes MSMetaEnhancer to collect InChi and Isomeric_smiles using PubChem and IDSM databases.
- Utilizes MSMetaEnhancer to collect MW using RDkit (For GOLM).
1.2 replace key
- replace isomeric_smiles key to smiles using replace text tool
- replace MW key to parent_mass using replace text tool (For GOLM)
1.3 Matchms Filtering
- Filters out invalid SMILES and InChi from the dataset using Matchms filtering.
Step 2: Complex Removal and Subsetting Dataset
Removes coordination complexes from the dataset.
2.1 Remove Complexes and Subset Data
- Removes complexes from the dataset.
- Exports metadata using Matchms metadata export, cuts the SMILES column, removes complexes using Rem_Complex tool, and updates the dataset using Matchms subsetting.
Step 3: Data Key Manipulation
Add missing metadata required by the MSFinder for annotation.
3.1 Matchms Remove Key
- Removes existing keys such as adduct, charge, and ionmode from the dataset.
3.2 Matchms Add Key
- Adds necessary keys like charge, ionmode, and adduct to the dataset.
3.3 Matchms Filtering
- Derives precursor m/z using parent mass and adduct information using matchms filtering.
3.4 Matchms Convert
- Converts the dataset to Riken format for compatibility with MSFinder using matchms convert.
Step 4: Peak Annotation
4.1 Recetox-MSFinder
- Executes MSFinder with a 0.5 Da tolerance for both MS1 and MS2, including all element checks and an extended range for peak annotation.
Step 5: Error Handling and Refinement
Check the MSFinder output to see if the output is the results or the log file. If the output is log file remove the smile from the dataset using matchms subsetting tool and rerun MSFinder.
5.1 Error Handling
- Handles errors in peak annotation by removing SMILES that are not accepted by MSFinder.
- Reruns MSFinder after error correction or with different parameter (if applicable).
Step 6: High-res Annotation
6.1 High-Res Peak Overwriting
- Utilizes the Use_Theoretical_mz_Annotations tool to Overwrite experimentally measured mz values for peaks with theoretical values from peak comments.
Inputs
ID | Name | Description | Type |
---|---|---|---|
Spectral library file | Spectral library file | n/a |
|
Steps
ID | Name | Description |
---|---|---|
1 | MSMetaEnhancer: collect InChi using pubchem | toolshed.g2.bx.psu.edu/repos/recetox/msmetaenhancer/msmetaenhancer/0.3.0+galaxy3 |
2 | MSMetaEnhancer: collect Isomeric_smiles from IDSM | toolshed.g2.bx.psu.edu/repos/recetox/msmetaenhancer/msmetaenhancer/0.3.0+galaxy3 |
3 | MSMetaEnhancer: collect NOMINAL_MASS using RDkit | toolshed.g2.bx.psu.edu/repos/recetox/msmetaenhancer/msmetaenhancer/0.3.0+galaxy3 |
4 | Replace ISOMERIC_SMILES TO SMILES | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.4 |
5 | Replace NOMINAL_MASS TO PARENT_MASS | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.4 |
6 | Matchms Filtering: require Inchi and Smile | toolshed.g2.bx.psu.edu/repos/recetox/matchms_filtering/matchms_filtering/0.24.0+galaxy2 |
7 | Metadata export | toolshed.g2.bx.psu.edu/repos/recetox/matchms_metadata_export/matchms_metadata_export/0.24.0+galaxy1 |
8 | Convert CSV to tabular | csv_to_tabular |
9 | Extract smiles column | toolshed.g2.bx.psu.edu/repos/iuc/column_remove_by_header/column_remove_by_header/1.0 |
10 | Convert tabular to CSV | tabular_to_csv |
11 | Remove coordination complexes | toolshed.g2.bx.psu.edu/repos/recetox/rem_complex/rem_complex/1.0.0+galaxy2 |
12 | Subset spectra based of rem_complex output | toolshed.g2.bx.psu.edu/repos/recetox/matchms_subsetting/matchms_subsetting/0.24.0+galaxy5 |
13 | Matchms remove key: remove existing ionmode key | toolshed.g2.bx.psu.edu/repos/recetox/matchms_remove_key/matchms_remove_key/0.24.0+galaxy0 |
14 | Matchms remove key: remove existing adduct key | toolshed.g2.bx.psu.edu/repos/recetox/matchms_remove_key/matchms_remove_key/0.24.0+galaxy0 |
15 | Matchms remove key: remove existing precursor_mz key | toolshed.g2.bx.psu.edu/repos/recetox/matchms_remove_key/matchms_remove_key/0.24.0+galaxy0 |
16 | Matchms add key: add ionmode key | toolshed.g2.bx.psu.edu/repos/recetox/matchms_add_key/matchms_add_key/0.24.0+galaxy1 |
17 | Matchms add key: add adduct key | toolshed.g2.bx.psu.edu/repos/recetox/matchms_add_key/matchms_add_key/0.24.0+galaxy1 |
18 | Matchms add key: add charge key | toolshed.g2.bx.psu.edu/repos/recetox/matchms_add_key/matchms_add_key/0.24.0+galaxy1 |
19 | Matchms filtering: add precursor_mz | toolshed.g2.bx.psu.edu/repos/recetox/matchms_filtering/matchms_filtering/0.24.0+galaxy2 |
20 | matchms convert to riken | toolshed.g2.bx.psu.edu/repos/recetox/matchms_convert/matchms_convert/0.24.0+galaxy0 |
21 | RECETOX MsFinder | toolshed.g2.bx.psu.edu/repos/recetox/recetox_msfinder/recetox_msfinder/v3.5.2+galaxy4 |
22 | use theoretical m/z values | toolshed.g2.bx.psu.edu/repos/recetox/use_theoretical_mz_annotations/use_theoretical_mz_annotations/1.0.0+galaxy1 |
Outputs
ID | Name | Description | Type |
---|---|---|---|
_anonymous_output_1 | _anonymous_output_1 | n/a |
|
_anonymous_output_2 | _anonymous_output_2 | n/a |
|
_anonymous_output_3 | _anonymous_output_3 | n/a |
|
_anonymous_output_4 | _anonymous_output_4 | n/a |
|
MSMetaEnhancer on input dataset(s) | MSMetaEnhancer on input dataset(s) | n/a |
|
Log of MSMetaEnhancer on input dataset(s) | Log of MSMetaEnhancer on input dataset(s) | n/a |
|
_anonymous_output_5 | _anonymous_output_5 | n/a |
|
_anonymous_output_6 | _anonymous_output_6 | n/a |
|
_anonymous_output_7 | _anonymous_output_7 | n/a |
|
_anonymous_output_8 | _anonymous_output_8 | n/a |
|
_anonymous_output_9 | _anonymous_output_9 | n/a |
|
_anonymous_output_10 | _anonymous_output_10 | n/a |
|
_anonymous_output_11 | _anonymous_output_11 | n/a |
|
_anonymous_output_12 | _anonymous_output_12 | n/a |
|
_anonymous_output_13 | _anonymous_output_13 | n/a |
|
_anonymous_output_14 | _anonymous_output_14 | n/a |
|
_anonymous_output_15 | _anonymous_output_15 | n/a |
|
_anonymous_output_16 | _anonymous_output_16 | n/a |
|
_anonymous_output_17 | _anonymous_output_17 | n/a |
|
_anonymous_output_18 | _anonymous_output_18 | n/a |
|
_anonymous_output_19 | _anonymous_output_19 | n/a |
|
_anonymous_output_20 | _anonymous_output_20 | n/a |
|
_anonymous_output_21 | _anonymous_output_21 | n/a |
|
_anonymous_output_22 | _anonymous_output_22 | n/a |
|
_anonymous_output_23 | _anonymous_output_23 | n/a |
|
Version History
Version 2 (latest) Created 6th Jun 2024 at 11:18 by Zargham Ahmad
Frozen
Version-2
8dd83f3
Version 1 (earliest) Created 20th May 2024 at 11:05 by Helge Hecht
Initial release of the workflow.
Frozen
Version-1
24479e7
Creators
Additional credit
Research Infrastructure RECETOX RI (No LM2018121) financed by the Ministry of Education, Youth and Sports, and Operational Programme Research, Development and Innovation - project CETOCOEN EXCELLENCE (No CZ.02.1.01/0.0/0.0/17_043/0009632).
Submitter
Views: 1901 Downloads: 228 Runs: 0
Created: 20th May 2024 at 11:05
Last updated: 6th Jun 2024 at 10:58
None