Theoretical fragment substructure generation and in silico mass spectral library high-resolution upcycling workflow

Workflow Type: Galaxy
Work-in-progress

Galaxy Workflow Documentation: MS Finder Pipeline

This document outlines a MSFinder Galaxy workflow designed for peak annotation. The workflow consists of several steps aimed at preprocessing MS data, filtering, enhancing, and running MSFinder.

Step 1: Data Collection and Preprocessing

Collect if the inchi and smiles are missing from the dataset, and subsequently filter out the spectra which are missing inchi and smiles.

1.1 MSMetaEnhancer: Collect InChi, Isomeric_smiles, and Nominal_mass

  • Utilizes MSMetaEnhancer to collect InChi and Isomeric_smiles using PubChem and IDSM databases.
  • Utilizes MSMetaEnhancer to collect MW using RDkit (For GOLM).

1.2 replace key

  • replace isomeric_smiles key to smiles using replace text tool
  • replace MW key to parent_mass using replace text tool (For GOLM)

1.3 Matchms Filtering

  • Filters out invalid SMILES and InChi from the dataset using Matchms filtering.

Step 2: Complex Removal and Subsetting Dataset

Removes coordination complexes from the dataset.

2.1 Remove Complexes and Subset Data

  • Removes complexes from the dataset.
  • Exports metadata using Matchms metadata export, cuts the SMILES column, removes complexes using Rem_Complex tool, and updates the dataset using Matchms subsetting.

Step 3: Data Key Manipulation

Add missing metadata required by the MSFinder for annotation.

3.1 Matchms Remove Key

  • Removes existing keys such as adduct, charge, and ionmode from the dataset.

3.2 Matchms Add Key

  • Adds necessary keys like charge, ionmode, and adduct to the dataset.

3.3 Matchms Filtering

  • Derives precursor m/z using parent mass and adduct information using matchms filtering.

3.4 Matchms Convert

  • Converts the dataset to Riken format for compatibility with MSFinder using matchms convert.

Step 4: Peak Annotation

4.1 Recetox-MSFinder

  • Executes MSFinder with a 0.5 Da tolerance for both MS1 and MS2, including all element checks and an extended range for peak annotation.

Step 5: Error Handling and Refinement

Check the MSFinder output to see if the output is the results or the log file. If the output is log file remove the smile from the dataset using matchms subsetting tool and rerun MSFinder.

5.1 Error Handling

  • Handles errors in peak annotation by removing SMILES that are not accepted by MSFinder.
  • Reruns MSFinder after error correction or with different parameter (if applicable).

Step 6: High-res Annotation

6.1 High-Res Peak Overwriting

  • Utilizes the Use_Theoretical_mz_Annotations tool to Overwrite experimentally measured mz values for peaks with theoretical values from peak comments.

Inputs

ID Name Description Type
Spectral library file Spectral library file n/a
  • File

Steps

ID Name Description
1 MSMetaEnhancer: collect InChi using pubchem toolshed.g2.bx.psu.edu/repos/recetox/msmetaenhancer/msmetaenhancer/0.4.0+galaxy1
2 MSMetaEnhancer: collect Isomeric_smiles from IDSM toolshed.g2.bx.psu.edu/repos/recetox/msmetaenhancer/msmetaenhancer/0.4.0+galaxy1
3 MSMetaEnhancer: collect MW using RDkit toolshed.g2.bx.psu.edu/repos/recetox/msmetaenhancer/msmetaenhancer/0.4.0+galaxy1
4 Replace ISOMERIC_SMILES TO SMILES toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.4
5 Replace MW TO PARENT_MASS toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.4
6 Matchms Filtering: require Inchi and Smile toolshed.g2.bx.psu.edu/repos/recetox/matchms_filtering/matchms_filtering/0.24.0+galaxy2
7 Metadata export toolshed.g2.bx.psu.edu/repos/recetox/matchms_metadata_export/matchms_metadata_export/0.24.0+galaxy1
8 Convert CSV to tabular csv_to_tabular
9 Extract smiles column toolshed.g2.bx.psu.edu/repos/iuc/column_remove_by_header/column_remove_by_header/1.0
10 Convert tabular to CSV tabular_to_csv
11 Remove coordination complexes toolshed.g2.bx.psu.edu/repos/recetox/rem_complex/rem_complex/1.0.0+galaxy2
12 Subset spectra based of rem_complex output toolshed.g2.bx.psu.edu/repos/recetox/matchms_subsetting/matchms_subsetting/0.24.0+galaxy5
13 Matchms remove key: remove existing ionmode key toolshed.g2.bx.psu.edu/repos/recetox/matchms_remove_key/matchms_remove_key/0.24.0+galaxy0
14 Matchms remove key: remove existing adduct key toolshed.g2.bx.psu.edu/repos/recetox/matchms_remove_key/matchms_remove_key/0.24.0+galaxy0
15 Matchms remove key: remove existing precursor_mz key toolshed.g2.bx.psu.edu/repos/recetox/matchms_remove_key/matchms_remove_key/0.25.0+galaxy0
16 Matchms add key: add ionmode key toolshed.g2.bx.psu.edu/repos/recetox/matchms_add_key/matchms_add_key/0.24.0+galaxy1
17 Matchms add key: add adduct key toolshed.g2.bx.psu.edu/repos/recetox/matchms_add_key/matchms_add_key/0.24.0+galaxy1
18 Matchms filtering: add precursor_mz toolshed.g2.bx.psu.edu/repos/recetox/matchms_filtering/matchms_filtering/0.25.0+galaxy1
19 matchms convert to riken toolshed.g2.bx.psu.edu/repos/recetox/matchms_convert/matchms_convert/0.24.0+galaxy0
20 RECETOX MsFinder toolshed.g2.bx.psu.edu/repos/recetox/recetox_msfinder/recetox_msfinder/v3.5.2+galaxy4
21 use theoretical m/z values toolshed.g2.bx.psu.edu/repos/recetox/use_theoretical_mz_annotations/use_theoretical_mz_annotations/1.0.0+galaxy1

Outputs

ID Name Description Type
converted_library converted_library n/a
  • File
output output n/a
  • File
_anonymous_output_1 _anonymous_output_1 n/a
  • File

Version History

Version 2 (latest) Created 6th Jun 2024 at 11:18 by Zargham Ahmad

No revision comments

Frozen Version-2 8dd83f3

Version 1 (earliest) Created 20th May 2024 at 11:05 by Helge Hecht

Initial release of the workflow.


Frozen Version-1 24479e7
help Creators and Submitter
Creators
Additional credit

Research Infrastructure RECETOX RI (No LM2018121) financed by the Ministry of Education, Youth and Sports, and Operational Programme Research, Development and Innovation - project CETOCOEN EXCELLENCE (No CZ.02.1.01/0.0/0.0/17_043/0009632).

Submitter
Citation
Ahmad, Z., Hecht, H., & Price, E. J. (2024). Theoretical fragment substructure generation and in silico mass spectral library high-resolution upcycling workflow. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.888.2
Activity

Views: 1765   Downloads: 206   Runs: 0

Created: 20th May 2024 at 11:05

Last updated: 6th Jun 2024 at 10:58

help Attributions

None

Total size: 2.84 MB
Powered by
(v.1.16.0-main)
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH