Theoretical fragment substructure generation and in silico mass spectral library high-resolution upcycling workflow

Workflow Type: Galaxy
Work-in-progress

Galaxy Workflow Documentation: MS Finder Pipeline

This document outlines a MSFinder Galaxy workflow designed for peak annotation. The workflow consists of several steps aimed at preprocessing MS data, filtering, enhancing, and running MSFinder.

Step 1: Data Collection and Preprocessing

Collect if the inchi and smiles are missing from the dataset, and subsequently filter out the spectra which are missing inchi and smiles.

1.1 MSMetaEnhancer: Collect InChi, Isomeric_smiles, and Nominal_mass

  • Utilizes MSMetaEnhancer to collect InChi and Isomeric_smiles using PubChem and IDSM databases.
  • Utilizes MSMetaEnhancer to collect MW using RDkit (For GOLM).

1.2 replace key

  • replace isomeric_smiles key to smiles using replace text tool
  • replace MW key to parent_mass using replace text tool (For GOLM)

1.3 Matchms Filtering

  • Filters out invalid SMILES and InChi from the dataset using Matchms filtering.

Step 2: Complex Removal and Subsetting Dataset

Removes coordination complexes from the dataset.

2.1 Remove Complexes and Subset Data

  • Removes complexes from the dataset.
  • Exports metadata using Matchms metadata export, cuts the SMILES column, removes complexes using Rem_Complex tool, and updates the dataset using Matchms subsetting.

Step 3: Data Key Manipulation

Add missing metadata required by the MSFinder for annotation.

3.1 Matchms Remove Key

  • Removes existing keys such as adduct, charge, and ionmode from the dataset.

3.2 Matchms Add Key

  • Adds necessary keys like charge, ionmode, and adduct to the dataset.

3.3 Matchms Filtering

  • Derives precursor m/z using parent mass and adduct information using matchms filtering.

3.4 Matchms Convert

  • Converts the dataset to Riken format for compatibility with MSFinder using matchms convert.

Step 4: Peak Annotation

4.1 Recetox-MSFinder

  • Executes MSFinder with a 0.5 Da tolerance for both MS1 and MS2, including all element checks and an extended range for peak annotation.

Step 5: Error Handling and Refinement

Check the MSFinder output to see if the output is the results or the log file. If the output is log file remove the smile from the dataset using matchms subsetting tool and rerun MSFinder.

5.1 Error Handling

  • Handles errors in peak annotation by removing SMILES that are not accepted by MSFinder.
  • Reruns MSFinder after error correction or with different parameter (if applicable).

Step 6: High-res Annotation

6.1 High-Res Peak Overwriting

  • Utilizes the Use_Theoretical_mz_Annotations tool to Overwrite experimentally measured mz values for peaks with theoretical values from peak comments.

Inputs

ID Name Description Type
Spectral library file Spectral library file n/a
  • File

Steps

ID Name Description
1 MSMetaEnhancer: collect InChi using pubchem toolshed.g2.bx.psu.edu/repos/recetox/msmetaenhancer/msmetaenhancer/0.3.0+galaxy3
2 MSMetaEnhancer: collect Isomeric_smiles from IDSM toolshed.g2.bx.psu.edu/repos/recetox/msmetaenhancer/msmetaenhancer/0.3.0+galaxy3
3 MSMetaEnhancer: collect NOMINAL_MASS using RDkit toolshed.g2.bx.psu.edu/repos/recetox/msmetaenhancer/msmetaenhancer/0.3.0+galaxy3
4 Replace ISOMERIC_SMILES TO SMILES toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.4
5 Replace NOMINAL_MASS TO PARENT_MASS toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.4
6 Matchms Filtering: require Inchi and Smile toolshed.g2.bx.psu.edu/repos/recetox/matchms_filtering/matchms_filtering/0.24.0+galaxy2
7 Metadata export toolshed.g2.bx.psu.edu/repos/recetox/matchms_metadata_export/matchms_metadata_export/0.24.0+galaxy1
8 Convert CSV to tabular csv_to_tabular
9 Extract smiles column toolshed.g2.bx.psu.edu/repos/iuc/column_remove_by_header/column_remove_by_header/1.0
10 Convert tabular to CSV tabular_to_csv
11 Remove coordination complexes toolshed.g2.bx.psu.edu/repos/recetox/rem_complex/rem_complex/1.0.0+galaxy2
12 Subset spectra based of rem_complex output toolshed.g2.bx.psu.edu/repos/recetox/matchms_subsetting/matchms_subsetting/0.24.0+galaxy5
13 Matchms remove key: remove existing ionmode key toolshed.g2.bx.psu.edu/repos/recetox/matchms_remove_key/matchms_remove_key/0.24.0+galaxy0
14 Matchms remove key: remove existing adduct key toolshed.g2.bx.psu.edu/repos/recetox/matchms_remove_key/matchms_remove_key/0.24.0+galaxy0
15 Matchms remove key: remove existing precursor_mz key toolshed.g2.bx.psu.edu/repos/recetox/matchms_remove_key/matchms_remove_key/0.24.0+galaxy0
16 Matchms add key: add ionmode key toolshed.g2.bx.psu.edu/repos/recetox/matchms_add_key/matchms_add_key/0.24.0+galaxy1
17 Matchms add key: add adduct key toolshed.g2.bx.psu.edu/repos/recetox/matchms_add_key/matchms_add_key/0.24.0+galaxy1
18 Matchms add key: add charge key toolshed.g2.bx.psu.edu/repos/recetox/matchms_add_key/matchms_add_key/0.24.0+galaxy1
19 Matchms filtering: add precursor_mz toolshed.g2.bx.psu.edu/repos/recetox/matchms_filtering/matchms_filtering/0.24.0+galaxy2
20 matchms convert to riken toolshed.g2.bx.psu.edu/repos/recetox/matchms_convert/matchms_convert/0.24.0+galaxy0
21 RECETOX MsFinder toolshed.g2.bx.psu.edu/repos/recetox/recetox_msfinder/recetox_msfinder/v3.5.2+galaxy4
22 use theoretical m/z values toolshed.g2.bx.psu.edu/repos/recetox/use_theoretical_mz_annotations/use_theoretical_mz_annotations/1.0.0+galaxy1

Outputs

ID Name Description Type
_anonymous_output_1 _anonymous_output_1 n/a
  • File
_anonymous_output_2 _anonymous_output_2 n/a
  • File
_anonymous_output_3 _anonymous_output_3 n/a
  • File
_anonymous_output_4 _anonymous_output_4 n/a
  • File
MSMetaEnhancer on input dataset(s) MSMetaEnhancer on input dataset(s) n/a
  • File
Log of MSMetaEnhancer on input dataset(s) Log of MSMetaEnhancer on input dataset(s) n/a
  • File
_anonymous_output_5 _anonymous_output_5 n/a
  • File
_anonymous_output_6 _anonymous_output_6 n/a
  • File
_anonymous_output_7 _anonymous_output_7 n/a
  • File
_anonymous_output_8 _anonymous_output_8 n/a
  • File
_anonymous_output_9 _anonymous_output_9 n/a
  • File
_anonymous_output_10 _anonymous_output_10 n/a
  • File
_anonymous_output_11 _anonymous_output_11 n/a
  • File
_anonymous_output_12 _anonymous_output_12 n/a
  • File
_anonymous_output_13 _anonymous_output_13 n/a
  • File
_anonymous_output_14 _anonymous_output_14 n/a
  • File
_anonymous_output_15 _anonymous_output_15 n/a
  • File
_anonymous_output_16 _anonymous_output_16 n/a
  • File
_anonymous_output_17 _anonymous_output_17 n/a
  • File
_anonymous_output_18 _anonymous_output_18 n/a
  • File
_anonymous_output_19 _anonymous_output_19 n/a
  • File
_anonymous_output_20 _anonymous_output_20 n/a
  • File
_anonymous_output_21 _anonymous_output_21 n/a
  • File
_anonymous_output_22 _anonymous_output_22 n/a
  • File
_anonymous_output_23 _anonymous_output_23 n/a
  • File

Version History

Version 2 (latest) Created 6th Jun 2024 at 11:18 by Zargham Ahmad

No revision comments

Frozen Version-2 8dd83f3

Version 1 (earliest) Created 20th May 2024 at 11:05 by Helge Hecht

Initial release of the workflow.


Frozen Version-1 24479e7
help Creators and Submitter
Creators
Additional credit

Research Infrastructure RECETOX RI (No LM2018121) financed by the Ministry of Education, Youth and Sports, and Operational Programme Research, Development and Innovation - project CETOCOEN EXCELLENCE (No CZ.02.1.01/0.0/0.0/17_043/0009632).

Submitter
Citation
Ahmad, Z., Hecht, H., & Price, E. J. (2024). Theoretical fragment substructure generation and in silico mass spectral library high-resolution upcycling workflow. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.888.1
Activity

Views: 1901   Downloads: 228   Runs: 0

Created: 20th May 2024 at 11:05

Last updated: 6th Jun 2024 at 10:58

help Attributions

None

Total size: 46.4 KB
Powered by
(v.1.16.0-main)
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH