Workflow Type: Python
Stable

SynProtX

DOI

An official implementation of our research paper "SynProtX: A Large-Scale Proteomics-Based Deep Learning Model for Predicting Synergistic Anticancer Drug Combinations".

SynProtX is a deep learning model that integrates large-scale proteomics data, molecular graphs, and chemical fingerprints to predict synergistic effects of anticancer drug combinations. It provides robust performance across tissue-specific and study-specific datasets, enhancing reproducibility and biological relevance in drug synergy prediction.

Setting up environment

We use Miniconda to manage Python dependencies in this project. To reproduce our environment, please run the following script in the terminal:

conda env create -f env.yml
conda activate SynProtX

Downloading raw data

Datasets, hyperparameters, and model checkpoints can be downloaded through Zenodo.

Generating dataset

A tarball will be obtained after download. After file extraction, move all nested folders to the root of this project directory. You might need to move all files in data/export up to data folder. Otherwise, you will run the Jupyter Notebook files to generate mandatory data. Let’s take a look at ipynb folder. Run the following files in order if you want to replicate our exported data.

  • 01_drugcomb_clean.ipynbcleandata_cancer.csv
  • 02_CCLE_gene_expressionCCLE_expression_cleaned.csv
  • 03_omics_preprocessprotein_omics_data_cleaned.csv
  • 04_drugcomb_gene_prot_cleandata_preprocessing_gene.pkl, data_drugcomb.pkl, data_preprocessing_protein.pkl
  • 05_graph_generate.ipynbnps_intersected folder
  • 06_smiles_feat_generate.ipynbsmiles_graph_data.pkl
  • 07_to_ecfp6_deepsyn.ipynbdeepsyn_drug_row.npy, deepsyn_drug_col.npy

If the console shows an error indicating that SMILES not found, you MUST run the file 06_smiles_feat_generate.ipynb again to regenerate data.

Training and testing

To execute a training and testing task for our model, run the following script

python synprotx/.py -d  -m 

Possible options are listed below.

  • model represents the name of the model to run. Must be one of gat, gcn, attentivefp and gatfp.
  • --database/-d specifies data source to train the model on. Must be one of almanac-breast, almanac-lung, almanac-ovary, almanac-skin, friedman, oneil.
  • --mode/-m input must be either clas, for classification task, or regr, for regression task. Default to clas
  • Flags --no-feamol, --no-feagene, --no-feaprot disable the molecule branch, gene expression branch, and protein expression branch, respectively, when propagate through the model.

Note: There are more options to configure. Execute python synprotx/.py -h for a more detailed description.

Version History

main @ 596087c (latest) Created 5th Jun 2025 at 20:44 by Bundit Boonyarit

Update README.md


Frozen main 596087c

main @ 9e9cc5c Created 5th Jun 2025 at 17:53 by Bundit Boonyarit

Add files via upload


Frozen main 9e9cc5c
help Creators and Submitter
Creators
  • Bundit Boonyarit
  • Matin Kositchutima
  • Tisorn Na Phattalung
  • Nattawin Yamprasert
  • Chanitra Thuwajit
  • Thanyada Rungrotmongkol
  • Sarana Nutanong
Submitter
Citation
Boonyarit, B., Kositchutima, M., Na Phattalung, T., Yamprasert, N., Thuwajit, C., Rungrotmongkol, T., & Nutanong, S. (2025). SynProtX. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.1726.3
License
Activity

Views: 73   Downloads: 21

Created: 5th Jun 2025 at 17:29

Last updated: 5th Jun 2025 at 20:44

help Attributions

None

Total size: 3.32 MB
Powered by
(v.1.17.0-main)
Copyright © 2008 - 2025 The University of Manchester and HITS gGmbH