SynProtX
An official implementation of our research paper "SynProtX: A Large-Scale Proteomics-Based Deep Learning Model for Predicting Synergistic Anticancer Drug Combinations".
SynProtX is a deep learning model that integrates large-scale proteomics data, molecular graphs, and chemical fingerprints to predict synergistic effects of anticancer drug combinations. It provides robust performance across tissue-specific and study-specific datasets, enhancing reproducibility and biological relevance in drug synergy prediction.
Setting up environment
We use Miniconda to manage Python dependencies in this project. To reproduce our environment, please run the following script in the terminal:
conda env create -f env.yml
conda activate SynProtX
Downloading raw data
Datasets, hyperparameters, and model checkpoints can be downloaded through Zenodo.
Generating dataset
A tarball will be obtained after download. After file extraction, move all nested folders to the root of this project directory. You might need to move all files in data/export up to data folder. Otherwise, you will run the Jupyter Notebook files to generate mandatory data. Let’s take a look at ipynb folder. Run the following files in order if you want to replicate our exported data.
01_drugcomb_clean.ipynb→cleandata_cancer.csv02_CCLE_gene_expression→CCLE_expression_cleaned.csv03_omics_preprocess→protein_omics_data_cleaned.csv04_drugcomb_gene_prot_clean→data_preprocessing_gene.pkl,data_drugcomb.pkl,data_preprocessing_protein.pkl05_graph_generate.ipynb→nps_intersectedfolder06_smiles_feat_generate.ipynb→smiles_graph_data.pkl07_to_ecfp6_deepsyn.ipynb→deepsyn_drug_row.npy,deepsyn_drug_col.npy
If the console shows an error indicating that SMILES not found, you MUST run the file
06_smiles_feat_generate.ipynbagain to regenerate data.
Training and testing
To execute a training and testing task for our model, run the following script
python synprotx/.py -d -m
Possible options are listed below.
modelrepresents the name of the model to run. Must be one ofgat,gcn,attentivefpandgatfp.--database/-dspecifies data source to train the model on. Must be one ofalmanac-breast,almanac-lung,almanac-ovary,almanac-skin,friedman,oneil.--mode/-minput must be eitherclas, for classification task, orregr, for regression task. Default toclas- Flags
--no-feamol,--no-feagene,--no-feaprotdisable the molecule branch, gene expression branch, and protein expression branch, respectively, when propagate through the model.
Note: There are more options to configure. Execute python synprotx/.py -h for a more detailed description.
Version History
main @ 596087c (latest) Created 5th Jun 2025 at 20:44 by Bundit Boonyarit
Update README.md
Frozen
main
596087c
main @ 9e9cc5c Created 5th Jun 2025 at 17:53 by Bundit Boonyarit
Add files via upload
Frozen
main
9e9cc5c
Creators and SubmitterCreators
Submitter
Views: 1574 Downloads: 473
Created: 5th Jun 2025 at 17:29
Last updated: 5th Jun 2025 at 20:44
AttributionsNone
View on GitHub
https://orcid.org/0000-0003-4425-2608