SynProtX
An official implementation of our research paper "SynProtX: A Large-Scale Proteomics-Based Deep Learning Model for Predicting Synergistic Anticancer Drug Combinations".
SynProtX is a deep learning model that integrates large-scale proteomics data, molecular graphs, and chemical fingerprints to predict synergistic effects of anticancer drug combinations. It provides robust performance across tissue-specific and study-specific datasets, enhancing reproducibility and biological relevance in drug synergy prediction.
Setting up environment
We use Miniconda to manage Python dependencies in this project. To reproduce our environment, please run the following script in the terminal:
conda env create -f env.yml
conda activate SynProtX
Downloading raw data
Datasets, hyperparameters, and model checkpoints can be downloaded through Zenodo.
Generating dataset
A tarball will be obtained after download. After file extraction, move all nested folders to the root of this project directory. You might need to move all files in data/export up to data folder. Otherwise, you will run the Jupyter Notebook files to generate mandatory data. Let’s take a look at ipynb folder. Run the following files in order if you want to replicate our exported data.
- 01_drugcomb_clean.ipynb→- cleandata_cancer.csv
- 02_CCLE_gene_expression→- CCLE_expression_cleaned.csv
- 03_omics_preprocess→- protein_omics_data_cleaned.csv
- 04_drugcomb_gene_prot_clean→- data_preprocessing_gene.pkl,- data_drugcomb.pkl,- data_preprocessing_protein.pkl
- 05_graph_generate.ipynb→- nps_intersectedfolder
- 06_smiles_feat_generate.ipynb→- smiles_graph_data.pkl
- 07_to_ecfp6_deepsyn.ipynb→- deepsyn_drug_row.npy,- deepsyn_drug_col.npy
If the console shows an error indicating that SMILES not found, you MUST run the file
06_smiles_feat_generate.ipynbagain to regenerate data.
Training and testing
To execute a training and testing task for our model, run the following script
python synprotx/.py -d  -m 
Possible options are listed below.
- modelrepresents the name of the model to run. Must be one of- gat,- gcn,- attentivefpand- gatfp.
- --database/- -dspecifies data source to train the model on. Must be one of- almanac-breast,- almanac-lung,- almanac-ovary,- almanac-skin,- friedman,- oneil.
- --mode/- -minput must be either- clas, for classification task, or- regr, for regression task. Default to- clas
- Flags --no-feamol,--no-feagene,--no-feaprotdisable the molecule branch, gene expression branch, and protein expression branch, respectively, when propagate through the model.
Note: There are more options to configure. Execute python  synprotx/.py -h for a more detailed description.
Version History
main @ 596087c (latest) Created 5th Jun 2025 at 20:44 by Bundit Boonyarit
Update README.md
Frozen
 main
main596087c
    main @ 9e9cc5c Created 5th Jun 2025 at 17:53 by Bundit Boonyarit
Add files via upload
Frozen
 main
main9e9cc5c
     Creators and Submitter
 Creators and SubmitterCreators
Submitter
Views: 1426 Downloads: 425
Created: 5th Jun 2025 at 17:29
Last updated: 5th Jun 2025 at 20:44
 Attributions
 AttributionsNone

 View on GitHub
View on GitHub Download RO-Crate
Download RO-Crate




 https://orcid.org/0000-0003-4425-2608
 https://orcid.org/0000-0003-4425-2608
