SynProtX

An official implementation of our research paper "SynProtX: A Large-Scale Proteomics-Based Deep Learning Model for Predicting Synergistic Anticancer Drug Combinations".

SynProtX is a deep learning model that integrates large-scale proteomics data, molecular graphs, and chemical fingerprints to predict synergistic effects of anticancer drug combinations. It provides robust performance across tissue-specific and study-specific datasets, enhancing reproducibility and biological relevance in drug synergy prediction.

Setting up environment

We use Miniconda to manage Python dependencies in this project. To reproduce our environment, please run the following script in the terminal:

conda env create -f env.yml
conda activate SynProtX

Downloading raw data

Datasets, hyperparameters, and model checkpoints can be downloaded through Zenodo.

Generating dataset

A tarball will be obtained after download. After file extraction, move all nested folders to the root of this project directory. You might need to move all files in data/export up to data folder. Otherwise, you will run the Jupyter Notebook files to generate mandatory data. Let’s take a look at ipynb folder. Run the following files in order if you want to replicate our exported data.

01_drugcomb_clean.ipynb → cleandata_cancer.csv
02_CCLE_gene_expression → CCLE_expression_cleaned.csv
03_omics_preprocess → protein_omics_data_cleaned.csv
04_drugcomb_gene_prot_clean → data_preprocessing_gene.pkl, data_drugcomb.pkl, data_preprocessing_protein.pkl
05_graph_generate.ipynb → nps_intersected folder
06_smiles_feat_generate.ipynb → smiles_graph_data.pkl
07_to_ecfp6_deepsyn.ipynb → deepsyn_drug_row.npy, deepsyn_drug_col.npy

If the console shows an error indicating that SMILES not found, you MUST run the file 06_smiles_feat_generate.ipynb again to regenerate data.

Training and testing

To execute a training and testing task for our model, run the following script

python synprotx/.py -d  -m

Possible options are listed below.

model represents the name of the model to run. Must be one of gat, gcn, attentivefp and gatfp.
--database/-d specifies data source to train the model on. Must be one of almanac-breast, almanac-lung, almanac-ovary, almanac-skin, friedman, oneil.
--mode/-m input must be either clas, for classification task, or regr, for regression task. Default to clas
Flags --no-feamol, --no-feagene, --no-feaprot disable the molecule branch, gene expression branch, and protein expression branch, respectively, when propagate through the model.

Note: There are more options to configure. Execute python synprotx/.py -h for a more detailed description.

Version History

main @ 596087c (latest) Created 5th Jun 2025 at 20:44 by Bundit Boonyarit

Update README.md

Frozen main 596087c

main @ 9e9cc5c Created 5th Jun 2025 at 17:53 by Bundit Boonyarit

Add files via upload

Frozen main 9e9cc5c

SynProtX
main @ 596087c (latest)

main @ 596087c (latest)

main @ 9e9cc5c

SynProtX

Setting up environment

Downloading raw data

Generating dataset

Training and testing

Version History

main @ 596087c (latest) Created 5th Jun 2025 at 20:44 by Bundit Boonyarit

main @ 9e9cc5c Created 5th Jun 2025 at 17:53 by Bundit Boonyarit

Creators

Submitter

SynProtX main @ 596087c (latest) main @ 596087c (latest) main @ 9e9cc5c

SynProtX

Setting up environment

Downloading raw data

Generating dataset

Training and testing

Version History

main @ 596087c (latest) Created 5th Jun 2025 at 20:44 by Bundit Boonyarit

main @ 9e9cc5c Created 5th Jun 2025 at 17:53 by Bundit Boonyarit

Creators

Submitter

Related items

SynProtX
main @ 596087c (latest)

main @ 596087c (latest)

main @ 9e9cc5c