SynProtX
An official implementation of our research paper "SynProtX: A Large-Scale Proteomics-Based Deep Learning Model for Predicting Synergistic Anticancer Drug Combinations".
SynProtX is a deep learning model that integrates large-scale proteomics data, molecular graphs, and chemical fingerprints to predict synergistic effects of anticancer drug combinations. It provides robust performance across tissue-specific and study-specific datasets, enhancing reproducibility and biological relevance in drug synergy prediction.
Setting up environment
We use Miniconda to manage Python dependencies in this project. To reproduce our environment, please run the following script in the terminal:
conda env create -f env.yml
conda activate SynProtX
Downloading raw data
Datasets, hyperparameters, and model checkpoints can be downloaded through Zenodo.
Generating dataset
A tarball will be obtained after download. After file extraction, move all nested folders to the root of this project directory. You might need to move all files in data/export
up to data
folder. Otherwise, you will run the Jupyter Notebook files to generate mandatory data. Let’s take a look at ipynb
folder. Run the following files in order if you want to replicate our exported data.
01_drugcomb_clean.ipynb
→cleandata_cancer.csv
02_CCLE_gene_expression
→CCLE_expression_cleaned.csv
03_omics_preprocess
→protein_omics_data_cleaned.csv
04_drugcomb_gene_prot_clean
→data_preprocessing_gene.pkl
,data_drugcomb.pkl
,data_preprocessing_protein.pkl
05_graph_generate.ipynb
→nps_intersected
folder06_smiles_feat_generate.ipynb
→smiles_graph_data.pkl
07_to_ecfp6_deepsyn.ipynb
→deepsyn_drug_row.npy
,deepsyn_drug_col.npy
If the console shows an error indicating that SMILES not found, you MUST run the file
06_smiles_feat_generate.ipynb
again to regenerate data.
Training and testing
To execute a training and testing task for our model, run the following script
python synprotx/.py -d -m
Possible options are listed below.
model
represents the name of the model to run. Must be one ofgat
,gcn
,attentivefp
andgatfp
.--database
/-d
specifies data source to train the model on. Must be one ofalmanac-breast
,almanac-lung
,almanac-ovary
,almanac-skin
,friedman
,oneil
.--mode
/-m
input must be eitherclas
, for classification task, orregr
, for regression task. Default toclas
- Flags
--no-feamol
,--no-feagene
,--no-feaprot
disable the molecule branch, gene expression branch, and protein expression branch, respectively, when propagate through the model.
Note: There are more options to configure. Execute python synprotx/.py -h
for a more detailed description.
Version History
main @ 596087c (latest) Created 5th Jun 2025 at 20:44 by Bundit Boonyarit
Update README.md
Frozen
main
596087c
main @ 9e9cc5c Created 5th Jun 2025 at 17:53 by Bundit Boonyarit
Add files via upload
Frozen
main
9e9cc5c

Creators
Submitter
Views: 73 Downloads: 21
Created: 5th Jun 2025 at 17:29
Last updated: 5th Jun 2025 at 20:44

None