📄 Generalizable machine learning models for rapid antimicrobial resistance prediction in unseen healthcare settings
This repository contains the code used for the experiments in the paper:
Generalizable machine learning models for rapid antimicrobial resistance prediction in unseen healthcare settings
by Diane Duroux, Paul P. Meyer, Giovanni Visonà, and Niko Beerenwinkel.
⚙️ Install the dependencies
You can set up the project with either pip or uv.
Option A - pip:
Install the necessary dependencies listed in the requirements.txt file
pip install -r requirements.txt
Option B - uv:
We provide pyproject.toml and uv.lock for macOS, Windows, and Linux.
Note: On a Linux or non-apple silicon please use the pyproject.toml file for Mac and rewrite the uv.lock after installation.
# 0) Install uv (one-time)
# mac/linux:
curl -LsSf https://astral.sh/uv/install.sh | sh
# windows (PowerShell):
iwr https://astral.sh/uv/install.ps1 -UseBasicParsing | iex
# 1) Ensure the pinned Python is available (adjust if your pyproject pins a version)
uv python install 3.11
# 2) Create the exact environment from the lockfile
uv sync --frozen
# 3) Run your code within the env
uv run python -V
uv run python your_script.py
💻 AMR Classifier Training with ResMLP and inference
The following command trains a ResMLP model for AMR classification using the preprocessed DRIAMS data.
📦 Output
In output//_results/, the script generates:
test_set_seed0.csv
➤ Contains predictions:species,sample_id,drug,response, andPrediction.
🛠 Required Arguments
| Argument | Description |
|---|---|
--driams_long_table |
Path to the metadata file for the current dataset. |
--spectra_matrix |
Path to the input mass spectra (either raw or MAE-encoded). |
--sample_embedding_dim |
Dimension of the spectra input (6000 for raw, or same as for MAE). |
--drugs_df |
Path to the antimicrobial compound encoding file. |
--fingerprint_class |
Type of encoding: 'morgan_1024', 'molformer_github', or 'selfies_flattened_one_hot'. |
--fingerprint_size |
Size of the encoding: 1024 (Morgan), 768 (Molformer), or 24160 (SELFIES). |
--split_type |
Set to specific if splits are pre-defined, else random. |
--split_ids |
Path to the data_splits.csv file. |
--experiment_group |
Name of the output folder. |
--experiment_name |
Name of the output subfolder. |
--seed |
Random seed for reproducibility. |
--n_epochs |
Number of epochs for classifier training. |
--learning_rate |
Learning rate for the optimizer. |
--patience |
Number of epochs to wait before early stopping. |
--batch_size |
Batch size for classifier training. |
🚀 Example: ResMLP Training on DRIAMS B2018 with Raw Spectra + Morgan Fingerprints
ulimit -Sn 10000 # Optional: increase file descriptor limit if needed
python3 code/ResAMR_classifier.py \
--driams_long_table ProcessedData/B2018/combined_long_table.csv \
--spectra_matrix ProcessedData/B2018/rawSpectra_data.npy \
--sample_embedding_dim 6000 \
--drugs_df OriginalData/drug_fingerprints_Mol_selfies.csv \
--fingerprint_class morgan_1024 \
--fingerprint_size 1024 \
--split_type specific \
--split_ids ProcessedData/B2018/data_splits.csv \
--experiment_group rawMS_MorganFing \
--experiment_name ResMLP \
--seed 0 \
--n_epochs 2 \
--learning_rate 0.0003 \
--patience 10 \
--batch_size 128
💰 Funding
This research was primarily supported by the ETH AI Center.
Version History
main @ 3ce9c42 (earliest) Created 17th Oct 2025 at 12:16 by Diane Duroux
Add files via upload
Frozen
main
3ce9c42
Creators and SubmitterCreator
Submitter
Views: 187 Downloads: 38
Created: 17th Oct 2025 at 12:16
TagsThis item has not yet been tagged.
AttributionsNone
View on GitHub