Generalizable machine learning models for rapid antimicrobial resistance prediction in unseen healthcare settings
main @ 5fb358d

Workflow Type: Python

📄 Generalizable machine learning models for rapid antimicrobial resistance prediction in unseen healthcare settings

This repository contains the code used for the experiments in the paper:

Generalizable machine learning models for rapid antimicrobial resistance prediction in unseen healthcare settings
by Diane Duroux, Paul P. Meyer, Giovanni Visonà, and Niko Beerenwinkel.

⚙️ Install the dependencies

Clone the repository, unzip OriginalData.zip, and install the necessary dependencies listed in the requirements.txt file

pip install -r requirements.txt

💻 AMR Classifier Training with ResMLP and inference

The following command trains a ResMLP model for AMR classification using the preprocessed DRIAMS data.

📦 Output

In output//_results/, the script generates:

  • test_set_seed0.csv
    ➤ Contains predictions: species, sample_id, drug, response, and Prediction.

🛠 Required Arguments

Argument Description
--driams_long_table Path to the metadata file for the current dataset.
--spectra_matrix Path to the input mass spectra (either raw or MAE-encoded).
--sample_embedding_dim Dimension of the spectra input (6000 for raw, or same as for MAE).
--drugs_df Path to the antimicrobial compound encoding file.
--fingerprint_class Type of encoding: 'morgan_1024', 'molformer_github', or 'selfies_flattened_one_hot'.
--fingerprint_size Size of the encoding: 1024 (Morgan), 768 (Molformer), or 24160 (SELFIES).
--split_type Set to specific if splits are pre-defined, else random.
--split_ids Path to the data_splits.csv file.
--experiment_group Name of the output folder.
--experiment_name Name of the output subfolder.
--seed Random seed for reproducibility.
--n_epochs Number of epochs for classifier training.
--learning_rate Learning rate for the optimizer.
--patience Number of epochs to wait before early stopping.
--batch_size Batch size for classifier training.

🚀 Example: ResMLP Training on DRIAMS B2018 with Raw Spectra + Morgan Fingerprints

ulimit -Sn 10000  # Optional: increase file descriptor limit if needed

python3 code/ResAMR_classifier.py \
    --driams_long_table ProcessedData/B2018/combined_long_table.csv \
    --spectra_matrix ProcessedData/B2018/rawSpectra_data.npy \
    --sample_embedding_dim 6000 \
    --drugs_df OriginalData/drug_fingerprints_Mol_selfies.csv \
    --fingerprint_class morgan_1024 \
    --fingerprint_size 1024 \
    --split_type specific \
    --split_ids ProcessedData/B2018/data_splits.csv \
    --experiment_group rawMS_MorganFing \
    --experiment_name ResMLP \
    --seed 0 \
    --n_epochs 2 \
    --learning_rate 0.0003 \
    --patience 10 \
    --batch_size 128

💰 Funding

This research was primarily supported by the ETH AI Center.

Version History

main @ 5fb358d (earliest) Created 21st Jul 2025 at 14:22 by Diane Duroux

Create LICENSE


Frozen main 5fb358d
help Creators and Submitter
Creators
Not specified
Submitter
Activity

Views: 20   Downloads: 4

Created: 21st Jul 2025 at 14:22

help Tags

This item has not yet been tagged.

help Attributions

None

Total size: 8.35 MB
Powered by
(v.1.17.0-main)
Copyright © 2008 - 2025 The University of Manchester and HITS gGmbH