Generalizable machine learning models for rapid antimicrobial resistance prediction in unseen healthcare settings
main @ 3ce9c42

Workflow Type: Shell Script

📄 Generalizable machine learning models for rapid antimicrobial resistance prediction in unseen healthcare settings

This repository contains the code used for the experiments in the paper:

Generalizable machine learning models for rapid antimicrobial resistance prediction in unseen healthcare settings
by Diane Duroux, Paul P. Meyer, Giovanni Visonà, and Niko Beerenwinkel.

⚙️ Install the dependencies

You can set up the project with either pip or uv.

Option A - pip:

Install the necessary dependencies listed in the requirements.txt file

pip install -r requirements.txt

Option B - uv:

We provide pyproject.toml and uv.lock for macOS, Windows, and Linux.

Note: On a Linux or non-apple silicon please use the pyproject.toml file for Mac and rewrite the uv.lock after installation.

# 0) Install uv (one-time)
# mac/linux:
curl -LsSf https://astral.sh/uv/install.sh | sh
# windows (PowerShell):
iwr https://astral.sh/uv/install.ps1 -UseBasicParsing | iex
 
# 1) Ensure the pinned Python is available (adjust if your pyproject pins a version)
uv python install 3.11
 
# 2) Create the exact environment from the lockfile
uv sync --frozen
 
# 3) Run your code within the env
uv run python -V
uv run python your_script.py

💻 AMR Classifier Training with ResMLP and inference

The following command trains a ResMLP model for AMR classification using the preprocessed DRIAMS data.

📦 Output

In output//_results/, the script generates:

  • test_set_seed0.csv
    ➤ Contains predictions: species, sample_id, drug, response, and Prediction.

🛠 Required Arguments

Argument Description
--driams_long_table Path to the metadata file for the current dataset.
--spectra_matrix Path to the input mass spectra (either raw or MAE-encoded).
--sample_embedding_dim Dimension of the spectra input (6000 for raw, or same as for MAE).
--drugs_df Path to the antimicrobial compound encoding file.
--fingerprint_class Type of encoding: 'morgan_1024', 'molformer_github', or 'selfies_flattened_one_hot'.
--fingerprint_size Size of the encoding: 1024 (Morgan), 768 (Molformer), or 24160 (SELFIES).
--split_type Set to specific if splits are pre-defined, else random.
--split_ids Path to the data_splits.csv file.
--experiment_group Name of the output folder.
--experiment_name Name of the output subfolder.
--seed Random seed for reproducibility.
--n_epochs Number of epochs for classifier training.
--learning_rate Learning rate for the optimizer.
--patience Number of epochs to wait before early stopping.
--batch_size Batch size for classifier training.

🚀 Example: ResMLP Training on DRIAMS B2018 with Raw Spectra + Morgan Fingerprints

ulimit -Sn 10000  # Optional: increase file descriptor limit if needed

python3 code/ResAMR_classifier.py \
    --driams_long_table ProcessedData/B2018/combined_long_table.csv \
    --spectra_matrix ProcessedData/B2018/rawSpectra_data.npy \
    --sample_embedding_dim 6000 \
    --drugs_df OriginalData/drug_fingerprints_Mol_selfies.csv \
    --fingerprint_class morgan_1024 \
    --fingerprint_size 1024 \
    --split_type specific \
    --split_ids ProcessedData/B2018/data_splits.csv \
    --experiment_group rawMS_MorganFing \
    --experiment_name ResMLP \
    --seed 0 \
    --n_epochs 2 \
    --learning_rate 0.0003 \
    --patience 10 \
    --batch_size 128

💰 Funding

This research was primarily supported by the ETH AI Center.

Version History

main @ 3ce9c42 (earliest) Created 17th Oct 2025 at 12:16 by Diane Duroux

Add files via upload


Frozen main 3ce9c42
help Creators and Submitter
Creator
  • Diane Duroux
Submitter
Citation
Duroux, D. (2025). Generalizable machine learning models for rapid antimicrobial resistance prediction in unseen healthcare settings. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.1999.1
Activity

Views: 187   Downloads: 38

Created: 17th Oct 2025 at 12:16

help Tags

This item has not yet been tagged.

help Attributions

None

Total size: 505 KB
Powered by
(v.1.17.1)
Copyright © 2008 - 2025 The University of Manchester and HITS gGmbH