GADES reproducibility workflow
main @ 39b937a

Workflow Type: Docker
Stable

Article-GADES

This repository represents generating and benchmarking the results of the GADES package for Distance Matrix Calculation

Installation

git lfs install
git clone https://github.com/lab-medvedeva/Article-GADES.git
cd Article-GADES

Put the Real datasets in the MEX format to the folder Datasets/Real.

Running benchmark using Docker Deployment

docker run --gpus all \
    -v $PWD/Datasets:/workspace/Article-GADES/Datasets \
    -v $PWD/results:/workspace/Article-GADES/results \
    akhtyamovpavel/article-gades 

Step 01. Generation of the datasets

Step 01.1. Generated Dense Datasets

cd ./scripts/MatricesGeneration
./generate_dense.sh ../../Datasets/

Step 01.2. Generated Sparse Datasets

cd ./scripts/MatricesGeneration
./generate_sparse.sh ../../Datasets/

Step 02. Benchmarking

Step 02.1. Generated Dense Datasets

cd ./scripts/Benchmarking

./run_benchmark_generated_dense.sh ../../

./run_benchmark_python_dense.sh ../../

Step 02.2. Generated Sparse Datasets


cd ./scripts/Benchmarking/

./run_benchmark_generated_sparse.sh ../../

Step 02.3 Real Datasets


cd ./Scripts/Benchmarking/
./run_benchmark_real_python.sh  ../../results/RealDatasets//
./run_benchmark_real_R.sh  ../../results/RealDatasets//

Example:

./run_benchmark.sh ../../Datasets/Real/HLCA_marrow.mtx ../results/RealDatasets/HLCA_marrow/

Step 02.4. Ablation Study for the Batch Size Usage

Step 02.5. Ablation Study for the Memory Usage

cd ./Scripts/Benchmarking
./run_benchmark_real_python_memory_usage.sh  ../../results/RealDatasetsBatchSizeFixedMemory//500/

Example:

./run_benchmark_real_python_memory_usage.sh ../../Datasets/CellLines.mtx ../../results/RealDatasetsBatchSizeFixedMemory/CellLines/500/

Step 03. Drawing charts

We split reproducibility notebooks into two parts:

  • Aggregation over datasets
  • Plotting charts

Aggregation

  1. For Generated Dense datasets you could use the GeneratedDatasetsCollector notebook.
  2. For Generated Sparse datasets you could use the GeneratedSparseCollector notebook.

Analyzing datasets

  1. Generated datasets analyzed in the GeneratedDatasetAnalysis notebook.
  2. Real datasets analyzed in the RealDatasetAnalysis notebook.
  3. Analysis of ablation study could be found in the reproducibility notebook.

Version History

main @ 39b937a (earliest) Created 5th Sep 2024 at 11:35 by Pavel Akhtyamov

Added links to ablation study


Frozen main 39b937a
help Creators and Submitter
Creator
Submitter
Citation
Akhtyamov, P. (2024). GADES reproducibility workflow. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.1125.1
Activity

Views: 308   Downloads: 59

Created: 5th Sep 2024 at 11:35

Last updated: 5th Sep 2024 at 11:36

Annotated Properties
Topic annotations
Operation annotations
help Attributions

None

Total size: 6.94 MB
Powered by
(v.1.16.0-main)
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH