Session during the Innovative HPC workflows for industry (https://eflows4hpc.eu/event/innovative-hpc-workflows-for-industry/) that describes how Workflow Provenance is recorded with COMPSs: the background on the tools used, how the recording has been designed, and how to use it and inspect metadata.
Creator: Raül Sirvent
Submitter: Raül Sirvent
Name: PhysioNet CascadeCSVM Kfold Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: MareNostrum5
Kfold to evaluate CascadeCSVM accuracy on PhysioNet dataset (https://b2drop.bsc.es/index.php/s/8Q8MefXX2rrzaWs). This application used dislib-0.9.0
Name: PhysioNet kNN Kfold Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: MareNostrum5
Kfold to evaluate kNN accuracy on PhysioNet dataset (https://b2drop.bsc.es/index.php/s/8Q8MefXX2rrzaWs). This application used dislib-0.9.0
Name: PhysioNet RF Kfold Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: MareNostrum5
Kfold to evaluate RandomForest accuracy on PhysioNet dataset (https://b2drop.bsc.es/index.php/s/8Q8MefXX2rrzaWs). This application used dislib-0.9.0
Name: GridSearchCV Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: MareNostrum5
GridSearch of kNN algorithm for the iris.csv dataset (https://gist.githubusercontent.com/netj/8836201/raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv). This application used dislib-0.9.0
Name: GridSearchCV Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: MareNostrum5
GridSearch of kNN algorithm for the iris.csv dataset (https://gist.githubusercontent.com/netj/8836201/raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv). This application used dislib-0.9.0
Sample workflow template that combines simulations with data analytics. It is not a real workflow, but it mimics this type of workflows. It illustrates how COMPSs invokes binaries. It can be extended to invoke MPI applications.
Lysozyme in water full COMPSs application, using dataset_small
Wordcount merge version COMPSs application
Wordcount reduce version COMPSs application
K-means COMPSs application
Cholesky factorisation COMPSs application
Cluster Comparison COMPSs application
Lysozyme in water sample COMPSs application
Lysozyme in water full COMPSs application run at MareNostrum IV, using full dataset with two workers
Name: Dislib Distributed Training - Cache ON Contact Person: cristian.tatu@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: Minotauro-MN4
PyTorch distributed training of CNN on GPU and leveraging COMPSs GPU Cache for deserialization speedup. Launched using 32 GPUs (16 nodes). Dataset: Imagenet Version dislib-0.9 Version PyTorch 1.7.1+cu101
Average task execution time: 36 seconds
Type: COMPSs
Creators: Cristian Tatu, The Workflows and Distributed Computing Team (https://www.bsc.es/discover-bsc/organisation/scientific-structure/workflows-and-distributed-computing/)
Submitter: Cristian Tatu
Name: K-Means GPU Cache ON Contact Person: cristian.tatu@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: Minotauro-MN4
K-Means running on the GPU leveraging COMPSs GPU Cache for deserialization speedup. Launched using 32 GPUs (16 nodes). Parameters used: K=40 and 32 blocks of size (1_000_000, 1200). It creates a block for each GPU. Total dataset shape is (32_000_000, 1200). Version dislib-0.9
Average task execution time: 16 seconds
Type: COMPSs
Creators: Cristian Tatu, The Workflows and Distributed Computing Team (https://www.bsc.es/discover-bsc/organisation/scientific-structure/workflows-and-distributed-computing/)
Submitter: Cristian Tatu
Calculates the Fibonacci series up to a specified length.
Type: COMPSs
Creator: Uploading this Workflow under the guidance of Raül Sirvent.
Submitter: Ashish Bhawel
Name: Incrementation and Fibonacci Access Level: public License Agreement: Apache2 Platform: COMPSs
Description
Brief Overview: Demonstrates COMPSs task parallelism with increment and Fibonacci computations. Helps to understand COMPSs.
Detailed Description:
- Performs multiple increments of input values in parallel using COMPSs.
- Concurrently calculates Fibonacci numbers using recursive COMPSs tasks.
- Demonstrates task synchronization via
compss_wait_on
.
Execution
...
Type: COMPSs
Creators: Ashish Bhawel, Ashish Bhawel, Uploading this Workflow under the guidance of Raül Sirvent.
Submitter: Ashish Bhawel
Name: Matrix multiplication with Files, reproducibility example Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs
Description
Matrix multiplication is a binary operation that takes a pair of matrices and produces another matrix.
If A is an n×m matrix and B is an m×p matrix, the result AB of their multiplication is an n×p matrix defined only if the number of columns m in A is equal to the number of rows m in B. When multiplying ...
Name: Matrix multiplication with Files, reproducibility example, without data persistence Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs
Description
Matrix multiplication is a binary operation that takes a pair of matrices and produces another matrix.
If A is an n×m matrix and B is an m×p matrix, the result AB of their multiplication is an n×p matrix defined only if the number of columns m in A is equal to the number ...
Monte Carlo Pi Estimation Program Description
This program is a Monte Carlo simulation designed to estimate the value of Pi using PyCOMPSs.
Tasks in the Program
- Count Points in Circle Task (
count_points_in_circle
):
- Generates random points within a square with side length 1.
- Counts points falling within the inscribed circle (x^2 + y^2 <= 1).
- Input: Number of points to generate (num_points)
- Output: Tuple containing count of points within the circle and list of generated ...
COMPSs Matrix Multiplication, out-of-core using files. Hypermatrix size used 2x2 blocks (MSIZE=2), block size used 2x2 elements (BSIZE=2)
Lysozyme in water full COMPSs application
Lysozyme in water full COMPSs application, using dataset_small
Lysozyme in water full COMPSs application
Name: KMeans Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: MareNostrum5
KMEans for clustering the housing.csv dataset (https://github.com/sonarsushant/California-House-Price-Prediction/blob/master/housing.csv). This application used dislib-0.9.0
Name: TruncatedSVD (Randomized SVD) Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: MareNostrum5
TruncatedSVD (Randomized SVD) for computing just 456 singular values out of a (4.5M x 850) size matrix. The input matrix represents a CFD transient simulation of air moving past a cylinder. This application used dislib-0.9.0
Name: Dislib Distributed Training - Cache OFF Contact Person: cristian.tatu@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: Minotauro-MN4
PyTorch distributed training of CNN on GPU. Launched using 32 GPUs (16 nodes). Dataset: Imagenet Version dislib-0.9 Version PyTorch 1.7.1+cu101
Average task execution time: 84 seconds
Type: COMPSs
Creators: Cristian Tatu, The Workflows and Distributed Computing Team (https://www.bsc.es/discover-bsc/organisation/scientific-structure/workflows-and-distributed-computing/)
Submitter: Cristian Tatu
Name: K-Means GPU Cache OFF Contact Person: cristian.tatu@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: Minotauro-MN4
K-Means running on GPUs. Launched using 32 GPUs (16 nodes). Parameters used: K=40 and 32 blocks of size (1_000_000, 1200). It creates a block for each GPU. Total dataset shape is (32_000_000, 1200). Version dislib-0.9
Average task execution time: 194 seconds
Type: COMPSs
Creators: Cristian Tatu, The Workflows and Distributed Computing Team (https://www.bsc.es/discover-bsc/organisation/scientific-structure/workflows-and-distributed-computing/)
Submitter: Cristian Tatu
Name: Matmul GPU Case 1 Cache-ON Contact Person: cristian.tatu@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: Minotauro-MN4
Matmul running on the GPU leveraging COMPSs GPU Cache for deserialization speedup. Launched using 32 GPUs (16 nodes). Performs C = A @ B Where A: shape (320, 56_900_000) block_size (10, 11_380_000) B: shape (56_900_000, 10) block_size (11_380_000, 10) C: shape (320, 10) block_size ...
Type: COMPSs
Creators: Cristian Tatu, The Workflows and Distributed Computing Team (https://www.bsc.es/discover-bsc/organisation/scientific-structure/workflows-and-distributed-computing/)
Submitter: Cristian Tatu
Name: Matmul GPU Case 1 Cache-OFF Contact Person: cristian.tatu@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs 3.3 Machine: Minotauro-MN4
Matmul running on the GPU without Cache. Launched using 32 GPUs (16 nodes). Performs C = A @ B Where A: shape (320, 56_900_000) block_size (10, 11_380_000) B: shape (56_900_000, 10) block_size (11_380_000, 10) C: shape (320, 10) block_size (10, 10) Total dataset size 291 ...
Type: COMPSs
Creators: Cristian Tatu, The Workflows and Distributed Computing Team (https://www.bsc.es/discover-bsc/organisation/scientific-structure/workflows-and-distributed-computing/)
Submitter: Cristian Tatu
PyCOMPSs implementation of Probabilistic Tsunami Forecast (PTF). PTF explicitly treats data- and forecast-uncertainties, enabling alert level definitions according to any predefined level of conservatism, which is connected to the average balance of missed-vs-false-alarms. Run of the Kos-Bodrum 2017 event test-case with 1000 scenarios, 8h tsunami simulation for each and forecast calculations for partial and full ensembles with focal mechanism and tsunami data updates.
Type: COMPSs
Creators: Louise Cordrie, Jorge Ejarque, Carlos Sánchez Linares, Jacopo Selva, Jorge Macías, Steven J. Gibbons, Fabrizio Bernardi, Roberto Tonini, Rosa M. Badia, Sonia Scardigno, Stefano Lorito, Finn Løvholt, Fabrizio Romano, Manuela Volpe, Alessandro D'Anca, Marc de la Asunción, Manuel J. Castro
Submitter: Jorge Ejarque
PyCOMPSs implementation of Probabilistic Tsunami Forecast (PTF). PTF explicitly treats data- and forecast-uncertainties, enabling alert level definitions according to any predefined level of conservatism, which is connected to the average balance of missed-vs-false-alarms. Run of the Boumerdes-2003 event test-case with 1000 scenarios, 8h tsunami simulation for each and forecast calculations for partial and full ensembles with focal mechanism and tsunami data updates.
Type: COMPSs
Creators: Louise Cordrie, Jorge Ejarque, Carlos Sánchez Linares, Jacopo Selva, Jorge Macías, Steven J. Gibbons, Fabrizio Bernardi, Roberto Tonini, Rosa M. Badia, Sonia Scardigno, Stefano Lorito, Finn Løvholt, Fabrizio Romano, Manuela Volpe, Alessandro D'Anca, Marc de la Asunción, Manuel J. Castro
Submitter: Jorge Ejarque
Name: Random Forest Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: MareNostrum4 This is an example of Random Forest algorithm from dislib. To show the usage, the code generates a synthetical input matrix. The results are printed by screen. This application used dislib-0.9.0
Name: Lanczos SVD Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: MareNostrum4
Lanczos SVD for computing singular values needed to reach an epsilon of 1e-3 on a matrix of (150000, 150). The input matrix is generated synthetically. This application used dislib-0.9.0
Type: COMPSs
Creators: Fernando Vázquez-Novoa, Workflows and Distributed Computing
Submitter: Fernando Vázquez-Novoa
Name: Word Count Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs
Description
Wordcount is an application that counts the number of words for a given set of files.
To allow parallelism the file is divided in blocks that are treated separately and merged afterwards.
Results are printed to a Pickle binary file, so they can be checked using: python -mpickle result.txt
This example also shows how to manually add input or ...
Type: COMPSs
Creators: Javier Conejero, The Workflows and Distributed Computing Team (https://www.bsc.es/discover-bsc/organisation/scientific-structure/workflows-and-distributed-computing/)
Submitter: Raül Sirvent
Name: Increment Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs
Description
Increment is an application that takes three different values and increases them a number of given times.
The purpose of this application is to show parallelism between the different increments.
Execution instructions
Usage:
runcompss --lang=python src/increment.py N initValue1 initValue2 initValue3
where:
- N: Number of times to increase ...
Type: COMPSs
Creators: Javier Conejero, The Workflows and Distributed Computing Team (https://www.bsc.es/discover-bsc/organisation/scientific-structure/workflows-and-distributed-computing/)
Submitter: Raül Sirvent
Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs
Description
Simple is an application that takes one value and increases it by five units. The purpose of this application is to show how tasks are managed by COMPSs.
Execution instructions
Usage:
runcompss --lang=python src/simple.py initValue
where:
- initValue: Initial value for counter
Execution Examples
runcompss --lang=python src/simple.py 1
runcompss
...
Type: COMPSs
Creators: Javier Conejero, The Workflows and Distributed Computing Team (https://www.bsc.es/discover-bsc/organisation/scientific-structure/workflows-and-distributed-computing/)
Submitter: Raül Sirvent
Name: TruncatedSVD (Randomized SVD) Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: MareNostrum4
TruncatedSVD (Randomized SVD) for computing just 456 singular values out of a (3.6M x 1200) size matrix. The input matrix represents a CFD transient simulation of aire moving past a cylinder. This application used dislib-0.9.0
Type: COMPSs
Creators: Cristian Tatu, The Workflows and Distributed Computing Team (https://www.bsc.es/discover-bsc/organisation/scientific-structure/workflows-and-distributed-computing/)
Submitter: Cristian Tatu
Name: Word Count Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs
Description
Wordcount is an application that counts the number of words for a given set of files.
To allow parallelism every file is treated separately and merged afterwards.
Execution instructions
Usage:
runcompss --lang=python src/wordcount.py datasetPath
where:
- datasetPath: Absolute path of the file to parse (e.g. /home/compss/tutorial_apps/python/wordcount/data/) ...
Type: COMPSs
Creators: Javier Conejero, The Workflows and Distributed Computing Team (https://www.bsc.es/discover-bsc/organisation/scientific-structure/workflows-and-distributed-computing/)
Submitter: Raül Sirvent
Name: Matrix multiplication with Objects Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs
Description
Matrix multiplication is a binary operation that takes a pair of matrices and produces another matrix.
If A is an n×m matrix and B is an m×p matrix, the result AB of their multiplication is an n×p matrix defined only if the number of columns m in A is equal to the number of rows m in B. When multiplying A and B, the ...
Type: COMPSs
Creators: Javier Conejero, The Workflows and Distributed Computing Team (https://www.bsc.es/discover-bsc/organisation/scientific-structure/workflows-and-distributed-computing/)
Submitter: Raül Sirvent
Name: Matrix multiplication with Files Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs
Description
Matrix multiplication is a binary operation that takes a pair of matrices and produces another matrix.
If A is an n×m matrix and B is an m×p matrix, the result AB of their multiplication is an n×p matrix defined only if the number of columns m in A is equal to the number of rows m in B. When multiplying A and B, the elements ...
Type: COMPSs
Creators: Javier Conejero, The Workflows and Distributed Computing Team (https://www.bsc.es/discover-bsc/organisation/scientific-structure/workflows-and-distributed-computing/)
Submitter: Raül Sirvent
A demonstration workflow for Reduced Order Modeling (ROM) within the eFlows4HPC project, implemented using Kratos Multiphysics, EZyRB, COMPSs, and dislib.
Type: COMPSs
Creators: Jose Raul Bravo Martinez, Sebastian Ares de Parga Regalado, Riccardo Rossi Bernecoli, Jorge Ejarque
Submitter: Raül Sirvent
Lysozyme in Water simplest version, from COMPSs Tutorial. The original idea of this worklfow comes from http://www.mdtutorials.com/gmx/lysozyme/index.html
BackTrackBB is a program for detection and space-time location of seismic sources based on multi-scale, frequency-selective statistical coherence of the wave field recorded by dense large-scale seismic networks and local antennas. The method is designed to enhance coherence of the signal statistical features across the array of sensors and consists of three steps. They are signal processing, space-time imaging and detection and location.
Source with inputs and outputs included (too big for ...