Distributed computing aims to offer tools and mechanisms that enable the sharing, selection, and aggregation of a wide variety of geographically distributed computational resources in a transparent way. The research done in this team is based on the past expertise of the group, and on extending it towards the aspects of distributed computing that can benefit from this expertise. The team at BSC has a strong focus on programming models and resource management and scheduling in distributed computing environments. Current trends in virtualisation have led to the appearance of Cloud computing, a topic also covered by this team. The activities of the group are mostly performed around the COMPSs project.
Space: eFlows4HPC
SEEK ID: https://workflowhub.eu/projects/172
Public web page: https://www.bsc.es/discover-bsc/organisation/scientific-structure/workflows-and-distributed-computing
Organisms: No Organisms specified
WorkflowHub PALs: No PALs for this Team
Team created: 30th Jun 2023
Related items
Teams: Cluster Emergent del Cervell Humà, Workflows and Distributed Computing, WP6 - Tsunamis, WP7 - Earthquakes, WP8 - Anthropogenic geophysical extremes, WP5 - Volcanoes, Pillar I: Manufacturing, Pillar II: Climate, Pillar III: Urgent computing for natural hazards, eFlows4HPC general, COMPSs Tutorials
Organizations: Barcelona Supercomputing Center (BSC-CNS)
https://orcid.org/0000-0003-0606-2512Expertise: Workflows, Programming Models, High Performance Computing, Distributed Computing, Provenance
Tools: COMPSs
Established Researcher at Workflows and Distributed Computing Group, Computer Sciences department, Barcelona Supercomputing Center.
eFlows4HPC project aims at providing workflow software stack and an additional set of services to enable the integration of HPC simulations and modelling with big data analytics and machine learning in scientific and industrial applications. The project is also developing the HPC Workflows as a Service (HPCWaaS) methodology that aims at providing tools to simplify the development, deployment, execution and reuse of workflows. The project demonstrates its advances through three application Pillars ...
Teams: Cluster Emergent del Cervell Humà, Workflows and Distributed Computing, Pillar I: Manufacturing, Pillar II: Climate, Pillar III: Urgent computing for natural hazards, eFlows4HPC general, COMPSs Tutorials
Web page: https://eflows4hpc.eu
Abstract (Expand)
Authors: Raul Sirvent, Javier Conejero, Francesc Lordan, Jorge Ejarque, Laura Rodriguez-Navas, Jose M. Fernandez, Salvador Capella-Gutierrez, Rosa M. Badia
Date Published: 1st Nov 2022
Publication Type: Proceedings
DOI: 10.1109/WORKS56498.2022.00006
Citation: 2022 IEEE/ACM Workshop on Workflows in Support of Large-Scale Science (WORKS),pp.1-9,IEEE
Provenance registration is becoming more and more important, as we increase the size and number of experiments performed using computers. In particular, when provenance is recorded in HPC environments, it must be efficient and scalable. In this paper, we propose a provenance registration method for scientific workflows, efficient enough to run in supercomputers (thus, it could run in other ...
Creator: Raül Sirvent
Submitter: Raül Sirvent
Session during the Innovative HPC workflows for industry (https://eflows4hpc.eu/event/innovative-hpc-workflows-for-industry/) that describes how Workflow Provenance is recorded with COMPSs: the background on the tools used, how the recording has been designed, and how to use it and inspect metadata.
Creator: Raül Sirvent
Submitter: Raül Sirvent
Name: Random Forest Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: MareNostrum4 This is an example of Random Forest algorithm from dislib. To show the usage, the code generates a synthetical input matrix. The results are printed by screen. This application used dislib-0.9.0
Name: GridSearchCV Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: MareNostrum5
GridSearch of kNN algorithm for the iris.csv dataset (https://gist.githubusercontent.com/netj/8836201/raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv). This application used dislib-0.9.0
Name: GridSearchCV Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: MareNostrum5
GridSearch of kNN algorithm for the iris.csv dataset (https://gist.githubusercontent.com/netj/8836201/raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv). This application used dislib-0.9.0
Name: Matrix Multiplication Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs
Description
Matrix multiplication is a binary operation that takes a pair of matrices and produces another matrix.
If A is an n×m matrix and B is an m×p matrix, the result AB of their multiplication is an n×p matrix defined only if the number of columns m in A is equal to the number of rows m in B. When multiplying A and B, the elements of the ...
Name: Matrix Multiplication Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs
Description
Matrix multiplication is a binary operation that takes a pair of matrices and produces another matrix.
If A is an n×m matrix and B is an m×p matrix, the result AB of their multiplication is an n×p matrix defined only if the number of columns m in A is equal to the number of rows m in B. When multiplying A and B, the elements of the ...
Lysozyme in water full COMPSs application
Name: KMeans Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: MareNostrum5
KMEans for clustering the housing.csv dataset (https://github.com/sonarsushant/California-House-Price-Prediction/blob/master/housing.csv). This application used dislib-0.9.0
Name: TruncatedSVD (Randomized SVD) Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: MareNostrum5
TruncatedSVD (Randomized SVD) for computing just 456 singular values out of a (4.5M x 850) size matrix. The input matrix represents a CFD transient simulation of air moving past a cylinder. This application used dislib-0.9.0
Name: SparseLU Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs
Description
The Sparse LU application computes an LU matrix factorization on a sparse blocked matrix. The matrix size (number of blocks) and the block size are parameters of the application.
As the algorithm progresses, the area of the matrix that is accessed is smaller; concretely, at each iteration, the 0th row and column of the current matrix are discarded. ...
COMPSs Matrix Multiplication, out-of-core using files. Hypermatrix size used 2x2 blocks (MSIZE=2), block size used 2x2 elements (BSIZE=2)
Name: Matrix multiplication with Files, reproducibility example, without data persistence Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs
Description
Matrix multiplication is a binary operation that takes a pair of matrices and produces another matrix.
If A is an n×m matrix and B is an m×p matrix, the result AB of their multiplication is an n×p matrix defined only if the number of columns m in A is equal to the number ...
Name: Matrix multiplication with Files, reproducibility example Contact Person: support-compss@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs
Description
Matrix multiplication is a binary operation that takes a pair of matrices and produces another matrix.
If A is an n×m matrix and B is an m×p matrix, the result AB of their multiplication is an n×p matrix defined only if the number of columns m in A is equal to the number of rows m in B. When multiplying ...
Name: Matmul GPU Case 1 Cache-ON Contact Person: cristian.tatu@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: Minotauro-MN4
Matmul running on the GPU leveraging COMPSs GPU Cache for deserialization speedup. Launched using 32 GPUs (16 nodes). Performs C = A @ B Where A: shape (320, 56_900_000) block_size (10, 11_380_000) B: shape (56_900_000, 10) block_size (11_380_000, 10) C: shape (320, 10) block_size ...
Type: COMPSs
Creators: Cristian Tatu, The Workflows and Distributed Computing Team (https://www.bsc.es/discover-bsc/organisation/scientific-structure/workflows-and-distributed-computing/)
Submitter: Cristian Tatu
Name: Matmul GPU Case 1 Cache-OFF Contact Person: cristian.tatu@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs 3.3 Machine: Minotauro-MN4
Matmul running on the GPU without Cache. Launched using 32 GPUs (16 nodes). Performs C = A @ B Where A: shape (320, 56_900_000) block_size (10, 11_380_000) B: shape (56_900_000, 10) block_size (11_380_000, 10) C: shape (320, 10) block_size (10, 10) Total dataset size 291 ...
Type: COMPSs
Creators: Cristian Tatu, The Workflows and Distributed Computing Team (https://www.bsc.es/discover-bsc/organisation/scientific-structure/workflows-and-distributed-computing/)
Submitter: Cristian Tatu
Name: K-Means GPU Cache OFF Contact Person: cristian.tatu@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: Minotauro-MN4
K-Means running on GPUs. Launched using 32 GPUs (16 nodes). Parameters used: K=40 and 32 blocks of size (1_000_000, 1200). It creates a block for each GPU. Total dataset shape is (32_000_000, 1200). Version dislib-0.9
Average task execution time: 194 seconds
Type: COMPSs
Creators: Cristian Tatu, The Workflows and Distributed Computing Team (https://www.bsc.es/discover-bsc/organisation/scientific-structure/workflows-and-distributed-computing/)
Submitter: Cristian Tatu
Name: K-Means GPU Cache ON Contact Person: cristian.tatu@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: Minotauro-MN4
K-Means running on the GPU leveraging COMPSs GPU Cache for deserialization speedup. Launched using 32 GPUs (16 nodes). Parameters used: K=40 and 32 blocks of size (1_000_000, 1200). It creates a block for each GPU. Total dataset shape is (32_000_000, 1200). Version dislib-0.9
Average task execution time: 16 seconds
Type: COMPSs
Creators: Cristian Tatu, The Workflows and Distributed Computing Team (https://www.bsc.es/discover-bsc/organisation/scientific-structure/workflows-and-distributed-computing/)
Submitter: Cristian Tatu
Name: Dislib Distributed Training - Cache ON Contact Person: cristian.tatu@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: Minotauro-MN4
PyTorch distributed training of CNN on GPU and leveraging COMPSs GPU Cache for deserialization speedup. Launched using 32 GPUs (16 nodes). Dataset: Imagenet Version dislib-0.9 Version PyTorch 1.7.1+cu101
Average task execution time: 36 seconds
Type: COMPSs
Creators: Cristian Tatu, The Workflows and Distributed Computing Team (https://www.bsc.es/discover-bsc/organisation/scientific-structure/workflows-and-distributed-computing/)
Submitter: Cristian Tatu
Name: Dislib Distributed Training - Cache OFF Contact Person: cristian.tatu@bsc.es Access Level: public License Agreement: Apache2 Platform: COMPSs Machine: Minotauro-MN4
PyTorch distributed training of CNN on GPU. Launched using 32 GPUs (16 nodes). Dataset: Imagenet Version dislib-0.9 Version PyTorch 1.7.1+cu101
Average task execution time: 84 seconds
Type: COMPSs
Creators: Cristian Tatu, The Workflows and Distributed Computing Team (https://www.bsc.es/discover-bsc/organisation/scientific-structure/workflows-and-distributed-computing/)
Submitter: Cristian Tatu
Lysozyme in water full COMPSs application run at MareNostrum IV, using full dataset with two workers
PyCOMPSs implementation of Probabilistic Tsunami Forecast (PTF). PTF explicitly treats data- and forecast-uncertainties, enabling alert level definitions according to any predefined level of conservatism, which is connected to the average balance of missed-vs-false-alarms. Run of the Kos-Bodrum 2017 event test-case with 1000 scenarios, 8h tsunami simulation for each and forecast calculations for partial and full ensembles with focal mechanism and tsunami data updates.
Type: COMPSs
Creators: Louise Cordrie, Jorge Ejarque, Carlos Sánchez Linares, Jacopo Selva, Jorge Macías, Steven J. Gibbons, Fabrizio Bernardi, Roberto Tonini, Rosa M. Badia, Sonia Scardigno, Stefano Lorito, Finn Løvholt, Fabrizio Romano, Manuela Volpe, Alessandro D'Anca, Marc de la Asunción, Manuel J. Castro
Submitter: Jorge Ejarque