Publications

What is a Publication?
6 Publications visible to you, out of a total of 6

Abstract (Expand)

Development of the needed extensions of the EuroScienceGateway components (Pulsar and Galaxy) to automate and facilitate the integration of user provided computing and storage resources. Project:: EuroScienceGateway was funded by the European Union programme Horizon Europe (HORIZON-INFRA-2021-EOSC-01-04) under grant agreement number 101057388 and by UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding guarantee grant number 10038963. Document: D4.1 Bring Your Own Infrastructure Work Package: Work Package 4. Building blocks for a sustainable operating model. Task: - Task 4.1 Bring Your Own Compute (BYOC) - Task 4.2 Bring Your Own Storage (BYOS) Lead Beneficiary: EGI Contributing Beneficiary: AGH-UST, ALU-FR, EGI, INFN, and VIB Executive Summary This deliverable presents the activities carried out in tasks 4.1 “Bring Your Own Compute (BYOC)” and 4.2 “Bring Your Own Storage (BYOS)”, under Work Package 4 “Building blocks for a sustainable operating model”. The overall goal of tasks 4.1 and 4.2 is to make it easier for Galaxy users to connect their accounts in Galaxy to existing, externally managed compute and storage resources. The benefits are twofold: 1) Galaxy administrators do not need to operate and maintain additional IT infrastructure and 2) Galaxy users get extra capacity to execute workflows that are beyond their assigned quotas in Galaxy.

Authors: Maiken Pedersen, Sanjay Kumar Srikakulam, Paul De Geest, Enol Fernandez-del-Castillo, Andrea Cristofori, Sebastian Luna-Valero, Marco Antonio Tangaro, Stefano Nicotri

Date Published: 26th Aug 2024

Publication Type: Tech report

Abstract (Expand)

Development of the Open Infrastructure and Pulsar Network to support distributed job execution and scalable Galaxy deployments across Europe. Project: EuroScienceGateway was funded by the European UnionUnion programme Horizon Europe (HORIZON-INFRA-2021-EOSC-01-04) under grant agreement number 101057388 and by UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding guarantee grant number 10038963. Document: D3.1 Operations documentation on the Open Infrastructure deployment Work Package: Work Package 3. Pulsar Network: Distributed heterogeneous compute. Tasks: - Task 3.1 Develop and maintain an Open Infrastructure based deployment model for Pulsar endpoints. - Task 3.3 Build a European-wide network of Pulsar sites. - Task 3.5 Developing and maintaining national or domain-driven Galaxy servers. Lead Beneficiary: INFN Contributing Beneficiary: INFN, ALU-FR, CNRS, CESNET, UiB, BSC, VIB, IISAS, TUBITAK and CNR Executive Summary Work Package 3 of the EuroScienceGateway project is divided into 5 tasks, aimed at bringing into production (TRL9) the Pulsar Network , a distributed computing network that allows public Galaxy servers to offload jobs to remote computing clusters provided by project partners. Specifically, this deliverable describes the work done in tasks 3.1, 3.2, 3.3 and 3.5. The main objectives of WP3 are: 1) to simplify the deployment and management of new Pulsar and Galaxy endpoints (T3.1 and T3.5), to make Pulsar compatible with the GA4GH TES specifications (T3.2), and to deploy new Pulsar endpoints (T3.3)

Authors: Stefano Nicotri, Marco Antonio Tangaro, Federico Zambelli, Miroslav Ruda, Ales Krenek, Björn Grüning, Sanjay Kumar Srikakulam, Anthony Bretaudeau, Sondre Batalden, María Chavero Díez, Paul De Geest

Date Published: 27th Aug 2024

Publication Type: Journal

Abstract (Expand)

WorkflowHub is a registry of computational workflows, provided as a EOSC Service by ELIXIR-UK, and used by over 200 different research projects, institutions and virtual collaborations. For this milestone of EuroScienceGateway (ESG), the project has developed an onboarding guide for WorkflowHub and registered in WorkflowHub the initial ESG workflows that have been developed and maintained by the project.

Authors: Stian Soiland-Reyes, Björn Grüning, Paul De Geest

Date Published: 29th Feb 2024

Publication Type: Tech report

Abstract (Expand)

Description Effective resource scheduling is critical in high-performance (HPC) and high-throughput computing (HTC) environments, where traditional scheduling systems struggle with resource contention,tion, data locality, and fault tolerance. Meta-scheduling, which abstracts multiple schedulers for unified job allocation, addresses these challenges. Galaxy, a widely used platform for data-intensive computational analysis, employs the \textit{Total Perspective Vortex (TPV)} system for resource scheduling. With over 550,000 users, Galaxy aims to optimize scheduling efficiency in large-scale environments. While TPV offers flexibility, its decision-making can be enhanced by incorporating real-time resource availability and job status. This paper introduces the TPV Broker, a meta-scheduling framework that integrates real-time resource data to enable dynamic, data-aware scheduling. TPV Broker enhances scalability, resource utilization, and scheduling efficiency in Galaxy, offering potential for further improvements in distributed computing environments.

Authors: Abdulrahman Azab, Paul De Geest, Sanjay Kumar Srikakulam, Tomáš Vondra, Mira Kuntz, Björn Grüning

Date Published: 1st Feb 2025

Publication Type: Unpublished

Abstract (Expand)

Project: EuroScienceGateway was funded by the European Union programme Horizon Europe (HORIZON-INFRA-2021-EOSC-01-04) under grant agreement number 101057388 and by UK Research and Innovation (UKRI)KRI) under the UK government’s Horizon Europe funding guarantee grant number 10038963. Document: D4.2 Publication on the smart job scheduler implementation Work Package: Work Package 4. Building blocks for a sustainable operating model. Task: - Task 4.3 Implement a smart job-scheduling system across Europe Lead Beneficiary: EGI Contributing Beneficiary: ALU-FR, CESNET, EGI, UiO, and VIB Executive Summary Galaxy is currently using the Total Perspective Vortex (TPV) to schedule millions of jobs for hundred thousand users globally. While TPV has proven to be a robust meta-scheduling tool for Galaxy in the last years, there are areas of improvement that have been addressed in the EuroScienceGateway project: - Gathering live usage metrics from across the distributed computing endpoints connected to Galaxy in order to distribute the load across all sites. - Adding latitude and longitude attributes to data stores and computing endpoints to allocate jobs as close as possible to the location of the data. - Visualizing job distribution across sites with an intuitive dashboard. As a result the EuroScienceGateway project has developed two new tools: - TPV Broker for the efficient meta-scheduling of jobs taking into account real-time usage metrics and data-locality information - Galaxy Job Radar: a web dashboard to easily visualize the allocation of jobs across all sites The EuroScienceGateway project has significantly improved the meta-scheduling of jobs for Galaxy, resulting in less waiting times for users to see their job completed and improving resource utilization across all sites.

Authors: Abdulrahman Azab, Sanjay Kumar Srikakulam, Paul De Geest, Tomáš Vondrák, Björn Grüning, Mira Kuntz, Enol Fernandez-del-Castillo, Sebastian Luna-Valero

Date Published: 27th Feb 2025

Publication Type: Tech report

Abstract (Expand)

Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, existing approaches tend to lack interoperable adoption across workflow management systems. In this work we present Workflow Run RO-Crate, an extension of RO-Crate (Research Object Crate) and Schema.org to capture the provenance of the execution of computational workflows at different levels of granularity and bundle together all their associated objects (inputs, outputs, code, etc.). The model is supported by a diverse, open community that runs regular meetings, discussing development, maintenance and adoption aspects. Workflow Run RO-Crate is already implemented by several workflow management systems, allowing interoperable comparisons between workflow runs from heterogeneous systems. We describe the model, its alignment to standards such as W3C PROV, and its implementation in six workflow systems. Finally, we illustrate the application of Workflow Run RO-Crate in two use cases of machine learning in the digital image analysis domain.

Authors: Simone Leo, Michael R. Crusoe, Laura Rodríguez-Navas, Raül Sirvent, Alexander Kanitz, Paul De Geest, Rudolf Wittner, Luca Pireddu, Daniel Garijo, José M. Fernández, Iacopo Colonnelli, Matej Gallo, Tazro Ohta, Hirotaka Suetake, Salvador Capella-Gutierrez, Renske de Wit, Bruno P. Kinoshita, Stian Soiland-Reyes

Date Published: 10th Sep 2024

Publication Type: Journal

Powered by
(v.1.17.0-main)
Copyright © 2008 - 2025 The University of Manchester and HITS gGmbH