Publications

What is a Publication?
11 Publications visible to you, out of a total of 11

Abstract (Expand)

Computational workflows, regardless of their portability or maturity, represent major investments of both effort and expertise. They are first class, publishable research objects in their own right. They are key to sharing methodological know-how for reuse, reproducibility, and transparency. Consequently, the application of the FAIR principles to workflows [goble_2019, wilkinson_2025] is inevitable to enable them to be Findable, Accessible, Interoperable, and Reusable. Making workflows FAIR would reduce duplication of effort, assist in the reuse of best practice approaches and community-supported standards, and ensure that workflows as digital objects can support reproducible and robust science. FAIR workflows also encourage interdisciplinary collaboration, enabling workflows developed in one field to be repurposed and adapted for use in other research domains. FAIR workflows draw from both FAIR data [wilkinson_2016] and software [barker_2022] principles. Workflows propose explicit method abstractions and tight bindings to data, hence making many of the data principles apply. Meanwhile, as executable pipelines with a strong emphasis on code composition and data flow between steps, the software principles apply, too. As workflows are chiefly concerned with the processing and creation of data, they also have an important role to play in ensuring and supporting data FAIRification. The FAIR Principles for software and data mandate the use of persistent identifiers (PID) and machine actionable metadata associated with workflows to enable findability, reusability, interoperability and reusability. To implement the principles requires a PID and metadata framework with appropriate programmatic protocols, an accompanying ecosystem of services, tools, guidelines, policies, and best practices, as well the buy-in of existing workflow systems such that they adapt in order to adopt. The European EOSC-Life Workflow Collaboratory is an example of such a digital infrastructure for the Biosciences: it includes a metadata standards framework for describing workflows (i.e. RO-Crate, Bioschemas, and CWL), that is managed and used by dedicated new FAIR workflow services and programmatic APIs for interoperability and metadata access such as those proposed by the Global Alliance for Genomics and Health (GA4GH) [rehm_2021]. The WorkflowHub registry supports workflow Findability and Accessibility, while workflow testing services like LifeMonitor support long-term Reusability, Usability and Reproducibility. Existing workflow management systems/languages and packaging solutions are incorporated and adapted to promote portability, composability, interoperability, provenance collection and reusability, and to use and support these FAIR services. In this chapter, we will introduce the FAIR principles for workflows, the connections between FAIR workflows, and the FAIR ecosystems in which they live, using the EOSC-Life Collaboratory as a concrete example. We will also introduce other community efforts that are easing the ways that workflows are shared and reused by others, and we will discuss how the variations in different workflow settings impact their FAIR perspective.

Authors: Sean R. Wilkinson, Johan Gustafsson, Finn Bacall, Khalid Belhajjame, Salvador Capella, José María Fernández González, Jacob Fosso Tande, Luiz Gadelha, Daniel Garijo, Patricia Grubel, Björn Grüning, Farah Zaib Khan, Sehrish Kanwal, Simone Leo, Stuart Owen, Luca Pireddu, Line Pouchard, Laura Rodriguez-Navas, Beatriz Serrano-Solano, Stian Soiland-Reyes, Baiba Vilne, Alan Williams, Merridee Ann Wouters, Frederik Coppens, Carole Goble

Date Published: 21st May 2025

Publication Type: InBook

Abstract (Expand)

Development of the Open Infrastructure and Pulsar Network to support distributed job execution and scalable Galaxy deployments across Europe. Project: EuroScienceGateway was funded by the European UnionUnion programme Horizon Europe (HORIZON-INFRA-2021-EOSC-01-04) under grant agreement number 101057388 and by UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding guarantee grant number 10038963. Document: D3.1 Operations documentation on the Open Infrastructure deployment Work Package: Work Package 3. Pulsar Network: Distributed heterogeneous compute. Tasks: - Task 3.1 Develop and maintain an Open Infrastructure based deployment model for Pulsar endpoints. - Task 3.3 Build a European-wide network of Pulsar sites. - Task 3.5 Developing and maintaining national or domain-driven Galaxy servers. Lead Beneficiary: INFN Contributing Beneficiary: INFN, ALU-FR, CNRS, CESNET, UiB, BSC, VIB, IISAS, TUBITAK and CNR Executive Summary Work Package 3 of the EuroScienceGateway project is divided into 5 tasks, aimed at bringing into production (TRL9) the Pulsar Network , a distributed computing network that allows public Galaxy servers to offload jobs to remote computing clusters provided by project partners. Specifically, this deliverable describes the work done in tasks 3.1, 3.2, 3.3 and 3.5. The main objectives of WP3 are: 1) to simplify the deployment and management of new Pulsar and Galaxy endpoints (T3.1 and T3.5), to make Pulsar compatible with the GA4GH TES specifications (T3.2), and to deploy new Pulsar endpoints (T3.3)

Authors: Stefano Nicotri, Marco Antonio Tangaro, Federico Zambelli, Miroslav Ruda, Ales Krenek, Björn Grüning, Sanjay Kumar Srikakulam, Anthony Bretaudeau, Sondre Batalden, María Chavero Díez, Paul De Geest

Date Published: 27th Aug 2024

Publication Type: Journal

Abstract (Expand)

Description This EuroScienceGateway report gives an overview of FAIR Digital Objects (FDO), considering their use for computational workflows as scholarly objects. EuroScienceGateway has progressed thed the technologies Signposting and RO-Crate for implementing Workflow FDOs with the registry WorkflowHub and the workflow system Galaxy, and initiated work with academic publishers to encourage workflow citation practices. Here we document how WorkflowHub supports research software best practices for workflows, and assist building FAIR Computational Workflows. Provenance of workflow executions has been made possible in an interoperable way across many workflow systems using Workflow Run Crate profiles, including from Galaxy. Finally this report explores how Workflow FDOs are exposed and can be utilised, e.g. gathered in knowledge graphs and having tighter workflow system integration.

Authors: Stian Soiland-Reyes, Eli Chadwick, Finn Bacall, Jose M. Fernandez, Björn Grüning, Hakan Bayındır

Date Published: 28th Aug 2024

Publication Type: Tech report

Abstract (Expand)

The concept of publishing workflows as scholarly is being recognised and practiced through repositories like WorkflowHub and principles for FAIR Computational Workflow. This deliverable describes how the evolving landscape of the European Open Science Cloud (EOSC) can facilitate workflow publishing in a federated and distributed manner, exemplified by how workflows for Galaxy are published.

Authors: Stian Soiland-Reyes, Eli Chadwick, Armin Dadras, Björn Grüning, Catalin Condurache, Sebastian Luna-Valero, Volodymyr Savchenko

Date Published: 8th Feb 2025

Publication Type: Tech report

Abstract (Expand)

WorkflowHub is a registry of computational workflows, provided as a EOSC Service by ELIXIR-UK, and used by over 200 different research projects, institutions and virtual collaborations. For this milestone of EuroScienceGateway (ESG), the project has developed an onboarding guide for WorkflowHub and registered in WorkflowHub the initial ESG workflows that have been developed and maintained by the project.

Authors: Stian Soiland-Reyes, Björn Grüning, Paul De Geest

Date Published: 29th Feb 2024

Publication Type: Tech report

Abstract (Expand)

Description This deliverable provides the final project summary of EuroScienceGateway (ESG), a Horizon Europe and EOSC initiative (Grant Agreement 101057388, Sept 2022–Aug 2025) coordinated byed by Albert-Ludwigs-Universität Freiburg. It summarizes ESG’s main achievements, impacts, FAIR data management, sustainability and exploitation plans, and dissemination outcomes. Technically, ESG delivered a production-grade, federated research gateway built on Galaxy and an expanded Pulsar Network, enabling scalable, data-intensive analysis across heterogeneous European compute and storage. Key innovations include Bring-Your-Own-Compute/Storage (BYOC/BYOS), a smart meta-scheduler (TPV Broker), Galaxy Job Radar dashboard, and streamlined deployment/admin tooling—altogether improving throughput, data locality, and operational transparency. The project operationalized FAIR principles for computational workflows by packaging and publishing Workflow RO-Crates with persistent identifiers via WorkflowHub, advancing EOSC interoperability. Federated AAI (e.g., EGI Check-in, LS Login, IAM4NFDI) supports secure access across institutions. ESG contributed >20 workflows, >40 tutorials, and >10 peer-reviewed publications, and collaborated with 20+ initiatives. Six national Galaxy instances and 10+ Pulsar endpoints were launched; the European Galaxy instance achieved ISO/IEC 27001 certification. Community impact was substantial: registered users on the European Galaxy portal grew from ~30,000 to >130,000, with monthly actives doubling to >6,000, underpinned by >20 online/onsite workshops and large-scale training through the Galaxy Training Network and Training-Infrastructure-as-a-Service (TIaaS). Sustainability is ensured through distributed governance, national/institutional hosting of Galaxy/Pulsar services, continued curation of workflows and training materials, and alignment with EOSC service models and funding pathways. The report closes with exploitation routes for beneficiaries and stakeholders and a record of dissemination and outreach activities across the European research ecosystem.

Authors: Armin Dadras, Oana Kaiser, Björn Grüning, Sebastian Luna-Valero, Enol Fernandez-del-Castillo

Date Published: 20th Aug 2025

Publication Type: Tech report

Abstract (Expand)

Description Effective resource scheduling is critical in high-performance (HPC) and high-throughput computing (HTC) environments, where traditional scheduling systems struggle with resource contention,tion, data locality, and fault tolerance. Meta-scheduling, which abstracts multiple schedulers for unified job allocation, addresses these challenges. Galaxy, a widely used platform for data-intensive computational analysis, employs the \textit{Total Perspective Vortex (TPV)} system for resource scheduling. With over 550,000 users, Galaxy aims to optimize scheduling efficiency in large-scale environments. While TPV offers flexibility, its decision-making can be enhanced by incorporating real-time resource availability and job status. This paper introduces the TPV Broker, a meta-scheduling framework that integrates real-time resource data to enable dynamic, data-aware scheduling. TPV Broker enhances scalability, resource utilization, and scheduling efficiency in Galaxy, offering potential for further improvements in distributed computing environments.

Authors: Abdulrahman Azab, Paul De Geest, Sanjay Kumar Srikakulam, Tomáš Vondra, Mira Kuntz, Björn Grüning

Date Published: 1st Feb 2025

Publication Type: Unpublished

Abstract (Expand)

Project: EuroScienceGateway was funded by the European Union programme Horizon Europe (HORIZON-INFRA-2021-EOSC-01-04) under grant agreement number 101057388 and by UK Research and Innovation (UKRI)KRI) under the UK government’s Horizon Europe funding guarantee grant number 10038963. Document: D4.2 Publication on the smart job scheduler implementation Work Package: Work Package 4. Building blocks for a sustainable operating model. Task: - Task 4.3 Implement a smart job-scheduling system across Europe Lead Beneficiary: EGI Contributing Beneficiary: ALU-FR, CESNET, EGI, UiO, and VIB Executive Summary Galaxy is currently using the Total Perspective Vortex (TPV) to schedule millions of jobs for hundred thousand users globally. While TPV has proven to be a robust meta-scheduling tool for Galaxy in the last years, there are areas of improvement that have been addressed in the EuroScienceGateway project: - Gathering live usage metrics from across the distributed computing endpoints connected to Galaxy in order to distribute the load across all sites. - Adding latitude and longitude attributes to data stores and computing endpoints to allocate jobs as close as possible to the location of the data. - Visualizing job distribution across sites with an intuitive dashboard. As a result the EuroScienceGateway project has developed two new tools: - TPV Broker for the efficient meta-scheduling of jobs taking into account real-time usage metrics and data-locality information - Galaxy Job Radar: a web dashboard to easily visualize the allocation of jobs across all sites The EuroScienceGateway project has significantly improved the meta-scheduling of jobs for Galaxy, resulting in less waiting times for users to see their job completed and improving resource utilization across all sites.

Authors: Abdulrahman Azab, Sanjay Kumar Srikakulam, Paul De Geest, Tomáš Vondrák, Björn Grüning, Mira Kuntz, Enol Fernandez-del-Castillo, Sebastian Luna-Valero

Date Published: 27th Feb 2025

Publication Type: Tech report

Abstract (Expand)

Description This preprint outlines the development and deployment of the European Pulsar Network (EPN)—a federated, scalable architecture enabling distributed job execution across national and Europeanopean Galaxy instances. Built within the Horizon Europe EuroScienceGateway project, the EPN leverages the Galaxy workflow system and the Pulsar job execution service to offload computational workloads to remote endpoints seamlessly and securely. The work introduces an Open Infrastructure (OI) framework that automates provisioning, deployment, and monitoring using Terraform, Ansible, and Jenkins. The pre-print highlights deployments across thirteen Pulsar nodes and six national Galaxy portals, illustrating how the EPN supports reproducible, FAIR-aligned data analysis while abstracting infrastructure complexity for researchers.

Authors: Marco Antonio Tangaro, Stefano Nicotri, Björn Grüning, Sanjay Kumar Srikakulam, Armin Dadras, Oana Kaiser, Mira Kuntz, Anthony Bretaudeau, Paul De Geest, Sebastian Luna-Valero, María Chavero Díez, José María Fernández González, Salvador Capella-Gutierrez, Josep Lluís Gelpí, Jan Astalos, Boris Jurič, Miroslav Ruda, Łukasz Opioła, Hakan Bayındır, SILVIA GIOIOSA, Gaetanomaria De Sanctis, Federico Zambelli

Date Published: 7th Aug 2025

Publication Type: Unpublished

Abstract (Expand)

Workflows have become a core part of computational scientific analysis in recent years. Automated computational workflows multiply the power of researchers, potentially turning “hand-cranked” datadata processing by informaticians into robust factories for complex research output. However, in order for a piece of software to be usable as a workflow-ready tool, it may require alteration from its likely origin as a standalone tool. Research software is often created in response to the need to answer a research question with the minimum expenditure of time and money in resource-constrained projects. The level of quality might range from “it works on my computer” to mature and robust projects with support across multiple operating systems. Despite significant increase in uptake of workflow tools, there is little specific guidance for writing software intended to slot in as a tool within a workflow; or on converting an existing standalone research-quality software tool into a reusable, composable, well-behaved citizen within a larger workflow. In this paper we present 10 simple rules for how a software tool can be prepared for workflow use.

Authors: Paul Brack, Peter Crowther, Stian Soiland-Reyes, Stuart Owen, Douglas Lowe, Alan R. Williams, Quentin Groom, Mathias Dillen, Frederik Coppens, Björn Grüning, Ignacio Eguinoa, Philip Ewels, Carole Goble

Date Published: 24th Mar 2022

Publication Type: Journal

Powered by
(v.1.17.0-main)
Copyright © 2008 - 2025 The University of Manchester and HITS gGmbH