Publications

What is a Publication?
28 Publications visible to you, out of a total of 28

Abstract (Expand)

WorkflowHub is a registry of computational workflows, provided as a EOSC Service by ELIXIR-UK, and used by over 200 different research projects, institutions and virtual collaborations. For this milestone of EuroScienceGateway (ESG), the project has developed an onboarding guide for WorkflowHub and registered in WorkflowHub the initial ESG workflows that have been developed and maintained by the project.

Authors: Stian Soiland-Reyes, Björn Grüning, Paul De Geest

Date Published: 29th Feb 2024

Publication Type: Tech report

Abstract (Expand)

Description The Workflowhub Knowledge Graph has been improved and its generation made more robust. When this work was last reported, a complete knowledge graph had been generated but several criticismsicisms were made. The previous graph was: - Verbose and hard for a human to read or navigate - Had unresolvable URIs as root data entities - Contained many duplicate entries - Contained sparse metadata from only a single source Work has successfully been undertaken to address all of these points. The graph now uses partially resolvable, more human readable, URIs for root data entities. Steps have been added to the generation software to add metadata from additional sources (enrichment) and to remove duplicate entries (consolidation). Several areas of the codebase have been refactored and improved, to help ensure repeatability and longevity. The new knowledge graph still has areas that could be improved. Partially resolvable URIs should be migrated to fully resolvable alternatives. Further enrichment processes should be added which affords greater de-duplication.

Authors: Eli Chadwick, Oliver Woolland, Volodymyr Savchenko, Finn Bacall, Alexander Hambley, José María Fernández González, Armin Dadras, Stian Soiland-Reyes

Date Published: 1st Aug 2025

Publication Type: Tech report

Abstract (Expand)

Computational workflows describe the complex multi-step methods that are used for data collection, data preparation, analytics, predictive modelling, and simulation that lead to new data products. They can inherently contribute to the FAIR data principles: by processing data according to established metadata; by creating metadata themselves during the processing of data; and by tracking and recording data provenance. These properties aid data quality assessment and contribute to secondary data usage. Moreover, workflows are digital objects in their own right. This paper argues that FAIR principles for workflows need to address their specific nature in terms of their composition of executable software steps, their provenance, and their development.

Authors: Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes, Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters, Daniel Schober

Date Published: 2020

Publication Type: Journal

Abstract (Expand)

Description This deliverable provides the final project summary of EuroScienceGateway (ESG), a Horizon Europe and EOSC initiative (Grant Agreement 101057388, Sept 2022–Aug 2025) coordinated byed by Albert-Ludwigs-Universität Freiburg. It summarizes ESG’s main achievements, impacts, FAIR data management, sustainability and exploitation plans, and dissemination outcomes. Technically, ESG delivered a production-grade, federated research gateway built on Galaxy and an expanded Pulsar Network, enabling scalable, data-intensive analysis across heterogeneous European compute and storage. Key innovations include Bring-Your-Own-Compute/Storage (BYOC/BYOS), a smart meta-scheduler (TPV Broker), Galaxy Job Radar dashboard, and streamlined deployment/admin tooling—altogether improving throughput, data locality, and operational transparency. The project operationalized FAIR principles for computational workflows by packaging and publishing Workflow RO-Crates with persistent identifiers via WorkflowHub, advancing EOSC interoperability. Federated AAI (e.g., EGI Check-in, LS Login, IAM4NFDI) supports secure access across institutions. ESG contributed >20 workflows, >40 tutorials, and >10 peer-reviewed publications, and collaborated with 20+ initiatives. Six national Galaxy instances and 10+ Pulsar endpoints were launched; the European Galaxy instance achieved ISO/IEC 27001 certification. Community impact was substantial: registered users on the European Galaxy portal grew from ~30,000 to >130,000, with monthly actives doubling to >6,000, underpinned by >20 online/onsite workshops and large-scale training through the Galaxy Training Network and Training-Infrastructure-as-a-Service (TIaaS). Sustainability is ensured through distributed governance, national/institutional hosting of Galaxy/Pulsar services, continued curation of workflows and training materials, and alignment with EOSC service models and funding pathways. The report closes with exploitation routes for beneficiaries and stakeholders and a record of dissemination and outreach activities across the European research ecosystem.

Authors: Armin Dadras, Oana Kaiser, Björn Grüning, Sebastian Luna-Valero, Enol Fernandez-del-Castillo

Date Published: 20th Aug 2025

Publication Type: Tech report

Abstract (Expand)

This report provides an in-depth analysis of the sustainability of the Galaxy platform, a globally recognized open-source system for data analysis, workflow management, and scientific collaboration. Developed under the EuroScienceGateway project and supported by the European Union’s Horizon Europe program (Grant Agreement No. 101057388), the report evaluates Galaxy through the lenses of desirability, feasibility, and viability using a robust analytical framework derived from design thinking and open-source community health metrics (CHAOSS). The report presents empirical data on Galaxy's rapid growth in user adoption, job execution volume, infrastructure robustness, contributor engagement, community governance, and scientific impact. It highlights Galaxy’s ability to democratize access to advanced computational tools, support reproducible science, and maintain long-term sustainability through a distributed community and institutional support. This document is a valuable resource for funders, policymakers, and stakeholders in the open science and digital research infrastructure community, illustrating why Galaxy represents a low-risk, high-reward investment in the future of data-driven research.

Author: Smitesh Jain

Date Published: 17th Jul 2025

Publication Type: Tech report

Abstract (Expand)

We here introduce the concept of Canonical Workflow Building Blocks (CWBB), a methodology of describing and wrapping computational tools, in order for them to be utilized in a reproducible manner from multiple workflow languages and execution platforms. We argue such practice is a necessary requirement for FAIR Computational Workflows [Goble 2020] to improve widespread adoption and reuse of a computational method across workflow language barriers.

Authors: Stian Soiland-Reyes, Genís Bayarri, Pau Andrio, Robin Long, Douglas Lowe, Ania Niewielska, Adam Hospital

Date Published: 7th Mar 2021

Publication Type: Journal

Abstract (Expand)

A widely used standard for portable multilingual data analysis pipelines would enable considerable benefits to scholarly publication reuse, research/industry collaboration, regulatory cost control, and to the environment. Published research that used multiple computer languages for their analysis pipelines would include a complete and reusable description of that analysis that is runnable on a diverse set of computing environments. Researchers would be able to easier collaborate and reuse these pipelines, adding or exchanging components regardless of programming language used; collaborations with and within the industry would be easier; approval of new medical interventions that rely on such pipelines would be faster. Time will be saved and environmental impact would also be reduced, as these descriptions contain enough information for advanced optimization without user intervention. Workflows are widely used in data analysis pipelines, enabling innovation and decision-making for the modern society. In many domains the analysis components are numerous and written in multiple different computer languages by third parties. However, lacking a standard for reusable and portable multilingual workflows, then reusing published multilingual workflows, collaborating on open problems, and optimizing their execution would be severely hampered. Moreover, only a standard for multilingual data analysis pipelines that was widely used would enable considerable benefits to research-industry collaboration, regulatory cost control, and to preserving the environment. Prior to the start of the CWL project, there was no standard for describing multilingual analysis pipelines in a portable and reusable manner. Even today / currently, although there exist hundreds of single-vendor and other single-source systems that run workflows, none is a general, community-driven, and consensus-built standard. Preprint, submitted to Communications of the ACM (CACM).

Authors: Michael R. Crusoe, Sanne Abeln, Alexandru Iosup, Peter Amstutz, John Chilton, Nebojša Tijanić, Hervé Ménager, Stian Soiland-Reyes, Carole Goble

Date Published: 14th May 2021

Publication Type: Unpublished

Abstract (Expand)

Description Effective resource scheduling is critical in high-performance (HPC) and high-throughput computing (HTC) environments, where traditional scheduling systems struggle with resource contention,tion, data locality, and fault tolerance. Meta-scheduling, which abstracts multiple schedulers for unified job allocation, addresses these challenges. Galaxy, a widely used platform for data-intensive computational analysis, employs the \textit{Total Perspective Vortex (TPV)} system for resource scheduling. With over 550,000 users, Galaxy aims to optimize scheduling efficiency in large-scale environments. While TPV offers flexibility, its decision-making can be enhanced by incorporating real-time resource availability and job status. This paper introduces the TPV Broker, a meta-scheduling framework that integrates real-time resource data to enable dynamic, data-aware scheduling. TPV Broker enhances scalability, resource utilization, and scheduling efficiency in Galaxy, offering potential for further improvements in distributed computing environments.

Authors: Abdulrahman Azab, Paul De Geest, Sanjay Kumar Srikakulam, Tomáš Vondra, Mira Kuntz, Björn Grüning

Date Published: 1st Feb 2025

Publication Type: Unpublished

Abstract (Expand)

Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition hasmposition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the “big picture” of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future.

Authors: Anna-Lena Lamprecht, Magnus Palmblad, Jon Ison, Veit Schwämmle, Mohammad Sadnan Al Manir, Ilkay Altintas, Christopher J. O. Baker, Ammar Ben Hadj Amor, Salvador Capella-Gutierrez, Paulos Charonyktakis, Michael R. Crusoe, Yolanda Gil, Carole Goble, Timothy J. Griffin, Paul Groth, Hans Ienasescu, Pratik Jagtap, Matúš Kalaš, Vedran Kasalica, Alireza Khanteymoori, Tobias Kuhn, Hailiang Mei, Hervé Ménager, Steffen Möller, Robin A. Richardson, Vincent Robert, Stian Soiland-Reyes, Robert Stevens, Szoke Szaniszlo, Suzan Verberne, Aswin Verhoeven, Katherine Wolstencroft

Date Published: 2021

Publication Type: Journal

Abstract (Expand)

Research Object Crate (RO-Crate) is a lightweight method to package research outputs along with their metadata. Signposting provides a simple yet powerful approach to navigate scholarly objects on the Web. Combining these technologies form a "webby" implementation of the FAIR Digital Object principles which is suitable for retrofitting to existing data infrastructures or even for ad-hoc research objects using regular Web hosting platforms. Here we give an update of recent community development and adoption of RO-Crate and Signposting. It is notable that programmatic access and more detailed profiles have received high attention, as well as several FDO implementations that use RO-Crate.

Authors: Stian Soiland-Reyes, Peter Sefton, Simone Leo, Leyla Jael Castro, Claus Weiland, Herbert Van de Sompel

Date Published: 18th Mar 2025

Publication Type: Journal

Powered by
(v.1.17.0-main)
Copyright © 2008 - 2025 The University of Manchester and HITS gGmbH