Publications

What is a Publication?
59 Publications visible to you, out of a total of 59

Abstract (Expand)

Preprint: https://arxiv.org/abs/2110.02168 The landscape of workflow systems for scientific applications is notoriously convoluted with hundreds of seemingly equivalent workflow systems, many isolatedd research claims, and a steep learning curve. To address some of these challenges and lay the groundwork for transforming workflows research and development, the WorkflowsRI and ExaWorks projects partnered to bring the international workflows community together. This paper reports on discussions and findings from two virtual "Workflows Community Summits" (January and April, 2021). The overarching goals of these workshops were to develop a view of the state of the art, identify crucial research challenges in the workflows community, articulate a vision for potential community efforts, and discuss technical approaches for realizing this vision. To this end, participants identified six broad themes: FAIR computational workflows; AI workflows; exascale challenges; APIs, interoperability, reuse, and standards; training and education; and building a workflows community. We summarize discussions and recommendations for each of these themes.

Authors: Rafael Ferreira da Silva, Henri Casanova, Kyle Chard, Ilkay Altintas, Rosa M Badia, Bartosz Balis, Taina Coleman, Frederik Coppens, Frank Di Natale, Bjoern Enders, Thomas Fahringer, Rosa Filgueira, Grigori Fursin, Daniel Garijo, Carole Goble, Dorran Howell, Shantenu Jha, Daniel S. Katz, Daniel Laney, Ulf Leser, Maciej Malawski, Kshitij Mehta, Loic Pottier, Jonathan Ozik, J. Luc Peterson, Lavanya Ramakrishnan, Stian Soiland-Reyes, Douglas Thain, Matthew Wolf

Date Published: 1st Nov 2021

Publication Type: Journal

Abstract (Expand)

The term “scientific workflow” has evolved over the last two decades to encompass a broad range of compositions of interdependent compute tasks and data movements. It has also become an umbrella term for processing in modern scientific applications. Today, many scientific applications can be considered as workflows made of multiple dependent steps, and hundreds of workflow systems have been developed to manage and run these scientific workflows. However, no turnkey solution has emerged from the field to address the diversity of scientific processes and the infrastructure on which they are supposed to be implemented. Instead, new research problems requiring the execution of scientific workflows with some novel feature often lead to the development of an entirely new workflow system. A direct consequence of this situation is that many existing workflow management systems (WMSs) share some salient features, offer similar functionalities, and can manage the same categories of workflows but at the same time also have some distinct capabilities that can be important for specific applications. This situation makes researchers who develop workflows face the complex question of selecting a WMS. This selection can be driven by technical considerations, to find the system that is the most appropriate for their application and for the computing and storage resources available to them, or other factors such as reputation, adoption, strong community support, or long-term sustainability. To address this problem, a group of WMS developers and practitioners joined their efforts to produce a community-based terminology of WMSs. This paper summarizes their findings and introduces this new terminology to characterize WMSs. This terminology is composed of fives axes: workflow structure and characteristics, composition, orchestration, data management, and metadata capture. Each axis comprises several concepts that capture the prominent features of WMSs. Based on this terminology, this paper also presents a classification of 23 existing WMSs according to the proposed axes and terms.

Authors: Frédéric Suter, Tainã Coleman, İlkay Altintaş, Rosa M. Badia, Bartosz Balis, Kyle Chard, Iacopo Colonnelli, Ewa Deelman, Paolo Di Tommaso, Thomas Fahringer, Carole Goble, Shantenu Jha, Daniel S. Katz, Johannes Köster, Ulf Leser, Kshitij Mehta, Hilary Oliver, J.-Luc Peterson, Giovanni Pizzi, Loïc Pottier, Raül Sirvent, Eric Suchyta, Douglas Thain, Sean R. Wilkinson, Justin M. Wozniak, Rafael Ferreira da Silva

Date Published: 2026

Publication Type: Journal

Abstract (Expand)

Computational workflows, regardless of their portability or maturity, represent major investments of both effort and expertise. They are first class, publishable research objects in their own right. They are key to sharing methodological know-how for reuse, reproducibility, and transparency. Consequently, the application of the FAIR principles to workflows [goble_2019, wilkinson_2025] is inevitable to enable them to be Findable, Accessible, Interoperable, and Reusable. Making workflows FAIR would reduce duplication of effort, assist in the reuse of best practice approaches and community-supported standards, and ensure that workflows as digital objects can support reproducible and robust science. FAIR workflows also encourage interdisciplinary collaboration, enabling workflows developed in one field to be repurposed and adapted for use in other research domains. FAIR workflows draw from both FAIR data [wilkinson_2016] and software [barker_2022] principles. Workflows propose explicit method abstractions and tight bindings to data, hence making many of the data principles apply. Meanwhile, as executable pipelines with a strong emphasis on code composition and data flow between steps, the software principles apply, too. As workflows are chiefly concerned with the processing and creation of data, they also have an important role to play in ensuring and supporting data FAIRification. The FAIR Principles for software and data mandate the use of persistent identifiers (PID) and machine actionable metadata associated with workflows to enable findability, reusability, interoperability and reusability. To implement the principles requires a PID and metadata framework with appropriate programmatic protocols, an accompanying ecosystem of services, tools, guidelines, policies, and best practices, as well the buy-in of existing workflow systems such that they adapt in order to adopt. The European EOSC-Life Workflow Collaboratory is an example of such a digital infrastructure for the Biosciences: it includes a metadata standards framework for describing workflows (i.e. RO-Crate, Bioschemas, and CWL), that is managed and used by dedicated new FAIR workflow services and programmatic APIs for interoperability and metadata access such as those proposed by the Global Alliance for Genomics and Health (GA4GH) [rehm_2021]. The WorkflowHub registry supports workflow Findability and Accessibility, while workflow testing services like LifeMonitor support long-term Reusability, Usability and Reproducibility. Existing workflow management systems/languages and packaging solutions are incorporated and adapted to promote portability, composability, interoperability, provenance collection and reusability, and to use and support these FAIR services. In this chapter, we will introduce the FAIR principles for workflows, the connections between FAIR workflows, and the FAIR ecosystems in which they live, using the EOSC-Life Collaboratory as a concrete example. We will also introduce other community efforts that are easing the ways that workflows are shared and reused by others, and we will discuss how the variations in different workflow settings impact their FAIR perspective.

Authors: Sean R. Wilkinson, Johan Gustafsson, Finn Bacall, Khalid Belhajjame, Salvador Capella, José María Fernández González, Jacob Fosso Tande, Luiz Gadelha, Daniel Garijo, Patricia Grubel, Björn Grüning, Farah Zaib Khan, Sehrish Kanwal, Simone Leo, Stuart Owen, Luca Pireddu, Line Pouchard, Laura Rodriguez-Navas, Beatriz Serrano-Solano, Stian Soiland-Reyes, Baiba Vilne, Alan Williams, Merridee Ann Wouters, Frederik Coppens, Carole Goble

Date Published: 21st May 2025

Publication Type: InBook

Abstract (Expand)

Motivation Protein-protein interactions (PPIs) can be used for a plenty of applications like inferring protein functions or even helping the drug discovery process. For human specie, there is a lot of validated information and functional annotations for the proteins in its interactome. In other species, the known interactome is much smaller compared with human and there are many proteins with few or no annotations by specialists. Understanding the interactome of other species helps to trace evolutionary characteristics, compare important biological processes and also build interactomes for new organisms according to other organisms more related with it instead of relying just to the human interactome. Results In this study, we evaluate the performance of PredPrIn workflow in predicting interactome for seven organisms in terms of scalability and precision showing that PredPrIn gets over than 70% of precision and it takes less than three days even on the largest datasets. We made a transfer learning analysis predicting an organism interactome from each other organism, we then showed an implication regarding to their evolutionary relation in the number of ortholog proteins shared between these organisms. We also present an analysis of functional enrichment showing the proportion of shared annotations between positive and false interactions predicted and extraction of topological features of each organism interactome such as proteins acting as hubs and bridge between modules. From each organism, one of the most frequent biological processes was selected and the proteins and pairs present in it were compared in terms of quantity in the interactome available in HINT database for that organism and the one predicted by PredPrIn. In this comparison we showed that we covered those proteins and pairs covered in HINT and also enriched these processes for almost all organisms. Conclusions In this work, we have proved the efficiency of PredPrIn workflow for protein interaction prediction for seven different organisms using scalability, performance and transfer learning analyses. We have also made cross-species interactome comparisons showing the most frequent biological processes for each organism as well as the topological features of each organism interactome showing the consistency with hypothesis about biological networks. Finally, we described the enrichment made by PredPrIn in selected biological processes showing that its prediction was important to enhance information about these organisms interactomes.

Author: Yasmmin C Martins

Date Published: 7th Jun 2023

Publication Type: Journal

Abstract

Not specified

Authors: Sean R. Wilkinson, Meznah Aloqalaa, Khalid Belhajjame, Michael R. Crusoe, Bruno de Paula Kinoshita, Luiz Gadelha, Daniel Garijo, Ove Johan Ragnar Gustafsson, Nick Juty, Sehrish Kanwal, Farah Zaib Khan, Johannes Köster, Karsten Peters-von Gehlen, Line Pouchard, Randy K. Rannow, Stian Soiland-Reyes, Nicola Soranzo, Shoaib Sufi, Ziheng Sun, Baiba Vilne, Merridee A. Wouters, Denis Yuen, Carole Goble

Date Published: 1st Dec 2025

Publication Type: Journal

Abstract (Expand)

Provenance registration is becoming more and more important, as we increase the size and number of experiments performed using computers. In particular, when provenance is recorded in HPC environments, it must be efficient and scalable. In this paper, we propose a provenance registration method for scientific workflows, efficient enough to run in supercomputers (thus, it could run in other environments with more relaxed restrictions, such as distributed ones). It also must be scalable in order to deal with large workflows, that are more typically used in HPC. We also target transparency for the user, shielding them from having to specify how provenance must be recorded. We implement our design using the COMPSs programming model as a Workflow Management System (WfMS) and use RO-Crate as a well-established specification to record and publish provenance. Experiments are provided, demonstrating the run time efficiency and scalability of our solution.

Authors: Raul Sirvent, Javier Conejero, Francesc Lordan, Jorge Ejarque, Laura Rodriguez-Navas, Jose M. Fernandez, Salvador Capella-Gutierrez, Rosa M. Badia

Date Published: 1st Nov 2022

Publication Type: Proceedings

Abstract (Expand)

In the recent years, the improvement of software and hardware performance has made biomolecular simulations a mature tool for the study of biological processes. Simulation length and the size and complexity of the analyzed systems make simulations both complementary and compatible with other bioinformatics disciplines. However, the characteristics of the software packages used for simulation have prevented the adoption of the technologies accepted in other bioinformatics fields like automated deployment systems, workflow orchestration, or the use of software containers. We present here a comprehensive exercise to bring biomolecular simulations to the “bioinformatics way of working”. The exercise has led to the development of the BioExcel Building Blocks (BioBB) library. BioBB’s are built as Python wrappers to provide an interoperable architecture. BioBB’s have been integrated in a chain of usual software management tools to generate data ontologies, documentation, installation packages, software containers and ways of integration with workflow managers, that make them usable in most computational environments.

Authors: Pau Andrio, Adam Hospital, Javier Conejero, Luis Jordá, Marc Del Pino, Laia Codo, Stian Soiland-Reyes, Carole Goble, Daniele Lezzi, Rosa M. Badia, Modesto Orozco, Josep Ll. Gelpi

Date Published: 1st Dec 2019

Publication Type: Journal

Abstract (Expand)

Development of the needed extensions of the EuroScienceGateway components (Pulsar and Galaxy) to automate and facilitate the integration of user provided computing and storage resources. Project:: EuroScienceGateway was funded by the European Union programme Horizon Europe (HORIZON-INFRA-2021-EOSC-01-04) under grant agreement number 101057388 and by UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding guarantee grant number 10038963. Document: D4.1 Bring Your Own Infrastructure Work Package: Work Package 4. Building blocks for a sustainable operating model. Task: - Task 4.1 Bring Your Own Compute (BYOC) - Task 4.2 Bring Your Own Storage (BYOS) Lead Beneficiary: EGI Contributing Beneficiary: AGH-UST, ALU-FR, EGI, INFN, and VIB Executive Summary This deliverable presents the activities carried out in tasks 4.1 “Bring Your Own Compute (BYOC)” and 4.2 “Bring Your Own Storage (BYOS)”, under Work Package 4 “Building blocks for a sustainable operating model”. The overall goal of tasks 4.1 and 4.2 is to make it easier for Galaxy users to connect their accounts in Galaxy to existing, externally managed compute and storage resources. The benefits are twofold: 1) Galaxy administrators do not need to operate and maintain additional IT infrastructure and 2) Galaxy users get extra capacity to execute workflows that are beyond their assigned quotas in Galaxy.

Authors: Maiken Pedersen, Sanjay Kumar Srikakulam, Paul De Geest, Enol Fernandez-del-Castillo, Andrea Cristofori, Sebastian Luna-Valero, Marco Antonio Tangaro, Stefano Nicotri

Date Published: 26th Aug 2024

Publication Type: Tech report

Abstract (Expand)

Identification of honey bee (Apis mellifera) from various parts of the world is essential for protection of their biodiversity. The identification can be based on wing measurements which is inexpensive and easy available. In order to develop such identification there are required reference samples from various parts or the world. We provide collection of 26481 honey bee fore wing images from 13 countries in Europe: Austria (AT), Croatia (HR), Greece (GR), Moldova (MD), Montenegro (ME), Poland (PL), Portugal (PT), Romania (RO), Serbia (RS), Slovenia (SI), Spain (ES), Turkey (TR). For each country there are three files starting with the two letter country code (indicated earlier in the parentheses): XX-wing-images.zip, XX-raw-coordinates.csv and XX-data.csv, which contain wing images, raw landmark coordinates and geographic coordinates, respectively. Files with prefix EU contain combined data from all countries.

Authors: Andrzej Oleksa, Eliza Căuia, Adrian Siceanu, Zlatko Puškadija, Marin Kovačić, M. Alice Pinto, Pedro João Rodrigues, Fani Hatjina, Leonidas Charistos, Maria Bouga, Janez Prešern, Irfan Kandemir, Slađan Rašić, Szilvia Kusza, Adam Tofilski

Date Published: 1st Oct 2022

Publication Type: Journal

Abstract (Expand)

Project: EuroScienceGateway was funded by the European Union programme Horizon Europe (HORIZON-INFRA-2021-EOSC-01-04) under grant agreement number 101057388 and by UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding guarantee grant number 10038963. Document: D 5.1 Community onboarding cookbook published Work Package: Work Package 5: Community engagement, adoption and onboarding Task: Task 5.4 Mentoring and onboarding new communities Lead Beneficiary: University of Oslo Contributing Beneficiary: All partners Executive Summary The onboarding cookbook is a set of documents describing the creation of the Galaxy communities of practice called Special Interest Groups (SIG). It outlines the necessary steps which shall precede every SIG creation : search for already existing SIGs which match the required profile; clear definition of the future SIG's goals; setup of SIG's administrative bodies and routines; planning of future publications and training. The Cookbook also lists the minimum of prerequisites required before the creation of a new SIG. The goal of these recommendations is to empower the SIG creators to run a Galaxy community . Success stories of already onboarded communities with different levels of maturity are shared, focusing on their experience in using Galaxy and building their SIGs (Milestone 12 - Interim report on the activities of the 3 early adopters and assessment of the take up by their respective communities).

Author: Vazov Nikolay

Date Published: 29th Feb 2024

Publication Type: Journal

Powered by
(v.1.17.0-main)
Copyright © 2008 - 2025 The University of Manchester and HITS gGmbH