Publications

What is a Publication?
59 Publications visible to you, out of a total of 59

Abstract (Expand)

Background There is an availability of omics and often multi-omics cancer datasets on public databases such as Gene Expression Omnibus (GEO), International Cancer Genome Consortium and The Cancer Genome Atlas Program. Most of these databases provide at least the gene expression data for the samples contained in the project. Multi-omics has been an advantageous strategy to leverage personalized medicine, but few works explore strategies to extract knowledge relying only on gene expression level for decisions on tasks such as disease outcome prediction and drug response simulation. The models and information acquired on projects based only on expression data could provide decision making background for future projects that have other level of omics data such as DNA methylation or miRNAs. Results We extended previous methodologies to predict disease outcome from the combination of protein interaction networks and gene expression profiling by proposing an automated pipeline to perform the graph feature encoding and further patient networks outcome classification derived from RNA-Seq. We integrated biological networks from protein interactions and gene expression profiling to assess patient specificity combining the treatment/control ratio with the patient normalized counts of the deferentially expressed genes. We also tackled the disease outcome prediction from the gene set enrichment perspective, combining gene expression with pathway gene sets information as features source for this task. We also explored the drug response outcome perspective of the cancer disease still evaluating the relationship among gene expression profiling with single sample gene set enrichment analysis (ssGSEA), proposing a workflow to perform drug response screening according to the patient enriched pathways. Conclusion We showed the importance of the patient network modeling for the clinical task of disease outcome prediction using graph kernel matrices strategy and showed how ssGSEA improved the prediction only using transcriptomic data combined with pathway scores. We also demonstrated a detailed screening analysis showing the impact of pathway-based gene sets and normalization types for the drug response simulation. We deployed two fully automatized Screening workflows following the FAIR principles for the disease outcome prediction and drug response simulation tasks.

Author: Yasmmin Martins

Date Published: 28th Sep 2023

Publication Type: Journal

Abstract (Expand)

The Linking Open Data (LOD) cloud is a global data space for publishing and linking structured data on the Web. The idea is to facilitate the integration, exchange, and processing of data. The LOD cloud already includes a lot of datasets that are related to the biological area. Nevertheless, most of the datasets about protein interactions do not use metadata standards. This means that they do not follow the LOD requirements and, consequently, hamper data integration. This problem has impacts on the information retrieval, specially with respect to datasets provenance and reuse in further prediction experiments. This paper proposes an ontology to describe and unite the four main kinds of data in a single prediction experiment environment: (i) information about the experiment itself; (ii) description and reference to the datasets used in an experiment; (iii) information about each protein involved in the candidate pairs. They correspond to the biological information that describes them and normally involves integration with other datasets; and, finally, (iv) information about the prediction scores organized by evidence and the final prediction. Additionally, we also present some case studies that illustrate the relevance of our proposal, by showing how queries can retrieve useful information.

Authors: Yasmmin Cortes Martins, Maria Cláudia Cavalcanti, Luis Willian Pacheco Arge, Artur Ziviani, Ana Tereza Ribeiro de Vasconcelos

Date Published: 2019

Publication Type: Journal

Abstract (Expand)

Description Effective resource scheduling is critical in high-performance (HPC) and high-throughput computing (HTC) environments, where traditional scheduling systems struggle with resource contention,tion, data locality, and fault tolerance. Meta-scheduling, which abstracts multiple schedulers for unified job allocation, addresses these challenges. Galaxy, a widely used platform for data-intensive computational analysis, employs the \textit{Total Perspective Vortex (TPV)} system for resource scheduling. With over 550,000 users, Galaxy aims to optimize scheduling efficiency in large-scale environments. While TPV offers flexibility, its decision-making can be enhanced by incorporating real-time resource availability and job status. This paper introduces the TPV Broker, a meta-scheduling framework that integrates real-time resource data to enable dynamic, data-aware scheduling. TPV Broker enhances scalability, resource utilization, and scheduling efficiency in Galaxy, offering potential for further improvements in distributed computing environments.

Authors: Abdulrahman Azab, Paul De Geest, Sanjay Kumar Srikakulam, Tomáš Vondra, Mira Kuntz, Björn Grüning

Date Published: 1st Feb 2025

Publication Type: Unpublished

Abstract (Expand)

Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition hasmposition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the “big picture” of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future.

Authors: Anna-Lena Lamprecht, Magnus Palmblad, Jon Ison, Veit Schwämmle, Mohammad Sadnan Al Manir, Ilkay Altintas, Christopher J. O. Baker, Ammar Ben Hadj Amor, Salvador Capella-Gutierrez, Paulos Charonyktakis, Michael R. Crusoe, Yolanda Gil, Carole Goble, Timothy J. Griffin, Paul Groth, Hans Ienasescu, Pratik Jagtap, Matúš Kalaš, Vedran Kasalica, Alireza Khanteymoori, Tobias Kuhn, Hailiang Mei, Hervé Ménager, Steffen Möller, Robin A. Richardson, Vincent Robert, Stian Soiland-Reyes, Robert Stevens, Szoke Szaniszlo, Suzan Verberne, Aswin Verhoeven, Katherine Wolstencroft

Date Published: 2021

Publication Type: Journal

Abstract

Not specified

Authors: Anna-Lena Lamprecht, Magnus Palmblad, Jon Ison, Veit Schwämmle, Mohammad Sadnan Al Manir, Ilkay Altintas, Christopher J. O. Baker, Ammar Ben Hadj Amor, Salvador Capella-Gutierrez, Paulos Charonyktakis, Michael R. Crusoe, Yolanda Gil, Carole Goble, Timothy J. Griffin, Paul Groth, Hans Ienasescu, Pratik Jagtap, Matúš Kalaš, Vedran Kasalica, Alireza Khanteymoori, Tobias Kuhn, Hailiang Mei, Hervé Ménager, Steffen Möller, Robin A. Richardson, Vincent Robert, Stian Soiland-Reyes, Robert Stevens, Szoke Szaniszlo, Suzan Verberne, Aswin Verhoeven, Katherine Wolstencroft

Date Published: 2021

Publication Type: Journal

Abstract (Expand)

Semantic web standards have shown importance in the last 20 years in promoting data formalization and interlinking between the existing knowledge graphs. In this context, several ontologies and data integration initiatives have emerged in recent years for the biological area, such as the broadly used Gene Ontology that contains metadata to annotate gene function and subcellular location. Another important subject in the biological area is protein–protein interactions (PPIs) which have applications like protein function inference. Current PPI databases have heterogeneous exportation methods that challenge their integration and analysis. Presently, several initiatives of ontologies covering some concepts of the PPI domain are available to promote interoperability across datasets. However, the efforts to stimulate guidelines for automatic semantic data integration and analysis for PPIs in these datasets are limited. Here, we present PPIntegrator, a system that semantically describes data related to protein interactions. We also introduce an enrichment pipeline to generate, predict and validate new potential host–pathogen datasets by transitivity analysis. PPIntegrator contains a data preparation module to organize data from three reference databases and a triplification and data fusion module to describe the provenance information and results. This work provides an overview of the PPIntegrator system applied to integrate and compare host–pathogen PPI datasets from four bacterial species using our proposed transitivity analysis pipeline. We also demonstrated some critical queries to analyze this kind of data and highlight the importance and usage of the semantic data generated by our system.

Authors: Yasmmin Côrtes Martins, Artur Ziviani, Maiana de Oliveira Cerqueira e Costa, Maria Cláudia Reis Cavalcanti, Marisa Fabiana Nicolás, Ana Tereza Ribeiro de Vasconcelos

Date Published: 2023

Publication Type: Journal

Abstract (Expand)

Research Object Crate (RO-Crate) is a lightweight method to package research outputs along with their metadata. Signposting provides a simple yet powerful approach to navigate scholarly objects on the Web. Combining these technologies form a "webby" implementation of the FAIR Digital Object principles which is suitable for retrofitting to existing data infrastructures or even for ad-hoc research objects using regular Web hosting platforms. Here we give an update of recent community development and adoption of RO-Crate and Signposting. It is notable that programmatic access and more detailed profiles have received high attention, as well as several FDO implementations that use RO-Crate.

Authors: Stian Soiland-Reyes, Peter Sefton, Simone Leo, Leyla Jael Castro, Claus Weiland, Herbert Van de Sompel

Date Published: 18th Mar 2025

Publication Type: Journal

Abstract (Expand)

Description Documentation of the cross-domain adoption of the EuroScienceGateway (ESG) project, showcasing how Galaxy was used and extended to meet the data analysis needs of researchers acrossers across biodiversity, climate science, astrophysics, materials science, and biomedical domains. This record outlines ESG’s impact on the onboarding of diverse scientific communities, enabling scalable, reproducible, and FAIR-compliant workflows. Through targeted outreach, infrastructure integration, and community-driven tool development, the project successfully onboarded new user groups and demonstrated Galaxy’s adaptability across multiple scientific verticals. Over 800 tools were integrated into Galaxy during the past 3 years, and dozens of reusable workflows were published to support sensitive data handling, high-throughput image analysis, simulation environments, and federated compute. The deliverable documents use cases, domain-specific onboarding models, training efforts, and collaborative success stories, including the development of the Galaxy Codex and strategic alignment with EOSC, ELIXIR, and NFDI initiatives. Project: EuroScienceGateway was funded by the European Union’s Horizon Europe programme (HORIZON-INFRA-2021-EOSC-01) under grant agreement number 101057388. Document: D5.2 Publication of the usage of EuroScienceGateway by multiple communities Work Package: Work Package 5. Community engagement, adoption and onboarding Tasks: Task 5.1 Biodiversity and Climate Science Task 5.2 Materials Science Task 5.3 Astrophysics Task 5.4 Mentoring and onboarding new communities Lead Beneficiary: University of Oslo (UiO) Contributing Beneficiaries: UiO, ALU-FR, CNRS, UNIFI, UKRI, EPFL, UP, BSC

Authors: Armin Dadras, Denys Savchenko, Andrii Neronov, Volodymyr Savchenko, Nikolay Vazov, Jean Iaquinta, Eva Alloza, María Chavero Díez, Anthony Bretaudeau

Date Published: 18th Aug 2025

Publication Type: Tech report

Abstract (Expand)

Description Documentation of the design, deployment, and operationalization of the European Pulsar Network, developed within the EuroScienceGateway (ESG) project. This deliverable outlines how thew the Pulsar Network enables scalable, federated, and interoperable remote job execution across European Galaxy servers and compute infrastructures. This record showcases the technical architecture, automation strategies, and monitoring solutions behind the distributed execution framework, supporting reproducible workflows and efficient resource sharing. The network connects 13 Pulsar endpoints across 10 countries, integrated with six national Galaxy servers and the European Galaxy server. Deployments span public clouds, institutional HPCs, and EOSC resources, unified under a secure, open-source infrastructure stack using Terraform, Ansible, RabbitMQ, CVMFS, and SABER. The deliverable demonstrates how ESG addressed interoperability and scalability challenges through open infrastructure tooling, cross-institutional coordination, and continuous monitoring. It provides a replicable model for distributed compute resource integration and highlights Galaxy's extensibility in federated scientific computing. Project: EuroScienceGateway, funded by the European Union’s Horizon Europe programme (HORIZON-INFRA-2021-EOSC-01) under grant agreement number 101057388. Document: D3.2 Publication on the Pulsar Network, integrated in workflow management systems Work Package: Work Package 3. Pulsar Network: Distributed heterogeneuos compute Tasks: - Task 3.1 Develop and maintain an Open Infrastructure-based deployment model for Pulsar endpoints - Task 3.2 Add GA4GH Task Execution Service (TES) API to Pulsar - Task 3.3 Build a European-wide network of Pulsar sites - Task 3.4 Add TES support to WfExS (Workflow Execution Service) - Task 3.5 Developing and maintaining national or domain-driven Galaxy servers Lead Beneficiary: CNR Contributing Beneficiaries: CNR, INFN, ALU-FR, CNRS, CESNET, UiO, UB, EPFL, AGH/AGH-UST, BSC, VIB, IISAS, TUBITAK, UNIMAN

Authors: Armin Dadras, Marco Antonio Tangaro

Date Published: 1st Aug 2025

Publication Type: Tech report

Abstract (Expand)

Project: EuroScienceGateway was funded by the European Union programme Horizon Europe (HORIZON-INFRA-2021-EOSC-01-04) under grant agreement number 101057388 and by UK Research and Innovation (UKRI)KRI) under the UK government’s Horizon Europe funding guarantee grant number 10038963. Document: D4.2 Publication on the smart job scheduler implementation Work Package: Work Package 4. Building blocks for a sustainable operating model. Task: - Task 4.3 Implement a smart job-scheduling system across Europe Lead Beneficiary: EGI Contributing Beneficiary: ALU-FR, CESNET, EGI, UiO, and VIB Executive Summary Galaxy is currently using the Total Perspective Vortex (TPV) to schedule millions of jobs for hundred thousand users globally. While TPV has proven to be a robust meta-scheduling tool for Galaxy in the last years, there are areas of improvement that have been addressed in the EuroScienceGateway project: - Gathering live usage metrics from across the distributed computing endpoints connected to Galaxy in order to distribute the load across all sites. - Adding latitude and longitude attributes to data stores and computing endpoints to allocate jobs as close as possible to the location of the data. - Visualizing job distribution across sites with an intuitive dashboard. As a result the EuroScienceGateway project has developed two new tools: - TPV Broker for the efficient meta-scheduling of jobs taking into account real-time usage metrics and data-locality information - Galaxy Job Radar: a web dashboard to easily visualize the allocation of jobs across all sites The EuroScienceGateway project has significantly improved the meta-scheduling of jobs for Galaxy, resulting in less waiting times for users to see their job completed and improving resource utilization across all sites.

Authors: Abdulrahman Azab, Sanjay Kumar Srikakulam, Paul De Geest, Tomáš Vondrák, Björn Grüning, Mira Kuntz, Enol Fernandez-del-Castillo, Sebastian Luna-Valero

Date Published: 27th Feb 2025

Publication Type: Tech report

Powered by
(v.1.17.0-main)
Copyright © 2008 - 2025 The University of Manchester and HITS gGmbH