Publications

What is a Publication?
59 Publications visible to you, out of a total of 59

Abstract (Expand)

The third Dutch national airborne laser scanning flight campaign (AHN3, Actueel Hoogtebestand Nederland) conducted between 2014 and 2019 during the leaf-off season (October–April) across the whole Netherlands provides a free and open-access, country-wide dataset with ∼700 billion points and a point density of ∼10(–20) points/m2. The AHN3 point cloud was obtained with Light Detection And Ranging (LiDAR) technology and contains for each point the x, y, z coordinates and additional characteristics (e.g. return number, intensity value, scan angle rank and GPS time). Moreover, the point cloud has been pre-processed by ‘Rijkswaterstraat’ (the executive agency of the Dutch Ministry of Infrastructure and Water Management), comes with a Digital Terrain Model (DTM) and a Digital Surface Model (DSM), and is delivered with a pre-classification of each point into one of six classes (0: Never Classified, 1: Unclassified, 2: Ground, 6: Building, 9: Water, 26: Reserved [bridges etc.]). However, no detailed information on vegetation structure is available from the AHN3 point cloud. We processed the AHN3 point cloud (∼16 TB uncompressed data volume) into 10 m resolution raster layers of ecosystem structure at a national extent, using a novel high-throughput workflow called ‘Laserfarm’ and a cluster of virtual machines with fast central processing units, high memory nodes and associated big data storage for managing the large amount of files. The raster layers (available as GeoTIFF files) capture 25 LiDAR metrics of vegetation structure, including ecosystem height (e.g. 95th percentiles of normalized z), ecosystem cover (e.g. pulse penetration ratio, canopy cover, and density of vegetation points within defined height layers), and ecosystem structural complexity (e.g. skewness and variability of vertical vegetation point distribution). The raster layers make use of the Dutch projected coordinate system (EPSG:28992 Amersfoort / RD New), are each ∼1 GB in size, and can be readily used by ecologists in a geographic information system (GIS) or analytical open-source software such as R and Python. Even though the class ‘1: Unclassified’ mainly includes vegetation points, other objects such as cars, fences, and boats can also be present in this class, introducing potential biases in the derived data products. We therefore validated the raster layers of ecosystem structure using >180,000 hand-labelled LiDAR points in 100 randomly selected sample plots (10 m × 10 m each) across the Netherlands. Besides vegetation, objects such as boats, fences, and cars were identified in the sampled plots. However, the misclassification rate of vegetation points (i.e. non-vegetation points that were assumed to be vegetation) was low (∼0.05) and the accuracy of the 25 LiDAR metrics derived from the AHN3 point cloud was high (∼90%). To minimize existing inaccuracies in this country-wide data product (e.g. ships on water bodies, chimneys on roofs, or cars on roads that might be incorrectly used as vegetation points), we provide an additional mask that captures water bodies, buildings and roads generated from the Dutch cadaster dataset. This newly generated country-wide ecosystem structure data product provides new opportunities for ecology and biodiversity science, e.g. for mapping the 3D vegetation structure of a variety of ecosystems or for modelling biodiversity, species distributions, abundance and ecological niches of animals and their habitats.

Authors: W. Daniel Kissling, Yifang Shi, Zsófia Koma, Christiaan Meijer, Ou Ku, Francesco Nattino, Arie C. Seijmonsbergen, Meiert W. Grootes

Date Published: 1st Feb 2023

Publication Type: Journal

Abstract (Expand)

Development of the Open Infrastructure and Pulsar Network to support distributed job execution and scalable Galaxy deployments across Europe. Project: EuroScienceGateway was funded by the European UnionUnion programme Horizon Europe (HORIZON-INFRA-2021-EOSC-01-04) under grant agreement number 101057388 and by UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding guarantee grant number 10038963. Document: D3.1 Operations documentation on the Open Infrastructure deployment Work Package: Work Package 3. Pulsar Network: Distributed heterogeneous compute. Tasks: - Task 3.1 Develop and maintain an Open Infrastructure based deployment model for Pulsar endpoints. - Task 3.3 Build a European-wide network of Pulsar sites. - Task 3.5 Developing and maintaining national or domain-driven Galaxy servers. Lead Beneficiary: INFN Contributing Beneficiary: INFN, ALU-FR, CNRS, CESNET, UiB, BSC, VIB, IISAS, TUBITAK and CNR Executive Summary Work Package 3 of the EuroScienceGateway project is divided into 5 tasks, aimed at bringing into production (TRL9) the Pulsar Network , a distributed computing network that allows public Galaxy servers to offload jobs to remote computing clusters provided by project partners. Specifically, this deliverable describes the work done in tasks 3.1, 3.2, 3.3 and 3.5. The main objectives of WP3 are: 1) to simplify the deployment and management of new Pulsar and Galaxy endpoints (T3.1 and T3.5), to make Pulsar compatible with the GA4GH TES specifications (T3.2), and to deploy new Pulsar endpoints (T3.3)

Authors: Stefano Nicotri, Marco Antonio Tangaro, Federico Zambelli, Miroslav Ruda, Ales Krenek, Björn Grüning, Sanjay Kumar Srikakulam, Anthony Bretaudeau, Sondre Batalden, María Chavero Díez, Paul De Geest

Date Published: 27th Aug 2024

Publication Type: Journal

Abstract (Expand)

This dataset provides a standardized collection of rasterized Light Detection And Ranging (LiDAR) metrics in GeoTIFF format, derived from country-wide airborne laser scanning (ALS) data across seven demonstration sites in five European countries: Mols Bjerge National Park (Denmark), Reserve Naturelle Nationale du Bagnas (France), Oostvaardersplassen (Netherlands), Salisbury Plain (United Kingdom), Knepp Estate (United Kingdom), Monks Wood (United Kingdom), and the island of Comino (Malta). The sites range in areal size from 0.08 km2 to 54 km2 and include habitat types such as forests, broadleaf and conifer woodlands, small plantations, dry and wet grasslands, marshes, reedbeds, arable fields, farmland, scrublands and mediterranean garigue. A total of 35 LiDAR metrics were calculated, of which 28 represent vegetation structural attributes. These include vegetation height (seven metrics), vegetation cover (fourteen metrics), and vegetation vertical variability (seven metrics). Additionally, seven metrics describe point density (one metric), eigenvalues (three metrics), and normal vectors (three metrics). The rasterized LiDAR metrics have a spatial resolution of 10 m, with coverage and extent defined by shapefiles corresponding to each demonstration site. The raw ALS point clouds were clipped to the site boundaries and processed with the 'Laserfarm' workflow, a standardized computational workflow that includes modular pipelines for re-tiling, normalization, feature extraction, and rasterization. Laserfarm employs the feature extraction module of the open-source ‘Laserchicken’ software to compute the LiDAR metrics. The workflow was implemented using the IT services of the Dutch national facility for information and communication technology, SURF. The clipped LiDAR point clouds are available through a public repository, except for the LiDAR point clouds from Comino, Malta, which are not publicly available. The 35 rasterized LiDAR metrics (GeoTIFF files, 10 m resolution) from all sites, including Comino, as well as the corresponding site boundary shapefiles (geospatial vector format), are provided in a Zenodo repository. Additionally, the Jupyter Notebooks with Python code for executing the Laserfarm workflow are available to facilitate reproducibility and further computational applications. Users should note that the rasterized LiDAR metrics may contain zero or NA values, particularly over water surfaces, with the pulse penetration ratio metric potentially indicating false high vegetation cover over water. Users may reclassify or mask areas with zero values accordingly. Some pixels exhibit abnormal vegetation height values, which can be filtered before analysis. Certain striping patterns, likely resulting from overlapping flight lines and increased point density, are present in some metrics, though their overall impact appears minimal. This dataset enables diverse applications, including canopy height measurements, mapping of hedgerows, treelines, and forest patches, as well as characterizing vegetation density, vertical stratification, and habitat openness. It supports landscape-scale habitat analysis and contributes to the standardization of vegetation metrics from ALS data for site-specific ecological monitoring (e.g., Natura 2000). Moreover, the dataset demonstrates the automated execution of LiDAR data processing workflows, which is crucial for establishing a transnational and multi-site biodiversity and ecosystem observation network.

Authors: W. Daniel Kissling, Wessel Mulder, Jinhu Wang, Yifang Shi

Date Published: 1st Jun 2025

Publication Type: Journal

Abstract (Expand)

Coordinates of 19 landmarks from honey bee (Apis mellifera) worker wings. They represent 1832 workers, 187 colonies, 25 subspecies and four evolutionary lineages. The material was obtained from thee Morphometric Bee Data Bank in Oberursel, Germany.

Authors: Anna Nawrocka, Irfan Kandemir, Stefan Fuchs, Adam Tofilski

Date Published: 1st Apr 2018

Publication Type: Journal

Abstract (Expand)

Considerable efforts have been made to build the Web of Data. One of the main challenges has to do with how to identify the most related datasets to connect to. Another challenge is to publish a local dataset into the Web of Data, following the Linked Data principles. The present work is based on the idea that a set of activities should guide the user on the publication of a new dataset into the Web of Data. It presents the specification and implementation of two initial activities, which correspond to the crawling and ranking of a selected set of existing published datasets. The proposed implementation is based on the focused crawling approach, adapting it to address the Linked Data principles. Moreover, the dataset ranking is based on a quick glimpse into the content of the selected datasets. Additionally, the paper presents a case study in the Biomedical area to validate the implemented approach, and it shows promising results with respect to scalability and performance.

Authors: Yasmmin Cortes Martins, Fábio Faria da Mota, Maria Cláudia Cavalcanti

Date Published: 2016

Publication Type: Journal

Abstract (Expand)

The ongoing coronavirus 2019 (COVID-19) pandemic, triggered by the emerging SARS-CoV-2 virus, represents a global public health challenge. Therefore, the development of effective vaccines is an urgent need to prevent and control virus spread. One of the vaccine production strategies uses the in silico epitope prediction from the virus genome by immunoinformatic approaches, which assist in selecting candidate epitopes for in vitro and clinical trials research. This study introduces the EpiCurator workflow to predict and prioritize epitopes from SARS-CoV-2 genomes by combining a series of computational filtering tools. To validate the workflow effectiveness, SARS-CoV-2 genomes retrieved from the GISAID database were analyzed. We identified 11 epitopes in the receptor-binding domain (RBD) of Spike glycoprotein, an important antigenic determinant, not previously described in the literature or published on the Immune Epitope Database (IEDB). Interestingly, these epitopes have a combination of important properties: recognized in sequences of the current variants of concern, present high antigenicity, conservancy, and broad population coverage. The RBD epitopes were the source for a multi-epitope design to in silico validation of their immunogenic potential. The multi-epitope overall quality was computationally validated, endorsing its efficiency to trigger an effective immune response since it has stability, high antigenicity and strong interactions with Toll-Like Receptors (TLR). Taken together, the findings in the current study demonstrated the efficacy of the workflow for epitopes discovery, providing target candidates for immunogen development.

Authors: Cristina S. Ferreira, Yasmmin C. Martins, Rangel Celso Souza, Ana Tereza R. Vasconcelos

Date Published: 2021

Publication Type: Journal

Abstract (Expand)

Description This EuroScienceGateway report gives an overview of FAIR Digital Objects (FDO), considering their use for computational workflows as scholarly objects. EuroScienceGateway has progressed thed the technologies Signposting and RO-Crate for implementing Workflow FDOs with the registry WorkflowHub and the workflow system Galaxy, and initiated work with academic publishers to encourage workflow citation practices. Here we document how WorkflowHub supports research software best practices for workflows, and assist building FAIR Computational Workflows. Provenance of workflow executions has been made possible in an interoperable way across many workflow systems using Workflow Run Crate profiles, including from Galaxy. Finally this report explores how Workflow FDOs are exposed and can be utilised, e.g. gathered in knowledge graphs and having tighter workflow system integration.

Authors: Stian Soiland-Reyes, Eli Chadwick, Finn Bacall, Jose M. Fernandez, Björn Grüning, Hakan Bayındır

Date Published: 28th Aug 2024

Publication Type: Tech report

Abstract (Expand)

The concept of publishing workflows as scholarly is being recognised and practiced through repositories like WorkflowHub and principles for FAIR Computational Workflow. This deliverable describes how the evolving landscape of the European Open Science Cloud (EOSC) can facilitate workflow publishing in a federated and distributed manner, exemplified by how workflows for Galaxy are published.

Authors: Stian Soiland-Reyes, Eli Chadwick, Armin Dadras, Björn Grüning, Catalin Condurache, Sebastian Luna-Valero, Volodymyr Savchenko

Date Published: 8th Feb 2025

Publication Type: Tech report

Abstract (Expand)

WorkflowHub is a registry of computational workflows, provided as a EOSC Service by ELIXIR-UK, and used by over 200 different research projects, institutions and virtual collaborations. For this milestone of EuroScienceGateway (ESG), the project has developed an onboarding guide for WorkflowHub and registered in WorkflowHub the initial ESG workflows that have been developed and maintained by the project.

Authors: Stian Soiland-Reyes, Björn Grüning, Paul De Geest

Date Published: 29th Feb 2024

Publication Type: Tech report

Abstract (Expand)

Description The Workflowhub Knowledge Graph has been improved and its generation made more robust. When this work was last reported, a complete knowledge graph had been generated but several criticismsicisms were made. The previous graph was: - Verbose and hard for a human to read or navigate - Had unresolvable URIs as root data entities - Contained many duplicate entries - Contained sparse metadata from only a single source Work has successfully been undertaken to address all of these points. The graph now uses partially resolvable, more human readable, URIs for root data entities. Steps have been added to the generation software to add metadata from additional sources (enrichment) and to remove duplicate entries (consolidation). Several areas of the codebase have been refactored and improved, to help ensure repeatability and longevity. The new knowledge graph still has areas that could be improved. Partially resolvable URIs should be migrated to fully resolvable alternatives. Further enrichment processes should be added which affords greater de-duplication.

Authors: Eli Chadwick, Oliver Woolland, Volodymyr Savchenko, Finn Bacall, Alexander Hambley, José María Fernández González, Armin Dadras, Stian Soiland-Reyes

Date Published: 1st Aug 2025

Publication Type: Tech report

Powered by
(v.1.17.0-main)
Copyright © 2008 - 2025 The University of Manchester and HITS gGmbH