Workflow Type: Common Workflow Language
Work-in-progress

CWL + RO-Crate Workflow Descriptions

This repository stores computational workflows described using the Common Workflow Language (CWL) and enriched with metadata using Research Object Crate (RO-Crate) conforming to the Workflow Run RO-Crate profile.

Each workflow is contained in its own directory (e.g., WF5201, WF6101, ...). Inside each workflow directory you will typically find at least:

  • The CWL workflow definition (with the same name as the directory, e.g., WF5201.cwl).
  • The RO-Crate metadata file (ro-crate-metadata.json).

Additional files supporting the workflow may also be included.

Overview

This document explains how to represent workflows by combining:

  • CWL (Common Workflow Language): Used to define the computational steps, data flows, and tools.
  • RO-Crate: Used to capture associated metadata (e.g., authorship, licenses, software, datasets) for the workflow.

By separating the abstract workflow definition from its metadata description, you can leverage existing tools for visualization, editing, and validation of your workflows while maintaining a clear structure.

Our Approach

We represent workflows using a combination of CWL and RO-Crate:

  • CWL: Captures the abstract definition of the workflow, detailing its computational steps, data flows, and the tools utilized. It does not include the implementation details of each operation.
  • RO-Crate: Provides rich metadata for the overall repository, the workflow file(s), software, and datasets. This metadata allows you to understand the context, provenance, and related details of the workflow components.

This separation provides flexibility by keeping the execution details (CWL) distinct from descriptive metadata (RO-Crate), yet they remain tightly connected.

Describing a Workflow using CWL + RO-Crate

To fully describe a workflow, you must separate the workflow definition (using CWL) from the metadata description (using RO-Crate).

Defining the CWL Workflow

  1. Identify Global Inputs and Outputs:
    Decide on the data that enters the workflow (inputs) and the final results (outputs). Optionally, include intermediate outputs if they are of interest.

  2. Create the CWL File:
    Write a CWL file in YAML format. Start with file metadata such as:

    cwlVersion: v1.2
    class: Workflow
    
    requirements:
      MultipleInputFeatureRequirement: {}
      SubworkflowFeatureRequirement: {}
    

    [NOTE] The requirements section may vary depending on your workflow. For example, if you use sub-workflows, you must include the SubworkflowFeatureRequirement.

  3. Declare Global Inputs and Outputs:

    inputs:
      DT5210: Directory
      DT5211: Directory
    
    outputs:
      DT5208:
        type: Directory
        outputSource: SS5213/DT5208
    

    [NOTE] Although Directory is commonly used to represent a dataset, you might choose a different type. Refer to the CWL documentation for additional types.

Defining Workflow Steps

Each workflow step (or subworkflow) follows a consistent structure:

SS5205:
  in:
    DT5210: DT5210
  run:
    class: Operation
    inputs:
      DT5210: Directory
    outputs:
      DT5201: File
      DT5203: Directory
  out:
    - DT5201
    - DT5203

Key elements are:

  • in: Defines which data this step requires.
  • run:
    • For operations: Uses the Operation class to abstract away the underlying execution details.
    • For subworkflows: Points to another CWL file.
  • out: Lists the output data produced by the step.

Connecting Steps via Data Dependencies

CWL does not require an explicit execution order. Instead, dependencies are determined by connecting outputs to inputs:

ST520102:
  in:
    DT5201: ST520101/DT5201
  run: ST520102.cwl
  out:
    - DT5255

This connection means ST520102 depends on the output (DT5201) of ST520101 and will execute after it, while still allowing independent steps to run in parallel.

Validating Your Workflow and Metadata

  • CWL Validation:
    Use cwltool to check your CWL files for syntax errors and to generate a graphical visualization (using Graphviz dot format) for verifying the workflow structure.

  • RO-Crate Validation:
    Validate your ro-crate-metadata.json file with tools such as the RO-Crate Validator (Python) and explore your RO-Crate interactively with ro-crate-html-js.


Additional Resources

Click and drag the diagram to pan, double click or use the controls to zoom.

Inputs

ID Name Description Type
DT6102 n/a NEAMTHM18
  • Directory
DT6103 n/a EIDA seismic data archive
  • Directory
DT6104 n/a SLSMF sea level data
  • Directory
DT6105 n/a GNSS displacements
  • Directory
DT6109 n/a topo-bathymetric grids
  • Directory

Steps

ID Name Description
ST610101 n/a SS6101
ST610102 n/a SS6102
ST610103 n/a SS6103
ST610104 n/a SS6104
ST610105 n/a SS6105
ST610106 n/a n/a
ST610107 n/a SS6113
ST610108 n/a SS6114
ST610109 n/a n/a
ST610110 n/a SS6117
ST610111 n/a SS6118

Outputs

ID Name Description Type
DT6101 n/a Scenario Library
  • Directory
DT6106 n/a list of earthquake scenarios
  • Directory
DT6107 n/a list of scenario probabilities
  • Directory
DT6108 n/a list of landslide scenarios
  • Directory
DT6110 n/a Tsunami intensities
  • Directory
DT6111 n/a Ground deformation
  • Directory
DT6112 n/a Tsunami hazard curves
  • Directory
DT6113 n/a Hazard visual products
  • Directory

Version History

main @ c324ab2 (latest) Created 21st Feb 2025 at 13:26 by Raül Sirvent

update preview


Frozen main c324ab2

main @ 3923678 (earliest) Created 20th Dec 2024 at 14:54 by Raül Sirvent

add ro-crate-preview


Frozen main 3923678
help Creators and Submitter
Creators
Not specified
Additional credit

Marco Salvi

Submitter
Activity

Views: 56   Downloads: 0

Created: 20th Dec 2024 at 14:54

Last updated: 21st Feb 2025 at 13:26

help Tags

This item has not yet been tagged.

help Attributions

None

Total size: 451 KB
Powered by
(v.1.16.0-main)
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH