The WorkflowHub Project
The WorkflowHub project is a community framework for enabling scientific workflow research and development by providing foundational tools for analyzing workflow execution traces, and generating synthetic, yet realistic, workflow traces that can be used to develop new techniques, algorithms and systems that can overcome the challenges of efficient and robust execution of ever larger workflows on increasingly complex distributed infrastructures. The figure below shows an overview of the workflow research life cycle process that integrates the three axis of the WorkflowHub project:
The first axis (Workflow Traces) of the WorkflowHub project targets the collection and curation of open access production workflow executions from various scientific applications shared in a common trace format (i.e., The WorkflowHub JSON Format).
The second axis (Workflow Generator) of the WorkflowHub project targets the generation of realistic synthetic workflow traces based on workflow execution profiles extracted from execution traces.
The third axis (Workflow Simulator) of the WorkflowHub project fosters the use of simulation for the development, evaluation, and verification of scheduling and resource provisioning algorithms (e.g., multi-objective function optimization, etc.), evaluation of current and emerging computing platforms (e.g., clouds, IoT, extreme scale, etc.), among others.
The WorkflowHub Python Package
In order to allow users to easily interact with workflow execution traces and synthetic workflows, the WorkflowHub project provides a collection of tools released as an open source Python package, in which enables:
- Analysis of traces of actual workflow executions;
- Production of recipes structures for creating workflow recipes for workflow generation; and
- Generation of synthetic realistic workflow traces.
The Python package documentation provides all necessary information on how to install and use the available tools.