This BioExcel best practice guide outlines the development process for writing a workflow using the Common Workflow Language (CWL), from creating and selecting tools like BioBB, through early experimentation, reuse and testing, to optimization and ensuring reproducibility before publication in workflow repositories.
Creators: Stian Soiland-Reyes, Douglas Lowe, Robin Long
Submitter: Stian Soiland-Reyes
Abstract (Expand)
Authors: Michael R. Crusoe, Sanne Abeln, Alexandru Iosup, Peter Amstutz, John Chilton, Nebojša Tijanić, Hervé Ménager, Stian Soiland-Reyes, Carole Goble
Date Published: 14th May 2021
Publication Type: Unpublished
Citation: arXiv 2105.07028 [cs.DC]
Keynote at German Conference on Bioinformatics 2021 https://gcb2021.de/ FAIR Computational Workflows Computational workflows capture precise descriptions of the steps and data dependencies needed to carry out computational data pipelines, analysis and simulations in many areas of Science, including the Life Sciences. The use of computational workflows to manage these multi-step computational processes has accelerated in the past few years driven by the need for scalable data processing, the ...
Creator: Carole Goble
Submitter: Carole Goble
This is a Nextflow implementaion of the GATK Somatic Short Variant Calling workflow. This workflow can be used to discover somatic short variants (SNVs and indels) from tumour and matched normal BAM files following GATK's Best Practices Workflow. The workflowis currently optimised to run efficiently and at scale on the National Compute Infrastructure, Gadi.
Type: Nextflow
Creators: Nandan Deshpande, Tracy Chew, Cali Willet, Georgina Samaha
Submitter: Georgina Samaha
PAIRED-END workflow. Align reads on fasta reference/assembly using bwa mem, get a consensus, variants, mutation explanations.
IMPORTANT:
- For "bcftools call" consensus step, the --ploidy file is in "Données partagées" (Shared Data) and must be imported in your history to use the worflow by providing this file (tells bcftools to consider haploid variant calling).
- SELECT THE MOST ADAPTED VADR MODEL for annotation (see vadr parameters).
A CWL-based pipeline for calling small germline variants, namely SNPs and small INDELs, by processing data from Whole-genome Sequencing (WGS) or Targeted Sequencing (e.g., Whole-exome sequencing; WES) experiments.
On the respective GitHub folder are available:
- The CWL wrappers and subworkflows for the workflow
- A pre-configured YAML template, based on validation analysis of publicly available HTS data
Briefly, the workflow performs the following steps:
- Quality control of Illumina reads ...
Type: Common Workflow Language
Creators: Konstantinos Kyritsis, Nikolaos Pechlivanis, Fotis Psomopoulos
Submitter: Konstantinos Kyritsis
A CWL-based pipeline for calling small germline variants, namely SNPs and small INDELs, by processing data from Whole-genome Sequencing (WGS) or Targeted Sequencing (e.g., Whole-exome sequencing; WES) experiments.
On the respective GitHub folder are available:
- The CWL wrappers and subworkflows for the workflow
- A pre-configured YAML template, based on validation analysis of publicly available HTS data
Briefly, the workflow performs the following steps:
- Quality control of Illumina reads ...
Type: Common Workflow Language
Creators: Konstantinos Kyritsis, Nikolaos Pechlivanis, Fotis Psomopoulos
Submitter: Konstantinos Kyritsis
A CWL-based pipeline for processing ChIP-Seq data (FASTQ format) and performing:
- Peak calling
- Consensus peak count table generation
- Detection of super-enhancer regions
- Differential binding analysis
On the respective GitHub folder are available:
- The CWL wrappers for the workflow
- A pre-configured YAML template, based on validation analysis of publicly available HTS data
- Tables of metadata (
EZH2_metadata_CLL.csv
andH3K27me3_metadata_CLL.csv
), based on the same validation ...
Type: Common Workflow Language
Creators: Konstantinos Kyritsis, Nikolaos Pechlivanis, Fotis Psomopoulos
Submitter: Konstantinos Kyritsis
A CWL-based pipeline for processing RNA-Seq data (FASTQ format) and performing differential gene/transcript expression analysis.
On the respective GitHub folder are available:
- The CWL wrappers for the workflow
- A pre-configured YAML template, based on validation analysis of publicly available HTS data
- A table of metadata (
mrna_cll_subsets_phenotypes.csv
), based on the same validation analysis, to serve as an input example for the design of comparisons during differential expression ...
Type: Common Workflow Language
Creators: Konstantinos Kyritsis, Nikolaos Pechlivanis, Fotis Psomopoulos
Submitter: Konstantinos Kyritsis
This repository hosts Metabolome Annotation Workflow (MAW). The workflow takes MS2 .mzML format data files as an input in R. It performs spectral database dereplication using R Package Spectra and compound database dereplication using SIRIUS OR MetFrag . Final candidate selection is done in Python using RDKit and PubChemPy.
Type: Common Workflow Language
Creators: Mahnoor Zulfiqar, Michael R. Crusoe, Luiz Gadelha, Christoph Steinbeck, Maria Sorokina, Kristian Peters
Submitter: Mahnoor Zulfiqar
Joint multi-omics dimensionality reduction approaches for CAKUT data using peptidome and proteome data
Brief description In (Cantini et al. 2020), Cantini et al. evaluated 9 representative joint dimensionality reduction (jDR) methods for multi-omics integration and analysis and . The methods are Regularized Generalized Canonical Correlation Analysis (RGCCA), Multiple co-inertia analysis (MCIA), Multi-Omics Factor Analysis (MOFA), Multi-Study Factor Analysis (MSFA), iCluster, Integrative NMF ...
Type: Snakemake
Creators: Ozan Ozisik, Juma Bayjan, Cenna Doornbos, Friederike Ehrhart, Matthias Haimel, Laura Rodriguez-Navas, José Mª Fernández, Eleni Mina, Daniël Wijnbergen
Submitter: Juma Bayjan
In this analysis, we created an extended pathway, using the WikiPathways repository (Version 20210110) and the three -omics datasets. For this, each of the three -omics datasets was first analyzed to identify differentially expressed elements, and pathways associated with the significant miRNA-protein links were detected. A miRNA-protein link is deemed significant, and may possibly be implying causality, if both a miRNA and its target are significantly differentially expressed.
The peptidome and ...
Type: Snakemake
Creators: Woosub Shin, Friederike Ehrhart, Juma Bayjan, Cenna Doornbos, Ozan Ozisik
Submitter: Juma Bayjan
MGnify (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the assembly, analysis and archiving of microbiome data derived from sequencing microbial populations that are present in particular environments. Over the past 2 years, MGnify (formerly EBI Metagenomics) has more than doubled the number of publicly available analysed datasets held within the resource. Recently, an updated approach to data analysis has been unveiled (version 5.0), replacing the previous single pipeline ...
Type: Common Workflow Language
Creator: Alex L Mitchell, Alexandre Almeida, Martin Beracochea, Miguel Boland, Josephine Burgin, Guy Cochrane, Michael R Crusoe, Varsha Kale, Simon C Potter, Lorna J Richardson, Ekaterina Sakharova, Maxim Scheremetjew, Anton Korobeynikov, Alex Shlemov, Olga Kunyavskaya, Alla Lapidus, Robert D Finn
Submitter: Martin Beracochea
MGnify (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the assembly, analysis and archiving of microbiome data derived from sequencing microbial populations that are present in particular environments. Over the past 2 years, MGnify (formerly EBI Metagenomics) has more than doubled the number of publicly available analysed datasets held within the resource. Recently, an updated approach to data analysis has been unveiled (version 5.0), replacing the previous single pipeline ...
Type: Common Workflow Language
Creator: Alex L Mitchell, Alexandre Almeida, Martin Beracochea, Miguel Boland, Josephine Burgin, Guy Cochrane, Michael R Crusoe, Varsha Kale, Simon C Potter, Lorna J Richardson, Ekaterina Sakharova, Maxim Scheremetjew, Anton Korobeynikov, Alex Shlemov, Olga Kunyavskaya, Alla Lapidus, Robert D Finn
Submitter: Martin Beracochea
RNASeq-DE @ NCI-Gadi processes RNA sequencing data (single, paired and/or multiplexed) for differential expression (raw FASTQ to counts). This pipeline consists of multiple stages and is designed for the National Computational Infrastructure's (NCI) Gadi supercompter, leveraging multiple nodes to run each stage in parallel.
Infrastructure_deployment_metadata: Gadi (NCI)
Local Cromwell implementation of GATK4 germline variant calling pipeline
See the GATK website for more information on this toolset
Assumptions
- Using hg38 human reference genome build
- Running 'locally' i.e. not using HPC/SLURM scheduling, or containers. This repo was specifically tested on Pawsey Nimbus 16 CPU, 64GB RAM virtual machine, primarily running in the
/data
volume storage partition. - Starting from short-read Illumina paired-end fastq ...
microPIPE was developed to automate high-quality complete bacterial genome assembly using Oxford Nanopore Sequencing in combination with Illumina sequencing.
To build microPIPE we evaluated the performance of several tools at each step of bacterial genome assembly, including basecalling, assembly, and polishing. Results at each step were validated using the high-quality ST131 Escherichia coli strain EC958 (GenBank: HG941718.1). After appraisal of each step, we selected the best combination of ...
Type: Nextflow
Creators: Valentine Murigneux, Leah W Roberts, Brian M Forde, Minh-Duy Phan, Nguyen Thi Khanh Nhu, Adam D Irwin, Patrick N A Harris, David L Paterson, Mark A Schembri, David M Whiley, Scott A Beatson
Submitter: Valentine Murigneux
This BioExcel best practice guide discusses the workflow engines available for the Common Workflow Language (CWL).
Creators: Robin Long, Douglas Lowe, Stian Soiland-Reyes
Submitter: Stian Soiland-Reyes