Teams: NBIS, ERGA Assembly
Organizations: NBIS – National Bioinformatics Infrastructure Sweden
https://orcid.org/0000-0003-1675-0677Expertise: Bioinformatics, Genomics, Scientific workflow developement, Workflows
I'm a bioinformatician for the National Bioinformatics Infrastrure Sweden. I specialise in de novo genome assembly and workflow development with Nextflow. I'm also a Nextflow ambassador and nf-core maintainer.
Teams: Biodiversity Genomics Europe (general)
Organizations: BIOPOLIS Association (BIOPOLIS-CIBIO)
https://orcid.org/0000-0001-8650-7248Expertise: Molecular Biology, phylogenetics, evolution
Tools: Databases, Genomics, Genetic analysis
Expertise: Bioinformatics
Expertise: Genomics, Metagenomics, NGS, Python, evolution
Tools: Genomics, Python, Snakemake, Transcriptomics
Teams: Cimorgh IT solutions
Organizations: cimorgh IT
Expertise: Bioinformatics, Genomics, Metagenomics, Microbiology, NGS, Python, R, bash, WDL
Tools: Mathematical Modelling, R, WDL
Expertise: Bioinformatics, Genomics, Scientific workflow developement
Expertise: Bioinformatics, Genomics, Machine Learning
Tools: Python, R, Machine Learning
I am a Ph.D. student in Gong lab. I am interested in cancer genomics, including the mining of genetic risk determinants in cancer, functional prediction of genetic variants, tumor-associated molecular epidemiology, large-scale data integration, analysis, and mining, as well as the construction of bioinformatical data platforms.
Teams: Galaxy Training Network
Organizations: Erasmus University Medical Centre
https://orcid.org/0000-0003-3803-468XExpertise: Genomics, amplicon analysis, Microbiology
Tools: Galaxy
Post-doc at ErasmusMC, Galaxy Training Network (GTN) Lead
Expertise: Bioinformatics, Genomics, Metagenomics, Data Management
Tools: CWL, Jupyter notebook, Nextflow, Molecular Biology, Workflows, Microbiology, Transcriptomics, Perl, Python, R
Expertise: Bioinformatics, Computer Science, Data Management, Genetics, Genomics, Machine Learning, Metagenomics, NGS, Scientific workflow developement, Software Engineering
Tools: Databases, Galaxy, Genomics, Jupyter notebook, Machine Learning, Nextflow, nf-core, PCR, Perl, Python, R, rtPCR, Snakemake, Transcriptomics, Virology, Web, Web services, Workflows
Dad, husband and PhD. Scientist, technologist and engineer. Bibliophile. Philomath. Passionate about science, medicine, research, computing and all things geeky!
Teams: MAB - ATGC
Organizations: Centre National de la Recherche Scientifique (CNRS)
https://orcid.org/0000-0003-3791-3973Expertise: Bioinformatics, Genomics, algorithm, Machine Learning, Metagenomics, NGS, Computer Science
Tools: Transcriptomics, Genomics, Python, C/C++, Web services, Workflows
This is a project specific guide for the Bioiversity Genomics Europe (BGE project use of WorkflowHub.
Abstract (Expand)
Authors: Cristina S. Ferreira, Yasmmin C. Martins, Rangel Celso Souza, Ana Tereza R. Vasconcelos
Date Published: 2021
Publication Type: Journal
DOI: 10.7717/peerj.12548
Citation: PeerJ 9:e12548
Pipelines used by the genomes assembly teams part of the Biodiversity Genomics Europe project
Collection of Galaxy workflows for generating results used for creating ERGA-BGE Reports
For a given genome, two workflows should be run: the assembly evaluation (ASM analyses), and the annotation evaluation (ANNOT analyses)
Depending on the kind of data used for the genome assembly, you should choose HiFi or ONT (Illumina) workflows for ASM analyses
Collection of de-novo genome assembly workflows written for implementation in Galaxy
Input data should be Oxford Nanopore raw reads plus Illumina WGS reads and Illumina 3-dimensional Chromatin Confirmation Capture (HiC) reads
Executing all workflows will output one scaffolded collapsed assembly and the complete QC analyses
Please run the workflows in order: WF0 (there are two, one for ONT, and another one for Illumina that can be used independently for the WGS and HiC reads), WF1, WF2, WF3, WF4
Maintainers: Diego De Panis
Number of items: 6
Tags: Assembly, Bioinformatics, Galaxy, Genomics, Genome assembly, ONT, illumina, Hi-C
Collection of de-novo genome assembly workflows written for implementation in Galaxy
Input data should be Oxford Nanopore raw reads plus Illumina WGS reads and Illumina 3-dimensional Chromatin Confirmation Capture (HiC) reads
Executing all workflows will output one scaffolded collapsed assembly and the complete QC analyses
Please run the workflows in order: WF0 (there are two, one for ONT, and another one for Illumina that can be used independently for the WGS and HiC reads), WF1, WF2, WF3, WF4
Maintainers: Diego De Panis
Number of items: 6
Tags: Assembly, Bioinformatics, Galaxy, Genomics, Genome assembly, ONT, illumina, Hi-C
This is a general collection of workflows used by or developed by members of the BGE project.
Collection of de-novo genome assembly workflows written for implementation in Galaxy
Input data should be PacBio HiFi reads and Illumina 3-dimensional Chromatin Confirmation Capture (HiC) reads
Executing all workflows will output two scaffolded haplotype assemblies and the complete QC analyses
Please run the workflows in order: WF0 (there are two, one for HiFi and one for Illumina HiC), WF1, WF2, WF3, WF4
Maintainers: Tom Brown, Diego De Panis
Number of items: 6
Tags: Assembly, Bioinformatics, Galaxy, Genomics, Genome assembly, HiFi, Hi-C
The Vertebrate Genomes Pipelines in Galaxy are intended to allow a user to generate high-quality near error-free assemblies of species from a user's own data or from the GenomeArk database.
Pairwise alignment pipeline (genome to genome or reads to genome)
Assembly Evaluation for ERGA-BGE Reports
One Assmebly, HiFi WGS reads + HiC reads
The workflow requires the following:
- Species Taxonomy ID number
- NCBI Genome assembly accession code
- BUSCO Lineage
- WGS accurate reads accession code
- NCBI HiC reads accession code
The workflow will get the data and process it to generate genome profiling (genomescope, smudgeplot -optional-), assembly stats (gfastats), merqury stats (QV, completeness), BUSCO, snailplot, contamination blobplot, and HiC ...
Assembly Evaluation for ERGA-BGE Reports
One Assmebly, Illumina WGS reads + HiC reads
The workflow requires the following:
- Species Taxonomy ID number
- NCBI Genome assembly accession code
- BUSCO Lineage
- WGS accurate reads accession code
- NCBI HiC reads accession code
The workflow will get the data and process it to generate genome profiling (genomescope, smudgeplot -optional-), assembly stats (gfastats), merqury stats (QV, completeness), BUSCO, snailplot, contamination blobplot, and ...
The workflow requires the user to provide:
- ENSEMBL link address of the annotation GFF3 file
- ENSEMBL link address of the assembly FASTA file
- NCBI taxonomy ID
- BUSCO lineage
- OMArk database
Thw workflow will produce statistics of the annotation based on AGAT, BUSCO and OMArk.
cfDNA UniFlow is a unified, standardized, and ready-to-use workflow for processing whole genome sequencing (WGS) cfDNA samples from liquid biopsies. It includes essential steps for pre-processing raw cfDNA samples, quality control and reporting. Additionally, several optional utility functions like GC bias correction and estimation of copy number state are included. Finally, we provide specialized methods for extracting coverage derived signals and visualizations comparing cases and controls. ...
deepconsensus 1.2 snakemake pipeline
This snakemake-based workflow takes in a subreads.bam and results in a deepconsensus.fastq
- no methylation calls !
The metadata id of the subreads file needs to be: "m[numeric][numeric][numeric].subreads.bam"
Chunking (how many subjobs) and ccs min quality filter can be adjusted in the config.yaml
the checkpoint model for deepconsensus1.2 should be accessible like this: gsutil cp -r gs://brain-genomics-public/research/deepconsensus/models/v1.2/model_checkpoint/* ...
Annotation of an assembled bacterial genomes to detect genes, potential plasmids, integrons and Insertion sequence (IS) elements.
Type: Galaxy
Creators: ABRomics , Pierre Marin, Clea Siguret, abromics-consortium
Submitter: WorkflowHub Bot
Short paired-end read analysis to provide quality analysis, read cleaning and taxonomy assignation
Type: Galaxy
Creators: ABRomics , Pierre Marin, Clea Siguret, abromics-consortium
Submitter: WorkflowHub Bot
Antimicrobial resistance gene detection from assembled bacterial genomes
Type: Galaxy
Creators: ABRomics , Pierre Marin, Clea Siguret, abromics-consortium
Submitter: WorkflowHub Bot
Assembly of bacterial paired-end short read data with generation of quality metrics and reports
Type: Galaxy
Creators: Abromics , Pierre Marin, Clea Siguret, abromics-consortium
Submitter: WorkflowHub Bot
Variant Interpretation Pipeline (VIP) that annotates, filters and reports prioritized causal variants in humans, see https://github.com/molgenis/vip for more information.
An open-source analysis pipeline to detect germline or somatic variants from whole genome or targeted sequencing
Pipeline for the identification of circular DNAs
A mapping-based pipeline for creating a phylogeny from bacterial whole genome sequences
GSC (Genotype Sparse Compression)
Genotype Sparse Compression (GSC) is an advanced tool for lossless compression of VCF files, designed to efficiently store and manage VCF files in a compressed format. It accepts VCF/BCF files as input and utilizes advanced compression techniques to significantly reduce storage requirements while ensuring fast query capabilities. In our study, we successfully compressed the VCF files from the 1000 Genomes Project (1000Gpip3), consisting of 2504 samples and 80 ...
GSC (Genotype Sparse Compression)
Genotype Sparse Compression (GSC) is an advanced tool for lossless compression of VCF files, designed to efficiently store and manage VCF files in a compressed format. It accepts VCF/BCF files as input and utilizes advanced compression techniques to significantly reduce storage requirements while ensuring fast query capabilities. In our study, we successfully compressed the VCF files from the 1000 Genomes Project (1000Gpip3), consisting of 2504 samples and 80 ...
Parabricks-Genomics-nf is a GPU-enabled pipeline for alignment and germline short variant calling for short read sequencing data. The pipeline utilises NVIDIA's Clara Parabricks toolkit to dramatically speed up the execution of best practice bioinformatics tools. Currently, this pipeline is configured specifically for NCI's Gadi HPC.
NVIDIA's Clara Parabricks can deliver a significant ...
HiC contact map generation
Snakemake pipeline for the generation of .pretext
and .mcool
files for visualisation of HiC contact maps with the softwares PretextView and HiGlass, respectively.
Prerequisites
This pipeine has been tested using Snakemake v7.32.4
and requires conda for installation of required tools. To run the pipline use the command:
snakemake --use-conda
There are provided a set of configuration and running scripts for exectution on a slurm queueing system. After configuring ...
This is a Nextflow implementaion of the GATK Somatic Short Variant Calling workflow. This workflow can be used to discover somatic short variants (SNVs and indels) from tumour and matched normal BAM files following GATK's Best Practices Workflow. The workflowis currently optimised to run efficiently and at scale on the National Compute Infrastructure, Gadi.
Type: Nextflow
Creators: Nandan Deshpande, Tracy Chew, Cali Willet, Georgina Samaha
Submitter: Georgina Samaha
Workflow for Creating a large disease network from various datasets and databases for IBM, and applying the active subnetwork identification method MOGAMUN.
Type: Common Workflow Language
Creators: Daphne Wijnbergen, Mridul Johari
Submitter: Daphne Wijnbergen
ANNOTATO - Annotation workflow To Annotate Them Oll
ERGA Protein-coding gene annotation workflow.
Adapted from the work of Sagane Joye:
https://github.com/sdind/genome_annotation_workflow
Prerequisites
The following programs are required to run the workflow and the listed version were tested. It should be noted that older versions of snakemake are not compatible with newer versions of singularity as is noted here: https://github.com/nextflow-io/nextflow/issues/1659.
conda v 23.7.3
...
CLAWS (CNAG's Long-read Assembly Workflow in Snakemake)
Snakemake Pipeline used for de novo genome assembly @CNAG. It has been developed for Snakemake v6.0.5.
It accepts Oxford Nanopore Technologies (ONT) reads, PacBio HFi reads, illumina paired-end data, illumina 10X data and Hi-C reads. It does the preprocessing of the reads, assembly, polishing, purge_dups, scaffodling and different evaluation steps. By default it will preprocess the reads, run Flye + Hypo + purge_dups + yahs and evaluate ...
Type: Snakemake
Creators: Jessica Gomez-Garrido, Fernando Cruz (CNAG), Francisco Camara (CNAG), Tyler Alioto (CNAG)
Submitter: Jessica Gomez-Garrido
ARA (Automated Record Analysis) : An automatic pipeline for exploration of SRA datasets with sequences as a query
Requirements
-
Docker
-
Please checkout the Docker installation guide.
or
-
Mamba package manager
-
Please checkout the mamba or micromamba official installation guide.
-
We prefer
mamba
overconda
since it is faster and uses ...
prepareChIPs
This is a simple snakemake
workflow template for preparing single-end ChIP-Seq data.
The steps implemented are:
- Download raw fastq files from SRA
- Trim and Filter raw fastq files using
AdapterRemoval
- Align to the supplied genome using
bowtie2
- Deduplicate Alignments using
Picard MarkDuplicates
- Call Macs2 Peaks using
macs2
A pdf of the rulegraph is available here
Full details for each step are given below. Any additional ...
A CWL-based pipeline for calling small germline variants, namely SNPs and small INDELs, by processing data from Whole-genome Sequencing (WGS) or Targeted Sequencing (e.g., Whole-exome sequencing; WES) experiments.
On the respective GitHub folder are available:
- The CWL wrappers and subworkflows for the workflow
- A pre-configured YAML template, based on validation analysis of publicly available HTS data
Briefly, the workflow performs the following steps:
- Quality control of Illumina reads ...
Type: Common Workflow Language
Creators: Konstantinos Kyritsis, Nikolaos Pechlivanis, Fotis Psomopoulos
Submitter: Konstantinos Kyritsis
A CWL-based pipeline for calling small germline variants, namely SNPs and small INDELs, by processing data from Whole-genome Sequencing (WGS) or Targeted Sequencing (e.g., Whole-exome sequencing; WES) experiments.
On the respective GitHub folder are available:
- The CWL wrappers and subworkflows for the workflow
- A pre-configured YAML template, based on validation analysis of publicly available HTS data
Briefly, the workflow performs the following steps:
- Quality control of Illumina reads ...
Type: Common Workflow Language
Creators: Konstantinos Kyritsis, Nikolaos Pechlivanis, Fotis Psomopoulos
Submitter: Konstantinos Kyritsis
Purge dups
This snakemake pipeline is designed to be run using as input a contig-level genome and pacbio reads. This pipeline has been tested with snakemake v7.32.4
. Raw long-read sequencing files and the input contig genome assembly must be given in the config.yaml
file. To execute the workflow run:
snakemake --use-conda --cores N
Or configure the cluster.json and run using the ./run_cluster
command
MoMofy
Module for integrative Mobilome prediction
Bacteria can acquire genetic material through horizontal gene transfer, allowing them to rapidly adapt to changing environmental conditions. These mobile genetic elements can be classified into three main categories: plasmids, phages, and integrons. Autonomous elements are those capable of excising themselves from the chromosome, reintegrating elsewhere, and potentially modifying the host's physiology. Small integrative elements like insertion ...
IGVreport-nf
- Description
- Diagram
- User guide
- Workflow summaries
- Metadata
- Component tools
- Required (minimum) inputs/parameters
- Additional notes
- Help/FAQ/Troubleshooting
- Acknowledgements/citations/credits
Description
Quickly generate [IGV .html
...
GermlineStructuralV-nf is a pipeline for identifying structural variant events in human Illumina short read whole genome sequence data. GermlineStructuralV-nf identifies structural variant and copy number events from BAM files using Manta, Smoove, and TIDDIT. Variants are then merged using SURVIVOR, ...
Type: Nextflow
Creators: Georgina Samaha, Marina Kennerson, Tracy Chew, Sarah Beecroft
Submitter: Georgina Samaha
IndexReferenceFasta-nf
===========
Workflow for Metagenomics from bins to metabolic models (GEMs)
Summary
- Prodigal gene prediction
- CarveMe genome scale metabolic model reconstruction
- MEMOTE for metabolic model testing
- SMETANA Species METabolic interaction ANAlysis
Other UNLOCK workflows on WorkflowHub: https://workflowhub.eu/projects/16/workflows?view=default
All tool CWL files and other workflows can be found here: Tools: https://gitlab.com/m-unlock/cwl Workflows: https://gitlab.com/m-unlock/cwl/workflows
**How ...
Workflow for LongRead Quality Control and Filtering
- NanoPlot (read quality control) before and after filtering
- Filtlong (read trimming)
- Kraken2 taxonomic read classification before and after filtering
- Minimap2 read filtering based on given references
Other UNLOCK workflows on WorkflowHub: https://workflowhub.eu/projects/16/workflows?view=default
All tool CWL files and other workflows can be found here: https://gitlab.com/m-unlock/cwl/workflows
**How to setup and use an UNLOCK ...
Type: Common Workflow Language
Creators: Bart Nijsse, Jasper Koehorst, Germán Royval
Submitter: Bart Nijsse
- Deprecated -
See our updated hybrid assembly workflow: https://workflowhub.eu/workflows/367
And other workflows: https://workflowhub.eu/projects/16#workflows
Workflow for sequencing with ONT Nanopore data, from basecalled reads to (meta)assembly and binning
- Workflow Nanopore Quality
- Kraken2 taxonomic classification of FASTQ reads
- Flye (de-novo assembly)
- Medaka (assembly polishing)
- metaQUAST (assembly quality reports)
When Illumina reads are provided:
- Workflow ...
Type: Common Workflow Language
Creators: Bart Nijsse, Jasper Koehorst, Germán Royval
Submitter: Jasper Koehorst
Workflow for Illumina Quality Control and Filtering
Multiple paired datasets will be merged into single paired dataset.
Summary:
- FastQC on raw data files
- fastp for read quality trimming
- BBduk for phiX and (optional) rRNA filtering
- Kraken2 for taxonomic classification of reads (optional)
- BBmap for (contamination) filtering using given references (optional)
- FastQC on filtered (merged) data
Other UNLOCK workflows on WorkflowHub: https://workflowhub.eu/projects/16/workflows?view=default ...
Bootstrapping-for-BQSR @ NCI-Gadi is a pipeline for bootstrapping a variant resource to enable GATK base quality score recalibration (BQSR) for non-model organisms that lack a publicly available variant resource. This implementation is optimised for the National Compute Infrastucture's Gadi HPC. Multiple rounds of bootstrapping can be performed. Users can use Fastq-to-bam @ NCI-Gadi and Germline-ShortV @ NCI-Gadi to ...
Local Cromwell implementation of GATK4 germline variant calling pipeline
See the GATK website for more information on this toolset
Assumptions
- Using hg38 human reference genome build
- Running 'locally' i.e. not using HPC/SLURM scheduling, or containers. This repo was specifically tested on Pawsey Nimbus 16 CPU, 64GB RAM virtual machine, primarily running in the
/data
volume storage partition. - Starting from short-read Illumina paired-end fastq ...
Fastq-to-BAM @ NCI-Gadi is a genome alignment workflow that takes raw FASTQ files, aligns them to a reference genome and outputs analysis ready BAM files. This workflow is designed for the National Computational Infrastructure's (NCI) Gadi supercompter, leveraging multiple nodes on NCI Gadi to run all stages of the workflow in parallel, either massively parallel using the scatter-gather approach or parallel by sample. It consists of a number of stages and follows the BROAD Institute's best practice ...
Type: Shell Script
Creators: Cali Willet, Tracy Chew, Georgina Samaha, Rosemarie Sadsad, Andrey Bliznyuk, Ben Menadue, Rika Kobayashi, Matthew Downton, Yue Sun
Submitter: Georgina Samaha
SLURM HPC Cromwell implementation of GATK4 germline variant calling pipeline
See the GATK website for more information on this toolset
Assumptions
- Using hg38 human reference genome build
- Running using HPC/SLURM scheduling. This repo was specifically tested on Pawsey Zeus machine, primarily running in the
/scratch
partition. - Starting from short-read Illumina paired-end fastq files as input
Dependencies
The following versions have been ...
Germline-ShortV @ NCI-Gadi is an implementation of the BROAD Institute's best practice workflow for germline short variant discovery. This implementation is optimised for the National Compute Infrastucture's Gadi HPC, utilising scatter-gather parallelism to enable use of multiple nodes with high CPU or memory efficiency. This workflow requires sample BAM files, which can be generated using the Fastq-to-bam @ NCI-Gadi pipeline. Germline-ShortV can be applied ...
Type: Shell Script
Creators: Rosemarie Sadsad, Georgina Samaha, Tracy Chew, Cali Willet
Submitter: Tracy Chew
ORSON combine state-of-the-art tools for annotation processes within a Nextflow pipeline: sequence similarity search (PLAST, BLAST or Diamond), functional annotation retrieval (BeeDeeM) and functional prediction (InterProScan). When required, BUSCO completness evaluation and eggNOG Orthogroup annotation can be activated. While ORSON results can be analyzed through the command-line, it also offers the possibility to be compatible with BlastViewer or Blast2GO graphical tools.
Type: Nextflow
Creators: Cyril Noel, Alexandre Cormier, Patrick Durand, Laura Leroi, Pierre Cuzin
Submitter: Patrick Durand