WHALE: (W)orkflow for (H)uman-genome (A)nalysis of (L)ong-read (E)xperiments
Introduction
WHALE is a bioinformatics pipeline based on Nextflow and nf-core for long-read DNA sequencing analysis. It takes a samplesheet as input and performs quality control, alignment, variant calling and annotation.
Pipeline summary
- Read QC (FastQC)
- Present QC for raw reads (MultiQC)
- Alignment (Minimap2)
- Variant calling
- Single Nucleotide Variant (SNV) calling (DeepVariant,Clair3,NanoCaller)
- Structural Variant (SV) calling (Sniffles2,CuteSV,SVIM)
 
- Single Nucleotide Variant (SNV) calling (
- Merge variant calling
- Annotation
Usage
First, prepare a samplesheet with your input data. Depending on which step of the analysis you want to run, the input data type can be: fastq, bam (and bai), vcf or bed. The samplesheet should look as follows:
samplesheet.csv:
sample,fastq
A123,/path/to/your/input/file/A123.fastq.gz
B456,/path/to/your/input/file/B456.fastq.gz
There are two types of full analysis:
- 
SNV analysis: -profile snv_analysis 
- 
SV analysis: -profile sv_analysis Each full analysis can start with: - Alignment: --step mapping (input data type: fastq) (default)
- Variant calling: --step variant_calling (input data type: bam and bai)
 
A specific step of the analysis can be executed:
- SNV calling (and merge): -profile snv_calling (input data type: bam and bai)
- SV calling (and merge): -profile sv_calling (input data type: bam and bai)
- SNV annotation: -profile snv_annotation (input data type: vcf)
- SV annotation: -profile sv_annotation (input data type: bed)
Profiles to use in the CCC (UAM):
- -profile uam,singularity,batch
- -profile uam_allcontigs,singularity,batch
Profiles to use in the server:
- -profile tblabserver,singularity
- -profile tblabserver_allcontigs,singularity
Examples
SNV and SV analysis starting with variant calling in the server:
nextflow run WHALE \
   -profile snv_analysis,sv_analysis,tblabserver,singularity \
   --input samplesheet.csv \
   --outdir  \
   --step variant_calling
SV calling in the CCC:
nextflow run WHALE \
   -profile sv_calling,uam,singularity,batch \
   --input samplesheet.csv \
   --outdir 
Pipeline output
WHALE will create the following subdirectories in the output directory:
- alignment
- snv_calling
- snv_merge
 
- snv_annotation
- sv_calling
- sv_merge
 
- sv_annotation
- overlapping_sv_samples
 
- multiqc
- pipeline_info
Citations
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
Illustration by Yolanda Benítez
Version History
master @ 1036e1d (earliest) Created 12th Aug 2025 at 10:56 by Yolanda Benítez Quesada
Merge pull request #1 from RafaFariasVarona/profile_uam
update master branch
Frozen
 master
master1036e1d
     Creators and Submitter
 Creators and SubmitterCreators
Not specifiedSubmitter
Views: 494 Downloads: 124
Created: 12th Aug 2025 at 10:56
 Attributions
 AttributionsNone

 View on GitHub
View on GitHub Download RO-Crate
Download RO-Crate


