WHALE: (W)orkflow for (H)uman-genome (A)nalysis of (L)ong-read (E)xperiments
Introduction
WHALE is a bioinformatics pipeline based on Nextflow and nf-core for long-read DNA sequencing analysis. It takes a samplesheet as input and performs quality control, alignment, variant calling and annotation.
Pipeline summary
- Read QC (
FastQC
) - Present QC for raw reads (
MultiQC
) - Alignment (
Minimap2
) - Variant calling
- Single Nucleotide Variant (SNV) calling (
DeepVariant
,Clair3
,NanoCaller
) - Structural Variant (SV) calling (
Sniffles2
,CuteSV
,SVIM
)
- Single Nucleotide Variant (SNV) calling (
- Merge variant calling
- Annotation
Usage
First, prepare a samplesheet with your input data. Depending on which step of the analysis you want to run, the input data type can be: fastq, bam (and bai), vcf or bed. The samplesheet should look as follows:
samplesheet.csv
:
sample,fastq
A123,/path/to/your/input/file/A123.fastq.gz
B456,/path/to/your/input/file/B456.fastq.gz
There are two types of full analysis:
-
SNV analysis: -profile snv_analysis
-
SV analysis: -profile sv_analysis
Each full analysis can start with:
- Alignment: --step mapping (input data type: fastq) (default)
- Variant calling: --step variant_calling (input data type: bam and bai)
A specific step of the analysis can be executed:
- SNV calling (and merge): -profile snv_calling (input data type: bam and bai)
- SV calling (and merge): -profile sv_calling (input data type: bam and bai)
- SNV annotation: -profile snv_annotation (input data type: vcf)
- SV annotation: -profile sv_annotation (input data type: bed)
Profiles to use in the CCC (UAM):
- -profile uam,singularity,batch
- -profile uam_allcontigs,singularity,batch
Profiles to use in the server:
- -profile tblabserver,singularity
- -profile tblabserver_allcontigs,singularity
Examples
SNV and SV analysis starting with variant calling in the server:
nextflow run WHALE \
-profile snv_analysis,sv_analysis,tblabserver,singularity \
--input samplesheet.csv \
--outdir \
--step variant_calling
SV calling in the CCC:
nextflow run WHALE \
-profile sv_calling,uam,singularity,batch \
--input samplesheet.csv \
--outdir
Pipeline output
WHALE will create the following subdirectories in the output directory:
- alignment
- snv_calling
- snv_merge
- snv_annotation
- sv_calling
- sv_merge
- sv_annotation
- overlapping_sv_samples
- multiqc
- pipeline_info
Citations
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md
file.
Illustration by Yolanda Benítez
Version History
master @ 1036e1d (earliest) Created 12th Aug 2025 at 10:56 by Yolanda Benítez Quesada
Merge pull request #1 from RafaFariasVarona/profile_uam
update master branch
Frozen
master
1036e1d

Creators
Not specifiedSubmitter
Views: 90 Downloads: 17
Created: 12th Aug 2025 at 10:56

None