A CWL-based pipeline for calling small germline variants, namely SNPs and small INDELs, by processing data from Whole-genome Sequencing (WGS) or Targeted Sequencing (e.g., Whole-exome sequencing; WES) experiments.
On the respective GitHub folder are available:
- The CWL wrappers and subworkflows for the workflow
- A pre-configured YAML template, based on validation analysis of publicly available HTS data
Briefly, the workflow performs the following steps:
- Quality control of Illumina reads (FastQC)
- Trimming of the reads (e.g., removal of adapter and/or low quality sequences) (Trim galore)
- Mapping to reference genome (BWA-MEM)
- Convertion of mapped reads from SAM (Sequence Alignment Map) to BAM (Binary Alignment Map) format (samtools)
- Sorting mapped reads based on read names (samtools)
- Adding information regarding paired end reads (e.g., CIGAR field information) (samtools)
- Re-sorting mapped reads based on chromosomal coordinates (samtools)
- Adding basic Read-Group information regarding sample name, platform unit, platform (e.g., ILLUMINA), library and identifier (picard AddOrReplaceReadGroups)
- Marking PCR and/or optical duplicate reads (picard MarkDuplicates)
- Collection of summary statistics (samtools)
- Creation of indexes for coordinate-sorted BAM files to enable fast random access (samtools)
- Splitting the reference genome into a predefined number of intervals for parallel processing (GATK SplitIntervals)
At this point the application of single-sample workflow follows, during which multiple samples are accepted as input and they are not merged into a unified VCF file but are rather processed separately in each step of the workflow, leading to the production of a VCF file for each sample:
- Application of Base Quality Score Recalibration (BQSR) (GATK BaseRecalibrator, GatherBQSRReports and ApplyBQSR tools)
- Variant calling (GATK HaplotypeCaller)
- Merging of all genomic interval-split gVCF files for each sample (GATK MergeVCFs)
- Separate annotation of SNPs and INDELs based on pretrained Convolutional Neural Network (CNN) models (GATK SelectVariants, CNNScoreVariants and FilterVariantTranches tools)
- (Optional) Independent step of hard-filtering (GATK VariantFiltration)
- Variant filtering based on the information added during VQSR and/or custom filters (bcftools)
- Normalization of INDELs (split multiallelic sites) (bcftools)
- Annotation of the final dataset of filtered variants with genomic, population-related and/or clinical information (ANNOVAR)
Click and drag the diagram to pan, double click or use the controls to zoom.
Inputs
| ID | Name | Description | Type | 
|---|---|---|---|
| raw_files_directory | n/a | n/a | 
 | 
| input_file_split | n/a | n/a | 
 | 
| input_file_split_fwd_single | n/a | n/a | 
 | 
| input_file_split_rev | n/a | n/a | 
 | 
| input_qc_check | n/a | n/a | 
 | 
| input_trimming_check | n/a | n/a | 
 | 
| tg_quality | n/a | n/a | 
 | 
| tg_length | n/a | n/a | 
 | 
| tg_compression | n/a | n/a | 
 | 
| tg_do_not_compress | n/a | n/a | 
 | 
| tg_strigency | n/a | n/a | 
 | 
| tg_trim_suffix | n/a | n/a | 
 | 
| reference_genome | n/a | n/a | 
 | 
| bwa_mem_sec_shorter_split_hits | n/a | n/a | 
 | 
| bwa_mem_num_threads | n/a | n/a | 
 | 
| samtools_view_uncompressed | n/a | n/a | 
 | 
| samtools_view_collapsecigar | n/a | n/a | 
 | 
| samtools_view_readswithoutbits | n/a | n/a | 
 | 
| samtools_view_fastcompression | n/a | n/a | 
 | 
| samtools_view_samheader | n/a | n/a | 
 | 
| samtools_view_count | n/a | n/a | 
 | 
| samtools_view_readsingroup | n/a | n/a | 
 | 
| samtools_view_readtagtostrip | n/a | n/a | 
 | 
| samtools_view_readsquality | n/a | n/a | 
 | 
| samtools_view_readswithbits | n/a | n/a | 
 | 
| samtools_view_cigar | n/a | n/a | 
 | 
| samtools_view_iscram | n/a | n/a | 
 | 
| samtools_view_threads | n/a | n/a | 
 | 
| samtools_view_randomseed | n/a | n/a | 
 | 
| samtools_view_region | n/a | n/a | 
 | 
| samtools_view_readsinlibrary | n/a | n/a | 
 | 
| samtools_fixmate_threads | n/a | n/a | 
 | 
| samtools_fixmate_output_format | n/a | n/a | 
 | 
| samtools_sort_compression_level | n/a | n/a | 
 | 
| samtools_sort_threads | n/a | n/a | 
 | 
| samtools_sort_memory | n/a | n/a | 
 | 
| samtools_flagstat_threads | n/a | n/a | 
 | 
| picard_addorreplacereadgroups_rgpl | n/a | n/a | 
 | 
| gatk_splitintervals_include_intervalList | n/a | n/a | 
 | 
| gatk_splitintervals_exclude_intervalList | n/a | n/a | 
 | 
| gatk_splitintervals_scatter_count | n/a | n/a | 
 | 
| sub_bqsr_known_sites_1 | n/a | n/a | 
 | 
| sub_bqsr_known_sites_2 | n/a | n/a | 
 | 
| sub_bqsr_known_sites_3 | n/a | n/a | 
 | 
| sub_bqsr_interval_padding | n/a | n/a | 
 | 
| sub_hc_native_pairHMM_threads | n/a | n/a | 
 | 
| sub_hc_java_options | n/a | n/a | 
 | 
| VariantFiltration_window | n/a | n/a | 
 | 
| VariantFiltration_cluster | n/a | n/a | 
 | 
| VariantFiltration_filter_name_snp | n/a | n/a | 
 | 
| VariantFiltration_filter_snp | n/a | n/a | 
 | 
| VariantFiltration_filter_name_indel | n/a | n/a | 
 | 
| VariantFiltration_filter_indel | n/a | n/a | 
 | 
| FilterVariantTranches_resource_1 | n/a | n/a | 
 | 
| FilterVariantTranches_resource_2 | n/a | n/a | 
 | 
| FilterVariantTranches_resource_3 | n/a | n/a | 
 | 
| bcftools_view_include_hard_filters | n/a | n/a | 
 | 
| bcftools_view_include_CNN_filters | n/a | n/a | 
 | 
| bcftools_view_threads | n/a | n/a | 
 | 
| bcftools_norm_threads | n/a | n/a | 
 | 
| bcftoomls_norm_multiallelics | n/a | n/a | 
 | 
| table_annovar_database_location | n/a | n/a | 
 | 
| table_annovar_build_over | n/a | n/a | 
 | 
| table_annovar_remove | n/a | n/a | 
 | 
| table_annovar_protocol | n/a | n/a | 
 | 
| table_annovar_operation | n/a | n/a | 
 | 
| table_annovar_na_string | n/a | n/a | 
 | 
| table_annovar_vcfinput | n/a | n/a | 
 | 
| table_annovar_otherinfo | n/a | n/a | 
 | 
| table_annovar_convert_arg | n/a | n/a | 
 | 
Steps
| ID | Name | Description | 
|---|---|---|
| get_raw_files | n/a | n/a | 
| split_single_paired | n/a | n/a | 
| trim_galore_single | n/a | n/a | 
| trim_galore_paired | n/a | n/a | 
| fastqc_raw | n/a | n/a | 
| fastqc_single_trimmed | n/a | n/a | 
| fastqc_paired_trimmed | n/a | n/a | 
| cp_fastqc_raw_zip | n/a | n/a | 
| cp_fastqc_single_zip | n/a | n/a | 
| cp_fastqc_paired_zip | n/a | n/a | 
| rename_fastqc_raw_html | n/a | n/a | 
| rename_fastqc_single_html | n/a | n/a | 
| rename_fastqc_paired_html | n/a | n/a | 
| check_trimming | n/a | n/a | 
| rg_extraction_single | n/a | n/a | 
| bwa_mem_single | n/a | n/a | 
| split_paired_read1_read2 | n/a | n/a | 
| rg_extraction_paired | n/a | n/a | 
| bwa_mem_paired | n/a | n/a | 
| gather_bwa_sam_files | n/a | n/a | 
| samtools_view_conversion | n/a | n/a | 
| samtools_sort_by_name | n/a | n/a | 
| samtools_fixmate | n/a | n/a | 
| samtools_sort | n/a | n/a | 
| picard_addorreplacereadgroups | n/a | n/a | 
| picard_markduplicates | n/a | n/a | 
| samtools_flagstat | n/a | n/a | 
| samtools_view_count_total | n/a | n/a | 
| gatk_splitintervals | n/a | n/a | 
| samtools_index | n/a | n/a | 
| gatk_bqsr_subworkflow | n/a | n/a | 
| gatk_applybqsr | n/a | n/a | 
| samtools_index_2 | n/a | n/a | 
| gatk_haplotypecaller_subworkflow | n/a | n/a | 
| gatk_SelectVariants_snps | n/a | n/a | 
| gatk_SelectVariants_indels | n/a | n/a | 
| gatk_VariantFiltration_snps | n/a | n/a | 
| gatk_VariantFiltration_indels | n/a | n/a | 
| bgzip_snps | n/a | n/a | 
| tabix_snps | n/a | n/a | 
| bgzip_indels | n/a | n/a | 
| tabix_indels | n/a | n/a | 
| bcftools_concat | n/a | n/a | 
| bcftools_view_hard_filter | n/a | n/a | 
| bcftools_norm_hard_filter | n/a | n/a | 
| table_annovar_hard_filtered | n/a | n/a | 
| gatk_CNNScoreVariants | n/a | n/a | 
| gatk_FilterVariantTranches | n/a | n/a | 
| bcftools_view_filter_cnn | n/a | n/a | 
| bcftools_norm_cnn | n/a | n/a | 
| table_annovar_cnn_filtered | n/a | n/a | 
Outputs
| ID | Name | Description | Type | 
|---|---|---|---|
| o_trim_galore_single_fq | n/a | n/a | 
 | 
| o_trim_galore_single_reports | n/a | n/a | 
 | 
| o_trim_galore_paired_fq | n/a | n/a | 
 | 
| o_trim_galore_paired_reports | n/a | n/a | 
 | 
| o_fastqc_raw_html | n/a | n/a | 
 | 
| o_fastqc_single_html | n/a | n/a | 
 | 
| o_fastqc_paired_html | n/a | n/a | 
 | 
| o_fastqc_raw_zip | n/a | n/a | 
 | 
| o_fastqc_single_zip | n/a | n/a | 
 | 
| o_fastqc_paired_zip | n/a | n/a | 
 | 
| o_gather_bwa_sam_files | n/a | n/a | 
 | 
| o_samtools_view_conversion | n/a | n/a | 
 | 
| o_samtools_sort_by_name | n/a | n/a | 
 | 
| o_samtools_fixmate | n/a | n/a | 
 | 
| o_samtools_sort | n/a | n/a | 
 | 
| o_picard_addorreplacereadgroups | n/a | n/a | 
 | 
| o_picard_markduplicates | n/a | n/a | 
 | 
| o_picard_markduplicates_metrics | n/a | n/a | 
 | 
| o_samtools_flagstat | n/a | n/a | 
 | 
| o_samtools_view_count_total | n/a | n/a | 
 | 
| o_samtools_index | n/a | n/a | 
 | 
| o_gatk_bqsr_subworkflow | n/a | n/a | 
 | 
| o_gatk_ApplyBQSR | n/a | n/a | 
 | 
| o_samtools_index_2 | n/a | n/a | 
 | 
| o_gatk_splitintervals | n/a | n/a | 
 | 
| o_gatk_HaplotypeCaller | n/a | n/a | 
 | 
| o_tabix_snps | n/a | n/a | 
 | 
| o_tabix_indels | n/a | n/a | 
 | 
| o_bcftools_concat | n/a | n/a | 
 | 
| o_bcftools_view_hard_filter | n/a | n/a | 
 | 
| o_bcftools_norm_hard_filter | n/a | n/a | 
 | 
| o_gatk_CNNScoreVariants | n/a | n/a | 
 | 
| o_gatk_FilterVariantTranches | n/a | n/a | 
 | 
| o_bcftools_view_filter_cnn | n/a | n/a | 
 | 
| o_bcftools_norm_cnn | n/a | n/a | 
 | 
| o_table_annovar_cnn_filtered_multianno_vcf | n/a | n/a | 
 | 
| o_table_annovar_cnn_filtered_multianno_txt | n/a | n/a | 
 | 
| o_table_annovar_cnn_filtered_avinput | n/a | n/a | 
 | 
| o_table_annovar_hard_filtered_multianno_vcf | n/a | n/a | 
 | 
| o_table_annovar_hard_filtered_multianno_txt | n/a | n/a | 
 | 
| o_table_annovar_hard_filtered_avinput | n/a | n/a | 
 | 
Version History
Version 1 (earliest) Created 5th Jul 2023 at 10:48 by Konstantinos Kyritsis
Initial commit
Frozen
 Version-1
Version-1be8c585
     Creators and Submitter
 Creators and SubmitterCreators
Submitter
Views: 3865 Downloads: 599
Created: 5th Jul 2023 at 10:48
 Attributions
 AttributionsNone
 View on GitHub
View on GitHub Download RO-Crate
Download RO-Crate

 https://orcid.org/0000-0001-8035-341X
 https://orcid.org/0000-0001-8035-341X
