This is a genomics pipeline to do a single germline sample variant-calling, adapted from GATK Best Practice Workflow.
This workflow is a reference pipeline for using the Janis Python framework (pipelines assistant).
- Alignment: bwa-mem
- Variant-Calling: GATK HaplotypeCaller
- Outputs the final variants in the VCF format.
Resources
This pipeline has been tested using the HG38 reference set, available on Google Cloud Storage through:
This pipeline expects the assembly references to be as they appear in that storage (".fai", ".amb", ".ann", ".bwt", ".pac", ".sa", "^.dict"). The known sites (snps_dbsnp, snps_1000gp, known_indels, mills_indels) should be gzipped and tabix indexed.
Infrastructure_deployment_metadata: Spartan (Unimelb)
Click and drag the diagram to pan, double click or use the controls to zoom.
Inputs
ID | Name | Description | Type |
---|---|---|---|
sample_name | n/a | Sample name from which to generate the readGroupHeaderLine for BwaMem |
|
fastqs | n/a | An array of FastqGz pairs. These are aligned separately and merged to create higher depth coverages from multiple sets of reads |
|
reference | n/a | The reference genome from which to align the reads. This requires a number indexes (can be generated with the 'IndexFasta' pipeline This pipeline has been tested using the HG38 reference set. This pipeline expects the assembly references to be as they appear in the GCP example. For example: - HG38: https://console.cloud.google.com/storage/browser/genomics-public-data/references/hg38/v0/ - (".fai", ".amb", ".ann", ".bwt", ".pac", ".sa", "^.dict"). |
|
snps_dbsnp | n/a | From the GATK resource bundle, passed to BaseRecalibrator as ``known_sites`` |
|
snps_1000gp | n/a | From the GATK resource bundle, passed to BaseRecalibrator as ``known_sites``. Accessible from the HG38 genomics-public-data google cloud bucket: https://console.cloud.google.com/storage/browser/genomics-public-data/references/hg38/v0/ |
|
known_indels | n/a | From the GATK resource bundle, passed to BaseRecalibrator as ``known_sites`` |
|
mills_indels | n/a | From the GATK resource bundle, passed to BaseRecalibrator as ``known_sites`` |
|
gatk_intervals | n/a | List of intervals over which to split the GATK variant calling. If no interval is provided, one interval for each chromosome in the reference will be generated. |
|
cutadapt_adapters | n/a | Specifies a containment list for cutadapt, which contains a list of sequences to determine valid overrepresented sequences from the FastQC report to trim with Cuatadapt. The file must contain sets of named adapters in the form: ``name[tab]sequence``. Lines prefixed with a hash will be ignored. |
|
align_and_sort_sortsam_tmpDir | n/a | Undocumented option |
|
Steps
ID | Name | Description |
---|---|---|
fastqc | FastQC | n/a |
getfastqc_adapters | Parse FastQC Adaptors | n/a |
align_and_sort | Align and sort reads | n/a |
merge_and_mark | Merge and Mark Duplicates | n/a |
calculate_performancesummary_genomefile | Generate genome for BedtoolsCoverage | n/a |
performance_summary | Performance summary workflow (whole genome) | n/a |
generate_gatk_intervals | Generating genomic intervals by chromosome | n/a |
_evaluate_prescatter-bqsr-intervals | n/a | n/a |
bqsr | GATK Base Recalibration on Bam | Perform base quality score recalibration |
_evaluate_prescatter-vc_gatk-intervals | n/a | n/a |
vc_gatk | GATK4 Germline Variant Caller | n/a |
vc_gatk_merge | GATK4: Gather VCFs | n/a |
vc_gatk_compressvcf | BGZip | n/a |
vc_gatk_sort_combined | BCFTools: Sort | n/a |
vc_gatk_uncompress | UncompressArchive | n/a |
vc_gatk_addbamstats | Annotate Bam Stats to Germline Vcf Workflow | n/a |
Outputs
ID | Name | Description | Type |
---|---|---|---|
out_fastqc_reports | n/a | A zip file of the FastQC quality report. |
|
out_bam | n/a | Aligned and indexed bam. |
|
out_performance_summary | n/a | A text file of performance summary of bam |
|
out_variants_gatk | n/a | Merged variants from the GATK caller |
|
out_variants_gatk_split | n/a | Unmerged variants from the GATK caller (by interval) |
|
out_variants_bamstats | n/a | n/a |
|
Version History
Version 1 (earliest) Created 12th Nov 2021 at 02:30 by Richard Lupat
Added/updated 2 files
Open
master
2e7a0bb
Creator
Additional credit
Michael Franklin; Jiaan Yu; Juny Kesumadewi
Submitter
Views: 2736 Downloads: 295
Created: 12th Nov 2021 at 02:30
Last updated: 12th Nov 2021 at 02:41
This item has not yet been tagged.
None