A CWL-based pipeline for processing RNA-Seq data (FASTQ format) and performing differential gene/transcript expression analysis.
On the respective GitHub folder are available:
- The CWL wrappers for the workflow
- A pre-configured YAML template, based on validation analysis of publicly available HTS data
- A table of metadata (
mrna_cll_subsets_phenotypes.csv
), based on the same validation analysis, to serve as an input example for the design of comparisons during differential expression analysis
Briefly, the workflow performs the following steps:
- Quality control of Illumina reads (FastQC)
- Trimming of the reads (e.g., removal of adapter and/or low quality sequences) (Trim galore)
- (Optional) custom processing of the reads using FASTA/Q Trimmer (part of the FASTX-toolkit)
- Mapping to reference genome (HISAT2)
- Convertion of mapped reads from SAM (Sequence Alignment Map) to BAM (Binary Alignment Map) format (samtools)
- Sorting mapped reads based on chromosomal coordinates (samtools)
Subsequently, two independent workflows are implemented for differential expression analysis at the transcript and gene level.
First, following the reference protocol for HISAT, StringTie and Ballgown transcript expression analysis, StringTie along with a reference transcript annotation GTF (Gene Transfer Format) file (if one is available) is used to:
- Assemble transcripts for each RNA-Seq sample using the previous read alignments (BAM files)
- Generate a global, non-redundant set of transcripts observed in any of the RNA-Seq samples
- Estimate transcript abundances and generate read coverage tables for each RNA-Seq sample, based on the global, merged set of transcripts (rather than the reference) which is observed across all samples
Ballgown program is then used to load the coverage tables generated in the previous step and perform statistical analyses for differential expression at the transcript level. Notably, the StringTie - Ballgown protocol applied here was selected to include potentially novel transcripts in the analysis.
Second, featureCounts is used to count reads that are mapped to selected genomic features, in this case genes by default, and generate a table of read counts per gene and sample. This table is passed as input to DESeq2 to perform differential expression analysis at the gene level. Both Ballgown and DESeq2 R scripts, along with their respective CWL wrappers, were designed to receive as input various parameters, such as experimental design, contrasts of interest, numeric thresholds, and hidden batch effects.
Click and drag the diagram to pan, double click or use the controls to zoom.
Inputs
ID | Name | Description | Type |
---|---|---|---|
raw_files_directory | n/a | n/a |
|
input_file_split | n/a | n/a |
|
input_file_split_fwd_single | n/a | n/a |
|
input_file_split_rev | n/a | n/a |
|
input_qc_check | n/a | n/a |
|
input_trimming_check | n/a | n/a |
|
premapping_input_check | n/a | n/a |
|
tg_quality | n/a | n/a |
|
tg_length | n/a | n/a |
|
tg_compression | n/a | n/a |
|
tg_do_not_compress | n/a | n/a |
|
tg_trim_suffix | n/a | n/a |
|
tg_strigency | n/a | n/a |
|
fastx_first_base_to_keep | n/a | n/a |
|
fastx_last_base_to_keep | n/a | n/a |
|
hisat2_num_of_threads | n/a | n/a |
|
hisat2_alignments_tailored_trans_assemb | n/a | n/a |
|
hisat2_idx_directory | n/a | n/a |
|
hisat2_idx_basename | n/a | n/a |
|
hisat2_known_splicesite_infile | n/a | n/a |
|
samtools_view_isbam | n/a | n/a |
|
samtools_view_collapsecigar | n/a | n/a |
|
samtools_view_uncompressed | n/a | n/a |
|
samtools_view_fastcompression | n/a | n/a |
|
samtools_view_samheader | n/a | n/a |
|
samtools_view_count | n/a | n/a |
|
samtools_view_readswithoutbits | n/a | n/a |
|
samtools_view_readsingroup | n/a | n/a |
|
samtools_view_readtagtostrip | n/a | n/a |
|
samtools_view_readsquality | n/a | n/a |
|
samtools_view_readswithbits | n/a | n/a |
|
samtools_view_cigar | n/a | n/a |
|
samtools_view_iscram | n/a | n/a |
|
samtools_view_threads | n/a | n/a |
|
samtools_view_randomseed | n/a | n/a |
|
samtools_view_region | n/a | n/a |
|
samtools_view_readsinlibrary | n/a | n/a |
|
samtools_sort_compression_level | n/a | n/a |
|
samtools_sort_threads | n/a | n/a |
|
samtools_sort_memory | n/a | n/a |
|
samtools_sort_sort_by_name | n/a | n/a |
|
stringtie_guide_gff | n/a | n/a |
|
stringtie_transcript_merge_mode | n/a | n/a |
|
stringtie_out_gtf | n/a | n/a |
|
stringtie_expression_estimation_mode | n/a | n/a |
|
stringtie_ballgown_table_files | n/a | n/a |
|
stringtie_cpus | n/a | n/a |
|
stringtie_verbose | n/a | n/a |
|
stringtie_min_isoform_abundance | n/a | n/a |
|
stringtie_junction_coverage | n/a | n/a |
|
stringtie_min_read_coverage | n/a | n/a |
|
stringtie_conservative_mode | n/a | n/a |
|
bg_phenotype_file | n/a | n/a |
|
bg_phenotype | n/a | n/a |
|
bg_samples | n/a | n/a |
|
bg_timecourse | n/a | n/a |
|
bg_feature | n/a | n/a |
|
bg_measure | n/a | n/a |
|
bg_confounders | n/a | n/a |
|
bg_custom_model | n/a | n/a |
|
bg_mod | n/a | n/a |
|
bg_mod0 | n/a | n/a |
|
featureCounts_number_of_threads | n/a | n/a |
|
featureCounts_annotation_file | n/a | n/a |
|
featureCounts_output_file | n/a | n/a |
|
featureCounts_read_meta_feature_overlap | n/a | n/a |
|
deseq2_metadata | n/a | n/a |
|
deseq2_design | n/a | n/a |
|
deseq2_samples | n/a | n/a |
|
deseq2_min_sum_of_reads | n/a | n/a |
|
deseq2_reference_level | n/a | n/a |
|
deseq2_phenotype | n/a | n/a |
|
deseq2_contrast | n/a | n/a |
|
deseq2_numerator | n/a | n/a |
|
deseq2_denominator | n/a | n/a |
|
deseq2_lfcThreshold | n/a | n/a |
|
deseq2_pAdjustMethod | n/a | n/a |
|
deseq2_alpha | n/a | n/a |
|
deseq2_parallelization | n/a | n/a |
|
deseq2_cores | n/a | n/a |
|
deseq2_transformation | n/a | n/a |
|
deseq2_blind | n/a | n/a |
|
deseq2_hypothesis | n/a | n/a |
|
deseq2_reduced | n/a | n/a |
|
deseq2_hidden_batch_effects | n/a | n/a |
|
deseq2_hidden_batch_row_means | n/a | n/a |
|
deseq2_hidden_batch_method | n/a | n/a |
|
deseq2_variables | n/a | n/a |
|
Steps
ID | Name | Description |
---|---|---|
get_raw_files | n/a | n/a |
split_single_paired | n/a | n/a |
trim_galore_single | n/a | n/a |
trim_galore_paired | n/a | n/a |
fastqc_raw | n/a | n/a |
fastqc_single_trimmed | n/a | n/a |
fastqc_paired_trimmed | n/a | n/a |
cp_fastqc_raw_zip | n/a | n/a |
cp_fastqc_single_zip | n/a | n/a |
cp_fastqc_paired_zip | n/a | n/a |
rename_fastqc_raw_html | n/a | n/a |
rename_fastqc_single_html | n/a | n/a |
rename_fastqc_paired_html | n/a | n/a |
fastx_trimmer_single | n/a | n/a |
fastx_trimmer_paired | n/a | n/a |
check_for_fastx_and_produce_names | n/a | n/a |
hisat2_for_single_reads | n/a | n/a |
hisat2_for_paired_reads | n/a | n/a |
collect_hisat2_sam_files | n/a | n/a |
samtools_view | n/a | n/a |
samtools_sort | n/a | n/a |
stringtie_transcript_assembly | n/a | n/a |
stringtie_merge | n/a | n/a |
stringtie_expression | n/a | n/a |
ballgown_de | n/a | n/a |
featureCounts | n/a | n/a |
DESeq2_analysis | n/a | n/a |
Outputs
ID | Name | Description | Type |
---|---|---|---|
o_trim_galore_single_fq | n/a | n/a |
|
o_trim_galore_single_reports | n/a | n/a |
|
o_trim_galore_paired_fq | n/a | n/a |
|
o_trim_galore_paired_reports | n/a | n/a |
|
o_fastqc_raw_html | n/a | n/a |
|
o_fastqc_single_html | n/a | n/a |
|
o_fastqc_paired_html | n/a | n/a |
|
o_fastqc_raw_zip | n/a | n/a |
|
o_fastqc_single_zip | n/a | n/a |
|
o_fastqc_paired_zip | n/a | n/a |
|
o_fastx_trimmer_single | n/a | n/a |
|
o_fastx_trimmer_paired | n/a | n/a |
|
o_hisat2_for_single_reads_reports | n/a | n/a |
|
o_hisat2_for_paired_reads_reports | n/a | n/a |
|
o_collect_hisat2_sam_files | n/a | n/a |
|
o_samtools_view | n/a | n/a |
|
o_samtools_sort | n/a | n/a |
|
o_stringtie_transcript_assembly_gtf | n/a | n/a |
|
o_stringtie_merge | n/a | n/a |
|
o_stringtie_expression_gtf | n/a | n/a |
|
o_stringtie_expression_outdir | n/a | n/a |
|
o_ballgown_de_results | n/a | n/a |
|
o_ballgown_object | n/a | n/a |
|
o_ballgown_de_custom_model | n/a | n/a |
|
o_featureCounts | n/a | n/a |
|
o_deseq2_de_results | n/a | n/a |
|
o_deseq2_dds_object | n/a | n/a |
|
o_deseq2_res_lfcShrink_object | n/a | n/a |
|
o_deseq2_transformed_object | n/a | n/a |
|
Version History
Version 1 (earliest) Created 5th Jul 2023 at 09:44 by Konstantinos Kyritsis
Initial commit
Frozen
Version-1
a80a6c7
Creators
Submitter
Views: 1807 Downloads: 250
Created: 5th Jul 2023 at 09:44
Last updated: 5th Jul 2023 at 10:15
None