Introduction
vibbits/rnaseq-editing is a bioinformatics pipeline that can be used to analyse RNA sequencing data obtained from organisms with a reference genome and annotation followed by a prediction step of editing sites using RDDpred.
The pipeline is largely based on the nf-core RNAseq pipeline.
The initial nf-core pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!
Pipeline summary
- Merge re-sequenced FastQ files (
cat
) - Read QC (
FastQC
) - Adapter and quality trimming (
Trimmomatics
) - Use of STAR for multiple alignment and quantification:
STAR
- Sort and index alignments (
SAMtools
) - Prediction of editing sites using RDDpred (
RDDpred
) - Extensive quality control:
- Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks (
MultiQC
,R
)
Quick Start
-
Install
Nextflow
(>=21.04.0
) -
Install
Docker
on a Linux operating system. Note: This pipeline does not currently support running with macOS. -
Download the pipeline via git clone, download the associated training data files for RDDpred into the assets folder, download the reference data to
git clone https://github.com/vibbits/rnaseq-editing.git cd $(pwd)/rnaseq-editing/assets # download training data file for RDDpred wget -c # download reference data for your genome, we provide genome and indexed genome for STAR 2.7.3a
- Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use
-profile
in your command. This will enable eitherdocker
orsingularity
and set the appropriate execution settings for your local compute environment.
- Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use
-
Start running your own analysis using Docker locally!
nextflow run vibbits/rnaseq-editing \ --input samplesheet.csv \ --genome hg19 \ -profile docker
-
An executable Python script called
fastq_dir_to_samplesheet.py
has been provided if you would like to auto-create an input samplesheet based on a directory containing FastQ files before you run the pipeline (requires Python 3 installed locally) e.g.wget -L https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py ./fastq_dir_to_samplesheet.py samplesheet.csv --strandedness reverse
-
The final analysis has been executed on the Azure platform using Azure Kubernetes Services (AKS). AKS has to be set up on the Azure platform by defining a standard node pool called sys next to the scalable node pool cpumem using Standard_E8ds_v4 as node size for calculation. Furthermore, persistent volume claims (PVCs) have been setup for input and work folders of the nextflow runs. In the PVC
input
the reference data as well as the fastqc files have been stored where the PVCwork
, the temporary nextflow files for the individual runs as well as the output files have been stored. -
The config file for the final execution run for RNAseq editing for the human samples and reference genome hg19.
-
Documentation
The nf-core/rnaseq pipeline comes with documentation about the pipeline usage, parameters and output.
Credits
These scripts were written to provide a reproducible data analysis pipeline until the downstream processing using dedicated R scripts for exploratory analysis and plotting. The general structure of pipeline is based on the data analysis steps of the our recent paper ADAR1 interaction with Z-RNA promotes editing of endogenous double-stranded RNA and prevents MDA5-dependent immune activation.
Note: The nf-core scripts this pipeline is based on were originally written for use at the National Genomics Infrastructure, part of SciLifeLab in Stockholm, Sweden, by Phil Ewels (@ewels) and Rickard Hammarén (@Hammarn).
The RNAseq pipeline was re-written in Nextflow DSL2 by Harshil Patel (@drpatelh) from The Bioinformatics & Biostatistics Group at The Francis Crick Institute, London.
Citations
The nf-core
publication is cited here as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
Version History
master @ 6b921d6 (earliest) Created 27th Jan 2022 at 10:44 by Alexander Botzki
update RDDpred version (Docker) to 1.1.4
Frozen
master
6b921d6
Creators
Not specifiedSubmitter
Views: 1599 Downloads: 222
Created: 27th Jan 2022 at 10:44
Last updated: 4th Nov 2022 at 20:36
This item has not yet been tagged.
None