scRNA-Seq pipelines

Here we forge the tools to analyze single cell RNA-Seq experiments. The analysis workflow is based on the Bioconductor packages scater and scran as well as the Bioconductor workflows by Lun ATL, McCarthy DJ, & Marioni JC A step-by-step workflow for low-level analysis of single-cell RNA-seq data. F1000Res. 2016 Aug 31 [revised 2016 Oct 31];5:2122 and Amezquita RA, Lun ATL et al. Orchestrating Single-Cell Analysis with Bioconductor Nat Methods. 2020 Feb;17(2):137-145.

Implemented protocols

MARS-Seq (massively parallel single-cell RNA-sequencing): The protocol is based on the publications of Jaitin DA, et al. (2014). Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science (New York, N.Y.), 343(6172), 776–779. https://doi.org/10.1126/science.1247651 and Keren-Shaul H., et al. (2019). MARS-seq2.0: an experimental and analytical pipeline for indexed sorting combined with single-cell RNA sequencing. Nature Protocols. https://doi.org/10.1038/s41596-019-0164-4. The MARS-Seq library preparation protocol is given here. The sequencing reads are demultiplexed according to the respective pool barcodes before they are used as input for the analysis pipeline.
Smart-seq2: Libraries are generated using the Smart-seq2 kit.

Pipeline Workflow

All analysis steps are illustrated in the pipeline flowchart. Specify desired analysis details for your data in the respective essential.vars.groovy file (see below) and run the selected pipeline marsseq.pipeline.groovy or smartsseq.pipeline.groovy as described here. The analysis allows further parameter fine-tuning subsequent the initial analysis e.g. for plotting and QC thresholding. Therefore, a customisable sc.report.Rmd file will be generated in the output reports folder after running the pipeline. Go through the steps and modify the default settings where appropriate. Subsequently, the sc.report.Rmd file can be converted to a final html report using the knitr R-package.

The pipelines includes:

FastQC, MultiQC and other tools for rawdata quality control
Adapter trimming with Cutadapt
Mapping to the genome using STAR
generation of bigWig tracks for visualisation of alignment
Quantification with featureCounts (Subread) and UMI-tools (if UMIs are used for deduplication)
Downstream analysis in R using a pre-designed markdown report file (sc.report.Rmd). Modify this file to fit your custom parameter and thresholds and render it to your final html report. The Rmd file uses, among others, the following tools and methods:
- QC: the scater package.
- Normalization: the scran package.
- Differential expression analysis: the scde package.
- Trajectory analysis (pseudotime): the monocle package.

Pipeline parameter settings

essential.vars.groovy: essential parameter describing the experiment
- project folder name
- reference genome
- experiment design
- adapter sequence, etc.
additional (more specialized) parameter can be given in the var.groovy-files of the individual pipeline modules
targets.txt: comma-separated txt-file giving information about the analysed samples. The following columns are required
- sample: sample identifier. Must be a unique substring of the input sample file name (e.g. common prefixes and suffixes may be removed). These names are grebbed against the count file names to merge targets.txt to the count data.
- plate: plate ID (number)
- row: plate row (letter)
- col: late column (number)
- cells: 0c/1c/10c (control wells)
- group: default variable for cell grouping (e.g. by condition)
for pool-based libraries like MARSseq required additionally:
- pool: the pool ID comprises all cells from 1 library pool (i.e. a set of unique cell barcodes; the cell barcodes are re-used in other pools). Must be a unique substring of the input sample file name. For pool-based design, the pool ID is grebbed against the respective count data filename instead of the sample name as stated above.
- barcode: cell barcodes used as cell identifier in the count files. After merging the count data with targets.txt, the barcodes are replaced with sample IDs given in the sample column (i.e. here, sample names need not be a substring of input sample file name).

Programs required

FastQC
STAR
Samtools
Bedtools
Subread
Picard
UCSC utilities
RSeQC
UMI-tools
R

Resources

QC: the scater package.
Normalization: the scran package.
Trajectory analysis (pseudotime): the monocle package.
A tutorial from Hemberg lab
Luecken and Theis 2019 Current best practices in single‐cell RNA‐seq analysis: a tutorial

scRNA-seq MARS-seq
Version 1

scRNA-Seq pipelines

Implemented protocols

Pipeline Workflow

The pipelines includes:

Pipeline parameter settings

Programs required

Resources

Version History

Version 1 (earliest) Created 7th Oct 2020 at 08:46 by Sergi Sayols

Creator

Submitter

scRNA-seq MARS-seq Version 1

scRNA-Seq pipelines

Implemented protocols

Pipeline Workflow

The pipelines includes:

Pipeline parameter settings

Programs required

Resources

Version History

Version 1 (earliest) Created 7th Oct 2020 at 08:46 by Sergi Sayols

Creator

Submitter

Related items

scRNA-seq MARS-seq
Version 1