R workflow for RNA-seq analysis in unexplained recurrent pregnancy loss

Workflow Type: R markdown
Stable

The bioinformatic workflow presented here enables the analysis of RNA sequencing data obtained from human reproductive tissues in unexplained recurrent pregnancy loss (uRPL) research. This pipeline requires a sample sheet containing the sample information (example_input_data.csv) and gene expression matrices generated using the Salmon tool in the nf-core/rnaseq bioinformatics pipeline (example_count_data.csv). For more information on how to use the nf-core/rnaseq pipeline including the required inputs and expected outputs, please refer to their documentation. The processes used to download publicly available high throughput RNA-seq datasets and generate the Salmon gene expression matrices (e.g. counts files) can be found in our Github repository (also available as a file through WorkflowHub - Data_Preparation.md) alongside documentation showing the expected outputs from this pipeline.

The workflow developed during this project was designed with the intent to be used to compare datasets generated using different RNA sequencing methods by looking for concordance in differential expression analysis results, including differentially expressed genes and enriched functional pathways. This workflow can be accessed and used by others to help improve the standardisation and reproducibility of RNA-seq analytical processes, through consistent analysis methods and documentation.

This workflow can be split into different sections to complete the following analyses with the main packages used listed (tool versions available in the attached R script)

Section 1: Intialising environment and loading required packages and files

Section 2: Principal Component Analysis (PCAtools)

  • Section 2.1: Generating PCA objects to be used in Sections 2.2-2.4 (PCAtools)

  • Section 2.2: Principal Component Retention (PCAtools)

  • Section 2.3: Confounding factor identification using Eigencor plots and Pearson's Correlation coefficients (PCAtools)

  • Section 2.4: Generate PCA plots with arrows representing confounding numeric variables (ggplot2)

Section 3: Differential Expression Analysis (DESeq2)

  • Assess concordance in differential expression between datasets (ggVenn)

Section 4: Functional Annotation of KEGG pathways (clusterProfiler, pathview)

For the most up-to-date versions of the workflow script please check our GitHub page

Version History

Version 3 (latest) Created 10th Oct 2025 at 08:28 by Isabella Brown

No revision comments

Frozen Version-3 a562786

Version 2 Created 6th Oct 2025 at 03:51 by Isabella Brown

Update of workflow script to reflect changes to object manipulation within R Studio.


Frozen Version-2 a562786

Version 1 (earliest) Created 2nd Oct 2025 at 00:48 by Isabella Brown

Initial commit Since uploading Version 1 of our workflow, a change has been made to how certain object types can be manipulated in R studio. Please see the most up-to-date version of our workflow on our GitHub page.


Frozen Version-1 a562786
help Creators and Submitter
Creators
  • Isabella M Brown
  • Paul Whatmore
  • Kylie Munyard
Submitter
Activity

Views: 173   Downloads: 33

Created: 2nd Oct 2025 at 00:48

Last updated: 6th Oct 2025 at 03:47

help Attributions

None

Total size: 17 MB
Powered by
(v.1.17.0-main)
Copyright © 2008 - 2025 The University of Manchester and HITS gGmbH