[](https://github.com/nf-core/denovotranscript/actions/workflows/ci.yml)
[](https://github.com/nf-core/denovotranscript/actions/workflows/linting.yml)[](https://nf-co.re/denovotranscript/results)[](https://doi.org/10.5281/zenodo.13324371)
[](https://www.nf-test.com)
[](https://www.nextflow.io/)
[](https://docs.conda.io/en/latest/)
[](https://www.docker.com/)
[](https://sylabs.io/docs/)
[](https://cloud.seqera.io/launch?pipeline=https://github.com/nf-core/denovotranscript)
[](https://nfcore.slack.com/channels/denovotranscript)[](https://twitter.com/nf_core)[](https://mstdn.science/@nf_core)[](https://www.youtube.com/c/nf-core)
## Introduction
**nf-core/denovotranscript** is a bioinformatics pipeline for de novo transcriptome assembly of paired-end short reads from bulk RNA-seq. It takes a samplesheet and FASTQ files as input, perfoms quality control (QC), trimming, assembly, redundancy reduction, pseudoalignment, and quantification. It outputs a transcriptome assembly FASTA file, a transcript abundance TSV file, and a MultiQC report with assembly quality and read QC metrics.

1. Read QC of raw reads ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
2. Adapter and quality trimming ([`fastp`](https://github.com/OpenGene/fastp))
3. Read QC of trimmed reads ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
4. Remove rRNA or mitochondrial DNA (optional) ([`SortMeRNA`](https://hpc.nih.gov/apps/sortmeRNA.html))
5. Transcriptome assembly using any combination of the following:
- [`Trinity`](https://github.com/trinityrnaseq/trinityrnaseq/wiki) with normalised reads (default=True)
- [`Trinity`](https://github.com/trinityrnaseq/trinityrnaseq/wiki) with non-normalised reads
- [`rnaSPAdes`](https://ablab.github.io/spades/rna.html) medium filtered transcripts outputted (default=True)
- [`rnaSPAdes`](https://ablab.github.io/spades/rna.html) soft filtered transcripts outputted
- [`rnaSPAdes`](https://ablab.github.io/spades/rna.html) hard filtered transcripts outputted
6. Redundancy reduction with [`Evidential Gene tr2aacds`](http://arthropods.eugenes.org/EvidentialGene/). A transcript to gene mapping is produced from Evidential Gene's outputs using [`gawk`](https://www.gnu.org/software/gawk/).
7. Assembly completeness QC ([`BUSCO`](https://busco.ezlab.org/))
8. Other assembly quality metrics ([`rnaQUAST`](https://github.com/ablab/rnaquast))
9. Transcriptome quality assessment with [`TransRate`](https://hibberdlab.com/transrate/), including the use of reads for assembly evaluation. This step is not performed if profile is set to `conda` or `mamba`.
10. Pseudo-alignment and quantification ([`Salmon`](https://combine-lab.github.io/salmon/))
11. HTML report for raw reads, trimmed reads, BUSCO, and Salmon ([`MultiQC`](http://multiqc.info/))
## Usage
> [!NOTE]
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
First, prepare a samplesheet with your input data that looks as follows:
`samplesheet.csv`:
```csv
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
```
Each row represents a pair of fastq files (paired end).
Now, you can run the pipeline using:
```bash
nextflow run nf-core/denovotranscript \
-profile \
--input samplesheet.csv \
--outdir
```
> [!WARNING]
> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).
For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/denovotranscript/usage) and the [parameter documentation](https://nf-co.re/denovotranscript/parameters).
## Pipeline output
To see the results of an example test run with a full size dataset refer to the [results](https://nf-co.re/denovotranscript/results) tab on the nf-core website pipeline page.
For more details about the output files and reports, please refer to the
[output documentation](https://nf-co.re/denovotranscript/output).
## Credits
nf-core/denovotranscript was written by Avani Bhojwani ([@avani-bhojwani](https://github.com/avani-bhojwani/)) and Timothy Little ([@timslittle](https://github.com/timslittle/)).
## Contributions and Support
If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).
For further information or help, don't hesitate to get in touch on the [Slack `#denovotranscript` channel](https://nfcore.slack.com/channels/denovotranscript) (you can join with [this invite](https://nf-co.re/join/slack)).
## Citations
If you use nf-core/denovotranscript for your analysis, please cite it using the following doi: [10.5281/zenodo.13324371](https://doi.org/10.5281/zenodo.13324371)
An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
You can cite the `nf-core` publication as follows:
> **The nf-core framework for community-curated bioinformatics pipelines.**
>
> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
>
> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).