nf-core/denovotranscript

nf-core/denovotranscript

[![GitHub Actions CI Status](https://github.com/nf-core/denovotranscript/actions/workflows/ci.yml/badge.svg)](https://github.com/nf-core/denovotranscript/actions/workflows/ci.yml) [![GitHub Actions Linting Status](https://github.com/nf-core/denovotranscript/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/denovotranscript/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/denovotranscript/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.13324371-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.13324371) [![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com) [![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A524.04.2-23aa62.svg)](https://www.nextflow.io/) [![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/) [![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/) [![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/) [![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/nf-core/denovotranscript) [![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23denovotranscript-4A154B?labelColor=000000&logo=slack)](https://nfcore.slack.com/channels/denovotranscript)[![Follow on Twitter](http://img.shields.io/badge/twitter-%40nf__core-1DA1F2?labelColor=000000&logo=twitter)](https://twitter.com/nf_core)[![Follow on Mastodon](https://img.shields.io/badge/mastodon-nf__core-6364ff?labelColor=FFFFFF&logo=mastodon)](https://mstdn.science/@nf_core)[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core) ## Introduction **nf-core/denovotranscript** is a bioinformatics pipeline for de novo transcriptome assembly of paired-end short reads from bulk RNA-seq. It takes a samplesheet and FASTQ files as input, perfoms quality control (QC), trimming, assembly, redundancy reduction, pseudoalignment, and quantification. It outputs a transcriptome assembly FASTA file, a transcript abundance TSV file, and a MultiQC report with assembly quality and read QC metrics. ![nf-core/transfuse metro map](docs/images/denovotranscript_metro_map.drawio.svg) 1. Read QC of raw reads ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)) 2. Adapter and quality trimming ([`fastp`](https://github.com/OpenGene/fastp)) 3. Read QC of trimmed reads ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)) 4. Remove rRNA or mitochondrial DNA (optional) ([`SortMeRNA`](https://hpc.nih.gov/apps/sortmeRNA.html)) 5. Transcriptome assembly using any combination of the following: - [`Trinity`](https://github.com/trinityrnaseq/trinityrnaseq/wiki) with normalised reads (default=True) - [`Trinity`](https://github.com/trinityrnaseq/trinityrnaseq/wiki) with non-normalised reads - [`rnaSPAdes`](https://ablab.github.io/spades/rna.html) medium filtered transcripts outputted (default=True) - [`rnaSPAdes`](https://ablab.github.io/spades/rna.html) soft filtered transcripts outputted - [`rnaSPAdes`](https://ablab.github.io/spades/rna.html) hard filtered transcripts outputted 6. Redundancy reduction with [`Evidential Gene tr2aacds`](http://arthropods.eugenes.org/EvidentialGene/). A transcript to gene mapping is produced from Evidential Gene's outputs using [`gawk`](https://www.gnu.org/software/gawk/). 7. Assembly completeness QC ([`BUSCO`](https://busco.ezlab.org/)) 8. Other assembly quality metrics ([`rnaQUAST`](https://github.com/ablab/rnaquast)) 9. Transcriptome quality assessment with [`TransRate`](https://hibberdlab.com/transrate/), including the use of reads for assembly evaluation. This step is not performed if profile is set to `conda` or `mamba`. 10. Pseudo-alignment and quantification ([`Salmon`](https://combine-lab.github.io/salmon/)) 11. HTML report for raw reads, trimmed reads, BUSCO, and Salmon ([`MultiQC`](http://multiqc.info/)) ## Usage > [!NOTE] > If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data. First, prepare a samplesheet with your input data that looks as follows: `samplesheet.csv`: ```csv sample,fastq_1,fastq_2 CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz ``` Each row represents a pair of fastq files (paired end). Now, you can run the pipeline using: ```bash nextflow run nf-core/denovotranscript \ -profile \ --input samplesheet.csv \ --outdir ``` > [!WARNING] > Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files). For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/denovotranscript/usage) and the [parameter documentation](https://nf-co.re/denovotranscript/parameters). ## Pipeline output To see the results of an example test run with a full size dataset refer to the [results](https://nf-co.re/denovotranscript/results) tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the [output documentation](https://nf-co.re/denovotranscript/output). ## Credits nf-core/denovotranscript was written by Avani Bhojwani ([@avani-bhojwani](https://github.com/avani-bhojwani/)) and Timothy Little ([@timslittle](https://github.com/timslittle/)). ## Contributions and Support If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md). For further information or help, don't hesitate to get in touch on the [Slack `#denovotranscript` channel](https://nfcore.slack.com/channels/denovotranscript) (you can join with [this invite](https://nf-co.re/join/slack)). ## Citations If you use nf-core/denovotranscript for your analysis, please cite it using the following doi: [10.5281/zenodo.13324371](https://doi.org/10.5281/zenodo.13324371) An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file. You can cite the `nf-core` publication as follows: > **The nf-core framework for community-curated bioinformatics pipelines.** > > Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. > > _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).

License
MIT

Contents