Workflows
What is a Workflow?Filters
This is part of a series of workflows to annotate a genome, tagged with TSI-annotation
.
These workflows are based on command-line code by Luke Silver, converted into Galaxy Australia workflows.
The workflows can be run in this order:
- Repeat masking
- RNAseq QC and read trimming
- Find transcripts
- Combine transcripts
- Extract transcripts
- Convert formats
- Fgenesh annotation
Inputs required: assembled-genome.fasta, hard-repeat-masked-genome.fasta, and (because this workflow maps known mRNA ...
The aim of this workflow is to handle the routine part of shotgun metagenomics data processing. The workflow is using the tools Kraken2 and Bracken for taxonomy classification and the KrakenTools to evaluate diversity metrics. This workflow was tested on Galaxy Australia. A How-to guide for the workflow can be found at: https://github.com/vmurigneu/kraken_howto_ga_workflows
Type: Galaxy
Creators: Valentine Murigneux, QCIF/Biocommons, mike thang
Submitter: Valentine Murigneux
Genome assembly workflow for nanopore reads, for TSI
Input:
- Nanopore reads (can be in format: fastq, fastq.gz, fastqsanger, or fastqsanger.gz)
Optional settings to specify when the workflow is run:
- [1] how many input files to split the original input into (to speed up the workflow). default = 0. example: set to 2000 to split a 60 GB read file into 2000 files of ~ 30 MB.
- [2] filtering: min average read quality score. default = 10
- [3] filtering: min read length. default = 200
- [4] ...
Post-genome assembly quality control workflow using Quast, BUSCO, Meryl, Merqury and Fasta Statistics. Updates November 2023.
- Inputs: reads as fastqsanger.gz (not fastq.gz), and assembly.fasta. (To change format: click on the pencil icon next to the file in the Galaxy history, then "Datatypes", then set "New type" as fastqsanger.gz).
- New default settings for BUSCO: lineage = eukaryota; for Quast: lineage = eukaryotes, genome = large.
- Reports assembly stats into a table called metrics.tsv, ...
Scaffolding using HiC data with YAHS
This workflow has been created from a Vertebrate Genomes Project (VGP) scaffolding workflow.
- For more information about the VGP project see https://galaxyproject.org/projects/vgp/.
- The scaffolding workflow is at https://dockstore.org/workflows/github.com/iwc-workflows/Scaffolding-HiC-VGP8/main:main?tab=info
- Please see that link for the workflow diagram.
Some minor changes have been made to better fit with TSI project data:
- optional inputs of SAK info ...
This is part of a series of workflows to annotate a genome, tagged with TSI-annotation
.
These workflows are based on command-line code by Luke Silver, converted into Galaxy Australia workflows.
The workflows can be run in this order:
- Repeat masking
- RNAseq QC and read trimming
- Find transcripts
- Combine transcripts
- Extract transcripts
- Convert formats
- Fgenesh annotation
Workflow information:
- Input = genome.fasta.
- Outputs = soft_masked_genome.fasta, hard_masked_genome.fasta, ...
From the R1 and R2 fastq files of a single samples, make a scRNAseq counts matrix, and perform basic QC with scanpy. Then, do further processing by making a UMAP and clustering. Produces a processed AnnData Depreciated: use individual workflows insead for multiple samples
Takes fastqs and reference data, to produce a single cell counts matrix into and save in annData format - adding a column called sample with the sample name.
Take a scRNAseq counts matrix from a single sample, and perform basic QC with scanpy. Then, do further processing by making a UMAP and clustering. Produces a processed AnnData object.
Depreciated: use individual workflows insead for multiple samples
From the R1 and R2 fastq files of a single samples, make a scRNAseq counts matrix, and perform basic QC with scanpy. Then, do further processing by making a UMAP and clustering. Produces a processed AnnData
Depreciated: use individual workflows insead for multiple samples