Workflows
What is a Workflow?Filters
Genome assembly workflow for nanopore reads, for TSI
Input:
- Nanopore reads (can be in format: fastq, fastq.gz, fastqsanger, or fastqsanger.gz)
Optional settings to specify when the workflow is run:
- [1] how many input files to split the original input into (to speed up the workflow). default = 0. example: set to 2000 to split a 60 GB read file into 2000 files of ~ 30 MB.
- [2] filtering: min average read quality score. default = 10
- [3] filtering: min read length. default = 200
- [4] ...
Post-genome assembly quality control workflow using Quast, BUSCO, Meryl, Merqury and Fasta Statistics. Updates November 2023.
- Inputs: reads as fastqsanger.gz (not fastq.gz), and assembly.fasta. (To change format: click on the pencil icon next to the file in the Galaxy history, then "Datatypes", then set "New type" as fastqsanger.gz).
- New default settings for BUSCO: lineage = eukaryota; for Quast: lineage = eukaryotes, genome = large.
- Reports assembly stats into a table called metrics.tsv, ...
Scaffolding using HiC data with YAHS
This workflow has been created from a Vertebrate Genomes Project (VGP) scaffolding workflow.
- For more information about the VGP project see https://galaxyproject.org/projects/vgp/.
- The scaffolding workflow is at https://dockstore.org/workflows/github.com/iwc-workflows/Scaffolding-HiC-VGP8/main:main?tab=info
- Please see that link for the workflow diagram.
Some minor changes have been made to better fit with TSI project data:
- optional inputs of SAK info ...
This is part of a series of workflows to annotate a genome, tagged with TSI-annotation
.
These workflows are based on command-line code by Luke Silver, converted into Galaxy Australia workflows.
The workflows can be run in this order:
- Repeat masking
- RNAseq QC and read trimming
- Find transcripts
- Combine transcripts
- Extract transcripts
- Convert formats
- Fgenesh annotation
Workflow information:
- Input = genome.fasta.
- Outputs = soft_masked_genome.fasta, hard_masked_genome.fasta, ...
This is part of a series of workflows to annotate a genome, tagged with TSI-annotation
.
These workflows are based on command-line code by Luke Silver, converted into Galaxy Australia workflows.
The workflows can be run in this order:
- Repeat masking
- RNAseq QC and read trimming
- Find transcripts
- Combine transcripts
- Extract transcripts
- Convert formats
- Fgenesh annotation
For this workflow:
Inputs:
- assembled-genome.fasta
- hard-repeat-masked-genome.fasta
- If using the mRNAs option, ...
From the R1 and R2 fastq files of a single samples, make a scRNAseq counts matrix, and perform basic QC with scanpy. Then, do further processing by making a UMAP and clustering. Produces a processed AnnData Depreciated: use individual workflows insead for multiple samples
Takes fastqs and reference data, to produce a single cell counts matrix into and save in annData format - adding a column called sample with the sample name.
Take a scRNAseq counts matrix from a single sample, and perform basic QC with scanpy. Then, do further processing by making a UMAP and clustering. Produces a processed AnnData object.
Depreciated: use individual workflows insead for multiple samples
From the R1 and R2 fastq files of a single samples, make a scRNAseq counts matrix, and perform basic QC with scanpy. Then, do further processing by making a UMAP and clustering. Produces a processed AnnData
Depreciated: use individual workflows insead for multiple samples
Basic processing of a QC-filtered Anndata Object. UMAP, clustering e.t.c