Workflows
What is a Workflow?Filters
This workflow take as input a collection of paired fastq. Remove adapters with cutadapt, map pairs with bowtie2 allowing dovetail. Keep MAPQ30 and concordant pairs. BAM to BED. MACS2 with "ATAC" parameters.
This workflow take as input a collection of paired fastq. It uses HiCUP to go from fastq to validPair file. The pairs are filtered for MAPQ and sorted by cooler to generate a tabix dataset. Cooler is used to generate a balanced cool file to the desired resolution.
This workflow take as input a collection of paired fastq. It will remove bad quality and adapters with cutadapt. Map with Bowtie2 end-to-end. Will remove reads on MT and unconcordant pairs and pairs with mapping quality below 30 and PCR duplicates. Will compute the pile-up on 5' +- 100bp. Will call peaks and count the number of reads falling in the 1kb region centered on the summit. Will plot the number of reads for each fragment length.
This workflow takes as input a list of single-read fastqs. Adapters and bad quality bases are removed with cutadapt. Reads are mapped with STAR with ENCODE parameters and genes are counted simultaneously. The counts are reprocess to be similar to HTSeq-count output. FPKM are computed with cufflinks. Coverage (per million mapped reads) are computed with bedtools on uniquely mapped reads.
This workflow takes as input a list of paired-end fastqs. Adapters and bad quality bases are removed with cutadapt. Reads are mapped with STAR with ENCODE parameters and genes are counted simultaneously. The counts are reprocess to be similar to HTSeq-count output. FPKM are computed with cufflinks. Coverage (per million mapped reads) are computed with bedtools on uniquely mapped reads (with R2 orientation inverted).
COVID-19: consensus construction
This workflow aims at generating reliable consensus sequences from variant calls according to transparent criteria that capture at least some of the complexity of variant calling.
It takes a collection of VCFs (with DP and DP4 INFO fields) and a collection of the corresponding aligned reads (for the purpose of calculating genome-wide coverage) such as produced by any of the variant calling workflows in ...
ChIP-seq paired-end Workflow
Inputs dataset
- The workflow needs a single input which is a list of dataset pairs of fastqsanger.
Inputs values
- adapters sequences: this depends on the library preparation. If you don't know, use FastQC to determine if it is Truseq or Nextera.
- reference_genome: this field will be adapted to the genomes available for bowtie2.
- effective_genome_size: this is used by MACS2 and may be entered manually (indications are provided for heavily used genomes).
...
VGP Workflow #1
This workflow produces a Meryl database and Genomescope outputs that will be used to determine parameters for following workflows, and assess the quality of genome assemblies. Specifically, it provides information about the genomic complexity, such as the genome size and levels of heterozygosity and repeat content, as well about the data quality.
Inputs
- Collection of Hifi long reads in FASTQ format
Outputs
- Meryl Database of kmer counts
- GenomeScope
- Linear plot
...
VGP Workflow #1
This workflow collects the metrics on the properties of the genome under consideration by analyzing the k-mer frequencies. It provides information about the genomic complexity, such as the genome size and levels of heterozygosity and repeat content, as well about the data quality. It uses reads from two parental genomes to partition long reads from the offspring into haplotype-specific k-mer databases.
Inputs
- Collection of Hifi long reads in FASTQ format
- Paternal short-read ...
Generic variation analysis on WGS PE data
This workflows performs paired end read mapping with bwa-mem followed by sensitive variant calling across a wide range of AFs with lofreq and variant annotation with snpEff. The reference genome can be provided as a GenBank file.