Expertise: Machine Learning, R, Scientific workflow developement, Workflows, Agronomy, Biostatistics
Expertise: Bioinformatics, Metagenomics, NGS, Scientific workflow developement, Software Engineering
Tools: Conda, Jupyter notebook, Python, R, Single Cell analysis, Snakemake
Expertise: Genomics, Metagenomics, NGS, Python, evolution
Tools: Genomics, Python, Snakemake, Transcriptomics
Expertise: phylogenomics, phylogenetics, evolution, Microbiology, numerical methods
Hi! I'm Russell.
I'm a microbiologist who uses graph theory and machine learning to study the relationships that bacteria and archaea form with their host organisms, and lately giant viruses and their hosts. Or, I'm a computer scientist who builds software that uses concepts from evolution to extract knowledge about ecology from large datasets. Or, I'm a data scientist who uses Python to explore biological systems. Or, I'm a physicist that went rouge and defected to the squishy side of science. ...
Expertise: Bioinformatics, Computer Science, Data Management, Genetics, Genomics, Machine Learning, Metagenomics, NGS, Scientific workflow developement, Software Engineering
Tools: Databases, Galaxy, Genomics, Jupyter notebook, Machine Learning, Nextflow, nf-core, PCR, Perl, Python, R, rtPCR, Snakemake, Transcriptomics, Virology, Web, Web services, Workflows
Dad, husband and PhD. Scientist, technologist and engineer. Bibliophile. Philomath. Passionate about science, medicine, research, computing and all things geeky!
cfDNA UniFlow is a unified, standardized, and ready-to-use workflow for processing whole genome sequencing (WGS) cfDNA samples from liquid biopsies. It includes essential steps for pre-processing raw cfDNA samples, quality control and reporting. Additionally, several optional utility functions like GC bias correction and estimation of copy number state are included. Finally, we provide specialized methods for extracting coverage derived signals and visualizations comparing cases and controls. ...
HiC contact map generation
Snakemake pipeline for the generation of .pretext
and .mcool
files for visualisation of HiC contact maps with the softwares PretextView and HiGlass, respectively.
Prerequisites
This pipeine has been tested using Snakemake v7.32.4
and requires conda for installation of required tools. To run the pipline use the command:
snakemake --use-conda
There are provided a set of configuration and running scripts for exectution on a slurm queueing system. After configuring ...
GERONIMO
Introduction
GERONIMO is a bioinformatics pipeline designed to conduct high-throughput homology searches of structural genes using covariance models. These models are based on the alignment of sequences and the consensus of secondary structures. The pipeline is built using Snakemake, a workflow management tool that allows for the reproducible execution of analyses on various computational platforms.
The idea for developing GERONIMO emerged from a comprehensive search for [telomerase ...
Purge dups
This snakemake pipeline is designed to be run using as input a contig-level genome and pacbio reads. This pipeline has been tested with snakemake v7.32.4
. Raw long-read sequencing files and the input contig genome assembly must be given in the config.yaml
file. To execute the workflow run:
snakemake --use-conda --cores N
Or configure the cluster.json and run using the ./run_cluster
command
The Regulatory Mendelian Mutation (ReMM) score was created for relevance prediction of non-coding variations (SNVs and small InDels) in the human genome (GRCh37) in terms of Mendelian diseases. This project updates the ReMM score for the genome build GRCh38 and combines GRCh37 and GRCh38 into one workflow.
Pre-requirements
Conda
We use Conda as software and dependency management tool. Conda installation guidelines can be found here:
https://conda.io/projects/conda/en/latest/user-guide/install/index.html ...
polya_liftover - sc/snRNAseq Snakemake Workflow
A [Snakemake][sm] workflow for using PolyA_DB and UCSC Liftover with Cellranger.
Some genes are not accurately annotated in the reference genome. Here, we use information provide by the [PolyA_DB v3.2][polya] to update the coordinates, then the [USCS Liftover][liftover] tool to update to a more recent genome. Next, we use [Cellranger][cr] to create the reference and count matrix. Finally, by taking advantage of the integrated [Conda][conda] and ...