Expertise: Bioinformatics
Expertise: Bioinformatics
Tools: CWL, Genomics, Python, R, Transcriptomics, Jupyter notebook
Hiroshima University, Graduate School of Integrated Sciences for life, Laboratory of Genome Informatics, Ph.D student GitHub: https://github.com/yonesora56
Expertise: Bioinformatics, Biochemistry
Tools: Python
Teams: NBIS, ERGA Assembly
Organizations: NBIS – National Bioinformatics Infrastructure Sweden
https://orcid.org/0000-0003-1675-0677Expertise: Bioinformatics, Genomics, Scientific workflow developement, Workflows
I'm a bioinformatician for the National Bioinformatics Infrastrure Sweden. I specialise in de novo genome assembly and workflow development with Nextflow. I'm also a Nextflow ambassador and nf-core maintainer.
Teams: Protein-protein and protein-nucleic acid binding site prediction research
Organizations: Shandong University
https://orcid.org/0009-0003-5182-3533Expertise: Bioinformatics
Expertise: Bioinformatics, Metagenomics, NGS, Scientific workflow developement, Software Engineering
Tools: Conda, Jupyter notebook, Python, R, Single Cell analysis, Snakemake
Expertise: Bioinformatics
Expertise: Bioinformatics, Cheminformatics
Computation Biologist @UT Southwestern
Expertise: Bioinformatics
Teams: RECETOX SpecDatRI, RECETOX, usegalaxy-eu, ELIXIR Metabolomics
Organizations: Masaryk University, RECETOX
https://orcid.org/0000-0001-6744-996XExpertise: Bioinformatics, Cheminformatics, Metabolomics, Python, R, Software Engineering, Workflows
Tools: Metabolomics, Python, R, Workflows, Mass spectrometry, Chromatography
Teams: Genome Data Compression Team
Organizations: Shenzhen University
https://orcid.org/0009-0007-9672-6728Expertise: Bioinformatics
Tools: Workflows
Research Interest: Bioinformatics | Deep Learning | DevOps | Generative AI | Knowledge Graphs. Highly communicative, task oriented, feature responsive, time oriented, approachable, solution seeker and initiative taker focussed professional working across a wide variety of topics which includes bioinformatics involving genomes, transcriptomes, metagenomes and metatranscriptomes focussing on datasets coming from the plant, bacterial and fungal genome (Illumina Miseq, NextSeq, NovaSeq, PacBio, Oxford ...
Teams: ERGA Annotation, Bioinformatics Laboratory for Genomics and Biodiversity (LBGB)
Organizations: Genoscope
https://orcid.org/0000-0002-6621-9908Expertise: Bioinformatics
Tools: Nextflow, Python, R, Genetic analysis, Single Cell analysis
Expertise: Bioinformatics
Tools: R, Transcriptomics
Teams: Cimorgh IT solutions
Organizations: cimorgh IT
Expertise: Bioinformatics, Genomics, Metagenomics, Microbiology, NGS, Python, R, bash, WDL
Tools: Mathematical Modelling, R, WDL
Expertise: Bioinformatics, Metabarcoding, Metagenomics, Microbiology
Teams: EOSC-Life WP3 OC Team, cross RI project, EOSC-Life WP3, Euro-BioImaging
Organizations: EOSC-Life, Euro-BioImaging
Expertise: Bioengineering, Bioinformatics, Computer Science, Data Management
Tools: Databases, Jupyter notebook, Python
Biomedical Engineer working on preclinical image dataset repository and cross researching RIs
Expertise: Bioinformatics, Genomics, Scientific workflow developement
Expertise: Bioinformatics, Genomics, Machine Learning
Tools: Python, R, Machine Learning
I am a Ph.D. student in Gong lab. I am interested in cancer genomics, including the mining of genetic risk determinants in cancer, functional prediction of genetic variants, tumor-associated molecular epidemiology, large-scale data integration, analysis, and mining, as well as the construction of bioinformatical data platforms.
Expertise: Bioinformatics
Expertise: Bioinformatics, Cheminformatics, Software Engineering, Metabolomics, Lipidomics
Expertise: Bioinformatics, Genomics, Metagenomics, Data Management
Tools: CWL, Jupyter notebook, Nextflow, Molecular Biology, Workflows, Microbiology, Transcriptomics, Perl, Python, R
Teams: EU-Openscreen
Organizations: Fraunhofer Institute for Translational Medicine and Pharmacology ITMP
https://orcid.org/0000-0002-8080-9170Expertise: Bioinformatics, Cheminformatics, Machine Learning
Tools: Workflows
I am a bioinformatician and phylogenetics. I really love working on problems at the intersection of high-performance computing and scientific workflows applied to omics
Expertise: Bioinformatics, Computer Science, Data Management, Genetics, Genomics, Machine Learning, Metagenomics, NGS, Scientific workflow developement, Software Engineering
Tools: Databases, Galaxy, Genomics, Jupyter notebook, Machine Learning, Nextflow, nf-core, PCR, Perl, Python, R, rtPCR, Snakemake, Transcriptomics, Virology, Web, Web services, Workflows
Dad, husband and PhD. Scientist, technologist and engineer. Bibliophile. Philomath. Passionate about science, medicine, research, computing and all things geeky!
Expertise: Bioinformatics, Molecular Biology, Computer Science, NGS, Software Engineering
Teams: EU-Openscreen, OME
Organizations: Fraunhofer Institute for Translational Medicine and Pharmacology ITMP
https://orcid.org/0000-0002-1740-8390Expertise: Cheminformatics, Bioinformatics
Teams: Bioinformatics Innovation Lab
Organizations: Pondicherry University
https://orcid.org/0000-0003-4854-8238Expertise: Bioinformatics, Systems Biology, Machine Learning
Tools: Galaxy, Cytoscape, Databases, Jupyter notebook, R, Python
Ph.D. Student at Department of Bioinformatics, Pondicherry University
Teams: MAB - ATGC
Organizations: Centre National de la Recherche Scientifique (CNRS)
https://orcid.org/0000-0003-3791-3973Expertise: Bioinformatics, Genomics, algorithm, Machine Learning, Metagenomics, NGS, Computer Science
Tools: Transcriptomics, Genomics, Python, C/C++, Web services, Workflows
Expertise: Bioinformatics, Biostatistics, Metabarcoding, Metagenomics
Teams: Harkany Lab
Organizations: Medical University of Vienna
https://orcid.org/0000-0001-5920-2190Expertise: Systems Biology, Bioengineering, Bioinformatics, Neuroscience
Tools: Workflows, Machine Learning, Transcriptomics
Teams: GalaxyProject SARS-CoV-2, nf-core viralrecon, EOSC-Life - Demonstrator 7: Rare Diseases, iPC: individualizedPaediatricCure, EJPRD WP13 case-studies workflows, TransBioNet, OpenEBench, ELIXIR Proteomics
Organizations: Barcelona Supercomputing Center (BSC-CNS), ELIXIR
https://orcid.org/0000-0003-4929-1219Expertise: Bioinformatics, Computer Science, AI, Machine Learning
Computer Engineer in Barcelona Supercomputing Center (BSC)
Expertise: Bioinformatics
Bioinformatician in Stockholm, Sweden. Lead for nf-core and MultiQC projects.
Teams: GalaxyProject SARS-CoV-2
Organizations: Earlham Institute
https://orcid.org/0000-0003-3627-5340Expertise: Bioinformatics
Tools: Galaxy
Teams: V-Pipe
Organizations: SIB - Swiss Institute of Bioinformatics
https://orcid.org/0000-0002-7561-0810Expertise: Bioinformatics, Software Engineering
Medical doctor and bioinformatician
Developer from the Swiss Institute of Bioinformatics (SIB) Working at the Computational Biology Group (CBG) of ETH Zurich.
Diplom in Medicine. MSc in Bioinformatics and Proteomics.
I am also a ski teacher as a hobby.
Research Director @ INRAe
Teams: IBISBA Workflows
Organizations: Unspecified
Expertise: Bioinformatics
Tools: Workflows, Web services, Python
Teams: GalaxyProject SARS-CoV-2
Organizations: BC Centre for Disease Control
https://orcid.org/0000-0002-6178-3585Expertise: Bioinformatics, Data Management, Molecular Biology
Tools: Databases, PCR, Workflows, Web services
The Bioinformatics Core helps researchers identify and interpret patterns in RNA and DNA by placing sequencing data into a biologically meaningful context. This encompasses assisting with experimental design, developing reproducible workflows, analyzing next-generation sequencing data, and supporting manuscript development/publication
Space: University of Michigan BRCF Bioinformatics Core
Public web page: https://medresearch.umich.edu/office-research/about-office-research/biomedical-research-core-facilities/bioinformatics-core
Organisms: Not specified
Toward data-driven genome breeding (digital breeding), we are developing data analysis infrastructure technology essential for genome editing, focusing on gene function analysis using bioinformatics called BioDX.
Space: Hiroshima workflow community
Public web page: https://bonohu.hiroshima-u.ac.jp/index_en.html
Organisms: Not specified
Space: Independent Teams
Public web page: Not specified
Organisms: Not specified
Abstract (Expand)
Author: Yasmmin Martins
Date Published: 28th Sep 2023
Publication Type: Journal
DOI: 10.1101/2023.09.27.23296213
Citation: medrxiv;2023.09.27.23296213v1,[Preprint]
Abstract (Expand)
Authors: Yasmmin Côrtes Martins, Ronaldo Francisco da Silva
Date Published: 27th Sep 2023
Publication Type: Journal
DOI: 10.1101/2023.09.26.559599
Citation: biorxiv;2023.09.26.559599v1,[Preprint]
Abstract (Expand)
Authors: Yasmmin Martins, Ronaldo Francisco da Silva
Date Published: 22nd Jun 2023
Publication Type: Journal
DOI: 10.1101/2023.06.22.546079
Citation: biorxiv;2023.06.22.546079v1,[Preprint]
Abstract (Expand)
Author: Yasmmin C Martins
Date Published: 7th Jun 2023
Publication Type: Journal
DOI: 10.1101/2023.06.05.543725
Citation: biorxiv;2023.06.05.543725v1,[Preprint]
Abstract (Expand)
Authors: Cristina S. Ferreira, Yasmmin C. Martins, Rangel Celso Souza, Ana Tereza R. Vasconcelos
Date Published: 2021
Publication Type: Journal
DOI: 10.7717/peerj.12548
Citation: PeerJ 9:e12548
ONT Artificial Deletion Filter-Delter
A tool to filter short artificial deletion variations by Oxford Nanopore Technologies (ONT) R9 and R10 flow cells and chemistries.
Requirements
The tool has been tested on Ubuntu 20.04 with 256GB RAM, 64 CPU cores and a NVIDIA GPU with 48GB RAM. The minimal requirements should be >= 64GB RAM and a NVIDIA GPU with >= 8GB RAM. Other operating systems like Windows or Mac were not tested.
ONT softwares like Guppy, ...
Nextflow Pipeline for DeepVariant
This repository contains a Nextflow pipeline for Google’s DeepVariant, optimised for execution on NCI Gadi.
Quickstart Guide
- Edit the
pipeline_params.yml
file to include:
samples
: a list of samples, where each sample includes the sample name, BAM file path (ensure corresponding .bai is in the same directory), path to an optional regions-of-interest BED file (set to''
if not required), and the model type.ref
: path to the reference FASTA (ensure ...
GALOP - Genome Assembly using Long reads Pipeline
This repository contains an exact copy of the standard Genoscope long reads assembly pipeline.
At the moment, this is not intended for users to download as it uses grid submission commands that will only work at Genoscope. As time goes on, we intend to make this pipeline available to a broader audience. However, genome assembly and polishing commands are accessible in the lib/assembly.py
and lib/polishing.py
files.
galop.py -h
Mandatory
...
skim2mito
skim2mito is a snakemake pipeline for the batch assembly, annotation, and phylogenetic analysis of mitochondrial genomes from low coverage genome skims. The pipeline was designed to work with sequence data from museum collections. However, it should also work with genome skims from recently collected samples.
Contents
- Setup
- Example data
- Input
- Output
- Filtering contaminants
- [Assembly and ...
Workflow for converting (genome) annotation tool output into a GBOL RDF file (TTL/HDT) using SAPP
Current formats / tools:
- EMBL format
- InterProScan (JSON/TSV)
- eggNOG-mapper (TSV)
- KoFamScan (TSV)
git: https://gitlab.com/m-unlock/cwl
SAPP (Semantic Annotation Platform with Provenance):
https://gitlab.com/sapp
https://academic.oup.com/bioinformatics/article/34/8/1401/4653704
Workflow for microbial (meta-)genome annotation
Input is a (meta)genome sequence in fasta format.
-
bakta
-
KoFamScan (optional)
-
InterProScan (optional)
-
eggNOG mapper (optional)
-
To RDF conversion with SAPP (optional, default on) --> SAPP conversion Workflow in WorkflowHub
Stratum corneum nanotexture feature detection using deep learning and spatial analysis: a non-invasive tool for skin barrier assessment
This repository presents an objective, quantifiable method for assessing atopic dermatitis (AD) severity. The program integrates deep learning object detection with spatial analysis algorithms to accurately calculate the density of circular nano-size objects (CNOs), termed the Effective Corneocyte Topographical Index (ECTI). The ECTI demonstrates remarkable ...
Article-GADES
This repository represents generating and benchmarking the results of the GADES package for Distance Matrix Calculation
Installation
git lfs install
git clone https://github.com/lab-medvedeva/Article-GADES.git
cd Article-GADES
Put the Real datasets in the MEX format to the folder Datasets/Real
.
Running benchmark using Docker Deployment
docker run --gpus all \
-v $PWD/Datasets:/workspace/Article-GADES/Datasets
...
Swedish Earth Biogenome Project - Genome Assembly Workflow
The primary genome assembly workflow for the Earth Biogenome Project at NBIS.
Workflow overview
General aim:
flowchart LR
hifi[/ HiFi reads /] --> data_inspection
ont[/ ONT reads /] --> data_inspection
hic[/ Hi-C reads /] --> data_inspection
data_inspection[[ Data inspection ]] --> preprocessing
preprocessing[[ Preprocessing ]] --> assemble
assemble[[ Assemble ]] --> validation
validation[[ Assembly
...
GraphRBF is a state-of-the-art protein-protein/nucleic acid interaction site prediction model built by enhanced graph neural networks and prioritized radial basis function neural networks. This project serves users to use our software to directly predict protein binding sites or train our model on a new database. Identification of protein-protein and protein-nucleic acid binding sites provides insights into biological processes related to protein functions and technical guidance for disease ...
cfDNA UniFlow is a unified, standardized, and ready-to-use workflow for processing whole genome sequencing (WGS) cfDNA samples from liquid biopsies. It includes essential steps for pre-processing raw cfDNA samples, quality control and reporting. Additionally, several optional utility functions like GC bias correction and estimation of copy number state are included. Finally, we provide specialized methods for extracting coverage derived signals and visualizations comparing cases and controls. ...
deepconsensus 1.2 snakemake pipeline
This snakemake-based workflow takes in a subreads.bam and results in a deepconsensus.fastq
- no methylation calls !
The metadata id of the subreads file needs to be: "m[numeric][numeric][numeric].subreads.bam"
Chunking (how many subjobs) and ccs min quality filter can be adjusted in the config.yaml
the checkpoint model for deepconsensus1.2 should be accessible like this: gsutil cp -r gs://brain-genomics-public/research/deepconsensus/models/v1.2/model_checkpoint/* ...
Variant Interpretation Pipeline (VIP) that annotates, filters and reports prioritized causal variants in humans, see https://github.com/molgenis/vip for more information.
Workflow for gene set enrichment analsysis (GSEA) and co-expression analysis (WGCNA) on transcriptomics data to analyze pathways affected in Porto-Sinusoidal Vascular Disease.
Type: Common Workflow Language
Creators: Aishwarya Iyer, Friederike Ehrhart
Submitter: Aishwarya Iyer
Type: Nextflow
Creators: Arnau Soler Costa, Amy Curwin, Jordi Rambla, All the Sarek team, nf-core comunity and people in the IMPaCT-Data project.
Submitter: Arnau Soler Costa
Galaxy Workflow Documentation: MS Finder Pipeline
This document outlines a MSFinder Galaxy workflow designed for peak annotation. The workflow consists of several steps aimed at preprocessing MS data, filtering, enhancing, and running MSFinder.
Step 1: Data Collection and Preprocessing
Collect if the inchi and smiles are missing from the dataset, and subsequently filter out the spectra which are missing inchi and smiles.
1.1 MSMetaEnhancer: Collect InChi, Isomeric_smiles, and Nominal_mass
...
Type: Galaxy
Creators: Zargham Ahmad, Helge Hecht, Elliott J. Price, Research Infrastructure RECETOX RI (No LM2018121) financed by the Ministry of Education, Youth and Sports, and Operational Programme Research, Development and Innovation - project CETOCOEN EXCELLENCE (No CZ.02.1.01/0.0/0.0/17_043/0009632).
Submitters: Helge Hecht, Zargham Ahmad
GSC (Genotype Sparse Compression)
Genotype Sparse Compression (GSC) is an advanced tool for lossless compression of VCF files, designed to efficiently store and manage VCF files in a compressed format. It accepts VCF/BCF files as input and utilizes advanced compression techniques to significantly reduce storage requirements while ensuring fast query capabilities. In our study, we successfully compressed the VCF files from the 1000 Genomes Project (1000Gpip3), consisting of 2504 samples and 80 ...
GSC (Genotype Sparse Compression)
Genotype Sparse Compression (GSC) is an advanced tool for lossless compression of VCF files, designed to efficiently store and manage VCF files in a compressed format. It accepts VCF/BCF files as input and utilizes advanced compression techniques to significantly reduce storage requirements while ensuring fast query capabilities. In our study, we successfully compressed the VCF files from the 1000 Genomes Project (1000Gpip3), consisting of 2504 samples and 80 ...
GBMatch_CNN
Work in progress... Predicting TS & risk from glioblastoma whole slide images
Reference
Upcoming paper: stay tuned...
Dependencies
python 3.7.7
randaugment by Khrystyna Faryna: https://github.com/tovaroe/pathology-he-auto-augment
tensorflow 2.1.0
scikit-survival 0.13.1
pandas 1.0.3
lifelines 0.25.0
Description
The pipeline implemented here predicts transcriptional subtypes and survival of glioblastoma patients based on H&E stained whole slide scans. Sample data is ...
JAX NGS Operations Nextflow DSL2 Pipelines
This repository contains production bioinformatic analysis pipelines for a variety of bulk 'omics data analysis. Please see the Wiki documentation associated with this repository for all documentation and available analysis workflows.
Type: Nextflow
Creators: Michael Lloyd, Brian Sanderson, Barry Guglielmo, Sai Lek, Peter Fields, Harshpreet Chandok, Carolyn Paisie, Gabriel Rech, Ardian Ferraj, Anuj Srivastava
Submitter: Michael Lloyd
ProGFASTAGen
The ProGFASTAGen (Protein-Graph-FASTA-Generator or ProtGraph-FASTA-Generator) repository contains workflows to generate so-called precursor-specific-FASTAs (using the precursors from MGF-files) including feature-peptides, like VARIANTs or CONFLICTs if desired, or global-FASTAs (as described in ProtGraph). The single workflow scripts have been implemented with Nextflow-DSL-2 ...
Parabricks-Genomics-nf is a GPU-enabled pipeline for alignment and germline short variant calling for short read sequencing data. The pipeline utilises NVIDIA's Clara Parabricks toolkit to dramatically speed up the execution of best practice bioinformatics tools. Currently, this pipeline is configured specifically for NCI's Gadi HPC.
NVIDIA's Clara Parabricks can deliver a significant ...
HiC contact map generation
Snakemake pipeline for the generation of .pretext
and .mcool
files for visualisation of HiC contact maps with the softwares PretextView and HiGlass, respectively.
Prerequisites
This pipeine has been tested using Snakemake v7.32.4
and requires conda for installation of required tools. To run the pipline use the command:
snakemake --use-conda
There are provided a set of configuration and running scripts for exectution on a slurm queueing system. After configuring ...
Framework for construction of phylogenetic networks on High Performance Computing (HPC) environment
Introduction
Phylogeny refers to the evolutionary history and relationship between biological lineages related by common descent. Reticulate evolution refers to the origination of lineages through the complete or partial merging of ancestor lineages. Networks may be used to represent lineage independence events in non-treelike phylogenetic processes.
The methodology for reconstructing networks ...
This is a Nextflow implementaion of the GATK Somatic Short Variant Calling workflow. This workflow can be used to discover somatic short variants (SNVs and indels) from tumour and matched normal BAM files following GATK's Best Practices Workflow. The workflowis currently optimised to run efficiently and at scale on the National Compute Infrastructure, Gadi.
Type: Nextflow
Creators: Nandan Deshpande, Tracy Chew, Cali Willet, Georgina Samaha
Submitter: Georgina Samaha
ONTViSc (ONT-based Viral Screening for Biosecurity)
Introduction
eresearchqut/ontvisc is a Nextflow-based bioinformatics pipeline designed to help diagnostics of viruses and viroid pathogens for biosecurity. It takes fastq files generated from either amplicon or whole-genome sequencing using Oxford Nanopore Technologies as input.
The pipeline can either: 1) perform a direct search on the sequenced reads, 2) generate clusters, 3) assemble the reads to generate longer contigs or 4) directly ...
Type: Nextflow
Creators: Marie-Emilie Gauthier, Craig Windell, Magdalena Antczak, Roberto Barrero
Submitter: Magdalena Antczak
Workflow for Creating a large disease network from various datasets and databases for IBM, and applying the active subnetwork identification method MOGAMUN.
Type: Common Workflow Language
Creators: Daphne Wijnbergen, Mridul Johari
Submitter: Daphne Wijnbergen
ANNOTATO - Annotation workflow To Annotate Them Oll
Correlation between Phenotypic and In Silico Detection of Antimicrobial Resistance in Salmonella enterica in Canada Using Staramr.
Doi: 10.3390/microorganisms10020292
tool | version | license |
---|---|---|
staramr | 0.8.0 | Apache-2.0 license |
With this galaxy pipeline you can use Salmonella sp. next generation sequencing results to predict bacterial AMR phenotypes and compare the results against gold standard Salmonella sp. phenotypes obtained from food.
This pipeline is based on the work of the National Food Agency of Canada. Doi: 10.3389/fmicb.2020.00549
tool | version | license |
---|---|---|
SeqSero2 | 1.2.1 | GNU GPL v2.0 |
... |
Summary
The data preparation pipeline contains tasks for two distinct scenarios: leukaemia that contains microarray data for 119 patients and ovarian cancer that contains next generation sequencing data for 380 patients.
The disease outcome prediction pipeline offers two strategies for this task:
Graph kernel method: It starts generating personalized networks for ...
Summary
The PPI information aggregation pipeline starts getting all the datasets in GEO database whose material was generated using expression profiling by high throughput sequencing. From each database identifiers, it extracts the supplementary files that had the counts table. Once finishing the download step, it identifies those that were normalized or had the raw counts to normalize. It also identify and map the gene ids to uniprot (the ids found usually ...
Summary
The validation process proposed has two pipelines for filtering PPIs predicted by some IN SILICO detection method, both pipelines can be executed separately. The first pipeline (i) filter according to association rules of cellular locations extracted from HINT database. The second pipeline (ii) filter according to scientific papers where both proteins in the PPIs appear in interaction context in the sentences.
The pipeline (i) starts extracting cellular component annotations from ...
Summary
PredPrIn is a scientific workflow to predict Protein-Protein Interactions (PPIs) using machine learning to combine multiple PPI detection methods of proteins according to three categories: structural, based on primary aminoacid sequence and functional annotations.
PredPrIn contains three main steps: (i) acquirement and treatment of protein information, (ii) feature generation, and (iii) classification and analysis.
(i) The first step builds a knowledge base with the available annotations ...
Summary
HPPIDiscovery is a scientific workflow to augment, predict and perform an insilico curation of host-pathogen Protein-Protein Interactions (PPIs) using graph theory to build new candidate ppis and machine learning to predict and evaluate them by combining multiple PPI detection methods of proteins according to three categories: structural, based on primary aminoacid sequence and functional annotations.
HPPIDiscovery contains three main steps: (i) acquirement of pathogen and host proteins ...
PAIRED-END workflow. Align reads on fasta reference/assembly using bwa mem, get a consensus, variants, mutation explanations.
IMPORTANT:
- For "bcftools call" consensus step, the --ploidy file is in "Données partagées" (Shared Data) and must be imported in your history to use the worflow by providing this file (tells bcftools to consider haploid variant calling).
- SELECT THE MOST ADAPTED VADR MODEL for annotation (see vadr parameters).
This workflow represents the Default ML Pipeline for AutoML feature from MLme. Machine Learning Made Easy (MLme) is a novel tool that simplifies machine learning (ML) for researchers. By integrating four essential functionalities, namely data exploration, AutoML, CustomML, and visualization, MLme fulfills the diverse requirements of researchers while eliminating the need for extensive coding efforts. MLme serves as a valuable resource that empowers researchers of all technical levels to leverage ...
CLAWS (CNAG's Long-read Assembly Workflow in Snakemake)
Snakemake Pipeline used for de novo genome assembly @CNAG. It has been developed for Snakemake v6.0.5.
It accepts Oxford Nanopore Technologies (ONT) reads, PacBio HFi reads, illumina paired-end data, illumina 10X data and Hi-C reads. It does the preprocessing of the reads, assembly, polishing, purge_dups, scaffodling and different evaluation steps. By default it will preprocess the reads, run Flye + Hypo + purge_dups + yahs and evaluate ...
Type: Snakemake
Creators: Jessica Gomez-Garrido, Fernando Cruz (CNAG), Francisco Camara (CNAG), Tyler Alioto (CNAG)
Submitter: Jessica Gomez-Garrido
About SnakeMAGs
SnakeMAGs is a workflow to reconstruct prokaryotic genomes from metagenomes. The main purpose of SnakeMAGs is to process Illumina data from raw reads to metagenome-assembled genomes (MAGs). SnakeMAGs is efficient, easy to handle and flexible to different projects. The workflow is CeCILL licensed, implemented in Snakemake (run on multiple cores) and available ...
GERONIMO
Introduction
GERONIMO is a bioinformatics pipeline designed to conduct high-throughput homology searches of structural genes using covariance models. These models are based on the alignment of sequences and the consensus of secondary structures. The pipeline is built using Snakemake, a workflow management tool that allows for the reproducible execution of analyses on various computational platforms.
The idea for developing GERONIMO emerged from a comprehensive search for [telomerase ...
prepareChIPs
This is a simple snakemake
workflow template for preparing single-end ChIP-Seq data.
The steps implemented are:
- Download raw fastq files from SRA
- Trim and Filter raw fastq files using
AdapterRemoval
- Align to the supplied genome using
bowtie2
- Deduplicate Alignments using
Picard MarkDuplicates
- Call Macs2 Peaks using
macs2
A pdf of the rulegraph is available here
Full details for each step are given below. Any additional ...
SINGLE-END workflow. Align reads on fasta reference/assembly using bwa mem, get a consensus, variants, mutation explanations.
IMPORTANT:
- For "bcftools call" consensus step, the --ploidy file is in "Données partagées" (Shared Data) and must be imported in your history to use the worflow by providing this file (tells bcftools to consider haploid variant calling).
- SELECT the mot ADAPTED VADR MODEL for annotation (see vadr parameters).
This repository hosts Metabolome Annotation Workflow (MAW). The workflow takes MS2 .mzML format data files as an input in R. It performs spectral database dereplication using R Package Spectra and compound database dereplication using SIRIUS OR MetFrag . Final candidate selection is done in Python using RDKit and PubChemPy.
Type: Common Workflow Language
Creators: Mahnoor Zulfiqar, Michael R. Crusoe, Luiz Gadelha, Christoph Steinbeck, Maria Sorokina, Kristian Peters
Submitter: Mahnoor Zulfiqar
Purge dups
This snakemake pipeline is designed to be run using as input a contig-level genome and pacbio reads. This pipeline has been tested with snakemake v7.32.4
. Raw long-read sequencing files and the input contig genome assembly must be given in the config.yaml
file. To execute the workflow run:
snakemake --use-conda --cores N
Or configure the cluster.json and run using the ./run_cluster
command
MGnify genomes catalogue pipeline
MGnify A pipeline to perform taxonomic and functional annotation and to generate a catalogue from a set of isolate and/or metagenome-assembled genomes (MAGs) using the workflow described in the following publication:
Gurbich TA, Almeida A, Beracochea M, Burdett T, Burgin J, Cochrane G, Raj S, Richardson L, Rogers AB, Sakharova E, Salazar GA and Finn RD. (2023) [MGnify Genomes: A Resource for Biome-specific Microbial Genome ...
Type: Nextflow
Creators: Ekaterina Sakharova, Tatiana Gurbich, Martin Beracochea
Submitter: Martin Beracochea
GRAVI: Gene Regulatory Analysis using Variable Inputs
This is a snakemake
workflow for:
- Performing sample QC
- Calling ChIP peaks
- Performing Differential Binding Analysis
- Comparing results across ChIP targets
The minimum required input is one ChIP target with two conditions.
Full documentation can be found here
Snakemake Implementation
The basic workflow is written snakemake
, requiring at least v7.7, and can be called using the following
...
GermlineStructuralV-nf is a pipeline for identifying structural variant events in human Illumina short read whole genome sequence data. GermlineStructuralV-nf identifies structural variant and copy number events from BAM files using Manta, Smoove, and TIDDIT. Variants are then merged using SURVIVOR, ...
Type: Nextflow
Creators: Georgina Samaha, Marina Kennerson, Tracy Chew, Sarah Beecroft
Submitter: Georgina Samaha
Type: Nextflow
Creators: Pablo Riesgo Ferreiro, Thomas Bukur, Patrick Sorn
Submitter: Pablo Riesgo Ferreiro
IndexReferenceFasta-nf
===========
Interactive Jupyter Notebooks in combination with Conda environments can be used to generate FAIR (Findable, Accessible, Interoperable and Reusable/Reproducible) biomolecular simulation workflows. The interactive programming code accompanied by documentation, and the possibility to inspect intermediate results with versatile graphical charts and data visualization is very helpful, especially in iterative processes, where parameters might be adjusted to a particular system of interest. This work ...
Collection of de-novo genome assembly workflows written for implementation in Galaxy
Input data should be Oxford Nanopore raw reads plus Illumina WGS reads and Illumina 3-dimensional Chromatin Confirmation Capture (HiC) reads
Executing all workflows will output one scaffolded collapsed assembly and the complete QC analyses
Please run the workflows in order: WF0 (there are two, one for ONT, and another one for Illumina that can be used independently for the WGS and HiC reads), WF1, WF2, WF3, WF4
Maintainers: Diego De Panis
Number of items: 6
Tags: Assembly, Bioinformatics, Galaxy, Genomics, Genome assembly, ONT, illumina, Hi-C
Collection of de-novo genome assembly workflows written for implementation in Galaxy
Input data should be Oxford Nanopore raw reads plus Illumina WGS reads and Illumina 3-dimensional Chromatin Confirmation Capture (HiC) reads
Executing all workflows will output one scaffolded collapsed assembly and the complete QC analyses
Please run the workflows in order: WF0 (there are two, one for ONT, and another one for Illumina that can be used independently for the WGS and HiC reads), WF1, WF2, WF3, WF4
Maintainers: Diego De Panis
Number of items: 6
Tags: Assembly, Bioinformatics, Galaxy, Genomics, Genome assembly, ONT, illumina, Hi-C
Collection of de-novo genome assembly workflows written for implementation in Galaxy
Input data should be PacBio HiFi reads and Illumina 3-dimensional Chromatin Confirmation Capture (HiC) reads
Executing all workflows will output two scaffolded haplotype assemblies and the complete QC analyses
Please run the workflows in order: WF0 (there are two, one for HiFi and one for Illumina HiC), WF1, WF2, WF3, WF4
Maintainers: Tom Brown, Diego De Panis
Number of items: 6
Tags: Assembly, Bioinformatics, Galaxy, Genomics, Genome assembly, HiFi, Hi-C