HiC scaffolding pipeline
Snakemake pipeline for scaffolding of a genome using HiC reads using yahs.
Prerequisites
This pipeine has been tested using Snakemake v7.32.4 and requires conda for installation of required tools. To run the pipline use the command:
snakemake --use-conda --cores N
where N is number of cores to use. There are provided a set of configuration and running scripts for exectution on a slurm queueing system. After configuring the cluster.json file run:
./run_cluster
Before starting
You need to create a temporary folder and specify the path in the config.yaml file. This should be able to hold the temporary files created when sorting the .pairsam file (100s of GB or even many TBs)
The path to the genome assemly must be given in the config.yaml.
The HiC reads should be paired and named as follows: Library_1.fastq.gz Library_2.fastq.gz. The pipeline can accept any number of paired HiC read files, but the naming must be consistent. The folder containing these files must be provided in the config.yaml.
Version History
Version 2 (latest) Created 21st Jun 2024 at 10:42 by Tom Brown
Add cluster json for execution on slurm
Frozen
 Version-2
Version-2efc9e4b
    Version 1 (earliest) Created 16th Mar 2024 at 09:01 by Tom Brown
Initial commit
Frozen
 Version-1
Version-1cd486a3
     Creators and Submitter
 Creators and SubmitterCreator
Submitter
Views: 3071 Downloads: 885
Created: 16th Mar 2024 at 09:01
 Tags
 Tags Attributions
 AttributionsNone
 Collections
 Collections
 View on GitHub
View on GitHub Download RO-Crate
Download RO-Crate
 ERGA Assembly Snake...
        ERGA Assembly Snake...
 Biodiversity & ecol...
        Biodiversity & ecol...


 https://orcid.org/0000-0001-8293-4816
 https://orcid.org/0000-0001-8293-4816




