A workflow for marine Genomic Observatories data analysis
An EOSC-Life project
The workflows developed in the framework of this project are based on pipeline-v5 of the MGnify resource.
This branch is a child of the pipeline_5.1 branch
that contains a part of the CWL descriptions of the MGnify pipeline version 5.1.
The following comes from the initial repo and describes how to get the databases required.

An EOSC-Life project
The workflows developed in the framework of this project are based on pipeline-v5 of the MGnify resource.
This branch is a child of the
pipeline_5.1branch that contains all CWL descriptions of the MGnify pipeline version 5.1.
Dependencies
To run metaGOflow you need to make sure you have the following set on your computing environmnet first:
- python3 [v 3.8+]
- Docker [v 19.+] or Singularity [v 3.7.+]/Apptainer [v 1.+]
- cwltool [v 3.+]
- rdflib [v 6.+]
- rdflib-jsonld [v 0.6.2]
- ro-crate-py [v 0.7.0]
- pyyaml [v 6.0]
- Node.js [v 10.24.0+]
- Available storage ~235GB for databases
Storage while running
Depending on the analysis you are about to run, disk requirements vary. Indicatively, you may have a look at the metaGOflow publication for computing resources used in various cases.
Installation
Get the EOSC-Life marine GOs workflow
git clone https://github.com/emo-bon/MetaGOflow
cd MetaGOflow
Download necessary databases (~235GB)
You can download databases for the EOSC-Life GOs workflow by running the
download_dbs.sh script under the Installation folder.
bash Installation/download_dbs.sh -f [Output Directory e.g. ref-dbs] 
If you have one or more already in your system, then create a symbolic link pointing
at the ref-dbs folder or at one of its subfolders/files.
The final structure of the DB directory should be like the following:
user@server:~/MetaGOflow: ls ref-dbs/
db_kofam/  diamond/  eggnog/  GO-slim/  interproscan-5.57-90.0/  kegg_pathways/  kofam_ko_desc.tsv  Rfam/  silva_lsu/  silva_ssu/
How to run
Ensure that Node.js is installed on your system before running metaGOflow
If you have root access on your system, you can run the commands below to install it:
DEBIAN/UBUNTU
sudo apt-get update -y
sudo apt-get install -y nodejs
RH/CentOS
sudo yum install rh-nodejs (e.g. rh-nodejs10)
Set up the environment
Run once - Setup environment
- 
conda create -n EOSC-CWL python=3.8
- 
conda activate EOSC-CWL
- 
pip install cwlref-runner cwltool[all] rdflib-jsonld rocrate pyyaml
Run every time
conda activate EOSC-CWL
Run the workflow
- Edit the config.ymlfile to set the parameter values of your choice. For selecting all the steps, then set totruethe variables in lines [2-6].
Using Singularity
Standalone
- run:
./run_wf.sh -s -n osd-short -d short-test-case -f test_input/wgs-paired-SRR1620013_1.fastq.gz -r test_input/wgs-paired-SRR1620013_2.fastq.gz ``
Using a cluster with a queueing system (e.g. SLURM)
- 
Create a job file (e.g., SBATCH file) 
- 
Enable Singularity, e.g. module load Singularity & all other dependencies 
- 
Add the run line to the job file 
Using Docker
Standalone
- run:
 HINT: If you are using Docker, you may need to run the above command without the `-s' flag../run_wf.sh -n osd-short -d short-test-case -f test_input/wgs-paired-SRR1620013_1.fastq.gz -r test_input/wgs-paired-SRR1620013_2.fastq.gz
Testing samples
The samples are available in the test_input folder.
We provide metaGOflow with partial samples from the Human Metagenome Project (SRR1620013 and SRR1620014) They are partial as only a small part of their sequences have been kept, in terms for the pipeline to test in a fast way.
Hints and tips
- 
In case you are using Docker, it is strongly recommended to avoid installing it through snap.
- 
RuntimeError: slurm currently does not support shared caching, because it does not support cleaning up a worker after the last job finishes. Set the--disableCachingflag if you want to use this batch system.
- 
In case you are having errors like: 
cwltool.errors.WorkflowException: Singularity is not available for this tool
You may run the following command:
singularity pull --force --name debian:stable-slim.sif docker://debian:stable-sli
Contribution
To make contribution to the project a bit easier, all the MGnify conditionals and subworkflows under
the workflows/ directory that are not used in the metaGOflow framework, have been removed.
However, all the MGnify tools/ and utils/ are available in this repo, even if they are not invoked in the current
version of metaGOflow.
This way, we hope we encourage people to implement their own conditionals and/or subworkflows by exploiting the
currently supported tools and utils as well as by developing new tools and/or utils.
Version History
eosc-life-gos @ deb5427 (latest) Created 16th May 2023 at 21:41 by Haris Zafeiropoulos
Merge pull request #38 from emo-bon/fix-bugs
Fixes logical expression to keep tmp.
Frozen
 eosc-life-gos
eosc-life-gosdeb5427
    eosc-life-gos @ deb5427 Created 16th May 2023 at 21:38 by Haris Zafeiropoulos
Merge pull request #38 from emo-bon/fix-bugs
Fixes logical expression to keep tmp.
Frozen
 eosc-life-gos
eosc-life-gosdeb5427
    eosc-life-gos @ 28122db (earliest) Created 19th Sep 2022 at 19:00 by Haris Zafeiropoulos
running version with workaround in conditionals
Frozen
 eosc-life-gos
eosc-life-gos28122db
     Creators and Submitter
 Creators and SubmitterCreators
Submitter
Views: 6959 Downloads: 1505
Created: 19th Sep 2022 at 19:00
Last updated: 16th May 2023 at 23:00
 Tags
 Tags Attributions
 Attributions Collections
 Collections View on GitHub
View on GitHub Download RO-Crate
Download RO-Crate


 Biodiversity & ecol...
        Biodiversity & ecol...




 https://orcid.org/0000-0003-3472-3736
 https://orcid.org/0000-0003-3472-3736


