Workflow Type: Unrecognized workflow type
Stable

NanoporeDB_workflow

alt text

alt text

1. Overview

This repository contains the integrated computational workflow for the large-scale mining, multimeric structure prediction, and quality filtering of protein nanopores. This pipeline enables the discovery of novel nanopore candidates from massive metagenomic and genomic databases. The structural models, pore geometry analysis, and membrane orientation predictions generated by this workflow are hosted at our public database: NanoporeDB (https://db.genomics.cn/nanopore/).

2. Workflow Diagram

alt text

  • Figure 1: Overview of the nanopore mining workflow.

3. Prerequisites & Installation

3.1 Conda Environment

We recommend using Conda to manage dependencies. To replicate the environment:

conda env create -f environment.yml

conda activate Foldseek

3.2 External Tools

Ensure the following tools are installed and accessible in your $PATH:

MMseqs2 (0b27c9d7d7757f9530f2efab14d246d268849925)

Foldseek (v9.427df8a)

US-align (v20241108)

AlphaFold-Multimer & AlphaFold3 Server

4. Database Preparation

Before running the pipeline, download and index the required databases:

4.1 Foldseek pre-generated databases of AFDB

mkdir -p Database && cd Database

wget https://foldseek.steineggerlab.workers.dev/afdb.tar.gz

tar -xzf afdb.tar.gz
  • Path to this directory will be used in Step 2

4.2 Sequence Databases (UniRef90 & MGnify90)

cd Database

Download:

UniRef90 (Release 2024_05)
MGnify90 (Release 2024_04)

Pre-processing (Extract Full-Length sequences):

zcat mgy_clusters.fa.gz | perl -ne 'if(/^>/){$keep = /FL=1/} print if $keep' > MGnify90FL.fa

Indexing:

mmseqs createdb MGnify90FL.fa MGnify90FL

mmseqs createdb uniref90.fasta uniref90

5. Step-by-Step Guide

Step 1: Candidate Retrieval (Manual/Web)

PDB Search: Search keywords "nanopore", "porin" at RCSB PDB. Save oligomeric structures to 1nanopore_query/PDB_nanopore/.

AFDB Search: Search keywords at AlphaFold DB. Save monomers to 1nanopore_query/AFDB_nanopore/.

*Refer to 1nanopore_query/search_keywords.txt for the detailed query logic.

Step 2: Structure-based Mining

Compare monomeric seed structures against AFDB using Foldseek:

perl bin/2_structure_search.pl 1nanopore_query/PDB_nanopore 1nanopore_query/AFDB_nanopore Database [threads]

Step 3: Sequence-based Expansion

Expand candidates by searching against UniRef90 and MGnify90FL:

perl bin/3_sequence_search.pl Database/uniref90 Database/MGnify90FL Database/uniref90.fasta Database/MGnify90FL.fa [threads]

Step 4: Multimeric Structure Prediction

AFM: Predict locally using AlphaFold-Multimer. Save to 4Multimer_prediction/nanopore_AFM/.

AF3: Submit to AlphaFold3 Server. Save to 4Multimer_prediction/nanopore_AF3/.

Consistency Check:

python bin/4_check.py 
  • Ensures IDs match between AFM (.pdb) and AF3 (.cif)

Step 5: Quality Filtering & Merging

perl bin/5_structure_filter.pl [threads]

6. Citation

If you use this workflow or NanoporeDB, please cite: Liu et al. NanoporeDB: A Structural Resource Of Multimeric Protein Nanopores For Single-Molecule Sensing. GigaScience, 2025. DOI:[https://doi.org/10.1101/2025.11.25.690617]

Version History

V1.0.0 (latest) Created 17th May 2026 at 15:34 by Yuqian Liu

Add files via upload


Frozen V1.0.0 1c278af

main @ 8179c93 (earliest) Created 13th May 2026 at 10:53 by Yuqian Liu

Update README.md


Frozen main 8179c93
help Creators and Submitter
Creator
Submitter
Citation
Liu, Y. (2026). NanoporeDB_workflow. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.2172.2
Activity

Views: 319   Downloads: 65

Created: 13th May 2026 at 10:53

Annotated Properties
help Attributions

None

Total size: 42.7 MB
Powered by
(v.1.17.3)
Copyright © 2008 - 2026 The University of Manchester and HITS gGmbH