IsoAnnot
IsoAnnot is a new tool for generating functional and structural annotation at isoform level, capable of collecting and integrating information from different databases to categorize and describe each isoform, including functional and structural information for both transcript and protein.
⚠️⚠️ IsoAnnot is currenlty under beta-testing. Please see the latest release and download IsoAnnot from the release branch.
Requirements
Computational Requirements
The computational requirements to run IsoAnnot may vary depending on the organism of interest and the size of the transcriptome you want to annotate.
Reference benchmark (Human transcriptome):
- Transcriptome size: 252,205 isoforms
- CPU cores: 8 cores
- Memory: 12 GB RAM
- Disk space: 14 GB
- Execution time: ~20 hours
The number of cores can be modified by editing the --cores parameter in the last line of IsoAnnot/isoannot.sh (default is 8 cores).
Software Prerequisites
IsoAnnot requires the following software to be installed before use:
- Operating System: GNU/Linux (tested and supported)
- Python: Python 3 (managed automatically by conda)
- Conda: For dependency management
- Snakemake: Workflow management system (version 7.x recommended)
Installation
IsoAnnot is distributed as a compressed file containing the proper directory structure.
Installation steps:
-
Extract the package to your desired installation folder:
tar -xzf IsoAnnot.tar.gz cd IsoAnnot -
Ensure all prerequisites are installed (see Installation Prerequisites)
-
Install external software (see External Software)
-
Activate the snakemake conda environment:
conda activate snakemake
You're now ready to run IsoAnnot!
Configuration Files
Configuration files control how IsoAnnot processes data for each species and database combination. Snakemake configuration files in IsoAnnot use the YAML file format and are organized on a per-species basis.
Where to Find Config Files
Configuration files are organized in a hierarchical directory structure:
IsoAnnot/config/
├── ensembl/
│ ├── hsapiens/
│ │ ├── config.yaml
│ │ └── Snakefile.smk
│ ├── mmusculus/
│ │ ├── config.yaml
│ │ └── Snakefile.smk
│ └── ...
├── refseq/
│ ├── hsapiens/
│ │ ├── config.yaml
│ │ └── Snakefile.smk
│ └── ...
├── mytranscripts/
│ ├── hsapiens/
│ │ ├── config.yaml
│ │ └── Snakefile.smk
│ └── ...
└── generic/
├── config.yaml # Generic settings
├── Snakefile.smk # Main workflow
├── Snakefile_ensembl.smk
├── Snakefile_refseq.smk
└── Snakefile_mytranscripts.smk
Path structure: config///config.yaml
Examples:
- Human Ensembl:
config/ensembl/hsapiens/config.yaml - Mouse RefSeq:
config/refseq/mmusculus/config.yaml - Custom human transcripts:
config/mytranscripts/hsapiens/config.yaml
How to Modify Config Files
To modify an existing configuration:
-
Navigate to the config file:
cd IsoAnnot/config/// nano config.yaml -
Edit parameters as needed (see Configuration Parameters Explained)
-
Save the file
-
Run IsoAnnot with the updated configuration:
cd IsoAnnot ./isoannot.sh --database --species
Common modifications:
- Update database URLs to newer releases
- Change file paths for custom data
- Adjust species-specific parameters
- Modify the
transcript_versionedflag
Generic Configuration
The generic configuration file (config/generic/config.yaml) contains global settings used across all species:
interproscan_path: "software/interproscan/interproscan.sh"
pfam_clan_url: ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.clans.tsv.gz
dir_sqanti: "scripts/sqanti3/"
Key parameters:
interproscan_path: Path to InterProScan executablepfam_clan_url: URL for Pfam clan databasedir_sqanti: Directory containing SQANTI3 scripts
Output
Output Structure
IsoAnnot generates its output in a structured directory hierarchy within the directory supplied by the user. In case none is given, it will use the running directory by default. The structure of /data/ folder is as follows:
/data/
└── / # e.g., Hsapiens/
├── _tappas__annotation_file.gff3 # Main output
├── _tappas__annotation_file.gff3_mod # Modified GFF3
├── config/ # Downloaded config files
│ ├── ensembl/
│ ├── refseq/
│ └── global/
├── output/
│ └── / # Database-specific outputs
│ ├── layers/ # Annotation layers
│ │ ├── go.gtf
│ │ ├── interpro.gtf
│ │ ├── reactome.gtf
│ │ └── ...
│ ├── transcripts/ # Transcript files
│ ├── proteins/ # Protein sequences
│ └── ...
└── tmp/ # Temporary processing files
Directory naming:
- ``: Capitalized species prefix from config (e.g.,
Hsapiens, `Mmusculus`, `Stuberosum`) - ``: Lowercase common name from config (e.g.,
human, `mouse`, `potato`) - ``: Database used (e.g.,
ensembl, `refseq`, `mytranscripts`)
Main Output Files
Primary Annotation File
File: _tappas__annotation_file.gff3
This is the main output file containing comprehensive isoform-level annotations.
Example: human_tappas_ensembl_annotation_file.gff3
Location: IsoAnnot/data//
Content: GFF3-formatted annotation with:
- Gene and transcript structures
- Protein-coding predictions
- Functional annotations from multiple databases
- Structural features
- Post-translational modifications
Modified Annotation File
File: _tappas__annotation_file.gff3_mod
A modified version of the main GFF3 file optimized for downstream analysis tools.
Understanding the GFF3 Annotation File
The output GFF3 file integrates information from multiple sources:
Structural information:
- Gene and transcript coordinates
- Exon/intron structure
- CDS (coding sequence) regions
- UTR regions (5' and 3')
Functional annotations (in attributes column):
- Gene Ontology (GO): Biological process, molecular function, cellular component
- InterPro: Protein domains, families, and functional sites
- Pfam: Protein family classifications
- Reactome: Pathway associations
- UniProt: Protein function descriptions
Post-translational modifications:
- Phosphorylation sites
- Other PTMs from PhosphoSitePlus
Example GFF3 attributes:
gene_id=ENSG00000000003;transcript_id=ENST00000000003;GO=GO:0005515,GO:0003824;
InterPro=IPR001478,IPR015421;Reactome=R-HSA-112316;UniProt=P12345
Using the output:
- Import into genome browsers (IGV, UCSC Genome Browser)
- Use with tappAS for isoform-level functional analysis
- Parse programmatically for custom analyses
- Filter by specific annotation types
Troubleshooting
Problem: "The snakefile or configfile requested do not exist"
- Solution: Ensure config files exist for your species at
config///
Problem: InterProScan not found
- Solution: Run
./InterproScan_install.shor verifyinterproscan_pathinconfig/generic/config.yaml
Problem: Out of memory errors
- Solution: Increase available RAM or reduce the number of cores used
Problem: Download errors for database files
- Solution: Check internet connection and verify URLs in config file are current
Problem: Snakemake directory locked
- Solution: Use
--unlockoption (see Unlocking the Working Directory)
Support
For issues, questions, or contributions:
- GitHub Issues: https://github.com/ConesaLab/IsoAnnot/issues
- Documentation: This README
License
[License information to be added]
Version History
0.9.0b1 @ 0572562 (earliest) Created 16th Jun 2026 at 09:25 by Fabián Robledo
Merge pull request #9 from ConesaLab/dev
Dev updates
Frozen
0.9.0b1
0572562
Creators and SubmitterCreators
Submitter
Views: 10 Downloads: 3
Created: 16th Jun 2026 at 09:25
Last updated: 16th Jun 2026 at 09:28
AttributionsNone
View on GitHub
https://orcid.org/0009-0005-9047-3315