pynnotate: a flexible tool for retrieving and processing GenBank data in molecular evolution research and education

Fernanda Caron
Felipe Magalhães
Matheus Salles
Fabricius Domingos

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Pynnotate is a Python-based tool designed for automated retrieval, parsing, and extraction of annotated gene sequences from GenBank records. The tool addresses the common challenges researchers face when working with GenBank data, including inconsistent gene nomenclature, redundant sequences, and the need for standardised gene extraction across multiple taxa. Pynnotate operates through both a graphical user interface and a command-line interface, making it accessible to users with varying levels of bioinformatics experience. The tool supports flexible sequence retrieval through manually defined accession numbers or NCBI query terms, and offers three distinct filtering modes: unconstrained (all sequences), strict (one sequence per species prioritising gene completeness), and flexible (multiple sequences per species when contributing different genes). Key features include synonym resolution for gene names, customizable sequence headers, metadata tracking, and automated gene extraction into separate files. Built-in dictionaries support animal and plant mitochondrial DNA, chloroplast DNA, and ribosomal DNA, and allow users to provide custom synonym dictionaries. The tool generates structured output including FASTA files, metadata matrices, and detailed logs, facilitating integration with downstream analyses. Designed for speed and scalability, pynnotate efficiently handles large datasets, allowing quick retrieval and extraction of annotated sequences across multiple taxa. Finally, pynnotate serves as a valuable resource for both research applications and educational settings, particularly benefiting educators conducting bioinformatics analyses with students with limited command-line experience.

Version published to 10.32942/x2294v
Feb 26, 2026

A Browser-Based Curation Tool for Expert Review of DNA Barcode Records from BOLD Systems

This article has 14 authors:
1. Stephan Kühbandner
2. Fabian Deister
3. Axel Hausmann
4. Michael Raupach
5. Ben Price
6. Torbjørn Ekrem
7. Elisabeth Stur
8. Brent Emerson
9. Peter Hollingsworth
10. Rutger Vos
11. Leonardo Dapporto
12. Adele Bordoni
13. Claudia Bruschini
14. Sónia Ferreira
This article has no evaluationsLatest version Mar 30, 2026
Benchmarking protein sequence and structure search methods for remote homology detection

This article has 6 authors:
1. Yuan Liu
2. Yingquan Zhou
3. Yan Huang
4. Hongyi Xin
5. Xiaoyong Pan
6. Hong-Bin Shen
This article has no evaluationsLatest version Feb 24, 2026
Sequenoscope: A Modular Tool for Nanopore Adaptive Sequencing Analytics and Beyond

This article has 9 authors:
1. Abdallah Meknas
2. Kyrylo Bessonov
3. Shannon H.C. Eagle
4. Christy-Lynn Peterson
5. James Robertson
6. Nicole Ricker
7. Tara Signorelli
8. John Nash
9. Aleisha Reimer
Reviewed by Access Microbiology

This article has 8 evaluationsLatest version Feb 27, 2026Latest activity Feb 27, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Browser-Based Curation Tool for Expert Review of DNA Barcode Records from BOLD Systems

Benchmarking protein sequence and structure search methods for remote homology detection

Sequenoscope: A Modular Tool for Nanopore Adaptive Sequencing Analytics and Beyond