Evolutionary and methodological considerations when interpreting gene presence-absence variation in pangenomes

Tomáš Brůna
Avinash Sreedasyam
Avril M. Harder
John T. Lovell

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

While graph-based pangenomes have become a standard and interoperable foundation for comparisons across multiple reference genomes, integrating protein-coding gene annotations across pangenomes in a single ‘pangene set’ remains challenging, both because of methodological inconsistency and biological presence-absence variation (PAV). Here, we review and experimentally evaluate the root of genome annotation and pangene set inconsistency using two polyploid plant pangenomes: cotton and soybean, which were chosen because of their existing diverse high-quality genomic resources and the known importance of gene presence-absence variation in their respective breeding programs. We first demonstrate that building pangene sets across different genome resources is highly error prone: PAV calculated directly from the genome annotations hosted on public repositories recapitulates structure in annotation methods and not biological sequence differences. Re-annotation of all genomes with a single identical pipeline largely resolves the broadest stroke issues; however, substantial challenges remain, including a surprisingly common case where exactly identical sequences have different gene model structural annotations. Combined, these results clearly show that pangenome gene model annotations must be carefully integrated before any biological inference can be made regarding sequence evolution, gene copy-number, or presence-absence variation.

Version published to 10.1101/2025.08.14.670405 on bioRxiv
Aug 14, 2025

META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

This article has 8 authors:
1. Louis-Maël Guéguen
2. Alban Mathieu
3. Simon Pelletier
4. Anthony Woo
5. Namita Misra
6. Magali Moreau
7. Olivier Perin
8. Arnaud Droit
This article has no evaluationsLatest version Jan 29, 2026
Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world

This article has 7 authors:
1. Grazia Visci
2. Elisabetta Notario
3. Giuseppe Defazio
4. Mariano Francesco Caratozzolo
5. Bruno Fosso
6. Marinella Marzano
7. Graziano Pesole
This article has no evaluationsLatest version Jan 30, 2026
A Benchmarking Framework to Catalyze Individual Human Genome Projects

This article has 3 authors:
1. Manjushri kalpande
2. Apoorva Ganesh
3. Subhashini Srinivasan
This article has no evaluationsLatest version Dec 17, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world

A Benchmarking Framework to Catalyze Individual Human Genome Projects