Decoding Positive Selection in Mycobacterium tuberculosis with Phylogeny-Guided Graph Attention Models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation: Positive selection is a key evolutionary force in Mycobacterium tuberculosis, driving the emergence of adaptive mutations that influence drug resistance, transmissibility, and virulence of tuberculosis. Phylogenetic trees capture the hierarchical evolutionary relationships among isolates, making them an ideal framework for detecting such adaptive signals. Here, we present a phylogeny-guided graph attention network approach, coupled with a novel method for converting SNP-annotated phylogenetic trees into graph structures suitable for graph neural network processing. Results: Using a dataset of 1,000 M. tuberculosis isolates, representing the four main lineages, and 249 single-nucleotide variants (84 resistance-associated and 165 neutral) spanning 61 drug-resistance genes, we constructed graphs where nodes represented individual isolates and edges reflected phylogenetic distances. To reduce noise and highlight local evolutionary structure, we pruned edges between isolates separated by more than seven internal nodes. Node features were encoded as binary indicators of SNP presence or absence, and the graph attention network (GAT) architecture comprised two attention layers with a residual connection, followed by global attention pooling and a multilayer perceptron classifier. The model achieved an accuracy of 0.81 on the held-out test set, and application to 146 WHO-classified "uncertain" variants identified 28 high-confidence candidates with convergent occurrence across multiple lineages, consistent with adaptive evolution. These variants included: eis c.-37G>T (kanamycin, amikacin), embA c.-12C>T (ethambutol) and rpoA Thr187Ala (rifampicin). These findings demonstrate the feasibility of transforming phylogenetic trees into graph neural network-compatible structures and utility of attention-based models for detecting signals of positive selection, supporting genomic surveillance and prioritising candidate variants for experimental validation.

Article activity feed