Alignment-free phylogenetic inference via hyperbolic protein language models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Conventional phylogenetic methods rely on multiple sequence alignments which are computationally intensive and often fail for highly divergent lineages. Here, we introduce LucaPhylo, an alignment-free framework that infers evolutionary relationships directly from unaligned sequences. Through a cascaded learning strategy LucaPhylo integrates protein language models with hyperbolic geometry, a representation space naturally suited to hierarchical branching, to capture deep evolutionary constraints without explicit homology matching. Using highly divergent RNA virosphere as a test case, LucaPhylo places unaligned sequences into phylogenetic trees with an accuracy comparable to leading alignment-based tree construction tools, while retaining divergent sequences that conventional pipelines frequently discard. It further enables the integration of divergent viral lineages into phylogenetic trees, thereby expanding the evolutionary landscape of RNA viruses. Together, LucaPhylo establishes an AI-driven, alignment-free paradigm for phylogenetic inference and provides a robust computational foundation for resolving deep evolutionary relationships among RNA viruses and other biological systems.