tMHG-Finder: Tree-guided Maximal Homologous Group Finder for Bacterial Genomes

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

A maximal homologous group , or MHG, as a group of sequences with a shared evolutionary ancestry, shifts the focus from a genecentric view to a homology-centric view in comparative genomic studies. Each MHG is formed by identifying and grouping all homologous sequences, which ensures that evolutionary events, such as horizontal gene transfer, gene duplication and loss, or de novo sequence evolution, are encapsulated within the same MHG. However, the current MHG computation tool, MHG-Finder, faces challenges in scalability to handle large datasets and lacks the ability to provide detailed insights into intermediate MHGs involving subsets of input genomes. We present tMHG-Finder ( https://github.com/yongze-yin/tMHG-Finder ), a new method that improves our previous method, MHG-Finder, by utilizing a guide tree to significantly improve scalability and provide more informative biological results. We also introduce a new measure, fractionalization (available at https://github.com/yongze-yin/Fract-Calculator ), to assess the accuracy of delineated MHGs compared to ground truth data. Our results show that tMHG-Finder scales linearly with the number of taxa, requiring a small fraction of the computational time of MHG-Finder. Furthermore, according to the fractionalization measure, tMHG-Finder outperforms four state-of-the-art whole-genome aligners on simulated data. Applying tMHG-Finder to a phylum of extreme-environment-resistant bacteria, we validated our results through the encapsulation of 16S rRNA sequences within MHGs. We further investigated how evolutionary rates change with phylogenetic distance and explored the functional roles of genes captured by conserved MHGs, demonstrating the broader utility of tMHG-Finder in uncovering evolutionary insights beyond MHG delineation and phylogenetic relationships.

Article activity feed