Robust genome-based delineation of bacterial genera
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Genomic analysis has become essential in bacterial taxonomy, enabling fast classification of bacteria. However, quantifiable measures to make informed decision on the taxonomic placement of bacterial taxa above the level of species are rare, which hampers the stability of knowledge in databases and articles. In this work, we focused on bacterial classification at the genus level and revisited the concept of Percentage Of Conserved Proteins (POCP). Whilst POCP is broadly used as an overall genome relatedness index, the underlying tool and method of calculation differ, resulting in disparate implementations and misnomers unfit for the increasing wealth of public genomes available.
Results
We evaluated 10 protein alignments methods and found evidence for a scalable yet robust alternative to BLASTP for POCP based on 2,358,466 pairwise comparisons of 4,767 genomes across 35 families. However, we showed that certain combinations of tools and parameters are not suited for POCP calculations owing to drastic over-or underestimation. Therefore, we bring forward a clearer definition of POCP using only unique matches, termed POCPu, that showed better genus delineation than with POCP. We suggested tentative family-specific thresholds when the standard of 50% was not resolutive enough.
Conclusions
We propose that faster bacterial genus delineation should be calculated with DIAMOND using very-sensitive settings and only unique matches (POCPu). We advise microbiologists to carefully assess that the tools used for genus assignment match their biological assumptions to avoid misguided inferences.