What can the current global bacterial taxonomy knowledge reveal about the actual (past and present) ecological success of each lineage?

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Our up-to-date sources of the man-known microbiology are publicly available, and include, as the most complete repository, the Genome Taxonomy Database GTDB https://gtdb.ecogenomic.org/, encompassing, on one end, 196 phyla, across which, at the other end, 107,235 bacterial species have been currently defined. The intermediate ranks feature 541 classes, 1,863 orders, 4,896 families, and 23,112 genera. Such a dataset is hereby used, not as a source of DNA sequences but rather as the handiest and presently most exhaustive checklist matrix of our current taxonomical notions. In the mathematical 'zero assumption' of an equal opportunity chance for each taxon throughout a parallel evolution, a 'perfectly even' community featuring those values would display today, within each of the 196 phyla, the mean values that would result by dividing the actual observed ones by the number of phyla. Such a virtual assemblage would therefore have the following numbers per phylum in each of them: 2.76 classes, 9.51 orders, 24.98 families, 117.92 genera, and 547.12 species. The deltas between these set-point null hypothesis values and the truly observed and described numbers for each lineage member featured by the database (numbers of species per genus, genera per family, families per order, orders per class, and classes per phylum) can reveal how much, and in which lineages, their evolution differed from each of those mere ranks' means. The comparison among the actual data also allows one to answer a series of questions: 1) What is the actually occurred uneven dynamics of evolutionary differentiation across each lineage?; 2) Which taxa are, at present, the 'winners' of the evolutionary race?, And, have they always led the run, or when did they overtake their formerly faster competitors? 3) Are there chronological bottlenecks that violate the assumption of either a constant or of a progressive speed of evolution (i.e. is speciation to each deeper, and more 'modern' rank, consistently faster than its former step, or is there evidence of evolutionary congestion 'jams'?); 4) which past step turns out to have been the most determinant to shape the modern assemblages? Which ones, among the taxa dominating today's diversity, have risen up just recently (e.g. by a massive radiation from genera), and which ones had instead started or accomplished the critical expansion that still keeps them as leaders, in remote times (e.g. by class or order radiations?; 5) Which patterns characterize in this respect the lineages of the most relevant bacterial agents in food, soil, industrial, clinical and environmental microbiology? Corresponding queries are addressed for the 5,869 species of Archaea. The methods and the results of this database-mined in silico analysis are hereby presented and discussed. The dataset provided as supplementary material allows the reader to address the above aspects for any taxon of interest within the 107,235 species covered by this analysis.

Article activity feed