A Time Machine for Taxonomy
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The NCBI Taxonomy Database is the primary resource for linking genomic information to taxonomic relationships, widely used across scientific disciplines and critically important to bioinformatics. This database is continuously changing as researchers discover and refine taxonomic relationships. Yet, tracking and comparing past taxonomic states is challenging due to frequent changes and the need to sift through numerous historical snapshots. To address this, we developed the Taxonomy Time Machine: a database for storing many snapshots of a taxonomic tree in a space-efficient manner. We have also created a web-based and programmatic (API) interface to make this data more accessible. This tool is capable of accurately reconstructing taxonomic lineages at any point in the history of the NCBI Taxonomy Database. We demonstrate that this tool is both perfectly accurate and significantly more efficient than loading and querying individual taxonomy snapshots, enabling its use on desktop computers as well as commodity web servers. We have made this tool available on the web ( https://taxonomy.onecodex.com ) as well as open source under the MIT license ( https://github.com/onecodex/taxonomy-time-machine ).