Integrating theory and machine learning to reveal determinants of plasmid copy number

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Plasmids are extrachromosomal mobile genetic elements whose copy numbers (PCNs) critically influence microbial evolution, antibiotic resistance and pathogenicity. Despite their importance and immense diversity, the ecological, evolutionary and molecular factors determining PCN remain poorly understood. Here, we present a theoretical model to explain the empirical power-law relationship between plasmid size and copy number, one of the fundamental quantitative principles governing PCN control. However, this relationship alone has limited predictive power. To improve PCN prediction, we introduce a data-driven approach incorporating diverse features. Trained on >10,000 plasmids, our machine learning model achieves significantly enhanced accuracy, with plasmid-encoded protein domains emerging as key predictors. Applying this framework, we conduct the first comprehensive analysis of PCN distributions across hundreds of thousands of metagenomic plasmids (IMG/PR database) and tens of thousands of clinical isolates, uncovering niche specific taxonomic PCN hotspots and ecological adaptations. These results provide critical insights into plasmid ecology, ARG surveillance and shed lights on the gut plasmidome, a “dark matter” in human microbiome.

Article activity feed