Computational exploration of the global microbiome for antibiotic discovery

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article


Novel antibiotics are urgently needed to combat the antibiotic-resistance crisis. We present a machine learning-based approach to predict prokaryotic antimicrobial peptides (AMPs) by leveraging a vast dataset of 63,410 metagenomes and 87,920 microbial genomes. This led to the creation of AMPSphere, a comprehensive catalog comprising 863,498 non-redundant peptides, the majority of which were previously unknown. We observed that AMP production varies by habitat, with animal-associated samples displaying the highest proportion of AMPs compared to other habitats. Furthermore, within different human-associated microbiota, strain-level differences were evident. To validate our predictions, we synthesized and experimentally tested 50 AMPs, demonstrating their efficacy against clinically relevant drug-resistant pathogens both in vitro and in vivo. These AMPs exhibited antibacterial activity by targeting the bacterial membrane. Additionally, AMPSphere provides valuable insights into the evolutionary origins of peptides. In conclusion, our approach identified AMP sequences within prokaryotic microbiomes, opening up new avenues for the discovery of antibiotics.

Article activity feed

  1. The skin abscess infection

    Do you have any insights if the same anti-infective properties would be shown in a different disease model? Such as infection not on the skin which would be easier to access but internally so that you would have to orally administer the AMP to the mouse?

  2. Five leadAMPs from different sources

    How were the AMPs for screening chosen? You discovered 1000s of AMPs, how were specific ones prioritized for testing in the mouse model? Based on the in vitro assays, or ease of synthesis?

  3. All the c_AMPs predicted here can be accessed at can retrieve the peptide sequences, ORFs, and predicted biochemical properties of eachc_AMP (e.g., molecular weight, isoelectric point, and charge at pH 7.0). We also provide thedistribution across geographical regions, habitats, and microbial species for each c_AMP.5100102104106108110112114116118120122124126128.CC-BY 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is madeThe copyright holder for this preprintthis version posted August 31, 2023.; preprint

    Awesome resource!

  4. The large number of singletons suggests that most c_AMPs originated from processesother than diversification within families, which is the opposite of the supposed origin of full-lengthproteins, in which singleton families are rare46

    This is an interesting observation and implication!

  5. o further assess the gene predictions,

    It would maybe help the reader if these were summarized in the results and not just referenced to the corresponding methods

  6. Analogously to Sberro et al.36, we used a modified

    Again, since this is a pivotol part of your analysis this should be explained more how this works and not just referenced

  7. ProGenomes2 database

    From looking at this paper, it looks like the current release of this database only include high-quality genomes from isolates? Not including MAGs? This is possibly a limitation since you screened metagenomes but not MAGs, and you can now easily find compendia of MAGs that are curated from GTDB/IMG

  8. with Illuminainstrument

    why only Illumina metagenomes? Because of the error-rates associated with metagenomes produced with Nanopore for example? I don't think you would have this issue with PacBio Hifi datasets, but also unsure of the amount of these datasets present in early 2020

  9. redict and catalog the entire global microbiome

    this sentence seems incomplete, to predict and catalog AMPs in global microbiomes? The abstract also focuses on animal-associated microbiomes, did you focus on this or include environmental microbiomes?

  10. Recently, proteome mining approaches have been developed toidentify antimicrobials in extinct organisms

    This sentence feels a little abrupt given the previous sentence, especially since there isn't an expansion of this sentence the significance of AMPs in ancient organisms