SeroBA(v2.0) and SeroBAnk: a robust genome-based serotyping scheme and comprehensive atlas of capsular diversity in Streptococcus pneumoniae
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The unprecedented number of Streptococcus pneumoniae (the pneumococcus) genomes sequenced in recent years has accelerated the discovery of novel serotypes and highlighted the genetic diversity both between and within each serotype. A novel serotype should demonstrate a distinct cps locus, capsular structure, and serological profile. In only the past four years, nine new serotypes have been identified. Accurate and timely serotyping of pneumococcal isolates is key to understanding its global distribution, evolution, and the response of the bacterial population to vaccination. However, current bioinformatics serotyping tools are infrequently updated, and struggle to accommodate the rapid discovery of new serotypes in a timely manner. To address these limitations, we built a comprehensive and curated library (SeroBAnk) encompassing all known pneumococcal serotypes; this resource is presented as an atlas on a dedicated publicly accessible webpage ( https://www.pneumogen.net/gps/#/serobank ). Building upon this resource, we developed SeroBA(v2.0), a tool with an easy-to-update database that can accurately identify 102 of 107 known pneumococcal serotypes (except for serotypes 24B, 24C, 24F, 7D and 6H) and 18 genetic subtypes within serotypes 6A, 6B, 11A, 19A, 19F and 33F. We validated SeroBA(v2.0) on 26,306 genomes from the Global Pneumococcal Sequencing project, reference isolates and simulated reads derived from the reference genetic sequences of capsular polysaccharide biosynthetic ( cps ) locus and showed that SeroBA(v2.0) can reliably detect the nine recently discovered serotypes. Additionally, we show that in silico serotypes inferred by SeroBA(v2.0) had high concordance with phenotypic serotypes determined by either Quellung or latex agglutination at the serotype level (88.9%; 15,945/17,933), and at the serogroup level (91.9%; 16,480/17,933). Finally, we propose a community-contribution based approach to ensure that SeroBA(v2.0) is maintained and updated as novel serotypes continue to be discovered. The global community can submit putative novel serotypes through our public repository on GitHub ( https://github.com/GlobalPneumoSeq/seroba/issues ). The submitted putative novel serotypes will be curated based on the genetic sequence of cps region, capsular structure and serological profile by people of relevant expertise in the field. SeroBA(v2.0) can be accessed at https://github.com/GlobalPneumoSeq/seroba .
Data summary
Genome sequences are available in the European Nucleotide Archive (ENA) and are also available alongside metadata on the Monocle Database available at https://data.monocle.sanger.ac.uk/ . The authors confirm all supporting data, code and protocols have been provided within the article or through supplementary data files.
Impact Statement
The polysaccharide capsule has been an effective vaccine antigen against diseases caused by Streptococcus pneumoniae (the pneumococcus). The pneumococcal conjugate vaccine has been estimated to have halved pneumococcal-related childhood mortality over 15 years (2000-2015). We collated the genetic locus and capsular structure of each known capsule type (serotype), alongside with pneumococcal vaccine formulation and licensure history, into a single webpage (SeroBAnk), providing a valuable resource for basic research and vaccine development. With increasing use of whole genome sequencing in clinical and public health laboratories, we also provided a fast and accurate bioinformatics tool, SeroBA(v2.0), to identify 102 pneumococcal serotypes, alongside a proposed system to expand SeroBA(v2.0) to include new serotypes as they are discovered, ensuring that the tool remains valuable to the global research community in the long-term.