Interpreting roles of mutations associated with the emergence of S. aureus USA300 strains using transcriptional regulatory network reconstruction

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This study presents useful insights into core genome mutations that could have contributed to the emergence of the Staphylococcus aureus lineage USA300, a frequent cause of community-acquired infections. The solid approach used is innovative in combining genome-wide association studies and RNA-expression analyses, both applied to extensive publicly available datasets. This strategy reduces the rate of false positives attributed to high genome-wide linkage disequilibrium. It is noted that this method cannot be used for most phenotype-genotype studies, especially those requiring essential population structure correction, and it can therefore not be readily replicated in different datasets.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

The Staphylococcus aureus clonal complex 8 (CC8) is made up of several subtypes with varying levels of clinical burden; from community-associated methicillin resistant S. aureus (CA-MRSA) USA300 strains to hospital-associated (HA-MRSA) USA500 strains and basal methicillin susceptible (MSSA) strains. This phenotypic distribution within a single clonal complex makes CC8 an ideal clade to study the emergence of mutations important for antibiotic resistance and community spread. Gene level analysis comparing USA300 against MSSA and HA-MRSA strains have revealed key horizontally acquired genes important for its rapid spread in the community. However, efforts to define the contributions of point mutations and indels have been confounded by strong linkage disequilibrium resulting from clonal propagation. To break down this confounding effect, we combined genetic association testing with a model of the transcriptional regulatory network (TRN) to find candidate mutations that may have led to changes in gene regulation. First, we used a De Bruijn graph genome-wide association study (DBGWAS) to enrich mutations unique to the USA300 lineages within CC8. Next, we reconstructed the TRN by using Independent Component Analysis on 670 RNA sequencing samples from USA300 and non-USA300 CC8 strains which predicted several genes with strain-specific altered expression patterns. Examination of the regulatory region of one of the genes enriched by both approaches, isdH , revealed a 38 base pair deletion containing a Fur binding site and a conserved SNP which likely led to the altered expression levels in USA300 strains. Taken together, our results demonstrate the utility of reconstructed TRNs to address the limits of genetic approaches when studying emerging pathogenic strains.

Article activity feed

  1. eLife assessment

    This study presents useful insights into core genome mutations that could have contributed to the emergence of the Staphylococcus aureus lineage USA300, a frequent cause of community-acquired infections. The solid approach used is innovative in combining genome-wide association studies and RNA-expression analyses, both applied to extensive publicly available datasets. This strategy reduces the rate of false positives attributed to high genome-wide linkage disequilibrium. It is noted that this method cannot be used for most phenotype-genotype studies, especially those requiring essential population structure correction, and it can therefore not be readily replicated in different datasets.

  2. Reviewer #1 (Public Review):

    Summary:
    This is large-scale genomics and transcriptomics study of the epidemic community-acquired methicillin-resistant S. aureus clone USA300, designed to identify core genome mutations that drove the emergence of the clone. It used publicly available datasets and a combination of genome-wide association studies (GWAS) and independent principal-component analysis (ICA) of RNA-seq profiles to compare USA300 versus non-USA300 within clonal complex 8. By overlapping the analyses the authors identified a 38bp deletion upstream of the iron-scavenging surface-protein gene isdH that was both significantly associated with the USA300 lineage and with a decreased transcription of the gene.

    Strengths:
    Several genomic studies have investigated genomic factors driving the emergence of successful S. aureus clones, in particular USA300. These studies have often focussed on acquisition of key accessory genes or have focussed on a small number of strains. This study makes a smart use of publicly available repositories to leverage the sample size of the analysis and identify new genomics markers of USA300 success.
    The approach of combining large-scale genomics and transcriptomics analysis is powerful, as it allows to make some inferences on the impact of the mutations. This is particularly important for mutations in intergenic regions, whose functional impact is often uncertain.
    The statistical genomics approaches are elegant and state-of-the-art and can be easily applied to other contexts or pathogens.

    Weaknesses:
    The main weakness of this work is that these data don't allow a casual inference on the role of isdH in driving the emergence of USA300. It is of course impossible to prove which mutation or gene drove the success of the clone, however, experimental data would have strengthened the conclusions of the authors in my opinion.
    Another limitation of this approach is that the approach taken here doesn't allow to make any conclusions on the adaptive role of the isdH mutation. In other words, it is still possible that the mutation is just a marker of USA300 success, due to other factors such as PVL, ACMI or the SCCmecIVa. This is because by its nature this analysis is heavily influenced by population structure. Usually, GWAS is applied to find genetic loci that are associated with a phenotype and are independent of the underlying population structure. Here, authors are using GWAS to find loci that are associated with a lineage. In other words, they are simply running a univariate analysis (likely a logistic regression) between genetic loci and the lineage without any correction for population structure, since population structure is the outcome. Therefore, this approach can't be applied to most phenotype-genotype studies where correction for population structure is critical.
    Finally, the approach used is complex and not easily reproduced in another dataset. Although I like DBGWAS and find the network analysis elegant, I would be interested in seeing how a simpler GWAS tool like Pyseer would perform.

  3. Reviewer #2 (Public Review):

    Summary:

    The work of Poudel et al. identified potential causal mutations related to the successful emergence of the virulent USA300 community-associated MRSA clone within clonal complex 8. To achieve this, the authors employed a methodology that combines the genome-wide association studies (GWAS) with the inference of a transcriptional regulatory network (TRN) through the independent component analysis (ICA) method from publicly available transcriptomic data. Thus, they identified genes with altered expression in the iModulons calculated by ICA and enriched mutations obtained from the De Bruijn graph genome-wide association study (DBGWAS) in the USA300 strains versus non-USA300 strains. The results revealed a deletion of 38 base pairs, containing a binding site for the Fur repressor, and an A→T mutation, both occurring in the upstream region of the isdH gene, whose expression level in USA300 strains exhibited a general increase compared to the other group. IsdH encodes the iron-regulated surface determinant protein H, which plays a crucial role in iron acquisition from heme and immune system evasion - two essential processes for the pathogenicity of S. aureus.

    Strengths:

    The clonal complex 8 (CC8), one of the most prevalent among S. aureus, encompasses strains responsible for both community-associated MRSA infections (CA-MRSA) and healthcare-associated (HA) infections (HA-MRSA and HA-MSSA). Within the CC8, one of the most prominent lineages is USA300, which emerged in the early 2000s and has since become a leading cause of CA-MRSA infections in the United States. The key genetic traits that characterize USA300 strains include the presence of the Panton-Valentine leukocidin (PVL) encoded by the genes lukF-PV and lukS-PV, the staphylococcal chromosomal cassette mec IVa (SCCmecIVa), and the arginine catabolic mobile element (ACME). Investigating the phenotypic impact of individual mutations on the success of epidemic strains through GWAS poses a challenge due to two main confounding factors: genome-wide linkage disequilibrium (LD) and population structure. The genome-wide LD is associated with false positives, where linked non-causal mutations are mistakenly identified as causal due to the same genomic backgrounds. Therefore, the strength of this work lies in the use of publicly available transcriptomic data to construct a TRN based on ICA. This approach validates the mutations enriched by GWAS and reduces the occurrence of false positives attributed to high genome-wide LD. By integrating various 'omics' data sources, this method enhances the reliability of the results and has successfully identified new potential genetic markers specific to USA300 strains. Furthermore, it revealed mutations within core genes and intergenic regulatory regions, findings that can be validated through experimental data.

    Weaknesses:

    GWAS aims to identify statistically significant associations that suggest a causal link between genotype and the specific phenotype of interest while simultaneously filtering out spurious associations caused by confounding factors. While the method described in this study minimizes the impact of genome-wide linkage disequilibrium (LD), it does not extend to addressing population structure. This is because the objective was precisely to identify mutations associated with the emergence of the USA300 clone. In this context, the confounding element arising from shared ancestry becomes the subject of analysis rather than an issue to be corrected. Therefore, it is essential to highlight that the method proposed in this work can not be applied to genome-wide association studies, where correction for population structure is critical for distinguishing genuine causal associations from spurious ones. This correction is crucial and necessary to most of the studied phenotypes of interest.

    Another limitation is that, although the authors emphasize the mutation in the isdH gene, the analyses conducted in this study do not provide insight into any potential adaptive function associated with it. Similarly, like the other genes exhibiting distinct expression patterns associated with enriched mutations from DBGWAS in USA300 strains, isdH is among the potential markers related to the success of the clone. This group includes well-established markers, such as ACME, which carries relevant genes like the arc operon and the speG gene that contribute to virulence and survival at infection sites.

    Finally, despite the availability of the codes on GitHub, the analysis itself is not easily reproducible or adaptable to other datasets.