Comprehensive fitness landscape of SARS-CoV-2 Mpro reveals insights into viral resistance mechanisms
Curation statements for this article:- 
  Curated by eLifeEvaluation Summary: The main protease of SARS-CoV-2 (Mpro) is a leading target for drug design due to its conserved and indispensable role in the life cycle. This study used cutting-edge molecular and statistical methods to generate a detailed genotype-phenotype map of this protein, also called a "fitness landscape." This fitness landscape is impressive in its detail and resolution and offers new insight that can be used to potentially generate new anti-viral drugs. This study was original in conception, precise in execution, and impressive in design. In sum, this constitutes a very meaningful addition to scientific literature. (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their name with the authors.) 
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (eLife)
- Evaluated articles (ScreenIT)
Abstract
With the continual evolution of new strains of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) that are more virulent, transmissible, and able to evade current vaccines, there is an urgent need for effective anti-viral drugs. The SARS-CoV-2 main protease (M pro ) is a leading target for drug design due to its conserved and indispensable role in the viral life cycle. Drugs targeting M pro appear promising but will elicit selection pressure for resistance. To understand resistance potential in M pro , we performed a comprehensive mutational scan of the protease that analyzed the function of all possible single amino acid changes. We developed three separate high throughput assays of M pro function in yeast, based on either the ability of M pro variants to cleave at a defined cut-site or on the toxicity of their expression to yeast. We used deep sequencing to quantify the functional effects of each variant in each screen. The protein fitness landscapes from all three screens were strongly correlated, indicating that they captured the biophysical properties critical to M pro function. The fitness landscapes revealed a non-active site location on the surface that is extremely sensitive to mutation, making it a favorable location to target with inhibitors. In addition, we found a network of critical amino acids that physically bridge the two active sites of the M pro dimer. The clinical variants of M pro were predominantly functional in our screens, indicating that M pro is under strong selection pressure in the human population. Our results provide predictions of mutations that will be readily accessible to M pro evolution and that are likely to contribute to drug resistance. This complete mutational guide of M pro can be used in the design of inhibitors with reduced potential of evolving viral resistance.
Article activity feed
- 
    
- 
      Author Response Reviewer #1 (Public Review): Major points: - The "collateral" fitness effects of mutations on proteins, i.e. those which change overall fitness by means other than directly altering the specific function, are also important sources of abundance changes in DMS experiments [https://www.pnas.org/doi/10.1073/pnas.1918680117]. In this instance, given that expression of Mpro is toxic at high levels, it seems plausible that the fluorescence-based fitness measurements could be influenced by these effects. In these experiments, libraries are first grown for many generations, and in some instances are also expanded after selection before sequencing. The experimental design of the FACS experiments limits the impact of these collateral effects, but these may significantly shape the pre-selection library abundances. Given that the …
 Author Response Reviewer #1 (Public Review): Major points: - The "collateral" fitness effects of mutations on proteins, i.e. those which change overall fitness by means other than directly altering the specific function, are also important sources of abundance changes in DMS experiments [https://www.pnas.org/doi/10.1073/pnas.1918680117]. In this instance, given that expression of Mpro is toxic at high levels, it seems plausible that the fluorescence-based fitness measurements could be influenced by these effects. In these experiments, libraries are first grown for many generations, and in some instances are also expanded after selection before sequencing. The experimental design of the FACS experiments limits the impact of these collateral effects, but these may significantly shape the pre-selection library abundances. Given that the competitive growth experiment involves a T=0 sequencing sample, would it be possible to analyze this pre-selection variant distribution for potential bias?
 To analyze enrichment/depletion of variants pre-selection, we compared each variant count in the plasmid library to that of the t=0 sample (immediately prior to induction of Mpro expression). Sequencing counts of the plasmid library correlate with counts in the t=0 sample indicating minimal selection before the induction of Mpro expression. Variants at low frequency showed a wider variance consistent with lower sampling. We added panels f and g to Figure 1- figure supplement 1 and the following text to the results section: “Variant counts analyzed by sequencing before and after the pre-selection amplification step were correlated, consistent with minimal to no selection prior to induction with β-estradiol (Figure 1 – figure supplement 1f and Figure 1 – figure supplement 1g).” - The comparisons with the clinically observed mutants are impressive. They note that "only nine having a score below that of the WT distribution". Are these seen with other mutations? As the authors note, their study is confined to single-site mutations and does not directly consider epistatic effects. Are there potential second-site mutations that could explain these 9 outliers?
 Of the clinical isolated sequenced to date, fewer than 0.4% have more than one mutation in the Mpro gene. We therefore did not take epistatic effects into account when performing this analysis. The following has been added to the text: “The vast majority of the clinical isolates that have been sequenced to date have either 0 or 1 Mpro mutations with fewer than 0.4% having 2 or greater mutations and thus we did not account for epistasis in our analysis.” - We have some additional questions about the data analysis. In all three experiments, variant proportions are mapped to [0,1], where average stop codon proportions are set as 0 and WT codons as 1. The use of this mapping should not influence the results, but it obscures the overall power. That is, the magnitudes of the changes in abundance underlying, e.g., an apparent loss of function mutation are unclear. The details of the unnormalized abundance changes should be included to improve reproducibility as well as interpretability. We also wonder whether the growth experiments should be fit to selection coefficients as well as normalized fitness scores.
 The raw counts, unnormalized scores and normalized scores are reported for each screen in Figure 2 – source data 1. Additionally, the growth experiments have been fit to selection coefficients (slope of log2(variant counts/WT counts) and these are also reported in Figure 2 – source data 1. For the growth screen we chose to report the functional score as opposed to the selection coefficients in the paper so they would be directly comparable to the TF and FRET functional scores. The following was added to the Materials and Methods: “The functional scores were normalized setting the score for the average WT Mpro barcode as 1 and the average stop codon as 0. Both the unnormalized and normalized scores are reported in Figure 2 – source data 1. For comparison, the counts for the growth-based screen were fit to selection coefficients (slope of log2(variant/WT counts)). We chose to report the functional scores as opposed to the selection coefficients in this paper so they would be directly comparable to the TF and FRET functional scores.” - Finally, we would request that the analysis scripts be made available. Given the current lack of standardization for DMS experiments and the difficulty with these types of analysis, this is both necessary for reproducibility and as a benefit for the community.
 All custom analysis scripts have been deposited at https://github.com/JuliaFlynn. This information has been added to the Key Resource Table. 
- 
      Evaluation Summary: The main protease of SARS-CoV-2 (Mpro) is a leading target for drug design due to its conserved and indispensable role in the life cycle. This study used cutting-edge molecular and statistical methods to generate a detailed genotype-phenotype map of this protein, also called a "fitness landscape." This fitness landscape is impressive in its detail and resolution and offers new insight that can be used to potentially generate new anti-viral drugs. This study was original in conception, precise in execution, and impressive in design. In sum, this constitutes a very meaningful addition to scientific literature. (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 … Evaluation Summary: The main protease of SARS-CoV-2 (Mpro) is a leading target for drug design due to its conserved and indispensable role in the life cycle. This study used cutting-edge molecular and statistical methods to generate a detailed genotype-phenotype map of this protein, also called a "fitness landscape." This fitness landscape is impressive in its detail and resolution and offers new insight that can be used to potentially generate new anti-viral drugs. This study was original in conception, precise in execution, and impressive in design. In sum, this constitutes a very meaningful addition to scientific literature. (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their name with the authors.) 
- 
      Reviewer #2 (Public Review): This study reports a "deep mutational scan" of the Mpro gene of SARS-CoV-2. The authors describe a complex new experimental system, together with all necessary controls to validate results. They focus in particular on mutations that disrupt function, with an eye toward guiding evolution-proof antiviral compounds. I find no serious weaknesses in the work or the manuscript. Rigorous connections to functional implications for other coronaviruses are also presented, and this work will be of broad interest to virologists working in this area as well as to a broader audience of evolutionary geneticists. 
- 
      Reviewer #1 (Public Review): Along with vaccines and non-pharmaceutical interventions, antivirals are essential tools in the ongoing SARS-CoV-2 pandemic. The viral main protease (Mpro) has several features making it an excellent candidate: it plays an essential role in replication and infection, is conserved in coronaviruses, and lacks human homologs. No inhibitors have been approved, but several are in clinical trials. As a result, the potential for resistance mutations to arise is largely unknown. In this manuscript, Flynn et al. use Deep Mutational Scanning (DMS) in three yeast-based systems to investigate the sensitivity of Mpro function to single-site mutations as an indicator of this potential. The major strength of this work is the discovery of protein regions which are intolerant of substitutions. Medicinal chemistry efforts … Reviewer #1 (Public Review): Along with vaccines and non-pharmaceutical interventions, antivirals are essential tools in the ongoing SARS-CoV-2 pandemic. The viral main protease (Mpro) has several features making it an excellent candidate: it plays an essential role in replication and infection, is conserved in coronaviruses, and lacks human homologs. No inhibitors have been approved, but several are in clinical trials. As a result, the potential for resistance mutations to arise is largely unknown. In this manuscript, Flynn et al. use Deep Mutational Scanning (DMS) in three yeast-based systems to investigate the sensitivity of Mpro function to single-site mutations as an indicator of this potential. The major strength of this work is the discovery of protein regions which are intolerant of substitutions. Medicinal chemistry efforts focusing on these regions may provide candidates which are more robust towards resistance mutations, as functional constraints may not allow alterations to modify ligand binding in those locations. An additional strength of this paper is the novelty of combining multiple independent readouts of protein function (in this case the three yeast screens). DMS often uses a single measure of fitness to infer protein function. If this is organismal growth, the readout is a very coarse-grained phenotype which can be difficult to interpret. Having an additional specific measurement of protein function provides more specific insight into what is driving fitness changes. This is unusual in DMS studies and comparisons between these distinct additional functional readouts make this study of particular interest for DMS practitioners. While the consistency between the methods is analyzed in some detail, there is potential to mine interesting data from the highest magnitude and most significant outliers (if they exist), especially those that differ between the live/die screen and the two fluorescent-based screens. The live/die screen may report on changes in substrate specificity or gains of catalytic function differently than the two fluorescent-based screens. One limitation of this work is with benchmarking the functional scores. It is unclear how the fitness scores would map to overall viral fitness. The authors observe a bimodal distribution of fitness scores in all screens, with almost all mutations having either WT-like or non-functional, with very few intermediate phenotypes. This is perhaps to be expected for a highly-adapted protein lying near the top of a fitness landscape, but if some amount of the null fitness scores are viable, this might undermine some of the overall conclusions. Overall, this is a careful and impressive work. It will likely be useful to guide anti-viral design and will also interest the DMS field. Major points: 1. The "collateral" fitness effects of mutations on proteins, i.e. those which change overall fitness by means other than directly altering the specific function, are also important sources of abundance changes in DMS experiments [https://www.pnas.org/doi/10.1073/pnas.1918680117]. In this instance, given that expression of Mpro is toxic at high levels, it seems plausible that the fluorescence-based fitness measurements could be influenced by these effects. In these experiments, libraries are first grown for many generations, and in some instances are also expanded after selection before sequencing. The experimental design of the FACS experiments limits the impact of these collateral effects, but these may significantly shape the pre-selection library abundances. Given that the competitive growth experiment involves a T=0 sequencing sample, would it be possible to analyze this pre-selection variant distribution for potential bias? 
 2. The comparisons with the clinically observed mutants are impressive. They note that "only nine having a score below that of the WT distribution". Are these seen with other mutations? As the authors note, their study is confined to single-site mutations and does not directly consider epistatic effects. Are there potential second-site mutations that could explain these 9 outliers?
 3. We have some additional questions about the data analysis. In all three experiments, variant proportions are mapped to [0,1], where average stop codon proportions are set as 0 and WT codons as 1. The use of this mapping should not influence the results, but it obscures the overall power. That is, the magnitudes of the changes in abundance underlying, e.g., an apparent loss of function mutation are unclear. The details of the unnormalized abundance changes should be included to improve reproducibility as well as interpretability. We also wonder whether the growth experiments should be fit to selection coefficients as well as normalized fitness scores.4. Finally, we would request that the analysis scripts be made available. Given the current lack of standardization for DMS experiments and the difficulty with these types of analysis, this is both necessary for reproducibility and as a benefit for the community. 
- 
      SciScore for 10.1101/2022.01.26.477860: (What is this?) Please note, not all rigor criteria are appropriate for all manuscripts. Table 1: Rigor Ethics not detected. Sex as a biological variable not detected. Randomization A pool of DNA constructs containing a randomized 18 bp barcode sequence (N18) was cloned into the NotI and AscI sites upstream of the LexA promoter sequence via restriction digestion, ligation and transformation into chemically competent E. coli. Blinding not detected. Power Analysis not detected. Cell Line Authentication not detected. Table 2: Resources Antibodies Sentences Resources 15 µg of total cellular protein was resolved by SDS-PAGE, transferred to a PVDF membrane, and probed using an anti-his antibody (ref). anti-hissuggested: NoneExperimental Models: Cell Lines Sentences Resources W303 cells were transformed with the … SciScore for 10.1101/2022.01.26.477860: (What is this?) Please note, not all rigor criteria are appropriate for all manuscripts. Table 1: Rigor Ethics not detected. Sex as a biological variable not detected. Randomization A pool of DNA constructs containing a randomized 18 bp barcode sequence (N18) was cloned into the NotI and AscI sites upstream of the LexA promoter sequence via restriction digestion, ligation and transformation into chemically competent E. coli. Blinding not detected. Power Analysis not detected. Cell Line Authentication not detected. Table 2: Resources Antibodies Sentences Resources 15 µg of total cellular protein was resolved by SDS-PAGE, transferred to a PVDF membrane, and probed using an anti-his antibody (ref). anti-hissuggested: NoneExperimental Models: Cell Lines Sentences Resources W303 cells were transformed with the p416LexA-UbMpro(C145A)-his construct and the resulting yeast cells were grown to exponential phase in SD-ura media at 30°C. 125 nM β-estradiol was added when indicated and cells were grown for an additional eight hours. W303suggested: Coriell Cat# GM03889, RRID:CVCL_W303)Experimental Models: Organisms/Strains Sentences Resources The Gal4, Gal80 and Pdr5 genes were disrupted to create the following strain: W303 HO::Gal1-GFP-v5-His3; gal4::trp1; gal80::leu2 pdr5::natMX. Gal4 , Gal80suggested: RRID:BDSC_51280)HO::Gal1-GFP-v5-His3; gal4::trp1; gal80::leu2 pdr5::natMXsuggested: NoneRecombinant DNA Sentences Resources Construction of WT Ub-Mpro vector (p416LexA_UbMpro(WT)_B112): The Ubiqutin-Mpro gene fusion was constructed using overlapping PCR of the yeast ubiquitin gene and SARS-CoV-2 Mpro gene (Jin, Du et al. 2020) and was inserted into the pRS416 vector after digestion with SpeI and BamHI. pRS416suggested: RRID:Addgene_158144)(Addgene plasmid # 58434; http://n2t.net/addgene:58434 ; RRID:Addgene_58434))(Ottoz, Rudolf et al. 2014) and inserted between the SacI and SpeI sites upstream of the ubiquitin-Mpro gene. detected: RRID:Addgene_58434)Addgene plasmid # 58437; http://n2t.net/addgene:58437; RRID:Addgene_58437))(Ottoz, Rudolf et al. 2014) and inserted into the KpnI site. detected: RRID:Addgene_58437)Barcode association: To associate barcodes with Mpro variants, we digested the p416-UbMpro(lib)-B112 plasmid upstream of the N18 sequence and downstream of the Mpro sequence with NotI and SalI enzymes (NEB). p416-UbMpro(lib)-B112suggested: NoneThe randomized 18 bp barcode sequence (N18) was cloned between the NotI and AscI sites upstream of the LexA promoter sequence in the p416LexA-Ub-Mpro(WT)-B112 vector with the goal of the WT sequence being represented by approximately 100 barcodes. p416LexA-Ub-Mpro(WT)-B112suggested: NoneThe resulting DBD-MproCS-AD fusion gene was inserted between the EcoRI and SacI sites downstream of the CUP promoter in the integrative bidirectional pDK-ATC plasmid (kindly provided by D. Kaganovich)(Amen and Kaganovich 2017). pDK-ATCsuggested: NoneThe mCherry gene was subsequently cloned into the XhoI/BamHI sites downstream of the TEF promoter in the opposite orientation to create the plasmid pDK-CUP-DBD-MproCS-AD-TEF-mCherry. pDK-CUP-DBD-MproCS-AD-TEF-mCherrysuggested: NoneBulk Split transcription factor (TF) competition experiment: Barcoded WT UbMpro (p416LexA-UbMpro(WT)-N18) plasmid was mixed with the barcoded UbMpro library (p416LexA-UbMpro(lib)-N18) at a ratio of 20-fold WT to the average library variant. p416LexA-UbMpro(WT)-N18suggested: NoneThe CyPet gene was amplified by PCR from the pCyPet-His vector (pCyPet-His was a gift from Patrick Daugherty (Addgene plasmid # 14030 ; http://n2t.net/addgene:14030 ; RRID:Addgene_14030)) with a forward primer containing the BamHI site (BamHI_F) and a reverse primer containing the extending MproCS overhang sequence. detected: RRID:Addgene_14030)The YPet gene was amplified by PCR from the pYPet-His vector (pYPet-His was a gift from Patrick Daugherty (Addgene plasmid # 14031 ; http://n2t.net/addgene:14031 ; RRID:Addgene_14031)) with a forward primer containing the extending MproCS overhang sequence and a reverse primer containing the XhoI site (XhoI_R). detected: RRID:Addgene_14031)The resulting CyPet-MproCS-YPet gene was inserted between the BamHI and XhoI sites downstream of the TEF promoter in the integrative bidirectional pDK-ATG plasmid (kindly provided by D. Kaganovich)(Amen and Kaganovich 2017). pDK-ATGsuggested: NoneAnalysis of Mpro expression and Ubiquitin removal by Western Blot: To facilitate analysis of expression levels of Mpro and examine effective removal of Ubiquitin, a his tag was fused to the C-terminus of Mpro to create the plasmid p416LexA-UbMpro-his6-B112. p416LexA-UbMpro-his6-B112suggested: NoneW303 cells were transformed with the p416LexA-UbMpro(C145A)-his construct and the resulting yeast cells were grown to exponential phase in SD-ura media at 30°C. 125 nM β-estradiol was added when indicated and cells were grown for an additional eight hours. p416LexA-UbMpro(C145A)-hissuggested: NoneSoftware and Algorithms Sentences Resources PacBio circular consensus sequences (CCS) were generated from the raw reads using SMRTLink v. SMRTLinksuggested: NoneIllumina sequence reads were filtered for Phred scores > 10 and strict matching of the sequence to the expected template and identifier sequence. Phredsuggested: (Phred, RRID:SCR_001017)The figures were generated using Matplotlib (Hunter 2007), PyMOL and GraphPad Prism version 9.3.1. Matplotlibsuggested: (MatPlotLib, RRID:SCR_008624)PyMOLsuggested: (PyMOL, RRID:SCR_000305)GraphPad Prismsuggested: (GraphPad Prism, RRID:SCR_002798)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog). 
 Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:However, there are a couple of caveats that should be kept in mind when utilizing these data sets. For example, we do not fully understand how Mpro’s biochemical function relates to viral fitness. Having some Mpro function is essential to the virus, so mutations that destroy Mpro function will form non-functional viruses. Function-fitness relationships tend to be non-linear (Heinrich and Rapoport 1974, Kacser and Fell 1995, Jiang, Mishra et al. 2013) and it may be likely that Mpro function must be decreased by a large amount in order to cause measurable changes in viral replication efficiency. This relationship between Mpro function and SARS-CoV-2 fitness would need to be determined in order to translate our functional scores to fitness scores. Additionally, our TF and FRET screens quantify cleavage at one defined site (Nsp4/5) and it may be important to analyze all sites in order to fully understand the selection pressures acting on Mpro. Another important caveat is that our fitness landscape captures single amino acid changes and therefore does not provide information on the potential interdependence or epistasis between double and higher order mutations. Information regarding epistasis will be important for accurately predicting the impacts of multiple mutations on fitness. Despite these caveats, the similarity in fitness landscapes for the TF and FRET screens with the yeast growth screen suggests that all three capture fundamental and general aspects of Mpro selection. In... Results from TrialIdentifier: No clinical trial numbers were referenced. Results from Barzooka: We did not find any issues relating to the usage of bar graphs. Results from JetFighter: We did not find any issues relating to colormaps. 
 Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
 Results from scite Reference Check: We found no unreliable references. 
- 
    
