Implementation of a multiplex PCR amplfication system combined with next-generation genome sequencing to decipher the circulation of Human coronavirus 229E lineages in Southern France

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Coronaviruses are known to evolve rapidly and be prone to new virus emergence. Human coronavirus-229E is one of the seven coronaviruses currently known to cause respiratory symptoms in humans. Genomic data are very scarce for this virus. Here, we implemented an in-house multiplex PCR strategy to amplify HCoV-229E genomes from nasopharyngeal samples diagnosed as positive for this virus RNA in Southeastern France. Then, next-generation sequencing was performed using Nanopore or Illumina technologies on Gridion or Novaseq 6000 instruments, respectively. HCoV-229E genomes were assembled and analyzed using MAFFT, MEGA, Itol, Nexstrain and Nextclade softwares. A total of 31 PCR primer pairs were designed to amplify overlapping fragments of the HCoV-229E genome. They allowed obtaining 123 genomes, which were classified in an emerging HCoV-229E lineage first reported in China, with two sublineages being delineated. Relatively to genome NC_002645.1 dating back to 1962, regarding nucleotide mutations, 1,167 substitutions, 72 insertions and 34 deletions were detected in viral genomes obatined here, while regarding amino acid mutations, 415 susbstitutions, 39 deletions and 14 amino acid insertions were detected. The genes with the greatest diversity were S (that encodes the spike protein) then Nsp3. Signature mutations were identified for each of the two sublineages. In summary, we almost doubled the set of HCoV-229E genomes available worldwide and provide the first genomes from France. Further studies are needed to strengthen the knowledge about the phylogenomics and evolutionary dynamics of this neglected respiratory virus, and about their specificities, which may also purvey clues to contribute improving knowledge for all other human coronaviruses.

Article activity feed

  1. This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/15279061.

    Overall Assessment

    This manuscript contributes to understanding the evolution and circulation of Human coronavirus 229E (HCoV-229E). The authors developed a multiplex PCR system for whole-genome sequencing of HCoV-229E, successfully generating 123 genomes from clinical samples collected in Southern France. This work nearly doubles the global number of available HCoV-229E genomes and provides France's first genomic data for this virus. The findings regarding lineage circulation and evolutionary patterns are significant and merit publication after addressing some concerns.

    Major Strengths

    1. The study addresses a significant knowledge gap by providing the first HCoV-229E genomes from France and demonstrating that a new lineage from China has spread to Europe, revealing previously unknown global circulation patterns. The gap was not merely about having more genomes, but about understanding the geographic spread, evolutionary dynamics, and region-specific mutations of this emerging lineage, including discovering two co-circulating sublineages with distinct mutation signatures.

    2. The methodology is well-described, with detailed PCR primer designs that will be valuable to other researchers.

    3. The analysis reveals important information about the circulation of an emerging HCoV-229E lineage in France, with evidence of two distinct sublineages.

    4. Identification of signature mutations for each sublineage provides useful markers for future surveillance.

    Major Concerns

    1. Lineage/sublineage nomenclature: While the authors identify two sublineages (a and b) within the emerging lineage, they create unnecessary confusion by not aligning their classification with previously reported 7a/7b sublineages. The authors should adopt the established 7a/7b nomenclature or provide a detailed justification for their alternative classification system, including a clear comparison showing how their sublineages relate to existing classifications. 

    2. Sample bias: While the authors note that most genomes were obtained from samples collected between 2019 and 2022, coinciding with increased respiratory virus testing during the COVID-19 pandemic, they should discuss how this temporal bias might affect the interpretation of their findings. Specifically, they should address potential biases such as: 1) samples may disproportionately represent co-infected patients with SARS-CoV-2 and HCoV-229E, 2) the intensive testing period may have captured more severe or symptomatic cases than would be detected in routine surveillance, and 3) the predominance of pandemic-era samples could skew the apparent evolutionary patterns and genetic diversity observed in their study.

    3. Functional implications: The manuscript identifies numerous mutations, particularly in the spike protein, but provides limited discussion of potential functional consequences of these mutations, particularly those in the receptor-binding domain.

    Minor Concerns

    1. Figure 2 would benefit from improved labeling and a more unambiguous indication of previously described genotypes/lineages.

    2. The authors should clarify PCR success rates - 123 genomes were obtained from 195 available samples, but what determined successful amplification besides Ct values? Additionally, they should discuss whether this 63% success rate introduces any systematic bias: Were the successfully sequenced samples different from unsuccessful ones in meaningful ways (e.g., higher viral loads, specific lineages, clinical severity, or collection timing)? Such analysis would help determine if their genomic dataset represents the circulating viral population or if technical limitations may have skewed their findings toward specific viral variants or patient characteristics.

    3. Table 2 formatting is difficult to read in its current form. A more accessible presentation of the spike protein mutations would be helpful.

    4. The timeline in Figure 1 should be more clearly labeled to show the relationship between sample collection dates and genome sequencing success.

    Recommendations 

    1. Provide a clearer resolution of the discrepancies between the sublineages identified in this work and those described in previous studies. Consider proposing a unified nomenclature if appropriate.

    2. Expand the discussion on the functional implications of the observed mutations, particularly those in the spike protein receptor-binding domains that might affect viral fitness or transmissibility.

    3. Include an analysis of selection pressures acting on different viral genes, strengthening the evolutionary analysis.

    4. If such data are available, provide more context about the clinical characteristics of the different sublineages.

    5. To strengthen the global perspective, consider including a more detailed comparison with the other recently published HCoV-229E genomic datasets (Musaeva et al. and McClure et al.).

    Conclusion

    This manuscript contributes meaningfully to coronavirus genomic surveillance, providing critical new data on HCoV-229E evolution and circulation in France. The methodological approach is sound, and the findings are significant. With revisions addressing the concerns above, this work will be a valuable addition to the scientific literature and facilitate future studies on coronavirus evolution.

    Competing interests

    The authors declare that they have no competing interests.

    Use of Artificial Intelligence (AI)

    The authors declare that they used generative AI to come up with new ideas for their review.