Using symptom-based case predictions to identify host genetic factors that contribute to COVID-19 susceptibility
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Epidemiological and genetic studies on COVID-19 are currently hindered by inconsistent and limited testing policies to confirm SARS-CoV-2 infection. Recently, it was shown that it is possible to predict COVID-19 cases using cross-sectional self-reported disease-related symptoms. Here, we demonstrate that this COVID-19 prediction model has reasonable and consistent performance across multiple independent cohorts and that our attempt to improve upon this model did not result in improved predictions. Using the existing COVID-19 prediction model, we then conducted a GWAS on the predicted phenotype using a total of 1,865 predicted cases and 29,174 controls. While we did not find any common, large-effect variants that reached genome-wide significance, we do observe suggestive genetic associations at two SNPs (rs11844522, p = 1.9x10-7; rs5798227, p = 2.2x10-7). Explorative analyses furthermore suggest that genetic variants associated with other viral infectious diseases do not overlap with COVID-19 susceptibility and that severity of COVID-19 may have a different genetic architecture compared to COVID-19 susceptibility. This study represents a first effort that uses a symptom-based predicted phenotype as a proxy for COVID-19 in our pursuit of understanding the genetic susceptibility of the disease. We conclude that the inclusion of symptom-based predicted cases could be a useful strategy in a scenario of limited testing, either during the current COVID-19 pandemic or any future viral outbreak.
Article activity feed
-
-
SciScore for 10.1101/2020.08.21.20177246: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Hereafter, we added RSIDs where both the genomic location and alleles matched to a variant from dbSNP https://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.25.gz, retrieved on June 30th). dbSNPsuggested: (dbSNP, RRID:SCR_002338)Variants in the D1 meta-analysis were filtered on MAF>0.01 (all_meta_AF column), after which we performed p-value informed LD pruning, also called clumping, using PLINK (v1.90b6.10 64-bit, –-clump) and the European population from the 1000 Genomes Project (phase 3) as a reference panel. PLINKsuggested: (PLINK, RRID:SCR_001757)1000 Genomes Projectsug…SciScore for 10.1101/2020.08.21.20177246: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Hereafter, we added RSIDs where both the genomic location and alleles matched to a variant from dbSNP https://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.25.gz, retrieved on June 30th). dbSNPsuggested: (dbSNP, RRID:SCR_002338)Variants in the D1 meta-analysis were filtered on MAF>0.01 (all_meta_AF column), after which we performed p-value informed LD pruning, also called clumping, using PLINK (v1.90b6.10 64-bit, –-clump) and the European population from the 1000 Genomes Project (phase 3) as a reference panel. PLINKsuggested: (PLINK, RRID:SCR_001757)1000 Genomes Projectsuggested: (1000 Genomes Project and AWS, RRID:SCR_008801)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Limitations: There are multiple limitations of using predicted COVID-19 cases that we need to consider. Firstly, the training data might not be fully representative of the whole spectrum of COVID-19 symptoms since testing of putative cases in the early months of the pandemic was mostly restricted to patients with a more severe phenotype. Individuals with essential occupations, for example healthcare professionals, were also more frequently tested at the beginning of the pandemic. Secondly, some symptoms are also present in common chronic diseases, for example “loss of smell and taste” is frequent among patients with a neurological disorder. Indeed, a preliminary analysis of the Lifelines data showed enrichment of patients with preexisting conditions in the predicted COVID-19 cases as compared to controls but no enrichment in the confirmed COVID-19 cases compared to confirmed negative cases, indicating that these individuals might be incorrectly predicted as COVID-19 cases by the Menni COVID-19 prediction model based on their symptoms (Figure S1). Thirdly, the prevalence of COVID-19 might be different among different populations and cohorts. The false positive rates of the prediction models are likely to be larger if the prevalence of COVID-19 is small compared to other infectious diseases that often have similar symptoms. Conclusions: We show that it is possible to conduct a GWAS on predicted COVID-19. As GWAS of COVID-19 will benefit from larger samples, predicted COVID-19 c...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-