COVID-19 Phenotypes and Comorbidity: A Data-Driven, Pattern Recognition Approach Using National Representative Data from the United States

Abstract

The aim of our study was to determine COVID-19 syndromic phenotypes in a data-driven manner using the survey results based on survey results from Carnegie Mellon University’s Delphi Group. Monthly survey results (>1 million responders per month; 320,326 responders with a certain COVID-19 test status and disease duration <30 days were included in this study) were used sequentially in identifying and validating COVID-19 syndromic phenotypes. Logistic Regression-weighted multiple correspondence analysis (LRW-MCA) was used as a preprocessing procedure, in order to weigh and transform symptoms recorded by the survey to eigenspace coordinates, capturing a total variance of >75%. These scores, along with symptom duration, were subsequently used by the Two Step Clustering algorithm to produce symptom clusters. Post-hoc logistic regression models adjusting for age, gender, and comorbidities and confirmatory linear principal components analyses were used to further explore the data. Model creation, based on August’s 66,165 included responders, was subsequently validated in data from March–December 2020. Five validated COVID-19 syndromes were identified in August: 1. Afebrile (0%), Non-Coughing (0%), Oligosymptomatic (ANCOS); 2. Febrile (100%) Multisymptomatic (FMS); 3. Afebrile (0%) Coughing (100%) Oligosymptomatic (ACOS); 4. Oligosymptomatic with additional self-described symptoms (100%; OSDS); 5. Olfaction/Gustatory Impairment Predominant (100%; OGIP). Our findings indicate that the COVID-19 spectrum may be undetectable when applying current disease definitions focusing on respiratory symptoms alone.

SciScore for 10.1101/2021.04.30.21256219: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Ethics	not detected.
Sex as a biological variable	not detected.
Randomization	Initially, Facebook selects a random sample among its users in the United States.
Blinding	not detected.
Power Analysis	not detected.

Table 2: Resources

No key resources detected.

Results from OddPub: Thank you for sharing your code.

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

Limitations and Strengths: The results of our study should be interpreted within the context of their limitations. As a survey administered via Facebook, our source data incur the corresponding selection bias. This however is potentially balanced by the large sample size of the final cohort, and represents the single largest study of its kind. Survivor bias is also inherently present in our study, considering that responders are unlikely to have severe COVID-19 at the time of survey administration. The lack of follow-up data correspondingly precludes that phenotype shifts (e.g. ANCOS or OSDS to FMS) cannot be explored. Another important consideration is that OSDS inevitably absorbs symptoms not originally covered by the initial study iterations and is correspondingly decomposed when these symptoms are identified and added. A prime example of this case is headache as symptom; when left to the discretion of the responder, it might not be evaluated properly as a feature (38). This paradigm becomes evident by the discordance between text-mining (April – November data, 10% of OSDS) vs. asked directly (i.e. 30% in all phenotypes and decomposition of OSDS as a “pure” phenotype). Comorbidities, reported by majority as broad categories, cannot be safely considered in strict interpretations as to their associations with phenotypes. Finally, as gender categories beyond male/female are underrepresented in the monthly samples, they cannot be safely used to extrapolate their contribution o...

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Results from scite Reference Check: We found no unreliable references.

Read the original source

COVID-19 Phenotypes and Comorbidity: A Data-Driven, Pattern Recognition Approach Using National Representative Data from the United States

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Development and Deployment of a Machine Learning–Based Predictive Model for COVID- 19 Infection Using Patient Demographic and Symptom Data in Nigeria

A Preliminary Prognostic Model for Predicting Poor Prognosis in COVID-19 Integrating Lung Epithelial Injury (KL-6) with Routine Care Markers

Machine Learning Analysis of COVID19 Transmission Dynamics Demographic Risk and Contact Tracing Outcomes in Nigeria

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Development and Deployment of a Machine Learning–Based Predictive Model for COVID- 19 Infection Using Patient Demographic and Symptom Data in Nigeria

A Preliminary Prognostic Model for Predicting Poor Prognosis in COVID-19 Integrating Lung Epithelial Injury (KL-6) with Routine Care Markers

Machine Learning Analysis of COVID19 Transmission Dynamics Demographic Risk and Contact Tracing Outcomes in Nigeria