Subphenotyping of Mexican Patients With COVID-19 at Preadmission To Anticipate Severity Stratification: Age-Sex Unbiased Meta-Clustering Technique

Abstract

The COVID-19 pandemic has led to an unprecedented global health care challenge for both medical institutions and researchers. Recognizing different COVID-19 subphenotypes—the division of populations of patients into more meaningful subgroups driven by clinical features—and their severity characterization may assist clinicians during the clinical course, the vaccination process, research efforts, the surveillance system, and the allocation of limited resources.

Objective

We aimed to discover age-sex unbiased COVID-19 patient subphenotypes based on easily available phenotypical data before admission, such as pre-existing comorbidities, lifestyle habits, and demographic features, to study the potential early severity stratification capabilities of the discovered subgroups through characterizing their severity patterns, including prognostic, intensive care unit (ICU), and morbimortality outcomes.

Methods

We used the Mexican Government COVID-19 open data, including 778,692 SARS-CoV-2 population-based patient-level data as of September 2020. We applied a meta-clustering technique that consists of a 2-stage clustering approach combining dimensionality reduction (ie, principal components analysis and multiple correspondence analysis) and hierarchical clustering using the Ward minimum variance method with Euclidean squared distance.

Results

In the independent age-sex clustering analyses, 56 clusters supported 11 clinically distinguishable meta-clusters (MCs). MCs 1-3 showed high recovery rates (90.27%-95.22%), including healthy patients of all ages, children with comorbidities and priority in receiving medical resources (ie, higher rates of hospitalization, intubation, and ICU admission) compared with other adult subgroups that have similar conditions, and young obese smokers. MCs 4-5 showed moderate recovery rates (81.30%-82.81%), including patients with hypertension or diabetes of all ages and obese patients with pneumonia, hypertension, and diabetes. MCs 6-11 showed low recovery rates (53.96%-66.94%), including immunosuppressed patients with high comorbidity rates, patients with chronic kidney disease with a poor survival length and probability of recovery, older smokers with chronic obstructive pulmonary disease, older adults with severe diabetes and hypertension, and the oldest obese smokers with chronic obstructive pulmonary disease and mild cardiovascular disease. Group outcomes conformed to the recent literature on dedicated age-sex groups. Mexican states and several types of clinical institutions showed relevant heterogeneity regarding severity, potentially linked to socioeconomic or health inequalities.

Conclusions

The proposed 2-stage cluster analysis methodology produced a discriminative characterization of the sample and explainability over age and sex. These results can potentially help in understanding the clinical patient and their stratification for automated early triage before further tests and laboratory results are available and even in locations where additional tests are not available or to help decide resource allocation among vulnerable subgroups such as to prioritize vaccination or treatments.

Article activity feed

SciScore for 10.1101/2021.02.21.21252132: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	Once obtained the proper number of clusters for each study group, namely for age <18, 18-49, 50-64, and >64 and by male and female, for each cluster among all groups we averaged the values of their clinical and habits features.

Table 2: Resources

Software and Algorithms
Sentences	Resources
MCA, PCA and clustering analyses were performed using RStudio (version 3.6).	RStudio suggested: (RStudio, RRID:SCR_000432)
Data processing and additional statistical were performed using Python (version 3.8).	Python suggested: (IPython, RRID:SCR_001658)

Results from OddPub:…

SciScore for 10.1101/2021.02.21.21252132: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	Once obtained the proper number of clusters for each study group, namely for age <18, 18-49, 50-64, and >64 and by male and female, for each cluster among all groups we averaged the values of their clinical and habits features.

Table 2: Resources

Software and Algorithms
Sentences	Resources
MCA, PCA and clustering analyses were performed using RStudio (version 3.6).	RStudio suggested: (RStudio, RRID:SCR_000432)
Data processing and additional statistical were performed using Python (version 3.8).	Python suggested: (IPython, RRID:SCR_001658)

Results from OddPub: Thank you for sharing your code and data.

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

4.5 Limitations: As possible limitations, we excluded patients confirmed after September 30 to avoid possible analysis disturbance about the patient’s death result. This approach impeded us to use the most recent data whose variability of epidemiological characteristics could have changed to some degree. The patients’ real characteristics comprise many other characteristics such as discharge, cough, fever, and dyspnea which were not available in the data; it would be interesting to include these characteristics in future experiments to explore heterogeneity patterns. Furthermore, the dataset did not include any further information about the patients who were discharged nor readmissions, which is another interesting focus that are rarely reported currently. Thus, further study about the severity patterns discovery among discharged patients who received post-surveillance is highly needed.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source