Identifying COVID-19 phenotypes using cluster analysis and assessing their clinical outcomes
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Multiple clinical phenotypes have been proposed for COVID-19, but few have stemmed from data-driven methods. We aimed to identify distinct phenotypes in patients admitted with COVID-19 using cluster analysis, and compare their respective characteristics and clinical outcomes.
We analyzed the data from 547 patients hospitalized with COVID-19 in a Canadian academic hospital from January 1, 2020, to January 30, 2021. We compared four clustering algorithms: K-means, PAM (partition around medoids), divisive and agglomerative hierarchical clustering. We used imaging data and 34 clinical variables collected within the first 24 hours of admission to train our algorithm. We then conducted survival analysis to compare clinical outcomes across phenotypes and trained a classification and regression tree (CART) to facilitate phenotype interpretation and phenotype assignment.
We identified three clinical phenotypes, with 61 patients (17%) in Cluster 1, 221 patients (40%) in Cluster 2 and 235 (43%) in Cluster 3. Cluster 2 and Cluster 3 were both characterized by a low-risk respiratory and inflammatory profile, but differed in terms of demographics. Compared with Cluster 3, Cluster 2 comprised older patients with more comorbidities. Cluster 1 represented the group with the most severe clinical presentation, as inferred by the highest rate of hypoxemia and the highest radiological burden. Mortality, mechanical ventilation and ICU admission risk were all significantly different across phenotypes.
We conducted a phenotypic analysis of adult inpatients with COVID-19 and identified three distinct phenotypes associated with different clinical outcomes. Further research is needed to determine how to properly incorporate those phenotypes in the management of patients with COVID-19.
Article activity feed
-
SciScore for 10.1101/2022.05.27.22275708: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Ethics IRB: The Institutional Review Board of the CHUM (Centre Hospitalier de l’Université de Montréal) approved the study and informed consent was waived because of its low risk and retrospective nature.
Consent: The Institutional Review Board of the CHUM (Centre Hospitalier de l’Université de Montréal) approved the study and informed consent was waived because of its low risk and retrospective nature.Sex as a biological variable not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Table 2: Resources
Software and Algorithms Sentences Resources The raw data was managed using SQLite 3, and further data processing was conducted using Python version 3.7 and R version … SciScore for 10.1101/2022.05.27.22275708: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Ethics IRB: The Institutional Review Board of the CHUM (Centre Hospitalier de l’Université de Montréal) approved the study and informed consent was waived because of its low risk and retrospective nature.
Consent: The Institutional Review Board of the CHUM (Centre Hospitalier de l’Université de Montréal) approved the study and informed consent was waived because of its low risk and retrospective nature.Sex as a biological variable not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Table 2: Resources
Software and Algorithms Sentences Resources The raw data was managed using SQLite 3, and further data processing was conducted using Python version 3.7 and R version 4.0.3. Pythonsuggested: (IPython, RRID:SCR_001658)Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Our study presents some limitations. Multiple variables could not be included because they were either not captured in our electronic health record (e.g., time from onset of symptoms, mechanical ventilation parameters, in-hospital complications) or excluded from our study because of missingness. However, missing values are common in clinical practice and investigating risk stratification while considering the inherent characteristics real-world data is of importance at the bedside (54). In addition, this enhances the applicability of our phenotypes, as they are only based on the most common variables available for patients admitted with COVID-19 (55). This differs from studies that have included flux cytometry and CD4+/CD8+ count in their algorithm (34). Besides, those omitted variables do not seem to have had significant impact on our results as the three clusters obtained were consistent in numbers with previous work (33–39). Additionally, our study included patients admitted between January 1, 2020, and January 31, 2021, being before the approval of the majority of targeted therapies against COVID-19 or vaccination. We therefore did not assess the effect of vaccination, treatments and the type of variant on phenotypes. Accordingly, this put our algorithm at risk for temporal dataset shift (56) and calibrating our clustering algorithm will be necessary before exploiting it in the clinical setting. Finally, because race-based data is not recorded in the Quebec healthcare sys...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
-