Accurately Differentiating Between Patients With COVID-19, Patients With Other Viral Infections, and Healthy Individuals: Multimodal Late Fusion Learning Approach
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Effectively identifying patients with COVID-19 using nonpolymerase chain reaction biomedical data is critical for achieving optimal clinical outcomes. Currently, there is a lack of comprehensive understanding in various biomedical features and appropriate analytical approaches for enabling the early detection and effective diagnosis of patients with COVID-19.
Objective
We aimed to combine low-dimensional clinical and lab testing data, as well as high-dimensional computed tomography (CT) imaging data, to accurately differentiate between healthy individuals, patients with COVID-19, and patients with non-COVID viral pneumonia, especially at the early stage of infection.
Methods
In this study, we recruited 214 patients with nonsevere COVID-19, 148 patients with severe COVID-19, 198 noninfected healthy participants, and 129 patients with non-COVID viral pneumonia. The participants’ clinical information (ie, 23 features), lab testing results (ie, 10 features), and CT scans upon admission were acquired and used as 3 input feature modalities. To enable the late fusion of multimodal features, we constructed a deep learning model to extract a 10-feature high-level representation of CT scans. We then developed 3 machine learning models (ie, k-nearest neighbor, random forest, and support vector machine models) based on the combined 43 features from all 3 modalities to differentiate between the following 4 classes: nonsevere, severe, healthy, and viral pneumonia.
Results
Multimodal features provided substantial performance gain from the use of any single feature modality. All 3 machine learning models had high overall prediction accuracy (95.4%-97.7%) and high class-specific prediction accuracy (90.6%-99.9%).
Conclusions
Compared to the existing binary classification benchmarks that are often focused on single-feature modality, this study’s hybrid deep learning-machine learning framework provided a novel and effective breakthrough for clinical applications. Our findings, which come from a relatively large sample size, and analytical workflow will supplement and assist with clinical decision support for current COVID-19 diagnostic methods and other clinical applications with high-dimensional multimodal biomedical features.
Article activity feed
-
SciScore for 10.1101/2020.08.18.20176776: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Other RF hyperparameters in this study included Gini impurity score to determine tree split, at least 2 samples to split an internal tree, and at least 1 sample at a leaf node (all default hyperparameter settings from scikit-learn library in Python; default hyperparameters for SVM and RF as well). scikit-learnsuggested: (scikit-learn, RRID:SCR_002577)The deep learning CNN and late fusion machine learning codes were developed in Python with various supporting packages such as scikit-learn. Pythonsuggested: (IPython, RRID:SCR_001658)Results from OddPub: Thank you for sharing your code and …
SciScore for 10.1101/2020.08.18.20176776: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Other RF hyperparameters in this study included Gini impurity score to determine tree split, at least 2 samples to split an internal tree, and at least 1 sample at a leaf node (all default hyperparameter settings from scikit-learn library in Python; default hyperparameters for SVM and RF as well). scikit-learnsuggested: (scikit-learn, RRID:SCR_002577)The deep learning CNN and late fusion machine learning codes were developed in Python with various supporting packages such as scikit-learn. Pythonsuggested: (IPython, RRID:SCR_001658)Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:There are some limitations of this study. First, in order to perform multinomial classification across the four classes, we have to discard a lot of features especially in lab testing modality. The non-COVID viral pneumonia (V) class used a different electronic health record (EHR) system that collected different lab testing features from participants in Wuhan (COVID and healthy classes). Many lab testing features were demonstrated to accurately differentiate severe and non-severe COVID-19 in our previous investigation, such as hsTNI, ddimer, LDH, etc. However these features were not present (or largely missing) in the V class. Eventually, only 10 lab testing features were included, compared to an average of 20–30 features available in different EHR systems. This is probably the reason why lab testing feature modality alone was not able to make accurate classification (highest accuracy 67.7% using RF) across all four classes in this specific study. In addition, although we had a reasonably large participant pool of 638 individuals, more participants were expected to further validate the findings of this study. Another potential pitfall is that not all feature modalities could be readily available at the same time for feature fusion and multimodal classification. For single modality features, CT is the best performer to generate accurate predictions. However, CT is usually performed in the radiology department. Lab testing may be outsourced and also takes time for the results t...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
-
-