Accurately Differentiating Between Patients With COVID-19, Patients With Other Viral Infections, and Healthy Individuals: Multimodal Late Fusion Learning Approach

Abstract

Effectively identifying patients with COVID-19 using nonpolymerase chain reaction biomedical data is critical for achieving optimal clinical outcomes. Currently, there is a lack of comprehensive understanding in various biomedical features and appropriate analytical approaches for enabling the early detection and effective diagnosis of patients with COVID-19.

Objective

We aimed to combine low-dimensional clinical and lab testing data, as well as high-dimensional computed tomography (CT) imaging data, to accurately differentiate between healthy individuals, patients with COVID-19, and patients with non-COVID viral pneumonia, especially at the early stage of infection.

Methods

In this study, we recruited 214 patients with nonsevere COVID-19, 148 patients with severe COVID-19, 198 noninfected healthy participants, and 129 patients with non-COVID viral pneumonia. The participants’ clinical information (ie, 23 features), lab testing results (ie, 10 features), and CT scans upon admission were acquired and used as 3 input feature modalities. To enable the late fusion of multimodal features, we constructed a deep learning model to extract a 10-feature high-level representation of CT scans. We then developed 3 machine learning models (ie, k-nearest neighbor, random forest, and support vector machine models) based on the combined 43 features from all 3 modalities to differentiate between the following 4 classes: nonsevere, severe, healthy, and viral pneumonia.

Results

Multimodal features provided substantial performance gain from the use of any single feature modality. All 3 machine learning models had high overall prediction accuracy (95.4%-97.7%) and high class-specific prediction accuracy (90.6%-99.9%).

Conclusions

Compared to the existing binary classification benchmarks that are often focused on single-feature modality, this study’s hybrid deep learning-machine learning framework provided a novel and effective breakthrough for clinical applications. Our findings, which come from a relatively large sample size, and analytical workflow will supplement and assist with clinical decision support for current COVID-19 diagnostic methods and other clinical applications with high-dimensional multimodal biomedical features.

SciScore for 10.1101/2020.08.18.20176776: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Other RF hyperparameters in this study included Gini impurity score to determine tree split, at least 2 samples to split an internal tree, and at least 1 sample at a leaf node (all default hyperparameter settings from scikit-learn library in Python; default hyperparameters for SVM and RF as well).	scikit-learn suggested: (scikit-learn, RRID:SCR_002577)
The deep learning CNN and late fusion machine learning codes were developed in Python with various supporting packages such as scikit-learn.	Python suggested: (IPython, RRID:SCR_001658)

Results from OddPub: Thank you for sharing your code and …

SciScore for 10.1101/2020.08.18.20176776: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Other RF hyperparameters in this study included Gini impurity score to determine tree split, at least 2 samples to split an internal tree, and at least 1 sample at a leaf node (all default hyperparameter settings from scikit-learn library in Python; default hyperparameters for SVM and RF as well).	scikit-learn suggested: (scikit-learn, RRID:SCR_002577)
The deep learning CNN and late fusion machine learning codes were developed in Python with various supporting packages such as scikit-learn.	Python suggested: (IPython, RRID:SCR_001658)

Results from OddPub: Thank you for sharing your code and data.

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

There are some limitations of this study. First, in order to perform multinomial classification across the four classes, we have to discard a lot of features especially in lab testing modality. The non-COVID viral pneumonia (V) class used a different electronic health record (EHR) system that collected different lab testing features from participants in Wuhan (COVID and healthy classes). Many lab testing features were demonstrated to accurately differentiate severe and non-severe COVID-19 in our previous investigation, such as hsTNI, ddimer, LDH, etc. However these features were not present (or largely missing) in the V class. Eventually, only 10 lab testing features were included, compared to an average of 20–30 features available in different EHR systems. This is probably the reason why lab testing feature modality alone was not able to make accurate classification (highest accuracy 67.7% using RF) across all four classes in this specific study. In addition, although we had a reasonably large participant pool of 638 individuals, more participants were expected to further validate the findings of this study. Another potential pitfall is that not all feature modalities could be readily available at the same time for feature fusion and multimodal classification. For single modality features, CT is the best performer to generate accurate predictions. However, CT is usually performed in the radiology department. Lab testing may be outsourced and also takes time for the results t...

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Accurately Differentiating Between Patients With COVID-19, Patients With Other Viral Infections, and Healthy Individuals: Multimodal Late Fusion Learning Approach

This article has been Reviewed by the following groups

Listed in

Abstract

Objective

Methods

Results

Conclusions

Article activity feed

Clinical features of MP associated pneumonia with or without viruses among hospitalized children in 2023, Wenzhou, Zhejiang, China

Epidemiological and Clinical Characteristics, Transmissibility, and Associated Factors of HAdV in Wuhan, China (2023-2024): Insights from a Multi-Center Public Health Surveillance System

Long COVID Syndrome Among Healthcare Workers in Italy: Online Survey

This article has been Reviewed by the following groups

Listed in

Abstract

Objective

Methods

Results

Conclusions

Article activity feed

Related articles

Clinical features of MP associated pneumonia with or without viruses among hospitalized children in 2023, Wenzhou, Zhejiang, China

Epidemiological and Clinical Characteristics, Transmissibility, and Associated Factors of HAdV in Wuhan, China (2023-2024): Insights from a Multi-Center Public Health Surveillance System

Long COVID Syndrome Among Healthcare Workers in Italy: Online Survey