Accurately Differentiating Between Patients With COVID-19, Patients With Other Viral Infections, and Healthy Individuals: Multimodal Late Fusion Learning Approach

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Effectively identifying patients with COVID-19 using nonpolymerase chain reaction biomedical data is critical for achieving optimal clinical outcomes. Currently, there is a lack of comprehensive understanding in various biomedical features and appropriate analytical approaches for enabling the early detection and effective diagnosis of patients with COVID-19.

Objective

We aimed to combine low-dimensional clinical and lab testing data, as well as high-dimensional computed tomography (CT) imaging data, to accurately differentiate between healthy individuals, patients with COVID-19, and patients with non-COVID viral pneumonia, especially at the early stage of infection.

Methods

In this study, we recruited 214 patients with nonsevere COVID-19, 148 patients with severe COVID-19, 198 noninfected healthy participants, and 129 patients with non-COVID viral pneumonia. The participants’ clinical information (ie, 23 features), lab testing results (ie, 10 features), and CT scans upon admission were acquired and used as 3 input feature modalities. To enable the late fusion of multimodal features, we constructed a deep learning model to extract a 10-feature high-level representation of CT scans. We then developed 3 machine learning models (ie, k-nearest neighbor, random forest, and support vector machine models) based on the combined 43 features from all 3 modalities to differentiate between the following 4 classes: nonsevere, severe, healthy, and viral pneumonia.

Results

Multimodal features provided substantial performance gain from the use of any single feature modality. All 3 machine learning models had high overall prediction accuracy (95.4%-97.7%) and high class-specific prediction accuracy (90.6%-99.9%).

Conclusions

Compared to the existing binary classification benchmarks that are often focused on single-feature modality, this study’s hybrid deep learning-machine learning framework provided a novel and effective breakthrough for clinical applications. Our findings, which come from a relatively large sample size, and analytical workflow will supplement and assist with clinical decision support for current COVID-19 diagnostic methods and other clinical applications with high-dimensional multimodal biomedical features.

Article activity feed

  1. SciScore for 10.1101/2020.08.18.20176776: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Other RF hyperparameters in this study included Gini impurity score to determine tree split, at least 2 samples to split an internal tree, and at least 1 sample at a leaf node (all default hyperparameter settings from scikit-learn library in Python; default hyperparameters for SVM and RF as well).
    scikit-learn
    suggested: (scikit-learn, RRID:SCR_002577)
    The deep learning CNN and late fusion machine learning codes were developed in Python with various supporting packages such as scikit-learn.
    Python
    suggested: (IPython, RRID:SCR_001658)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    There are some limitations of this study. First, in order to perform multinomial classification across the four classes, we have to discard a lot of features especially in lab testing modality. The non-COVID viral pneumonia (V) class used a different electronic health record (EHR) system that collected different lab testing features from participants in Wuhan (COVID and healthy classes). Many lab testing features were demonstrated to accurately differentiate severe and non-severe COVID-19 in our previous investigation, such as hsTNI, ddimer, LDH, etc. However these features were not present (or largely missing) in the V class. Eventually, only 10 lab testing features were included, compared to an average of 20–30 features available in different EHR systems. This is probably the reason why lab testing feature modality alone was not able to make accurate classification (highest accuracy 67.7% using RF) across all four classes in this specific study. In addition, although we had a reasonably large participant pool of 638 individuals, more participants were expected to further validate the findings of this study. Another potential pitfall is that not all feature modalities could be readily available at the same time for feature fusion and multimodal classification. For single modality features, CT is the best performer to generate accurate predictions. However, CT is usually performed in the radiology department. Lab testing may be outsourced and also takes time for the results t...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.