A Machine Learning Approach to Differentiate Between COVID-19 and Influenza Infection Using Synthetic Infection and Immune Response Data

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Data analysis is widely used to generate new insights into human disease mechanisms and provide better treatment methods. In this work, we used the mechanistic models of viral infection to generate synthetic data of influenza and COVID-19 patients. We then developed and validated a supervised machine learning model that can distinguish between the two infections. Influenza and COVID-19 are contagious respiratory illnesses that are caused by different pathogenic viruses but appeared with similar initial presentations. While having the same primary signs COVID-19 can produce more severe symptoms, illnesses, and higher mortality. The predictive model performance was externally evaluated by the ROC AUC metric (area under the receiver operating characteristic curve) on 100 virtual patients from each cohort and was able to achieve at least AUC=91% using our multiclass classifier. The current investigation highlighted the ability of machine learning models to accurately identify two different diseases based on major components of viral infection and immune response. The model predicted a dominant role for viral load and productively infected cells through the feature selection process.

Article activity feed

  1. SciScore for 10.1101/2022.01.27.22269978: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    The t − value for a 95% confidence interval from a sample size of N was then obtained in Microsoft Excel using the tinv function (i.e. tinv(1 − 0.95, N − 1)).
    Microsoft Excel
    suggested: (Microsoft Excel, RRID:SCR_016137)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    This is interpreted as a limitation of our model and can be a future extension of developing dynamic models which take more immune entities into account and end in a better classifier. Our model was trained and successfully evaluated on synthetic data. The model, however, could be applied to animal or human clinical data. This could be useful, for example, if a clinical trial is complicated by the existence of an infectious disease with similar infection characteristics. The model could be applied as a low-cost classification system that would not require expensive virus typing procedures and could rely solely on viral load and interferon measurements. We note that studies like [9] that focus analysis on demographic and observational data can be cheaper to conduct, but these data can also be subject to inconsistencies and bias, affecting classification outcomes. In a future study, we will expand our analysis to a model of in-host measurements and observational data to determine if specific combinations of in-host and observational data that best classify influenza and COVID-19 infections differ. Our machine learning model was developed in the Lasso framework. Ridge regression could also be employed, and require only small changes to our method to include this. We find that the model demonstrated a satisfactory performance by using a Ridge regression classifier – (ROC AUC= 95%) for the main infection period, and (ROC AUC= 89%) for the early days of infection.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.