Comparative performance of two automated machine learning platforms for COVID-19 detection by MALDI-TOF-MS

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

The 2019 novel coronavirus infectious disease (COVID-19) pandemic has resulted in an unsustainable need for diagnostic tests. Currently, molecular tests are the accepted standard for the detection of SARS-CoV-2. Mass spectrometry (MS) enhanced by machine learning (ML) has recently been postulated to serve as a rapid, high-throughput, and low-cost alternative to molecular methods. Automated ML is a novel approach that could move mass spectrometry techniques beyond the confines of traditional laboratory settings. However, it remains unknown how different automated ML platforms perform for COVID-19 MS analysis. To this end, the goal of our study is to compare algorithms produced by two commercial automated ML platforms (Platforms A and B). Our study consisted of MS data derived from 361 subjects with molecular confirmation of COVID-19 status including SARS-CoV-2 variants. The top optimized ML model with respect to positive percent agreement (PPA) within Platforms A and B exhibited an accuracy of 94.9%, PPA of 100%, negative percent agreement (NPA) of 93%, and an accuracy of 91.8%, PPA of 100%, and NPA of 89%, respectively. These results illustrate the MS method’s robustness against SARS-CoV-2 variants and highlight similarities and differences in automated ML platforms in producing optimal predictive algorithms for a given dataset.

Article activity feed

  1. SciScore for 10.1101/2022.02.02.22270298: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Platform A (Machine Intelligence Learning Optimizer): Briefly, Platform A was also used in the original study, as the AutoML software combines various unsupervised and supervised methods to ultimately acquire the optimized ML model.
    AutoML
    suggested: None
    NET language, through Python with NimbusML, with a graphical user interface as part of Visual Studio 2019 using Model Builder, or through a command line interface.
    Python
    suggested: (IPython, RRID:SCR_001658)

    Results from OddPub: Thank you for sharing your code.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    The use of AutoML with MALDI-TOF-MS for COVID-19 screening is an innovative rapid and high-throughput approach at overcoming limitations inherent with molecular- or antigen-based approaches (Tran 2021). However, ML algorithms are only as good as the quality of the data and programmers.3,8,10,21-23 Quality of data can be controlled by study design; however, the validity of ML algorithms must be rigorously scrutinized especially for health care applications.8,9 Traditional approaches developing ML models relies on expert data scientists to program, and lengthy experimentation cycles to select optimal features for training-testing. Due to the laborious nature of programming, data scientists may select features and limit model development to ML techniques based on their “expertise” and “familiarity” – creating a potential source of bias. Automated ML platforms such as MILO and Microsoft ML.NET provide means to accelerate development of ML algorithms. These AutoML platforms can go through a much higher number of combination of features across a range of ML techniques in a matter of hours or days. However, different AutoML platforms employ different functions which can influence model development. To this end, the comparison AutoML platforms are necessary to verify performance especially for medical applications. To our knowledge, this is the first study comparing AutoML platforms performance for COVID-19 screening. The study observed differences in models produced by Platforms A v...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.