Combining Mass Spectrometry with Machine Learning to Identify Novel Protein Signatures: The Example of Multisystem Inflammatory Syndrome in Children

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Objectives We demonstrate an approach that integrates biomarker analysis with machine learning to identify protein signatures, using the example of SARS-CoV-2-induced Multisystem Inflammatory Syndrome in Children (MIS-C). Methods We used plasma samples collected from subjects diagnosed with MIS-C and compared them first to controls with asymptomatic/mild SARS-CoV-2 infection and then to controls with pneumonia or Kawasaki disease. We used mass spectrometry to identify proteins. Support vector machine (SVM) algorithm-based classification schemes were used to analyze protein pathways. We assessed diagnostic accuracy using internal and external cross-validation. Results Proteomic analysis of a training dataset containing MIS-C (N=17), and asymptomatic/mild SARS-CoV-2 infected control samples (N=20) identified 643 proteins, of which 101 were differentially expressed. Plasma proteins associated with inflammation and coagulation increased and those associated with lipid metabolism decreased in MIS-C relative to controls. The SVM machine learning algorithm identified a three-protein model (ORM1, AZGP1, SERPINA3) that achieved 90.0% specificity, 88.2% sensitivity, and 93.5% area under the curve (AUC) distinguishing MIS-C from controls in the training set. Performance was retained in the validation dataset utilizing MIS-C (N=17) and asymptomatic/mild SARS-CoV-2 infected control samples (N=10) (90.0% specificity, 84.2% sensitivity, 87.4% AUC). We next replicated our approach to compare MIS-C with similarly presenting syndromes, such as pneumonia (N=17) and Kawasaki Disease (N=13) and found a distinct three-protein signature (VWF, SERPINA3, and FCGBP) that accurately distinguished MIS-C from the other conditions (97.5% specificity, 89.5% sensitivity, 95.6% AUC). We also developed a software tool that may be used to evaluate other protein pathway signatures using our data. Conclusions We used MIS-C, a novel hyperinflammatory illness, to demonstrate that the use of mass spectrometry to identify candidate plasma proteins followed by machine learning, specifically SVM, is an efficient strategy for identifying and evaluating biomarker signatures for disease classification.

Article activity feed