AImmune: a new blood-based machine learning approach to improving immune profiling analysis on COVID-19 patients

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

A massive number of transcriptomic profiles of blood samples from COVID-19 patients has been produced since pandemic COVID-19 begins, however, these big data from primary studies have not been well integrated by machine learning approaches. Taking advantage of modern machine learning arthrograms, we integrated and collected single cell RNA-seq (scRNA-seq) data from three independent studies, identified genes potentially available for interpretation of severity, and developed a high-performance deep learning-based deconvolution model AImmune that can predict the proportion of seven different immune cells from the bulk RNA-seq results of human peripheral mononuclear cells. This novel approach which can be used for clinical blood testing of COVID-19 on the ground that previous research shows that mRNA alternations in blood-derived PBMCs may serve as a severity indicator. Assessed on real-world data sets, the AImmune model outperformed the most recognized immune profiling model CIBERSORTx. The presented study showed the results obtained by the true scRNA-seq route can be consistently reproduced through the new approach AImmune, indicating a potential replacing the costly scRNA-seq technique for the analysis of circulating blood cells for both clinical and research purposes.

Article activity feed

  1. SciScore for 10.1101/2021.11.26.21266883: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    The human PBMC scRNA-seq datasets were downloaded from Gene Expression Omnibus (GEO) with accession IDs GSE150728, GSE166489, and GSE163668, respectively (Figure 1).
    Gene Expression Omnibus
    suggested: (Gene Expression Omnibus (GEO, RRID:SCR_005012)
    After loading and integrating these three datasets, the aggregate dataset was processed using the Python package Scanpy (v. 1.2.2) [38], following Scanpy’s reimplementation of Seurat’s clustering tutorial for 3k PBMCs from 10x Genomics workflow[39].
    Python
    suggested: (IPython, RRID:SCR_001658)
    All tests and comparisons are done using the built-in functions of the R package edgeR.
    edgeR
    suggested: (edgeR, RRID:SCR_012802)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Definitely, this research has limitations: 1) Though compared with each single study where the dataset is obtained, our integrated study could offer a more general understanding, however, since the subsequent machine learning process requires datasets with severity labels, this greatly limits our optional data sources, resulting in a final cohort of 45 samples, which still not enough to provide very generalized results. 2) Since the datasets comes from different studies which means there are great consistencies in sample acquisition, processing and sequencing procedures, these differences in technical details combined with their own batch effects result in our inability to make too fine a distinction between cell subpopulations, wasting the advantages of single-cell sequencing techniques, while one of which is to provide a finer understanding of immune cell subpopulations than currently exist [36], and COVID-19 patients do have changes in some specific subpopulations of immune cells [37]. 3) Although AImmune has been tested to have high prediction accuracy and better performance than the best model of its kind, CIBERSORTx, its prediction on DC cells is relatively poor (R2 = 0.58, while R2 of predictions on other cell types are all above 0.80), and interestingly, the same situation can be observed on CIBERSORTx poor (R2 = -0.07, while R2 of predictions on other cell types are all above 0.10), we hypothesize that this is because DC cells and Monocytes have close expression prof...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.