AImmune: a new blood-based machine learning approach to improving immune profiling analysis on COVID-19 patients

Abstract

A massive number of transcriptomic profiles of blood samples from COVID-19 patients has been produced since pandemic COVID-19 begins, however, these big data from primary studies have not been well integrated by machine learning approaches. Taking advantage of modern machine learning arthrograms, we integrated and collected single cell RNA-seq (scRNA-seq) data from three independent studies, identified genes potentially available for interpretation of severity, and developed a high-performance deep learning-based deconvolution model AImmune that can predict the proportion of seven different immune cells from the bulk RNA-seq results of human peripheral mononuclear cells. This novel approach which can be used for clinical blood testing of COVID-19 on the ground that previous research shows that mRNA alternations in blood-derived PBMCs may serve as a severity indicator. Assessed on real-world data sets, the AImmune model outperformed the most recognized immune profiling model CIBERSORTx. The presented study showed the results obtained by the true scRNA-seq route can be consistently reproduced through the new approach AImmune, indicating a potential replacing the costly scRNA-seq technique for the analysis of circulating blood cells for both clinical and research purposes.

SciScore for 10.1101/2021.11.26.21266883: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
The human PBMC scRNA-seq datasets were downloaded from Gene Expression Omnibus (GEO) with accession IDs GSE150728, GSE166489, and GSE163668, respectively (Figure 1).	Gene Expression Omnibus suggested: (Gene Expression Omnibus (GEO, RRID:SCR_005012)
After loading and integrating these three datasets, the aggregate dataset was processed using the Python package Scanpy (v. 1.2.2) [38], following Scanpy’s reimplementation of Seurat’s clustering tutorial for 3k PBMCs from 10x Genomics workflow[39].	Python suggested: (IPython, RRID:SCR_001658)
All tests and comparisons are done using the built-in …

SciScore for 10.1101/2021.11.26.21266883: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
The human PBMC scRNA-seq datasets were downloaded from Gene Expression Omnibus (GEO) with accession IDs GSE150728, GSE166489, and GSE163668, respectively (Figure 1).	Gene Expression Omnibus suggested: (Gene Expression Omnibus (GEO, RRID:SCR_005012)
After loading and integrating these three datasets, the aggregate dataset was processed using the Python package Scanpy (v. 1.2.2) [38], following Scanpy’s reimplementation of Seurat’s clustering tutorial for 3k PBMCs from 10x Genomics workflow[39].	Python suggested: (IPython, RRID:SCR_001658)
All tests and comparisons are done using the built-in functions of the R package edgeR.	edgeR suggested: (edgeR, RRID:SCR_012802)

Results from OddPub: Thank you for sharing your code and data.

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

Definitely, this research has limitations: 1) Though compared with each single study where the dataset is obtained, our integrated study could offer a more general understanding, however, since the subsequent machine learning process requires datasets with severity labels, this greatly limits our optional data sources, resulting in a final cohort of 45 samples, which still not enough to provide very generalized results. 2) Since the datasets comes from different studies which means there are great consistencies in sample acquisition, processing and sequencing procedures, these differences in technical details combined with their own batch effects result in our inability to make too fine a distinction between cell subpopulations, wasting the advantages of single-cell sequencing techniques, while one of which is to provide a finer understanding of immune cell subpopulations than currently exist [36], and COVID-19 patients do have changes in some specific subpopulations of immune cells [37]. 3) Although AImmune has been tested to have high prediction accuracy and better performance than the best model of its kind, CIBERSORTx, its prediction on DC cells is relatively poor (R2 = 0.58, while R2 of predictions on other cell types are all above 0.80), and interestingly, the same situation can be observed on CIBERSORTx poor (R2 = -0.07, while R2 of predictions on other cell types are all above 0.10), we hypothesize that this is because DC cells and Monocytes have close expression prof...

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Results from scite Reference Check: We found no unreliable references.

Read the original source

AImmune: a new blood-based machine learning approach to improving immune profiling analysis on COVID-19 patients

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Single-Cell Immune Profiling and Machine Learning Reveal a Predictive Immune Signature for Immunotherapy Response in NSCLC

Machine Learning-Enabled Diagnosis of Viral Respiratory Infections from Exhaled Volatile Organic Compound Analysis

Deep Learning Paradigm for Precision Lung Cancer Therapy with AI-Driven Genotype-Phenotype Mining and Patient-Derived Organoid Validation

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Single-Cell Immune Profiling and Machine Learning Reveal a Predictive Immune Signature for Immunotherapy Response in NSCLC

Machine Learning-Enabled Diagnosis of Viral Respiratory Infections from Exhaled Volatile Organic Compound Analysis

Deep Learning Paradigm for Precision Lung Cancer Therapy with AI-Driven Genotype-Phenotype Mining and Patient-Derived Organoid Validation