RNA Host Response-Based Diagnosis of Tick-Borne Infections using Machine Learning and Generative Artificial Intelligence

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate diagnosis of tick-borne infections remains challenging due to overlapping clinical presentations and limitations in existing laboratory tests. Here we combined RNA sequencing (RNA-Seq) of 461 whole blood samples with machine learning to identify distinct host gene expression signatures for early Lyme disease (n=79), babesiosis (n=103), and anaplasmosis (n=6), including patients with other infectious and/or inflammatory conditions (n=102) and uninfected donors (n=148) as controls. Gene classification models (classifiers) of Lyme disease, babesiosis, and anaplasmosis were 93%, 94%, and 98% accurate, respectively. The construction of a functional anaplasmosis classifier required the use of a generative artificial intelligence technique, called GANs (generative adversarial networks), to augment the 6 available samples with synthetic RNA-Seq count data created by two neural networks. Incorporation of GANs increased the sensitivity of the anaplasmosis classifier from 33% to 83% while maintaining high specificity (~98%). Parallel GAN versus no-GAN analyses of Lyme disease after downsampling to 6 samples from 79 demonstrated comparable performance. The final classifiers yielded genes specific to Lyme (cytokine signaling and adaptive immunity), babesiosis (macrophage and red blood cell signaling), and anaplasmosis (granulopoiesis and phagocytosis). This study identifies key host genes and pathways for differential diagnosis of tick-borne infections and highlights the utility of artificial intelligence methods for advancing diagnostic test development by harnessing the sparse datasets often encountered in clinical research.

Article activity feed