Machine Learning Enables Viral Genome-Agnostic Classification of RNA Virus Infections from Host Transcriptomes
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Targeted PCR diagnosis of RNA viruses is sequence dependent, meaning that the accuracy of the assay depends on the identity of the viral sequence. However, sequence-targeted assays can miss novel or divergent viruses. We test whether the host transcriptome alone can classify RNA virus infections without using viral sequences. Using publicly available data on Huh7 and Calu-3 cells experimentally infected with diverse negative-sense RNA, we evaluate two host-derived feature sets (differential expression dataset of identified genes and alignment-free nucleotide k-mer spectra) and compare hierarchical clustering with Random Forests. Across datasets and timepoints (12 to 24 hpi), Random forests accurately distinguished cells infected with different viruses. Controls with label permutation and read shuffling across experimental conditions established non-random performance. Influenza A infections exhibited the strongest, most distinct signatures, whereas Ebola and Lassa virus responses were subtler yet still classifiable. These results show that host-only transcriptomics encodes virus-specific, complex signals that machine learning can exploit, enabling genome-agnostic classification of infection statuses. This approach could aid early outbreak triage when primer based detection methods fail or viral genomes are unknown and complements sequence-based discovery.