Hardware-Aware RNA-seq Diagnostics: Plant Virus Detection via Cloud and AI
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: High-throughput sequencing (HTS) has transformed plant pathogen diagnostics by enabling comprehensive analysis of RNA-seq data. However, the size and complexity of HTS datasets often exceed the computational resources available in phytopathology laboratories, particularly in low- and middle-income regions. Efficient processing of HTS data remains a major bottleneck for timely and accessible virus detection. Results: To address these challenges, we introduce here two complementary strategies that mitigate hardware constraints in virus diagnostics, considering as study case the detection and discovery of viruses in cassava, a major food crop in tropical and subtropical regions worldwide. First, we deployed PhytoPipe, a state-of-the-art phytosanitary pipeline, as a cloud-based service on Amazon Web Services (AWS), allowing laboratories to run sophisticated workflows without high-performance computing infrastructure. PhytoPipe integrates unguided virus detection methods—including read classification, assembly-based annotation, and reference-based mapping—and was validated in this work using RNA-seq data from virus-infected cassava samples. Second, we developed an artificial neural network (ANN) classifier based on multi-head attention mechanisms , trained to distinguish viral reads from plant and bacterial sequences in unassembled HTS data (raw reads). By encoding k-mers at the protein level, the model achieves accurate classification while significantly reducing computational overhead. The ANN’s filtering capability improves the efficiency of downstream assembly and annotation tools, enabling virus detection on resource-constrained systems. Conclusion: Our findings demonstrate that cloud-based bioinformatics pipelines and machine learning classifiers can substantially lower the hardware demands of HTS-based pathogen diagnostics. These strategies expand access to advanced viral detection workflows, making them more scalable and applicable in diverse research and diagnostic settings, including laboratories with limited computational infrastructure.