Contextual Science and Genome Analysis for Air-Gapped AI Research
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Provided here is a study of large language models (LLMs) and retrieval augmented generation (RAG) frameworks in air-gapped environments for genome research on small grain crops. We developed two main applications: (1) a RAG-based system for contextual analysis of scientific literature, collecting over 5,000 PDFs on wheat pathogens, and (2) a GFF3 file analysis tool called Genoma that enables exploration of genome annotation through an interactive interface. Using the open-source framework Ollama, we compared the performance of multiple LLMs including Llama3.1, Deepseek-r1, and Qwen2.5 for biological data analysis. A LightRAG approach provided semantic visualization of document relationships, while the Genoma tool offered chromosome-level insights into genome annotations. These tools demonstrate the viability of powerful AI assistance for sensitive research environments, with potential applications for pangenome analysis and gene discovery. The code provides researchers with practical solutions for implementing AI in secure settings without sacrificing analytical power.