A Domain-Specific Language Model Approach for Identifying Immunogenic Epitopes from Publicly Available Data: EpitopeMiner
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Personalized cancer vaccines represent a transformative approach in immunotherapy, leveraging tumor-specific antigens, such as neoantigens, to stimulate durable and targeted immune responses. However, identifying immunogenic neoantigen epitopes through conventional approaches - relying on in silico predictions followed by experimental validation - remains a significant challenge due to limitations of computational tools and scattering of experiment data. To address these challenges, we introduce EpitopeMiner, a domain-specific large language model (LLM) enhanced with a Retrieval-Augmented Generation (RAG) framework, specifically designed to identify immunogenic epitopes predicted by existing computational tools. EpitopeMiner utilizes a custom database of MHC-I-associated epitope-related literature to provide domain-specific knowledge, enhancing the precision and relevance of LLM-generated responses. EpitopeMiner offers three key features: First, the ability to identify epitopes with similar sequences and potentially similar immunogenic effects, which is especially valuable for neoantigens that are patient-specific and rarely found in public datasets. Second, it supports multiple epitope searches with structured outputs, enhancing scalability. Third, it provides original text chunks and paper identifiers, significantly simplifying validation and further exploration of the retrieved knowledge. Applying EpitopeMiner to well-characterized MHC class I epitopes demonstrates its ability to consistently retrieve relevant papers, efficiently providing targeted insights on T-cell response and immunogenicity, outperforming a commercial AI-powered literature tool. Notably, when applied to lymphoma patient-derived neoantigens, EpitopeMiner successfully retrieves immune response information for a few similar epitopes, despite operating with a relatively smaller database compared to the benchmark tool. All in all, EpitopeMiner bridges the gap between computational prediction and experimental validation, providing a scalable solution for extracting knowledge from public data, fostering cross-study synergies, and accelerating the development of personalized cancer vaccines.