Efficient Retrieval Augmented Generation Based QA Chatbot Builder Using LLaMA 3.2B with LoRA

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The emergence of large language models (LLMs) has enabled advanced conversational systems; however, challenges such as hallucination, limited domain adaptation, and high fine-tuning costs persist. To overcome these limitations, this work presents an Efficient Retrieval-Augmented Generation (RAG) based QA Chatbot Builder leveraging LLaMA 3.2B with Low-Rank Adaptation (LoRA). The proposed framework integrates retrieval mechanisms with generative modeling, enabling the chatbot to ground its responses in domain specific, dynamically retrieved knowledge sources. This approach improves the accuracy of facts, reduces hallucinations, and ensures adaptability in diverse domains. To further improve efficiency, LoRA is employed as a parameter efficient fine-tuning method, significantly lowering computational requirements by updating only a small subset of the model’s parameters. This allows LLaMA 3.2B a lightweight yet powerful LLM which is to be fine-tuned effectively even in resource-constrained environments, making deployment practical for organizations lacking large-scale infrastructure. The synergy of RAG and LoRA ensures responses that are not only contextually relevant and verifiable but also computationally efficient and scalable. The resulting chatbot builder empowers users to create customizable, reliable, and transparent QA systems tailored for enterprise, education, healthcare, and research applications. Overall, this study contributes to advancing conversational AI by balancing accuracy, efficiency, and real-world applicability.

Article activity feed