Efficient Retrieval Augmented Generation Based QA Chatbot Builder Using LLaMA 3.2B with LoRA

Shreya Singh

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The emergence of large language models (LLMs) has enabled advanced conversational systems; however, challenges such as hallucination, limited domain adaptation, and high fine-tuning costs persist. To overcome these limitations, this work presents an Efficient Retrieval-Augmented Generation (RAG) based QA Chatbot Builder leveraging LLaMA 3.2B with Low-Rank Adaptation (LoRA). The proposed framework integrates retrieval mechanisms with generative modeling, enabling the chatbot to ground its responses in domain specific, dynamically retrieved knowledge sources. This approach improves the accuracy of facts, reduces hallucinations, and ensures adaptability in diverse domains. To further improve efficiency, LoRA is employed as a parameter efficient fine-tuning method, significantly lowering computational requirements by updating only a small subset of the model’s parameters. This allows LLaMA 3.2B a lightweight yet powerful LLM which is to be fine-tuned effectively even in resource-constrained environments, making deployment practical for organizations lacking large-scale infrastructure. The synergy of RAG and LoRA ensures responses that are not only contextually relevant and verifiable but also computationally efficient and scalable. The resulting chatbot builder empowers users to create customizable, reliable, and transparent QA systems tailored for enterprise, education, healthcare, and research applications. Overall, this study contributes to advancing conversational AI by balancing accuracy, efficiency, and real-world applicability.

Version published to 10.20944/preprints202509.1917.v1
Sep 24, 2025

Retrieval-Augmented Generation for Natural Language Processing: A Survey

This article has 11 authors:
1. Shangyu Wu
2. Ying Xiong
3. Yufei Cui
4. Haolun Wu
5. Can Chen
6. Ye Yuan
7. Lianming Huang
8. Xue Liu
9. Tei-Wei Kuo
10. Nan Guan
11. Chun Xue
This article has no evaluationsLatest version Aug 22, 2025
Learning to Retrieve, Generate, and Compress: A Unified View of Efficient RAG

This article has 4 authors:
1. Faruq Brontes
2. Jeanie Genesis
3. Zachariah Noa
4. Sigiwardaz Nymphodoros
This article has no evaluationsLatest version Aug 18, 2025
Joint Modeling of Intelligent Retrieval-Augmented Generation in LLM-Based Knowledge Fusion

This article has 2 authors:
1. Di Wu
2. Shuaidong Pan
This article has no evaluationsLatest version Sep 10, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Retrieval-Augmented Generation for Natural Language Processing: A Survey

Learning to Retrieve, Generate, and Compress: A Unified View of Efficient RAG

Joint Modeling of Intelligent Retrieval-Augmented Generation in LLM-Based Knowledge Fusion