BHRE-RAG: A Benchmark and Retrieval-Augmented Framework for Advancing Comprehension-Based Question Answering in Bangla

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Large language models excel in English but struggle with low-resource languages such as Bengali due to limited training data and complex linguistic structures. This paper represents a groundbreaking novel system that applies state-of-the-art natural language processing techniques to examine subject-specific chapters and generate questions with corresponding solutions of different lengths and addresses this gap through two key contributions. First, we introduce the Bangla Holistic Reasoning Evaluation (BHRE), a comprehensive zero-shot and few-shot assessment of LLMs (GPT-4, Llama-3.1, Mixtral-8x, Qwen2.5, Mistral, Gemma) on the challenging BanglaRQA dataset. Second, we propose a Retrieval Augmented Generation (RAG) framework with BHRE that enhances LLM performance by retrieving precise and contextual evidence before generating answers. Using the BanglaRQA question-answering dataset, comprising 3,000 context passages and 14,889 question-answer pairs, we benchmarked these LLMs using EM and F1 metrics against BanglaT5, a fine-tuned state-of-the-art model. Our results show that Llama-3 emerged as the top performing model in F1 and EM, while our RAG-based approach elevates its performance much better than the other two approaches, surpassing the previous fine-tuned SOTA (BanglaT5). This work demonstrates that prompt engineering techniques on LLMs can rival fine-tuned models, achieving top-notch quality answers even without fine-tuning, and the effectiveness of RAG systems for low-resource languages, and provides a reproducible framework for future research by enhancing the capability of language models.

Article activity feed