SumLLM: Performance Evaluation and the Judgment of Large Language Models in Bengali Abstractive News Summarization

Md Saiyem Raiyan
Nayeema Ferdous

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Bengali abstractive summarization has long been hindered by noisy, limited-quality reference datasets and limited evaluation methods. Prior benchmarks reported apparent strong performance, yet relied on small-scale human studies and reference-based metrics, both of which underestimate the generative capacity of modern LLMs. In this paper, we revisit Bangla summarization under zero-shot conditions, evaluating six recent open-source models: GPT-4, Llama-3.1-8B, Mixtral-8x22B-Instruct-v0.1, Gemma-2-27B, DeepSeek-R1, and Qwen3-30B-A3B on the Bengali Abstractive News Summarization (BANS) dataset. To overcome the issue of weak reference quality, we propose a robust evaluation framework using LLMs-as-Judges, where multiple calibrated LLMs independently assess outputs for faithfulness, coherence, and relevance. Our results demonstrate that modern LLMs can rival and in many cases surpass human-written references in readability and informativeness, though humans still retain advantages in certain nuanced cases. This work establishes zero-shot LLM reasoning combined with reference-free evaluation as a new paradigm for high-quality Bangla summarization, providing a scalable and robust framework for future low-resource language research.

Version published to 10.20944/preprints202512.2210.v1
Dec 24, 2025

BHRE-RAG: A Benchmark and Retrieval-Augmented Framework for Advancing Comprehension-Based Question Answering in Bangla

This article has 2 authors:
1. Md Saiyem Raiyan
2. Nayeema Ferdous
This article has no evaluationsLatest version Jan 23, 2026
LLM Aspect Prediction: Reviewing Academic Papers from Different Aspects with Large Language Model

This article has 3 authors:
1. Zihao Hu
2. Fumiyo Fukumoto
3. Dongjin Yu
This article has no evaluationsLatest version Dec 11, 2025
Image and Video Question Answering with Large Language Models: A Comprehensive Review

This article has 3 authors:
1. Alexander Davis
2. Justin Parker
3. Julian Perry
This article has no evaluationsLatest version Dec 19, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

BHRE-RAG: A Benchmark and Retrieval-Augmented Framework for Advancing Comprehension-Based Question Answering in Bangla

LLM Aspect Prediction: Reviewing Academic Papers from Different Aspects with Large Language Model

Image and Video Question Answering with Large Language Models: A Comprehensive Review