Pre-Processing Techniques on Abstractive Text Summarization for Gujarati Language

Vipul Tailor
Purna Tanna

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In the era of information overload, text summarization has become a crucial tool for extracting key information from vast amounts of text data. Abstractive summarization, which generates concise summaries while preserving the core meaning, is particularly challenging for languages like Gujarati due to its unique linguistic characteristics. This research paper focuses on the pre-processing phase of abstractive text summarization for Gujarati language, exploring various techniques such as noise removal, tokenization, stop-word removal, stemming/lemmatization, and sentence segmentation. By evaluating the impact of these pre-processing techniques on the quality of generated summaries using various evaluation scores, we aim to identify the most effective pre-processing methods for Gujarati text summarization. Our findings highlight the significance of pre-processing in improving summarization quality and provide insights for future research directions in this domain.

Version published to 10.21203/rs.3.rs-7293942/v1 on Research Square
Oct 22, 2025

Grammar-Driven Text Segmentationfor Context Understanding of Myanmar Language

This article has 3 authors:
1. myo thida
2. Nu Wei Thet
3. Thein Kyaw LWIN
This article has no evaluationsLatest version Jan 23, 2026
Advancing Sentiment Analysis in Gujarati: Performance Enhancement through a Hybrid Annotation Framework

This article has 2 authors:
1. Neha Shah¹
2. Preeti Baser²
This article has no evaluationsLatest version Jan 6, 2026
A Comprehensive Evaluation of Llama 3 for Text Classification Tasks

This article has 4 authors:
1. AmirAhmad Amjadi
2. Shiva TaghipourEivazi
3. Bahman Arasteh
4. Huseyin Kusetogullari
This article has no evaluationsLatest version Dec 23, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Grammar-Driven Text Segmentationfor Context Understanding of Myanmar Language

Advancing Sentiment Analysis in Gujarati: Performance Enhancement through a Hybrid Annotation Framework

A Comprehensive Evaluation of Llama 3 for Text Classification Tasks