A Hybrid TF–IDF and SBERT Approach for Enhanced Text Classification Performance

Muntazir Mehdi
Saqlain Mushtaq
Ghulam Rabbani Butt

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Automated text-similarity and plagiarism detection remain essential for academic integrity and content moderation. This paper presents a reproducible study that evaluates classical TF-IDF feature representations combined with standard classifiers (Logistic Regression, Random Forest, Multinomial Naïve Bayes, and linear Support Vector Machine) and introduces a hybrid TF-IDF + Sentence-BERT (SBERT) feature fusion to address paraphrase-driven cases. Experiments using an 80/20 stratified split on a labeled pairwise corpus show that a linear SVM trained on TF-IDF provides a strong baseline (F1 = 0.871). The proposed hybrid (TF-IDF reduced via TruncatedSVD concatenated with SBERT embeddings) improves semantic detection and achieves an F1 = 0.903 in our controlled experiments. We include implementation details, hyperparameters, an ablation study, explainability examples (SHAP), and reproducibility notes. The results indicate that hybrid sparse + dense feature pipelines can produce substantial gains with modest additional computation compared to full Transformer fine-tuning.

Version published to 10.20944/preprints202510.2427.v1
Oct 31, 2025

Parameter-Efficient Fine-Tuning (PEFT) Approaches for Large Language Models: A Comparative Analysis on AG News

This article has 1 author:
1. Asmaa Mohammed Shuibi
This article has no evaluationsLatest version Oct 10, 2025
Computation of Sentence Similarity Score through Hybrid Deep Learning with a Special Focus on Negation Sentence.

This article has 5 authors:
1. Rohit M
2. Jeganathan L
3. Srinivasa Rao Ummity
4. Janaki Meena M
5. Jayaram Balabaskaran
This article has no evaluationsLatest version Sep 22, 2025
Fusion of Local and Global Context in Large Language Models for Text Classification

This article has 5 authors:
1. Ran Hao
2. Xin Hu
3. Jiasen Zheng
4. Chong Peng
5. Junjiang Lin
This article has no evaluationsLatest version Sep 19, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Parameter-Efficient Fine-Tuning (PEFT) Approaches for Large Language Models: A Comparative Analysis on AG News

Computation of Sentence Similarity Score through Hybrid Deep Learning with a Special Focus on Negation Sentence.

Fusion of Local and Global Context in Large Language Models for Text Classification