Comparative Analysis of Linguistic and Semantic Features for Text Classification Using NLTK and spaCy

Rizwan Ayazuddin

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Text classification remains one of the most common NLP tasks, with applications in spam detection, sentiment analysis, and document categorization. This paper presents a lightweight comparative study of feature extraction techniques using two widely adopted NLP toolkits, NLTK and spaCy, applied to a benchmark dataset from the UCI Machine Learning Repository. By integrating traditional linguistic features (token counts, POS tagging, stopword filtering) with semantic embeddings, we evaluate the effectiveness of each toolkit in building a baseline classification system. Experimental results provide insights into the trade-offs between linguistic preprocessing and modern vectorization methods, offering practical recommendations for small-scale text mining projects.

Version published to 10.20944/preprints202509.2581.v1
Sep 30, 2025

Sentiment Analysis of Restaurant Reviews Using Machine Learning Algorithms

This article has 1 author:
1. Kabir Kohli
This article has no evaluationsLatest version Oct 16, 2025
Parameter-Efficient Fine-Tuning (PEFT) Approaches for Large Language Models: A Comparative Analysis on AG News

This article has 1 author:
1. Asmaa Mohammed Shuibi
This article has no evaluationsLatest version Oct 10, 2025
Fusion of Local and Global Context in Large Language Models for Text Classification

This article has 5 authors:
1. Ran Hao
2. Xin Hu
3. Jiasen Zheng
4. Chong Peng
5. Junjiang Lin
This article has no evaluationsLatest version Sep 19, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Sentiment Analysis of Restaurant Reviews Using Machine Learning Algorithms

Parameter-Efficient Fine-Tuning (PEFT) Approaches for Large Language Models: A Comparative Analysis on AG News

Fusion of Local and Global Context in Large Language Models for Text Classification