Hierarchical Text Classification with LLMs via BERT-Based Semantic Modeling and Consistency Regularization

Shuaidong Pan
Di Wu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This paper proposes a BERT-based method for hierarchical text classification, aiming to effectively model the relationship between textual semantics and label hierarchies. Traditional flat classification methods often fail to ensure hierarchical consistency in prediction when facing complex label systems, and they show limited performance in long-tail and low-frequency categories. To address this challenge, the proposed method combines the contextual modeling ability of pre-trained language models with a hierarchical regularization mechanism. It captures both global and local semantic information during representation learning and introduces hierarchical constraints at the prediction stage to enhance stability and robustness in multi-level classification tasks. Specifically, after text representation, predictions at different levels are obtained through inner product computation and hierarchical softmax, while a structure-aware regularization term is added to the loss function to ensure semantic consistency between parent and child categories. The method is evaluated on the Kaggle hierarchical text classification dataset, covering first, second, and third-level categories. Results show that the proposed approach achieves higher accuracy and F1 scores than baseline models across all levels, with stronger advantages in fine-grained category prediction. Furthermore, confusion matrix and t-SNE visualizations confirm that the model maintains inter-class separation and intra-class compactness in semantic space, demonstrating its effectiveness and reliability under complex label systems.

Version published to 10.20944/preprints202509.0750.v1
Sep 9, 2025

Fusion of Local and Global Context in Large Language Models for Text Classification

This article has 5 authors:
1. Ran Hao
2. Xin Hu
3. Jiasen Zheng
4. Chong Peng
5. Junjiang Lin
This article has no evaluationsLatest version Sep 19, 2025
A capsule-based hierarchical graph reasoning model incorporating homologous and heterogeneous information for sentiment analysis

This article has 1 author:
1. Kexin Zhang
This article has no evaluationsLatest version Sep 9, 2025
Computation of Sentence Similarity Score through Hybrid Deep Learning with a Special Focus on Negation Sentence.

This article has 5 authors:
1. Rohit M
2. Jeganathan L
3. Srinivasa Rao Ummity
4. Janaki Meena M
5. Jayaram Balabaskaran
This article has no evaluationsLatest version Sep 22, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Fusion of Local and Global Context in Large Language Models for Text Classification

A capsule-based hierarchical graph reasoning model incorporating homologous and heterogeneous information for sentiment analysis

Computation of Sentence Similarity Score through Hybrid Deep Learning with a Special Focus on Negation Sentence.