TMSFE: A Transformer-Based Multi-Label Semantic Feature Extraction Method
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Multi-label text classification is a critical task in natural language processing, in which each document may belong to multiple categories. This setting is challenging, as it involves complex label dependencies and requires extracting fine-grained semantic features for each label. We propose a novel Transformer-based algorithm, TMSFE Transformer-based multi-label semantic feature extraction, which integrates label-specific query embeddings with a multi-head attention mechanism to extract discriminative features for each potential label and leverages a Latent semantic space to enhance the efficiency of feature extraction. Unlike conventional single-label classifiers or flat multi-label methods, the proposed model designs a DeBERTaV3-based Transformer encoder to jointly model the document and label semantics. Additionally, the proposed SimCSE-Based latent semantic space module projects text and label representations into a shared latent semantic space to enhance feature extraction efficiency. And a sigmoid-based multi-label classification head is applied to the extracted features. Results show that the proposed TMSFE consistently outperforms baseline models, achieving lower Hamming loss and higher feature extraction accuracy.