LLM-Guided Weighted Contrastive Learning with Topic-Aware Masking for Efficient Domain Adaptation: A Case Study on Pulp-Era Science Fiction

Sujin Kang

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Domain adaptation of pre-trained language models remains challenging, especially for specialized text collections that include distinct vocabularies and unique semantic structures. Existing contrastive learning methods frequently rely on generic masking techniques and coarse-grained similarity measures, which limit their ability to capture fine-grained, domain-specific linguistic nuances. This paper proposes an enhanced domain adaptation framework by integrating weighted contrastive learning guided by large language model (LLM) feedback and a novel topic-aware masking strategy. Specifically, topic modeling is utilized to systematically identify semantically crucial domain-specific terms, enabling the creation of meaningful contrastive pairs through three targeted masking strategies: single-keyword, multiple-keyword, and partial-keyword masking. Each masked sentence undergoes LLM-guided reconstruction, accompanied by graduated similarity assessments that serve as continuous, fine-grained supervision signals. Experiments conducted on an early 20th-century science fiction corpus demonstrate that the proposed approach consistently outperforms existing baselines, such as SimCSE and DiffCSE, across multiple linguistic probing tasks within the newly introduced SF-ProbeEval benchmark. Furthermore, the proposed method achieves these performance improvements with significantly reduced computational requirements, highlighting its practical applicability for efficient and interpretable adaptation of language models to specialized domains.

Version published to 10.3390/electronics14173351
Aug 22, 2025
Version published to 10.20944/preprints202508.0100.v1
Aug 4, 2025

Enhancing Multilingual Text Understanding viaTransformer-Based Meta-Learning

This article has 3 authors:
1. Zhu Xiaoyuan
2. Tao Yun
3. Yu Rui
This article has no evaluationsLatest version Jul 25, 2025
Learning to Retrieve, Generate, and Compress: A Unified View of Efficient RAG

This article has 4 authors:
1. Faruq Brontes
2. Jeanie Genesis
3. Zachariah Noa
4. Sigiwardaz Nymphodoros
This article has no evaluationsLatest version Aug 18, 2025
Enhancing Caption Fidelity via Explanation-Guided Captioning with Vision-Language Fine-Tuning

This article has 3 authors:
1. Luca Müller
2. Rodolfo Patel
3. Sofia Rossi
This article has no evaluationsLatest version Aug 1, 2025

Listed in

Abstract

Article activity feed

Related articles

Enhancing Multilingual Text Understanding viaTransformer-Based Meta-Learning

Learning to Retrieve, Generate, and Compress: A Unified View of Efficient RAG

Enhancing Caption Fidelity via Explanation-Guided Captioning with Vision-Language Fine-Tuning