A Hybrid-Metric and Layer-Freezing Framework for Coreference Resolution
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Coreference resolution based on pretrained language models remains constrained in long-span contexts. This paper first provides a formal explanation from a mathematical modeling perspective of the root causes of performance degradation in long-span scenarios: the attention distribution exhibits magnitude contraction and directional noise accumulation with increasing cross-sentence distance, leading to attention dilution. Building on this analysis, This paper propose three improvements within the BERT framework: (1) a hybrid-similarity self-attention mechanism that balances magnitude sensitivity and directional stability between querykey vectors to suppress long-distance attention dilution; (2) a contrastive reformulation of cross-entropy, which, in line with the tasks binary nature and the modified representations, introduces a positivenegative information separation term to enhance inter-class separability and robustness to hard negatives; and (3) a layer-stable optimization strategy that, motivated by the semantic heterogeneity of attention heads, employs layer freezing and a three-stage pretrainingfine-tuningrefining pipeline to preserve lower-layer lexical and syntactic cues while progressively strengthening discourse-level semantics and stabilizing higher-layer representations. Experiments on the Chinese and English portions of OntoNotes 5.0 show consistent gains over strong baselines, with F1 improved by 0.9 and 1.1 points, respectively, providing an interpretable and extensible solution for cross-lingual coreference modeling.