Deepening Citation Understanding in Scientific Literature via LLM-Powered Context Extraction
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Scientific progress relies on the complex, interconnected web of scholarly publications housed within digital libraries. While citations are the core mechanism for linking this knowledge, the simple reference alone fails to capture the intricate context in which the cited work is discussed. This lack of context poses a significant challenge for digital library algorithms seeking to understand scholarly influence. To address this, Citation Context Extraction (CCE) is a foundational task for transforming raw citations into meaningful, semantically rich links that can power advanced bibliometric analysis. This paper presents a novel, two-fold methodology for enhancing the CCE process. We first introduce an improved CCE method that leverages a richer set of textual features to more accurately identify citation context sentences, advancing beyond existing state-of-the-art techniques. Second, we propose an original approach that utilizes Large Language Models (LLMs) and advanced prompt engineering to perform a deeper, more nuanced and explainable semantic interpretation of the extracted contexts. We validate our methods on two distinct scientific corpora: ACL-ARC, a specialized dataset from computational linguistics and SDP-ACT, a more generic dataset spanning multiple disciplines. The results of our comparative analysis against state-of-the-art models demonstrate a significant improvement in CCE quality. Our contributions provide a crucial step toward building more intelligent and interpretable knowledge discovery systems, unlocking the full potential of digital libraries as platforms for understanding and mapping the intellectual lineage of scientific discourse.