Deepening Citation Understanding in Scientific Literature via LLM-Powered Context Extraction

Thu Huong Nguyen
Cedric Pruski
Marcos Da Silveira

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Scientific progress relies on the complex, interconnected web of scholarly publications housed within digital libraries. While citations are the core mechanism for linking this knowledge, the simple reference alone fails to capture the intricate context in which the cited work is discussed. This lack of context poses a significant challenge for digital library algorithms seeking to understand scholarly influence. To address this, Citation Context Extraction (CCE) is a foundational task for transforming raw citations into meaningful, semantically rich links that can power advanced bibliometric analysis. This paper presents a novel, two-fold methodology for enhancing the CCE process. We first introduce an improved CCE method that leverages a richer set of textual features to more accurately identify citation context sentences, advancing beyond existing state-of-the-art techniques. Second, we propose an original approach that utilizes Large Language Models (LLMs) and advanced prompt engineering to perform a deeper, more nuanced and explainable semantic interpretation of the extracted contexts. We validate our methods on two distinct scientific corpora: ACL-ARC, a specialized dataset from computational linguistics and SDP-ACT, a more generic dataset spanning multiple disciplines. The results of our comparative analysis against state-of-the-art models demonstrate a significant improvement in CCE quality. Our contributions provide a crucial step toward building more intelligent and interpretable knowledge discovery systems, unlocking the full potential of digital libraries as platforms for understanding and mapping the intellectual lineage of scientific discourse.

Version published to 10.21203/rs.3.rs-8640955/v1 on Research Square
Jan 28, 2026

Integrating scientometric indicators with linguistic data mining to enhance international research collaboration

This article has 3 authors:
1. Gonzalo Ruiz
2. Jose Divasón
3. Carmen Pérez-Llantada
This article has no evaluationsLatest version Jan 14, 2026
Large language model research: Knowledge structure, thematic evolution, and future research agenda

This article has 5 authors:
1. Xiaofang Cao
2. Bin Yang
3. Xiaoping Yang
4. Quanzeng Wang
5. Nan Cao
This article has no evaluationsLatest version Feb 23, 2026
Unsupervised text clustering with large language models

This article has 6 authors:
1. Leonid Kuligin
2. Jacqueline Lammert
3. Florence Heinkelein
4. Keno Bressem
5. Martin Boeker
6. Maximilian Tschochohei
This article has no evaluationsLatest version Feb 23, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Integrating scientometric indicators with linguistic data mining to enhance international research collaboration

Large language model research: Knowledge structure, thematic evolution, and future research agenda

Unsupervised text clustering with large language models