Mediating Similarity: An Information-Theoretic Principle of Reference Behavior
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Information theory, originally rooted in thermodynamics, is utilized in scientometrics to quantify the diversity and heterogeneity of knowledge combinations. This study analyzes a large-scale journal citation network to introduce and empirically validate a fundamental principle of reference behavior, which we term "Mediating Similarity." We posit that a journal's reference distribution (the knowledge it cites) acts as a cognitive bridge between its own citation distribution (its identity in the scientific landscape) and the overall scientific content distribution (the broader knowledge environment). This phenomenon is captured by the Kullback-Leibler (KL) divergence inequality: the sum of the distances from a journal to its references and from its references to the global landscape is less than the direct distance from the journal to the global landscape. Our experimental findings provide robust, multi-level evidence for this principle. First, we demonstrate the universality of the phenomenon, showing that for all 19,129 journals in our dataset, the mediated KL divergence path is consistently shorter than the direct path. Second, we conducted two perturbation experiments on the top 500 journals ranked by the SJR indicator. Based on our finding that real references are mostly contained within the closest journals as measured by KL divergence, we created a high-relevance candidate pool for each journal, consisting of its real references plus twice the actual number of citations closest un-cited journals. In a global resampling test, we found that the actual reference portfolio exhibited a lower "cognitive energy" (sum of KL divergences) than 99% of 1,000 randomly assembled portfolios from the candidate pool averagely.This indicates that citation is a holistic process that selects for a synergistically optimal combination of references. In a local perturbation test where 10% of real references were swapped, the actual portfolio still outperformed the majority of 1,000 perturbed variations. This suggests that the real-world reference selection process, while driven by an optimization principle, operates as a robust "satisficing" strategy within the constraints of the scientific discovery process. Collectively, these findings reveal that reference behavior is a strategic process. Journals selectively curate references to construct an optimal cognitive path, efficiently shortening the distance between their field and the broader scientific environment.