Why Grounded Large Language Models Fail Without Domain-Specialized Retrieval: An Experimental Scientometric Study in Solar Physics
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Recent applications of large language models in scientometric analysis often assume that the underlying evidence space is neutral and given. In this study, we challenge this assumption by explicitly modeling information retrieval as a causal component shaping model-based analytical outputs. We propose a three-phase experimental framework that separates the construction of the evidence space via semantic retrieval, the analytical evaluation of that space prior to text generation, and model-based analytical agency under controlled grounding and structural enforcement conditions. We develop and release SciBERT-SolarPhysics-Search, a domain-specialized semantic retriever trained through domain-adaptive pretraining and supervised contrastive fine-tuning, and compare generic and specialized retrieval strategies, showing that specialized retrieval increases domain semantic coverage from 48.2% to 71.6% and conceptual connectivity density from 0.19 to 0.37. We further observe that improvements in retrieval quality and grounding alone do not ensure coherent scientometric outputs. Only configurations combining specialized retrieval, explicit grounding, and structural enforcement reduce the proportion of unsupported analytical statements from 0.62 to 0.08 in the agent-level evaluation. These results indicate that reliable integration of language 1 models into scientometrics depends on explicit control of retrieval infrastructure, evaluation criteria, and analytical constraints.