A Tale of Two Entity Resolution Models: A Classification Framework Based on Decision Timing, Human Involvement, and Explainability
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Entity resolution (ER) theory remains less systematized than ER practice. A shared vocabulary and a set of classification criteria would ease standardization and clarify the relationship between theoretical commitments and implementation choices. This paper examines how modeling assumptions shape ER behavior by contrasting two canonical models: Fellegi-Sunter (probabilistic matching) and algebraic (deterministic clustering). Using formal reconstructions and controlled demonstrations on benchmark datasets, we make three contributions. First, we establish a compact, model-agnostic vocabulary for describing ER models at the level of their theoretical commitments. Second, we derive four classification criteria---decision timing, human involvement, flexibility, and explainability---and demonstrate their consistent application across seven ER models. Third, we show how modelling assumptions determine which evaluation metrics are appropriate, linking model structure to measurement practice. Together, these contributions provide a principled basis for comparing existing ER approaches and situating new ones within a coherent design space.