A Tale of Two Entity Resolution Models: A Classification Framework Based on Decision Timing, Human Involvement, and Explainability

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Entity resolution (ER) theory remains less systematized than ER practice. A shared vocabulary and a set of classification criteria would ease standardization and clarify the relationship between theoretical commitments and implementation choices. This paper examines how modeling assumptions shape ER behavior by contrasting two canonical models: Fellegi-Sunter (probabilistic matching) and algebraic (deterministic clustering). Using formal reconstructions and controlled demonstrations on benchmark datasets, we make three contributions. First, we establish a compact, model-agnostic vocabulary for describing ER models at the level of their theoretical commitments. Second, we derive four classification criteria---decision timing, human involvement, flexibility, and explainability---and demonstrate their consistent application across seven ER models. Third, we show how modelling assumptions determine which evaluation metrics are appropriate, linking model structure to measurement practice. Together, these contributions provide a principled basis for comparing existing ER approaches and situating new ones within a coherent design space.

Article activity feed