An Incremental Entity Resolution Approach based on Deep Metric Learning

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Entity resolution (ER), a crucial component of data fusion, aims to identify and aggregate disparate entity descriptions describing the same real-world object. While existing approaches have achieved good performance in ER tasks, they fall short in scenarios with dynamic data sources. The inherent dynamism of these data sources requires ER models to progressively resolve entities with incoming data while maintaining high accuracy. To address such issue, we propose a novel dynamic data processing framework that leverages the matching model from the previous time step to parse incoming incremental data, and implement an Incremental Entity Resolution Approach based on Deep Metric Learning (IER-DML), developed to handle dynamic data and enhance matching capability. an uncertainty metric-based data augmentation mechanism is proposed to mitigate calibration issues caused by suboptimal training data, and to generate high-confidence labeled entity pairs. These labeled data are then used for the iterative optimization of IER-DML, enhancing ER capabilities of IER-DML. For modeling dynamic changes in the distributions of clustering structure among entity pairs, we propose a deep metric learning-based multi-center matcher. This matcher learns metric relationships between the deep semantic features of entity pairs, and models intrinsic structures using "soft centers". These centers capture multiple information clusters among entities, and flexibly represent clustering structures. By integrating deep metric learning with a multi-center strategy, it can dynamically adjust classification boundaries and metric standards in complex scenarios. Comprehensive experiments demonstrate that our approach outperforms state-of-the-art methods across all four dynamic benchmark datasets.

Article activity feed