Duration

2009

Description

Object Matching (Entity resolution) is a critical data integration task and aims at identifying semantically corresponding objects (records, instances) in one or several data sources. A typical example is the redundant and heterogeneous representation of customers in different enterprise databases. Finding corresponding customer representations is a key task, e.g., for customer relationship management or master data management, in general. On the web, finding matching ovjects is typically even more challenging due to the higher degrees of heterogenity (less structured data with many text attributes, more sources, more data quality problems etc.).

MOMA, STEM, FEVER

We are developing comprehensive prototypes for object matching since 2006. A key idea is to support the combination of several match techniques (matchers) to improve the overall effectiveness in terms of precision and recall. The first prototype MOMA supports the construction of flexible workflows for object matching and the reuse of previous match results which are represented as instance mappings. Furthermore, MOMA not only uses the similarity of attribute values but also incorporates a powerful context matcher called neighborhood matcher.

The more recent frameworks STEM and FEVER support blocking and matching as well as the use of machine learning techniques. The machine learning approaches utilize a limited amount of training data (manually labeled correspondences) to semi-automatically find effective combinations of matchers. FEVER also supports the comparative evaluation of different match approaches for a given match task.

Specific entity resolution approaches have been developed to categorize and match product offers and product descriptions of web shops.

Benchmarking existing entity resolution approaches:

In a VLDB 2010 paper we have used FEVER to comparatively evaluate existing entity resolution implementations. The datasets used in the evaluation can be downloaded here.

Please see also our recent work on cloud-based entity resolution and load balancing.

Publikationen (20)

Dateien Cover Beschreibung Jahr
2010 / 9
2010 / 9
2010 / 9
2010 / 7
2010 / 1
2009 / 8
2008 / 8
2007 / 3
2007 / 2
2007 / 1