Duration
Description
Object Matching (Entity resolution) is a critical data integration task and aims at identifying semantically corresponding objects (records, instances) in one or several data sources. A typical example is the redundant and heterogeneous representation of customers in different enterprise databases. Finding corresponding customer representations is a key task, e.g., for customer relationship management or master data management, in general. On the web, finding matching ovjects is typically even more challenging due to the higher degrees of heterogenity (less structured data with many text attributes, more sources, more data quality problems etc.).
MOMA, STEM, FEVER
We are developing comprehensive prototypes for object matching since 2006. A key idea is to support the combination of several match techniques (matchers) to improve the overall effectiveness in terms of precision and recall. The first prototype MOMA supports the construction of flexible workflows for object matching and the reuse of previous match results which are represented as instance mappings. Furthermore, MOMA not only uses the similarity of attribute values but also incorporates a powerful context matcher called neighborhood matcher.
The more recent frameworks STEM and FEVER support blocking and matching as well as the use of machine learning techniques. The machine learning approaches utilize a limited amount of training data (manually labeled correspondences) to semi-automatically find effective combinations of matchers. FEVER also supports the comparative evaluation of different match approaches for a given match task.
Specific entity resolution approaches have been developed to categorize and match product offers and product descriptions of web shops.
Benchmarking existing entity resolution approaches:
In a VLDB 2010 paper we have used FEVER to comparatively evaluate existing entity resolution implementations. The datasets used in the evaluation can be downloaded here.
Please see also our recent work on cloud-based entity resolution and load balancing.
Project members
Publikationen (20)
Dateien | Cover | Beschreibung | Jahr |
---|---|---|---|
Köpcke, H.
; Thor, A.
; Rahm, E.
Proc. 36th Intl. Conference on Very Large Databases (VLDB) / Proceedings of the VLDB Endowment 3(1), 2010
|
2010 / 9 | ||
Thor, A.
Proc. GI-Workshop - Informationsintegration in Service-Architekturen, 2010
|
2010 / 9 | ||
Kirsten, T.
; Kolb, L.
; Hartung, M.
; Groß, A.
; Köpcke, H.
; Rahm, E.
Proc. 8th Intl. Workshop on Quality in Databases (QDB), 2010
|
2010 / 9 | ||
2010 / 7 | |||
2010 / 1 | |||
Köpcke, H.
; Thor, A.
; Rahm, E.
Proc. 35th Intl. Conference on Very Large Databases (VLDB), 2009 (demo)
|
2009 / 8 | ||
Köpcke, H.
; Rahm, E.
6th International Workshop on Quality in Databases and Management of Uncertain Data (QDB/MUD 2008)
|
2008 / 8 | ||
Thor, A.
; Kirsten, T.
; Rahm, E.
Proc. of 12. GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW), 2007
|
2007 / 3 | ||
2007 / 2 | |||
2007 / 1 |