Object Matching (Entity Resolution)

Duration

2009

Description

Object Matching (Entity resolution) is a critical data integration task and aims at identifying semantically corresponding objects (records, instances) in one or several data sources. A typical example is the redundant and heterogeneous representation of customers in different enterprise databases. Finding corresponding customer representations is a key task, e.g., for customer relationship management or master data management, in general. On the web, finding matching ovjects is typically even more challenging due to the higher degrees of heterogenity (less structured data with many text attributes, more sources, more data quality problems etc.).

MOMA, STEM, FEVER

We are developing comprehensive prototypes for object matching since 2006. A key idea is to support the combination of several match techniques (matchers) to improve the overall effectiveness in terms of precision and recall. The first prototype MOMA supports the construction of flexible workflows for object matching and the reuse of previous match results which are represented as instance mappings. Furthermore, MOMA not only uses the similarity of attribute values but also incorporates a powerful context matcher called neighborhood matcher.

The more recent frameworks STEM and FEVER support blocking and matching as well as the use of machine learning techniques. The machine learning approaches utilize a limited amount of training data (manually labeled correspondences) to semi-automatically find effective combinations of matchers. FEVER also supports the comparative evaluation of different match approaches for a given match task.

Specific entity resolution approaches have been developed to categorize and match product offers and product descriptions of web shops.

Benchmarking existing entity resolution approaches:

In a VLDB 2010 paper we have used FEVER to comparatively evaluate existing entity resolution implementations. The datasets used in the evaluation can be downloaded here.

Please see also our recent work on cloud-based entity resolution and load balancing.

Project members

Publikationen (20)

Dateien	Cover	Beschreibung	Jahr
		Don't Match Twice: Redundancy-free Similarity Computation with MapReduce Kolb, L. ; Thor, A. ; Rahm, E. Proc. 2nd Intl. Workshop on Data Analytics in the Cloud (DanaC), 2013	2013 / 6
		When to Reach for the Cloud: Using Parallel Hardware for Link Discovery Kolb, L. ; Heino, N. ; Hartung, M. ; Auer, S. ; Rahm, E. Proc. 10th Intl. Extended Semantic Web Conference (ESWC), 2013	2013 / 5
		Parallel Entity Resolution with Dedoop Kolb, L. ; Rahm, E. Datenbank-Spektrum 13 (1), 2013	2013 / 2
		Dedoop: Efficient Deduplication with Hadoop Kolb, L. ; Thor, A. ; Rahm, E. Proc. 38th Intl. Conference on Very Large Databases (VLDB) / Proc. of the VLDB Endowment 5(12), 2012	2012 / 8
		Load Balancing for MapReduce-based Entity Resolution Kolb, L. ; Thor, A. ; Rahm, E. Proc. 28th Intl. Conference on Data Engineering (ICDE), 2012	2012 / 4
		Tailoring entity resolution for matching product offers Köpcke, H. ; Thor, A. ; Thomas, S. ; Rahm, E. Proc. 15th Intl. Conf. on Extending Database Technology (EDBT), 2012, pp. 545-550	2012 / 3
		Multi-pass Sorted Neighborhood Blocking with MapReduce Kolb, L. ; Thor, A. ; Rahm, E. Computer Science - Research and Development 27(1), 2012	2012 / 2
		Block-based Load Balancing for Entity Resolution with MapReduce Kolb, L. ; Thor, A. ; Rahm, E. Proc. 20th Intl. Conference on Information and Knowledge Management (CIKM), 2011	2011 / 10
		Learning-based Entity Resolution with MapReduce Kolb, L. ; Köpcke, H. ; Thor, A. ; Rahm, E. Proc. 3rd Intl. Workshop on Cloud Data Management (CloudDB), 2011	2011 / 10
		Parallel Sorted Neighborhood Blocking with MapReduce Kolb, L. ; Thor, A. ; Rahm, E. Proc. 14th GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW), 2011	2011 / 3

Database Group Leipzig

within the department of computer science

Duration

Description

Project members

Publikationen (20)

Recent publications