Skip to main content

User account menu

  • Log in
DBS-Logo

Database Group Leipzig

within the department of computer science

ScaDS-Logo Logo of the University of Leipzig

Main navigation

  • Home
  • Study
    • Exams
      • Hinweise zu Klausuren
    • Courses
      • Current
    • Modules
    • LOTS-Training
    • Abschlussarbeiten
    • Masterstudiengang Data Science
    • Oberseminare
    • Problemseminare
    • Top-Studierende
  • Research
    • Projects
      • Benchmark datasets for entity resolution
      • FAMER
      • HyGraph
      • Privacy-Preserving Record Linkage
      • GRADOOP
    • Publications
    • Prototypes
    • Annual reports
    • Cooperations
    • Graduations
    • Colloquia
    • Conferences
  • Team
    • Erhard Rahm
    • Member
    • Former employees
    • Associated members
    • Gallery

Object Matching (Entity Resolution)

Breadcrumb

  • Home
  • Research
  • Projects
  • Object Matching (Entity Resolution)

Duration

2009

Description

Object Matching (Entity resolution) is a critical data integration task and aims at identifying semantically corresponding objects (records, instances) in one or several data sources. A typical example is the redundant and heterogeneous representation of customers in different enterprise databases. Finding corresponding customer representations is a key task, e.g., for customer relationship management or master data management, in general. On the web, finding matching ovjects is typically even more challenging due to the higher degrees of heterogenity (less structured data with many text attributes, more sources, more data quality problems etc.).

MOMA, STEM, FEVER

We are developing comprehensive prototypes for object matching since 2006. A key idea is to support the combination of several match techniques (matchers) to improve the overall effectiveness in terms of precision and recall. The first prototype MOMA supports the construction of flexible workflows for object matching and the reuse of previous match results which are represented as instance mappings. Furthermore, MOMA not only uses the similarity of attribute values but also incorporates a powerful context matcher called neighborhood matcher.

The more recent frameworks STEM and FEVER support blocking and matching as well as the use of machine learning techniques. The machine learning approaches utilize a limited amount of training data (manually labeled correspondences) to semi-automatically find effective combinations of matchers. FEVER also supports the comparative evaluation of different match approaches for a given match task.

Specific entity resolution approaches have been developed to categorize and match product offers and product descriptions of web shops.

Benchmarking existing entity resolution approaches:

In a VLDB 2010 paper we have used FEVER to comparatively evaluate existing entity resolution implementations. The datasets used in the evaluation can be downloaded here.

Please see also our recent work on cloud-based entity resolution and load balancing.

Project members

  • Prof. Dr. Erhard Rahm
  • Prof. Dr. Hanna Köpcke
  • Prof. Dr. Andreas Thor

Publikationen (20)

Dateien Cover Beschreibung Jahr
Don't Match Twice: Redundancy-free Similarity Computation with MapReduce
Kolb, L. ; Thor, A. ; Rahm, E.
Proc. 2nd Intl. Workshop on Data Analytics in the Cloud (DanaC), 2013
2013 / 6
When to Reach for the Cloud: Using Parallel Hardware for Link Discovery
Kolb, L. ; Heino, N. ; Hartung, M. ; Auer, S. ; Rahm, E.
Proc. 10th Intl. Extended Semantic Web Conference (ESWC), 2013
2013 / 5
Parallel Entity Resolution with Dedoop
Kolb, L. ; Rahm, E.
Datenbank-Spektrum 13 (1), 2013
2013 / 2
Dedoop: Efficient Deduplication with Hadoop
Kolb, L. ; Thor, A. ; Rahm, E.
Proc. 38th Intl. Conference on Very Large Databases (VLDB) / Proc. of the VLDB Endowment 5(12), 2012
2012 / 8
Load Balancing for MapReduce-based Entity Resolution
Kolb, L. ; Thor, A. ; Rahm, E.
Proc. 28th Intl. Conference on Data Engineering (ICDE), 2012
2012 / 4
Tailoring entity resolution for matching product offers
Köpcke, H. ; Thor, A. ; Thomas, S. ; Rahm, E.
Proc. 15th Intl. Conf. on Extending Database Technology (EDBT), 2012, pp. 545-550
2012 / 3
Multi-pass Sorted Neighborhood Blocking with MapReduce
Kolb, L. ; Thor, A. ; Rahm, E.
Computer Science - Research and Development 27(1), 2012
2012 / 2
Block-based Load Balancing for Entity Resolution with MapReduce
Kolb, L. ; Thor, A. ; Rahm, E.
Proc. 20th Intl. Conference on Information and Knowledge Management (CIKM), 2011
2011 / 10
Learning-based Entity Resolution with MapReduce
Kolb, L. ; Köpcke, H. ; Thor, A. ; Rahm, E.
Proc. 3rd Intl. Workshop on Cloud Data Management (CloudDB), 2011
2011 / 10
Parallel Sorted Neighborhood Blocking with MapReduce
Kolb, L. ; Thor, A. ; Rahm, E.
Proc. 14th GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW), 2011
2011 / 3

Pagination

  • Current page 1
  • Page 2
  • Next page Next ›
  • Last page Last »

Recent publications

  • 2025 / 9: Generating Semantically Enriched Mobility Data from Travel Diaries
  • 2025 / 8: Slice it up: Unmasking User Identities in Smartwatch Health Data
  • 2025 / 6: SecUREmatch: Integrating Clerical Review in Privacy-Preserving Record Linkage
  • 2025 / 6: Leveraging foundation models and goal-dependent annotations for automated cell confluence assessment
  • 2025 / 5: Federated Learning With Individualized Privacy Through Client Sampling

Footer menu

  • Directions
  • Contact
  • Impressum