Skip to main content

User account menu

  • Log in
DBS-Logo

Database Group Leipzig

within the department of computer science

ScaDS-Logo Logo of the University of Leipzig

Main navigation

  • Home
  • Study
    • Exams
      • Hinweise zu Klausuren
    • Courses
      • Current
    • Modules
    • LOTS-Training
    • Abschlussarbeiten
    • Masterstudiengang Data Science
    • Oberseminare
    • Problemseminare
    • Top-Studierende
  • Research
    • Projects
      • Benchmark datasets for entity resolution
      • FAMER
      • HyGraph
      • Privacy-Preserving Record Linkage
      • GRADOOP
    • Publications
    • Prototypes
    • Annual reports
    • Cooperations
    • Graduations
    • Colloquia
    • Conferences
  • Team
    • Erhard Rahm
    • Member
    • Former employees
    • Associated members
    • Gallery

Training Selection for Tuning Entity Matching

Breadcrumb

  • Home
  • Research
  • Publications
  • Training Selection for Tuning Entity Matching

Köpcke, H. ; Rahm, E.

Training Selection for Tuning Entity Matching

6th International Workshop on Quality in Databases and Management of Uncertain Data (QDB/MUD 2008)

2008 / 08

Paper

Abstract

Entity matching is a crucial and difficult task for data integration.
An effective solution strategy typically has to combine several
techniques and to find suitable settings for critical configuration
parameters such as similarity thresholds. Supervised (training-based)
approaches promise to reduce the manual work for
determining (learning) effective strategies for entity matching.
However, they critically depend on training data selection which
is a difficult problem that has so far mostly been addressed
manually by human experts. In this paper we propose a training-based
framework called STEM for entity matching and present
different generic methods for automatically selecting training data
to combine and configure several matching techniques. We
evaluate the proposed methods for different match tasks and
small- and medium-sized training sets.

Recent publications

  • 2025 / 9: Generating Semantically Enriched Mobility Data from Travel Diaries
  • 2025 / 8: Slice it up: Unmasking User Identities in Smartwatch Health Data
  • 2025 / 7: MPGT: Multimodal Physics-Constrained Graph Transformer Learning for Hybrid Digital Twins
  • 2025 / 6: Leveraging foundation models and goal-dependent annotations for automated cell confluence assessment
  • 2025 / 6: SecUREmatch: Integrating Clerical Review in Privacy-Preserving Record Linkage

Footer menu

  • Directions
  • Contact
  • Impressum