Skip to main content

User account menu

  • Log in
DBS-Logo

Database Group Leipzig

within the department of computer science

ScaDS-Logo Logo of the University of Leipzig

Main navigation

  • Home
  • Study
    • Exams
      • Hinweise zu Klausuren
    • Courses
      • Current
    • Modules
    • LOTS-Training
    • Abschlussarbeiten
    • Masterstudiengang Data Science
    • Oberseminare
    • Problemseminare
    • Top-Studierende
  • Research
    • Projects
      • Benchmark datasets for entity resolution
      • FAMER
      • HyGraph
      • Privacy-Preserving Record Linkage
      • GRADOOP
    • Publications
    • Prototypes
    • Annual reports
    • Cooperations
    • Graduations
    • Colloquia
    • Conferences
  • Team
    • Erhard Rahm
    • Member
    • Former employees
    • Associated members
    • Gallery

(Privately) Estimating Linkage Quality for Record Linkage

Breadcrumb

  • Home
  • Research
  • Publications
  • (Privately) Estimating Linkage Quality for Record Linkage

Franke, M. ; Christen, V. ; Christen, P. ; Rohde, F. ; Rahm, E.

(Privately) Estimating Linkage Quality for Record Linkage

27th International Conference on Extending Database Technology

2024 / 03

Paper

Futher information: http://dx.doi.org/10.48786/edbt.2024.26

Abstract

Record linkage is the task of identifying records from different databases that refer to the same real-world entity. This task is an essential component of data integration to facilitate data analysis in a variety of domains, including healthcare, national security, and e-commerce. To evaluate the quality of record linkage approaches, the performance measures of precision, recall, and F-measure are commonly used. These measures require ground truth data that specifies known matches and non-matches. However, in practical linkage applications there typically is no such ground truth data available. Although linkage quality can be assessed manually by domain experts, such a clerical review process is time- and resource-consuming and generally not feasible when linking databases that are very large or that contain sensitive (personal) data. We review existing and propose improved unsupervised approaches for estimating the quality of linkage results. We evaluate our approaches on multiple datasets from three different domains. This evaluation shows that our approaches outperform existing methods and lead to estimates that are close to the actual linkage quality.

Recent publications

  • 2025 / 9: Generating Semantically Enriched Mobility Data from Travel Diaries
  • 2025 / 8: Slice it up: Unmasking User Identities in Smartwatch Health Data
  • 2025 / 7: MPGT: Multimodal Physics-Constrained Graph Transformer Learning for Hybrid Digital Twins
  • 2025 / 6: Leveraging foundation models and goal-dependent annotations for automated cell confluence assessment
  • 2025 / 6: SecUREmatch: Integrating Clerical Review in Privacy-Preserving Record Linkage

Footer menu

  • Directions
  • Contact
  • Impressum