Skip to main content

User account menu

  • Log in
DBS-Logo

Database Group Leipzig

within the department of computer science

ScaDS-Logo Logo of the University of Leipzig

Main navigation

  • Home
  • Study
    • Exams
      • Hinweise zu Klausuren
    • Courses
      • Current
    • Modules
    • LOTS-Training
    • Abschlussarbeiten
    • Masterstudiengang Data Science
    • Oberseminare
    • Problemseminare
    • Top-Studierende
  • Research
    • Projects
      • Benchmark datasets for entity resolution
      • FAMER
      • HyGraph
      • Privacy-Preserving Record Linkage
      • GRADOOP
    • Publications
    • Prototypes
    • Annual reports
    • Cooperations
    • Graduations
    • Colloquia
    • Conferences
  • Team
    • Erhard Rahm
    • Member
    • Former employees
    • Associated members
    • Gallery

Incremental Clustering Techniques for Multi-Party Privacy-Preserving Record Linkage

Breadcrumb

  • Home
  • Research
  • Publications
  • Incremental Clustering Techniques for Multi-Party Privacy-Preserving Record Linkage

Vatsalan, D. ; Christen, P. ; Rahm, E.

Incremental Clustering Techniques for Multi-Party Privacy-Preserving Record Linkage

Data & Knowledge Engineering

2020 / 06

Andere

Futher information: https://arxiv.org/pdf/1911.12930.pdf

Abstract

Privacy-Preserving Record Linkage (PPRL) supports the integration of sensitive information from multiple datasets,in particular the privacy-preserving matching of records referring to the same entity. PPRL has gained much attentionin many application areas, with the most prominent ones in the healthcare domain. PPRL techniques tackle thisproblem by conducting linkage on masked (encoded) values. Employing PPRL on records from multiple (more thantwo) parties/sources (multi-party PPRL, MP-PPRL) is an increasingly important but challenging problem that so farhas not been sufficiently solved. Existing MP-PPRL approaches are limited tofinding only those entities that arepresent in all parties thereby missing entities that match only in a subset of parties. Furthermore, previous MP-PPRL approaches face substantial scalability limitations due to the need of a large number of comparisons between maskedrecords. We thus propose and evaluate new MP-PPRL approaches that find matches in any subset of parties and stillscale to many parties. Our approaches maintain all matches within clusters, where these clusters are incrementallyextended or refined by considering records from one party after the other. An empirical evaluation using multiple realdatasets ranging from 3 to 26 parties each containing up to 5 million records validates that our protocols are efficient,and significantly outperform existing MP-PPRL approaches in terms of linkage quality and scalability.

Recent publications

  • 2025 / 9: Generating Semantically Enriched Mobility Data from Travel Diaries
  • 2025 / 8: Slice it up: Unmasking User Identities in Smartwatch Health Data
  • 2025 / 6: SecUREmatch: Integrating Clerical Review in Privacy-Preserving Record Linkage
  • 2025 / 6: Leveraging foundation models and goal-dependent annotations for automated cell confluence assessment
  • 2025 / 5: Federated Learning With Individualized Privacy Through Client Sampling

Footer menu

  • Directions
  • Contact
  • Impressum