Rohde, F. ; Christen, V. ; Rahm, E.

Exploring Privacy-Preserving Record Linkage: A Holistic Framework for Dataset Generation and Detailed Result Analysis

VLDB 2025 Workshop: 14th International Workshop on Quality in Databases (QDB’25)

2025 / 09

Paper

Abstract

Privacy-preserving record linkage (PPRL) methods facilitate integration of sensitive data without disclosing plaintext information among data owners or to third parties. However, PPRL techniques are notably affected by problems related to data quality. Their typically rigid matching strategy can deter data custodians from employing them in practical applications due to potential linkage quality issues. In this work, we present a framework for studying PPRL algorithms with respect to their robustness against dataset variation in order to guide data custodians in selecting suitable methods. Our framework offers multiple possibilities to create test datasets for linkage tasks depending on the available input data. Furthermore, the implementation includes a new synthetic data generator for creating realistic population records including common household structures for Germany.
At the heart of our contribution lies the creation and tracking of descriptive tags that outline the characteristics of datasets across various levels of granularity.
We describe an approach for exploring linkage quality outcomes based on those record (pair) features which enables researchers to better comprehend their linkage results and assess those of others.