Franke, M. ; Gladbach, M. ; Sehili, Z. ; Rohde, F. ; Rahm, E.

ScaDS Research on Scalable Privacy-preserving Record Linkage

Datenbank-Spektrum

2019 / 02

Paper

Futher information: https://doi.org/10.1007/s13222-019-00305-y

Abstract

Privacy-preserving record linkage (PPRL) supports the matching and integration of person-related data, e.g., o n patients or customers without compromising privacy. It is based on the encoding of sensitive attribute values needed for matching and often involves trusted parties for linkage. We report on recent research results from the Big Data center ScaDS Dresden/Leipzig to improve the efficien cy, scalability and quality of PPRL, and to apply PPRL in the medical domain. In particular, we present the use of pivot-based filtering techniques and LSH (locality-sensitive hashing)-based blocking to reduce the number of comparisons. Furthermore, we report on parallel linkage implementations based on Apache Flink supporting scalability to millions of records.