Speeding up Privacy Preserving Record Linkage for Metric Space Similarity Measures
Further information: http://link.springer.com/article/10.1007/s13222-016-0222-9
The analysis of person-related data in Big Data applications faces the tradeoff of finding useful results while preserving a high degree of privacy. This is especially challenging when person-related data from multiple sources need to be integrated and analyzed. Privacy-preserving record linkage (PPRL) addresses this problem by encoding sensitive attribute values such that the identification of persons is prevented but records can still be matched. In this paper we study how to improve the efficiency and scalability of PPRL by restricting the search space for matching encoded records. We focus on similarity measures for metric spaces and investigate the use of M‑trees as well as pivot-based solutions. Our evaluation shows that the new schemes outperform previous filter approaches by an order of magnitude.