Value-specific Weighting for Record-level Encodings in Privacy-Preserving Record Linkage
Conference on Database Systems for Business, Technology and Web (BTW) 2023
Privacy-preserving record linkage (PPRL) determines records representing the same entity while guaranteeing the privacy of individuals. A common approach is to encode plaintext data of records into Bloom filters that enable efficient calculation of similarities. A crucial step of PPRL is the classification of Bloom filter pairs as match or non-match based on computed similarities. In the context of record linkage, several weighting schemes and classification methods are available. The majority of weighting methods determine and adapt weights by applying the Fellegi&Sunter model for each attribute. In the PPRL domain, the attributes of a record are encoded in a joint record-level Bloom filter to impede cryptanalysis attacks so that the application of existing attribute-wise weighting approaches is not feasible. We study methods that use attribute-specific weights in record-level encodings and integrate weight adaptation approaches based on individual value frequencies. The experiments on real-world datasets show that frequency-dependent weighting schemes improve the linkage quality as well as the robustness with regard to threshold selection.