Record linkage is the process of matching records that refer to the same entity across two or more databases. In many application areas, ranging from healthcare to government services, the databases to be linked contain sensitive personal information, and hence, cannot be shared across organisations. Privacy-Preserving Record Linkage (PPRL) aims to overcome this challenge by facilitating the comparison of records that have been encoded or encrypted, thereby allowing linkage without the need of sharing any sensitive data. While various PPRL techniques have been developed, most of them do not properly address privacy concerns, such as the various vulnerabilities of encoded data with regard to cryptanalysis attacks. Existing PPRL methods, furthermore, do not provide conceptual analyses of how a user should set the various parameters required, possibly leading to sub-optimal results with regard to both linkage quality and privacy protection. Here we present a novel encoding method for PPRL that employs reference q-gram sets to generate bit arrays that represent sensitive values. Our method requires a single user parameter that determines a trade-off between linkage quality, scalability, and privacy. All other parameters are either data driven or have strong bounds based on the user-set parameter. Furthermore, our method addresses the length, frequency, and pattern-based PPRL vulnerabilities that are exploited by existing PPRL attacks. We conceptually analyse our method and experimentally evaluate it using multiple databases. Our results show that our method provides robust results for both high linkage quality and strong privacy protection.
扫码关注我们
求助内容:
应助结果提醒方式:
