{"title":"Private Collaborative Data Cleaning via Non-Equi PSI","authors":"Erik-Oliver Blass, F. Kerschbaum","doi":"10.1109/SP46215.2023.10179337","DOIUrl":null,"url":null,"abstract":"We introduce and investigate the privacy-preserving version of collaborative data cleaning. With collaborative data cleaning, two parties want to reconcile their data sets to filter out badly classified, misclassified data items. In the privacy-preserving (private) version of data cleaning, the additional security goal is that parties should only learn their misclassified data items, but nothing else about the other party’s data set. The problem of private data cleaning is essentially a variation of private set intersection (PSI), and one could employ recent circuit-PSI techniques to compute misclassifications with privacy. However, we design, analyze, and implement three new protocols tailored to the specifics of private data cleaning that outperform a circuit-PSI-based approach. With the first protocol, we exploit the idea that a small additional leakage (the differentially private size of the intersection of data items) allows for a reduction in complexity over circuit-PSI. The other two protocols convert the problem of finding a mismatch in data classifications into finding a match, and then follow the standard technique of using oblivious pseudorandom functions (OPRF) for computing PSI. Depending on the number of data classes, this leads to a concrete runtime improvement over circuit-PSI.","PeriodicalId":439989,"journal":{"name":"2023 IEEE Symposium on Security and Privacy (SP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Symposium on Security and Privacy (SP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SP46215.2023.10179337","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We introduce and investigate the privacy-preserving version of collaborative data cleaning. With collaborative data cleaning, two parties want to reconcile their data sets to filter out badly classified, misclassified data items. In the privacy-preserving (private) version of data cleaning, the additional security goal is that parties should only learn their misclassified data items, but nothing else about the other party’s data set. The problem of private data cleaning is essentially a variation of private set intersection (PSI), and one could employ recent circuit-PSI techniques to compute misclassifications with privacy. However, we design, analyze, and implement three new protocols tailored to the specifics of private data cleaning that outperform a circuit-PSI-based approach. With the first protocol, we exploit the idea that a small additional leakage (the differentially private size of the intersection of data items) allows for a reduction in complexity over circuit-PSI. The other two protocols convert the problem of finding a mismatch in data classifications into finding a match, and then follow the standard technique of using oblivious pseudorandom functions (OPRF) for computing PSI. Depending on the number of data classes, this leads to a concrete runtime improvement over circuit-PSI.