Privacy protection against user profiling through optimal data generalization

IF 4.8 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Computers & Security Pub Date : 2024-11-05 DOI:10.1016/j.cose.2024.104178

César Gil, Javier Parra-Arnau, Jordi Forné

{"title":"Privacy protection against user profiling through optimal data generalization","authors":"César Gil, Javier Parra-Arnau, Jordi Forné","doi":"10.1016/j.cose.2024.104178","DOIUrl":null,"url":null,"abstract":"<div><div>Personalized information systems are information-filtering systems that endeavor to tailor information-exchange functionality to the specific interests of their users. The ability of these systems to profile users based on their search queries at Google, disclosed locations at Twitter or rated movies at Netflix, is on the one hand what enables such intelligent functionality, but on the other, the source of serious privacy concerns. Leveraging on the principle of data minimization, we propose a data-generalization mechanism that aims to protect users’ privacy against non-fully trusted personalized information systems. In our approach, a user may like to disclose personal data to such systems when they feel comfortable. But when they do not, they may wish to replace specific and sensitive data with more general and thus less sensitive data, before sharing this information with the personalized system in question. Generalization therefore may protect user privacy to a certain extent, but clearly at the cost of some information loss. In this work, we model mathematically an optimized version of this mechanism and investigate theoretically some key properties of the privacy-utility trade-off posed by this mechanism. Experimental results on two real-world datasets demonstrate how our approach may contribute to privacy protection and show it can outperform state-of-the-art perturbation techniques like data forgery and suppression by providing higher utility for a same privacy level. On a practical level, the implications of our work are diverse in the field of personalized online services. We emphasize that our mechanism allows each user individually to take charge of their own privacy, without the need to go to third parties or share resources with other users. And on the other hand, it provides privacy designers/engineers with a new data-perturbative mechanism with which to evaluate their systems in the presence of data that is likely to be generalizable according to a certain hierarchy, highlighting spatial generalization, with practical application in popular location based services. Overall, a data-perturbation mechanism for privacy protection against user profiling, which is optimal, deterministic, and local, based on a untrusted model towards third parties.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"148 ","pages":"Article 104178"},"PeriodicalIF":4.8000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404824004838","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Personalized information systems are information-filtering systems that endeavor to tailor information-exchange functionality to the specific interests of their users. The ability of these systems to profile users based on their search queries at Google, disclosed locations at Twitter or rated movies at Netflix, is on the one hand what enables such intelligent functionality, but on the other, the source of serious privacy concerns. Leveraging on the principle of data minimization, we propose a data-generalization mechanism that aims to protect users’ privacy against non-fully trusted personalized information systems. In our approach, a user may like to disclose personal data to such systems when they feel comfortable. But when they do not, they may wish to replace specific and sensitive data with more general and thus less sensitive data, before sharing this information with the personalized system in question. Generalization therefore may protect user privacy to a certain extent, but clearly at the cost of some information loss. In this work, we model mathematically an optimized version of this mechanism and investigate theoretically some key properties of the privacy-utility trade-off posed by this mechanism. Experimental results on two real-world datasets demonstrate how our approach may contribute to privacy protection and show it can outperform state-of-the-art perturbation techniques like data forgery and suppression by providing higher utility for a same privacy level. On a practical level, the implications of our work are diverse in the field of personalized online services. We emphasize that our mechanism allows each user individually to take charge of their own privacy, without the need to go to third parties or share resources with other users. And on the other hand, it provides privacy designers/engineers with a new data-perturbative mechanism with which to evaluate their systems in the presence of data that is likely to be generalizable according to a certain hierarchy, highlighting spatial generalization, with practical application in popular location based services. Overall, a data-perturbation mechanism for privacy protection against user profiling, which is optimal, deterministic, and local, based on a untrusted model towards third parties.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过优化数据概括，防止用户貌相隐私泄露

个性化信息系统是一种信息过滤系统，致力于根据用户的具体兴趣定制信息交换功能。这些系统能够根据用户在谷歌的搜索查询、在推特上披露的位置或在 Netflix 上的电影评分来对用户进行分析，这一方面使这些智能功能得以实现，另一方面也是严重隐私问题的根源。利用数据最小化原则，我们提出了一种数据泛化机制，旨在保护用户隐私免受不可完全信任的个性化信息系统的侵害。在我们的方法中，当用户感觉舒适时，他们可能愿意向此类系统披露个人数据。但当他们不这样做时，他们可能希望在与相关个性化系统共享这些信息之前，用更通用、因而敏感度更低的数据来替换特定的敏感数据。因此，通用化可以在一定程度上保护用户隐私，但显然要以损失一些信息为代价。在这项工作中，我们对这种机制的优化版本进行了数学建模，并从理论上研究了这种机制所带来的隐私-效用权衡的一些关键特性。在两个真实数据集上的实验结果表明了我们的方法如何有助于隐私保护，并表明它可以在相同隐私水平下提供更高的效用，从而优于数据伪造和压制等最先进的扰动技术。从实践层面来看，我们的工作在个性化在线服务领域具有多种影响。我们强调，我们的机制允许每个用户单独负责自己的隐私，而无需求助于第三方或与其他用户共享资源。另一方面，它还为隐私设计人员/工程师提供了一种新的数据扰动机制，在数据可能按照一定的层次进行泛化（突出空间泛化）的情况下，利用这种机制对他们的系统进行评估，这在流行的基于位置的服务中得到了实际应用。总之，这是一种针对用户貌相的隐私保护数据扰动机制，它是最优的、确定的和局部的，基于对第三方的不信任模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computers & Security 工程技术-计算机：信息系统

CiteScore

12.40

自引率

7.10%

发文量

365

审稿时长

10.7 months

期刊介绍： Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world. Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.