{"title":"Practical anonymization for data streams: z-anonymity and relation with k-anonymity","authors":"Nikhil Jha , Luca Vassio , Martino Trevisan , Emilio Leonardi , Marco Mellia","doi":"10.1016/j.peva.2022.102329","DOIUrl":null,"url":null,"abstract":"<div><p>With the advent of big data and the emergence of data markets, preserving individuals’ privacy has become of utmost importance. The classical response to this need is anonymization, i.e., sanitizing the information that, directly or indirectly, can allow users’ re-identification. Among the various approaches, <span><math><mi>k</mi></math></span>-anonymity provides a simple and easy-to-understand protection. However, <span><math><mi>k</mi></math></span>-anonymity is challenging to achieve in a continuous stream of data and scales poorly when the number of attributes becomes high.</p><p>In this paper, we study a novel anonymization property called <span><math><mi>z</mi></math></span>-anonymity that we explicitly design to deal with data streams, i.e., where the decision to publish a given attribute (atomic information) is made in real time. The idea at the base of <span><math><mi>z</mi></math></span>-anonymity is to release such attribute about a user only if at least <span><math><mrow><mi>z</mi><mo>−</mo><mn>1</mn></mrow></math></span> other users have exposed the same attribute in a past time window. Depending on the value of <span><math><mi>z</mi></math></span>, the output stream results <span><math><mi>k</mi></math></span>-anonymized with a certain probability. To this end, we present a probabilistic model to map the <span><math><mi>z</mi></math></span>-anonymity into the <span><math><mi>k</mi></math></span>-anonymity property. The model is not only helpful in studying the <span><math><mi>z</mi></math></span>-anonymity property, but also general enough to evaluate the probability of achieving <span><math><mi>k</mi></math></span>-anonymity in data streams, resulting in a generic contribution.</p></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"159 ","pages":"Article 102329"},"PeriodicalIF":1.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Performance Evaluation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0166531622000372","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 2
Abstract
With the advent of big data and the emergence of data markets, preserving individuals’ privacy has become of utmost importance. The classical response to this need is anonymization, i.e., sanitizing the information that, directly or indirectly, can allow users’ re-identification. Among the various approaches, -anonymity provides a simple and easy-to-understand protection. However, -anonymity is challenging to achieve in a continuous stream of data and scales poorly when the number of attributes becomes high.
In this paper, we study a novel anonymization property called -anonymity that we explicitly design to deal with data streams, i.e., where the decision to publish a given attribute (atomic information) is made in real time. The idea at the base of -anonymity is to release such attribute about a user only if at least other users have exposed the same attribute in a past time window. Depending on the value of , the output stream results -anonymized with a certain probability. To this end, we present a probabilistic model to map the -anonymity into the -anonymity property. The model is not only helpful in studying the -anonymity property, but also general enough to evaluate the probability of achieving -anonymity in data streams, resulting in a generic contribution.
期刊介绍:
Performance Evaluation functions as a leading journal in the area of modeling, measurement, and evaluation of performance aspects of computing and communication systems. As such, it aims to present a balanced and complete view of the entire Performance Evaluation profession. Hence, the journal is interested in papers that focus on one or more of the following dimensions:
-Define new performance evaluation tools, including measurement and monitoring tools as well as modeling and analytic techniques
-Provide new insights into the performance of computing and communication systems
-Introduce new application areas where performance evaluation tools can play an important role and creative new uses for performance evaluation tools.
More specifically, common application areas of interest include the performance of:
-Resource allocation and control methods and algorithms (e.g. routing and flow control in networks, bandwidth allocation, processor scheduling, memory management)
-System architecture, design and implementation
-Cognitive radio
-VANETs
-Social networks and media
-Energy efficient ICT
-Energy harvesting
-Data centers
-Data centric networks
-System reliability
-System tuning and capacity planning
-Wireless and sensor networks
-Autonomic and self-organizing systems
-Embedded systems
-Network science