Local Hashing and Fake Data for Privacy-Aware Frequency Estimation

Gatha Varma
{"title":"Local Hashing and Fake Data for Privacy-Aware Frequency Estimation","authors":"Gatha Varma","doi":"10.1109/IMCOM56909.2023.10035583","DOIUrl":null,"url":null,"abstract":"Data collected from services and application users contain identifying attributes. The categorical attributes of user data capture information contained in a fixed set of domain values $\\boldsymbol{D}_{\\boldsymbol{m}}$. The statistical analysis of the collected data drives modeling, which in the case of categorical attributes is frequency estimation. It gives the approximate number of individuals who reported a specific value from set $\\boldsymbol{D}_{\\boldsymbol{m}}$. Under the conditions where the user data is collected repeatedly, frequency estimation may exhibit disclosure potential risks. Therefore it is important to privatize the user data such that the statistics are relevant yet minimize privacy risks. This is achieved by a set of algorithms called Frequency Oracles. Local Differential Privacy is a widely-used technique for the concerning circumstances. Additionally, several methods are used to amplify its privacy guarantees including sampling and randomization. In this paper, I propose the first sample-based frequency oracle which used Optimized Local Hashing (OLH) and was further enhanced by the replacement of some attribute values with fake data. The adaptive solution utilized the benefits offered by OLH for large-dimensioned dataset and a variance independent of dimensionality. The privacy-utility trade-off given by the proposed solution was found to be better than existing solutions for certain general and strict privacy regimes for multi-dimensional datasets.","PeriodicalId":230213,"journal":{"name":"2023 17th International Conference on Ubiquitous Information Management and Communication (IMCOM)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 17th International Conference on Ubiquitous Information Management and Communication (IMCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMCOM56909.2023.10035583","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Data collected from services and application users contain identifying attributes. The categorical attributes of user data capture information contained in a fixed set of domain values $\boldsymbol{D}_{\boldsymbol{m}}$. The statistical analysis of the collected data drives modeling, which in the case of categorical attributes is frequency estimation. It gives the approximate number of individuals who reported a specific value from set $\boldsymbol{D}_{\boldsymbol{m}}$. Under the conditions where the user data is collected repeatedly, frequency estimation may exhibit disclosure potential risks. Therefore it is important to privatize the user data such that the statistics are relevant yet minimize privacy risks. This is achieved by a set of algorithms called Frequency Oracles. Local Differential Privacy is a widely-used technique for the concerning circumstances. Additionally, several methods are used to amplify its privacy guarantees including sampling and randomization. In this paper, I propose the first sample-based frequency oracle which used Optimized Local Hashing (OLH) and was further enhanced by the replacement of some attribute values with fake data. The adaptive solution utilized the benefits offered by OLH for large-dimensioned dataset and a variance independent of dimensionality. The privacy-utility trade-off given by the proposed solution was found to be better than existing solutions for certain general and strict privacy regimes for multi-dimensional datasets.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
隐私感知频率估计的局部哈希和假数据
从服务和应用程序用户收集的数据包含标识属性。用户数据捕获信息的分类属性包含在一组固定的域值$\boldsymbol{D}_{\boldsymbol{m}}$中。收集到的数据的统计分析驱动建模,在分类属性的情况下是频率估计。它给出了从set $\boldsymbol{D}_{\boldsymbol{m}}$中报告特定值的个人的大致数量。在重复收集用户数据的情况下,频率估计可能存在泄露的潜在风险。因此,将用户数据私有化是很重要的,这样统计数据是相关的,但最大限度地减少隐私风险。这是通过一组称为频率预言器的算法实现的。在这种情况下,局部差分隐私是一种广泛使用的技术。此外,还使用了抽样和随机化等方法来增强其隐私保障。在本文中,我提出了第一个基于样本的频率数据库,它使用了优化的局部哈希(OLH),并通过用假数据替换一些属性值来进一步增强。该自适应解决方案利用了OLH对大维数据集的优势和与维数无关的方差。对于多维数据集的某些一般和严格的隐私制度,所提出的解决方案给出的隐私效用权衡优于现有的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Lightweight energy-efficient offloading framework for mobile edge/cloud computing Dual ResNet-based Environmental Sound Classification using GAN Finite Element Method for System-in-Package (SiP) Technology: Thermal Analysis Using Chip Cooling Laminate Chip (CCLC) An Improved Reverse Distillation Model for Unsupervised Anomaly Detection Pictorial Map Generation based on Color Extraction and Sentiment Analysis using SNS Photos
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1