Proximity Test for Sensitive Categorical Attributes in Big Data

Zakariae El Ouazzani, H. Bakkali
{"title":"Proximity Test for Sensitive Categorical Attributes in Big Data","authors":"Zakariae El Ouazzani, H. Bakkali","doi":"10.1109/CloudTech.2018.8713359","DOIUrl":null,"url":null,"abstract":"Nowadays, various organizations obtain and store huge amounts of data in large data sets for research and mining purposes. As we know, the collected data are useful only if they are published or shared between companies. However, these data contain individual's sensitive information. Then, ensuring privacy in big data becomes a very significant issue. The concept of privacy protection aims to protect this private information from different privacy threats that may violate the individual's identity. Therefore, anonymization techniques become subject of research and must be applied before transmitting the data set to organizations. Anonymization techniques represent a way to ensure privacy in mixed data sets containing both numerical and categorical attributes. Based on horizontal clustering idea, several works have been realized; l-diversity technique is one of them treating sensitive numerical and categorical attributes. Although l-diversity is applied on a data set by putting only distinct values into diverse buckets, those distinct values may correspond after the anonymization process to a specific category. In this paper a new method called “Proximity test for sensitive categorical attributes” is proposed to deal with non-numerical attributes. The proposed algorithm comes to test the degree of proximity between values within each bucket in the data set. Moreover, it works without taking into consideration any threshold. This algorithm is implemented and evaluated on a test table. Furthermore, we highlighted all the steps of our proposed algorithm with detailed comments.","PeriodicalId":292196,"journal":{"name":"2018 4th International Conference on Cloud Computing Technologies and Applications (Cloudtech)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 4th International Conference on Cloud Computing Technologies and Applications (Cloudtech)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CloudTech.2018.8713359","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Nowadays, various organizations obtain and store huge amounts of data in large data sets for research and mining purposes. As we know, the collected data are useful only if they are published or shared between companies. However, these data contain individual's sensitive information. Then, ensuring privacy in big data becomes a very significant issue. The concept of privacy protection aims to protect this private information from different privacy threats that may violate the individual's identity. Therefore, anonymization techniques become subject of research and must be applied before transmitting the data set to organizations. Anonymization techniques represent a way to ensure privacy in mixed data sets containing both numerical and categorical attributes. Based on horizontal clustering idea, several works have been realized; l-diversity technique is one of them treating sensitive numerical and categorical attributes. Although l-diversity is applied on a data set by putting only distinct values into diverse buckets, those distinct values may correspond after the anonymization process to a specific category. In this paper a new method called “Proximity test for sensitive categorical attributes” is proposed to deal with non-numerical attributes. The proposed algorithm comes to test the degree of proximity between values within each bucket in the data set. Moreover, it works without taking into consideration any threshold. This algorithm is implemented and evaluated on a test table. Furthermore, we highlighted all the steps of our proposed algorithm with detailed comments.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
大数据敏感分类属性的接近性检验
如今,各种组织在大型数据集中获取和存储大量数据,用于研究和挖掘目的。正如我们所知,收集的数据只有在公司之间发布或共享时才有用。然而,这些数据包含了个人的敏感信息。因此,确保大数据中的隐私成为一个非常重要的问题。隐私保护的概念旨在保护这些私人信息免受可能侵犯个人身份的不同隐私威胁。因此,匿名化技术成为研究的主题,必须在将数据集传输给组织之前应用。匿名化技术提供了一种在包含数值和分类属性的混合数据集中确保隐私的方法。基于横向聚类思想,实现了若干工作;l-分集技术是处理敏感数值属性和分类属性的技术之一。尽管l-diversity通过只将不同的值放入不同的桶来应用于数据集,但这些不同的值在匿名化过程之后可能对应于特定的类别。本文提出了一种处理非数值属性的“敏感范畴属性接近性检验”方法。该算法用于测试数据集中每个桶内值之间的接近程度。此外,它不需要考虑任何阈值。该算法在一个测试表上进行了实现和评估。此外,我们用详细的注释突出了我们提出的算法的所有步骤。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analyzing fault tolerance mechanism of Hadoop Mapreduce under different type of failures Cloud Secured Protocol based on Partial Homomorphic Encryptions Wireless Sensor Networks as part of IOT: Performance study of WiMax - Mobil protocol Proximity Test for Sensitive Categorical Attributes in Big Data DTLS Integration in oneM2M based on Zolertia RE-motes
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1