Anonymization and Analysis of Horizontally and Vertically Divided User Profile Databases with Multiple Sensitive Attributes

Yuki Ina, Y. Sei, Yasuyuki Tahara, Akihiko Ohsuga
{"title":"Anonymization and Analysis of Horizontally and Vertically Divided User Profile Databases with Multiple Sensitive Attributes","authors":"Yuki Ina, Y. Sei, Yasuyuki Tahara, Akihiko Ohsuga","doi":"10.1109/SOLI.2018.8476730","DOIUrl":null,"url":null,"abstract":"Preventing the identification of individuals is important when data analyzers have to guarantee the safety of the data analysis they work with. A method proposed to solve this problem entails altering a part of the data value or deleting it. As to the processes, attributes of the individual data are divided into three groups: identifier (ID), quasi-identifier (QID), and sensitive attribute (SA). ID is the data that identify an individual directly, such as name. QID is the attributes that could identify an individual by combining them, such as age and birthplace. SA is very important information and should not be exposed when the data is identified to an individual. Utilizing these concepts, a safety metric for the data, such as l-diversity, is proposed so far. Under l-diversity, we use the assumption that the SA value is not known for anyone, and we process the data to prevent attackers from identifying. However, there are scenarios in which existing methods cannot protect the data against an invasion of privacy. In an analysis completed by multiple organizations, they integrated their data to carry out the effective data research. Although they can obtain profitable results, the integrated data could include information that attackers use to identify people. Specifically speaking, if the attacker is an institute providing data, they can use their own data’ SA value as a QID value. The assumption of l-diversity is violated, so the existing safety metric loses its effect on protecting data. In this paper, we propose a new anonymization method to conceal organizations’ important data by inserting dummy values, thereby enabling analysts to use the data safely. At the same time, we provide a calculating method to decrease the influence of the noise generated from the dummy insertion. We confirm these methods’ effectiveness by measuring accuracy in a data analysis experiments.","PeriodicalId":424115,"journal":{"name":"2018 IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI)","volume":"219 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SOLI.2018.8476730","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Preventing the identification of individuals is important when data analyzers have to guarantee the safety of the data analysis they work with. A method proposed to solve this problem entails altering a part of the data value or deleting it. As to the processes, attributes of the individual data are divided into three groups: identifier (ID), quasi-identifier (QID), and sensitive attribute (SA). ID is the data that identify an individual directly, such as name. QID is the attributes that could identify an individual by combining them, such as age and birthplace. SA is very important information and should not be exposed when the data is identified to an individual. Utilizing these concepts, a safety metric for the data, such as l-diversity, is proposed so far. Under l-diversity, we use the assumption that the SA value is not known for anyone, and we process the data to prevent attackers from identifying. However, there are scenarios in which existing methods cannot protect the data against an invasion of privacy. In an analysis completed by multiple organizations, they integrated their data to carry out the effective data research. Although they can obtain profitable results, the integrated data could include information that attackers use to identify people. Specifically speaking, if the attacker is an institute providing data, they can use their own data’ SA value as a QID value. The assumption of l-diversity is violated, so the existing safety metric loses its effect on protecting data. In this paper, we propose a new anonymization method to conceal organizations’ important data by inserting dummy values, thereby enabling analysts to use the data safely. At the same time, we provide a calculating method to decrease the influence of the noise generated from the dummy insertion. We confirm these methods’ effectiveness by measuring accuracy in a data analysis experiments.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多敏感属性横向和纵向划分用户档案数据库的匿名化与分析
当数据分析人员必须保证他们所使用的数据分析的安全性时,防止识别个人是很重要的。解决这个问题的一种方法是修改数据值的一部分或删除它。在处理方面,将单个数据的属性分为三组:标识符(ID)、准标识符(QID)和敏感属性(SA)。ID是直接标识个人的数据,如姓名。QID是可以通过组合这些属性来识别个人的属性,比如年龄和出生地。SA是非常重要的信息,当数据被识别给个人时,不应该公开SA。利用这些概念,目前提出了一种数据的安全度量,如l-diversity。在l-diversity下,我们假设SA值不为任何人所知,并对数据进行处理以防止攻击者识别。然而,在某些情况下,现有方法无法保护数据免受隐私侵犯。在一个由多个组织完成的分析中,他们整合了他们的数据来进行有效的数据研究。虽然他们可以获得有利可图的结果,但整合的数据可能包括攻击者用来识别人的信息。具体来说,如果攻击者是提供数据的机构,他们可以使用自己数据的SA值作为QID值。违反了l-分集的假设,使现有的安全度量失去了保护数据的作用。在本文中,我们提出了一种新的匿名化方法,通过插入虚拟值来隐藏组织的重要数据,从而使分析人员能够安全地使用数据。同时,给出了一种减小假人插入噪声影响的计算方法。在数据分析实验中,我们通过测量精度来验证这些方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Wideband Circularly Polarized UHF RFID Reader Antenna NFC-based Smart Notification System for Hospital Discharge Process and Bed Management Ensuring performance measurement integrity in logistics using blockchain Anonymization and Analysis of Horizontally and Vertically Divided User Profile Databases with Multiple Sensitive Attributes Inter-organizational Knowledge Flow and R&D portfolio of Patent Citation Network - The Case of the Socket Wrench Industry
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1