Cluster-based anonymity model and algorithm for 1:1 dataset with a single sensitive attribute using machine learning technique

IF 5 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Egyptian Informatics Journal Pub Date : 2024-06-13 DOI:10.1016/j.eij.2024.100485
J. Jayapradha , Ghaida Muttashar Abdulsahib , Osamah Ibrahim Khalaf , M. Prakash , Mueen Uddin , Maha Abdelhaq , Raed Alsaqour
{"title":"Cluster-based anonymity model and algorithm for 1:1 dataset with a single sensitive attribute using machine learning technique","authors":"J. Jayapradha ,&nbsp;Ghaida Muttashar Abdulsahib ,&nbsp;Osamah Ibrahim Khalaf ,&nbsp;M. Prakash ,&nbsp;Mueen Uddin ,&nbsp;Maha Abdelhaq ,&nbsp;Raed Alsaqour","doi":"10.1016/j.eij.2024.100485","DOIUrl":null,"url":null,"abstract":"<div><p>Privacy is a significant issue that requires consideration in all applications. Data collected from various individuals and organizations must be disclosed to the public or private parties for analysis and research purposes. The collected data are studied and analyzed digitally for the extraction of various useful patterns for decision-making research purposes. Privacy-preserving data publishing is significant as privacy violations in the patient’s data may have an adverse effect on the individual positive reputation. An efficient Cluster Based anonymity model has been proposed to anonymizes the 1:1 dataset with a single sensitive attribute through the introduction of a concept named “Semi-sensitive attribute.” Based on correlation, the attributes are categorized as quasi-identifier and semi-sensitive attributes. The k-anonymity is implemented on the quasi-identifier with the semi-sensitive attribute table and Fuzzy c-means clustering has been implemented to fix a range of values for anonymizing the semi-sensitive attributes. The disease is considered a sensitive attribute as the research work focuses on the medical dataset. The proposed model is demonstrated to resist the three privacy attacks such as, i)Identity Disclosure, ii) Attribute Disclosure, and iii) Membership Disclosure. The utility loss is calculated for each row and utility loss of each record are aggregated and considered as the total information loss for each attribute. Cluster Based anonymity model measured the utility loss for all the attributes and the average utility loss for the anonymized patient dataset is 3.78%.</p></div>","PeriodicalId":56010,"journal":{"name":"Egyptian Informatics Journal","volume":null,"pages":null},"PeriodicalIF":5.0000,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1110866524000483/pdfft?md5=8857fa73ae94805ac4758e191a2acbc2&pid=1-s2.0-S1110866524000483-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Egyptian Informatics Journal","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110866524000483","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Privacy is a significant issue that requires consideration in all applications. Data collected from various individuals and organizations must be disclosed to the public or private parties for analysis and research purposes. The collected data are studied and analyzed digitally for the extraction of various useful patterns for decision-making research purposes. Privacy-preserving data publishing is significant as privacy violations in the patient’s data may have an adverse effect on the individual positive reputation. An efficient Cluster Based anonymity model has been proposed to anonymizes the 1:1 dataset with a single sensitive attribute through the introduction of a concept named “Semi-sensitive attribute.” Based on correlation, the attributes are categorized as quasi-identifier and semi-sensitive attributes. The k-anonymity is implemented on the quasi-identifier with the semi-sensitive attribute table and Fuzzy c-means clustering has been implemented to fix a range of values for anonymizing the semi-sensitive attributes. The disease is considered a sensitive attribute as the research work focuses on the medical dataset. The proposed model is demonstrated to resist the three privacy attacks such as, i)Identity Disclosure, ii) Attribute Disclosure, and iii) Membership Disclosure. The utility loss is calculated for each row and utility loss of each record are aggregated and considered as the total information loss for each attribute. Cluster Based anonymity model measured the utility loss for all the attributes and the average utility loss for the anonymized patient dataset is 3.78%.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用机器学习技术为具有单一敏感属性的 1:1 数据集设计基于聚类的匿名模型和算法
隐私是所有应用中都需要考虑的一个重要问题。从不同个人和组织收集的数据必须向公众或私人披露,以便进行分析和研究。对收集到的数据进行数字化研究和分析,以提取各种有用的模式,用于决策研究目的。保护隐私的数据发布意义重大,因为侵犯患者数据隐私可能会对个人的正面声誉产生不利影响。通过引入 "半敏感属性 "的概念,提出了一种高效的基于集群的匿名模型,对具有单一敏感属性的 1:1 数据集进行匿名处理。根据相关性,属性被分为准标识符和半敏感属性。对准标识符和半敏感属性表实施 k 匿名,并实施模糊 c 均值聚类,以确定半敏感属性的匿名值范围。疾病被视为敏感属性,因为研究工作的重点是医疗数据集。研究表明,所提出的模型可以抵御三种隐私攻击,如:i) 身份泄露;ii) 属性泄露;iii) 成员身份泄露。计算每一行的效用损失,汇总每条记录的效用损失,并将其视为每个属性的总信息损失。基于聚类的匿名模型测量了所有属性的效用损失,匿名患者数据集的平均效用损失为 3.78%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Egyptian Informatics Journal
Egyptian Informatics Journal Decision Sciences-Management Science and Operations Research
CiteScore
11.10
自引率
1.90%
发文量
59
审稿时长
110 days
期刊介绍: The Egyptian Informatics Journal is published by the Faculty of Computers and Artificial Intelligence, Cairo University. This Journal provides a forum for the state-of-the-art research and development in the fields of computing, including computer sciences, information technologies, information systems, operations research and decision support. Innovative and not-previously-published work in subjects covered by the Journal is encouraged to be submitted, whether from academic, research or commercial sources.
期刊最新文献
HD-MVCNN: High-density ECG signal based diabetic prediction and classification using multi-view convolutional neural network A hybrid encryption algorithm based approach for secure privacy protection of big data in hospitals A new probabilistic linguistic decision-making process based on PL-BWM and improved three-way TODIM methods Interval valued inventory model with different payment strategies for green products under interval valued Grey Wolf optimizer Algorithm fitness function Intelligent SDN to enhance security in IoT networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1