J. Jayapradha , Ghaida Muttashar Abdulsahib , Osamah Ibrahim Khalaf , M. Prakash , Mueen Uddin , Maha Abdelhaq , Raed Alsaqour
{"title":"利用机器学习技术为具有单一敏感属性的 1:1 数据集设计基于聚类的匿名模型和算法","authors":"J. Jayapradha , Ghaida Muttashar Abdulsahib , Osamah Ibrahim Khalaf , M. Prakash , Mueen Uddin , Maha Abdelhaq , Raed Alsaqour","doi":"10.1016/j.eij.2024.100485","DOIUrl":null,"url":null,"abstract":"<div><p>Privacy is a significant issue that requires consideration in all applications. Data collected from various individuals and organizations must be disclosed to the public or private parties for analysis and research purposes. The collected data are studied and analyzed digitally for the extraction of various useful patterns for decision-making research purposes. Privacy-preserving data publishing is significant as privacy violations in the patient’s data may have an adverse effect on the individual positive reputation. An efficient Cluster Based anonymity model has been proposed to anonymizes the 1:1 dataset with a single sensitive attribute through the introduction of a concept named “Semi-sensitive attribute.” Based on correlation, the attributes are categorized as quasi-identifier and semi-sensitive attributes. The k-anonymity is implemented on the quasi-identifier with the semi-sensitive attribute table and Fuzzy c-means clustering has been implemented to fix a range of values for anonymizing the semi-sensitive attributes. The disease is considered a sensitive attribute as the research work focuses on the medical dataset. The proposed model is demonstrated to resist the three privacy attacks such as, i)Identity Disclosure, ii) Attribute Disclosure, and iii) Membership Disclosure. The utility loss is calculated for each row and utility loss of each record are aggregated and considered as the total information loss for each attribute. Cluster Based anonymity model measured the utility loss for all the attributes and the average utility loss for the anonymized patient dataset is 3.78%.</p></div>","PeriodicalId":56010,"journal":{"name":"Egyptian Informatics Journal","volume":null,"pages":null},"PeriodicalIF":5.0000,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1110866524000483/pdfft?md5=8857fa73ae94805ac4758e191a2acbc2&pid=1-s2.0-S1110866524000483-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Cluster-based anonymity model and algorithm for 1:1 dataset with a single sensitive attribute using machine learning technique\",\"authors\":\"J. Jayapradha , Ghaida Muttashar Abdulsahib , Osamah Ibrahim Khalaf , M. Prakash , Mueen Uddin , Maha Abdelhaq , Raed Alsaqour\",\"doi\":\"10.1016/j.eij.2024.100485\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Privacy is a significant issue that requires consideration in all applications. Data collected from various individuals and organizations must be disclosed to the public or private parties for analysis and research purposes. The collected data are studied and analyzed digitally for the extraction of various useful patterns for decision-making research purposes. Privacy-preserving data publishing is significant as privacy violations in the patient’s data may have an adverse effect on the individual positive reputation. An efficient Cluster Based anonymity model has been proposed to anonymizes the 1:1 dataset with a single sensitive attribute through the introduction of a concept named “Semi-sensitive attribute.” Based on correlation, the attributes are categorized as quasi-identifier and semi-sensitive attributes. The k-anonymity is implemented on the quasi-identifier with the semi-sensitive attribute table and Fuzzy c-means clustering has been implemented to fix a range of values for anonymizing the semi-sensitive attributes. The disease is considered a sensitive attribute as the research work focuses on the medical dataset. The proposed model is demonstrated to resist the three privacy attacks such as, i)Identity Disclosure, ii) Attribute Disclosure, and iii) Membership Disclosure. The utility loss is calculated for each row and utility loss of each record are aggregated and considered as the total information loss for each attribute. Cluster Based anonymity model measured the utility loss for all the attributes and the average utility loss for the anonymized patient dataset is 3.78%.</p></div>\",\"PeriodicalId\":56010,\"journal\":{\"name\":\"Egyptian Informatics Journal\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2024-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1110866524000483/pdfft?md5=8857fa73ae94805ac4758e191a2acbc2&pid=1-s2.0-S1110866524000483-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Egyptian Informatics Journal\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1110866524000483\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Egyptian Informatics Journal","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110866524000483","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
隐私是所有应用中都需要考虑的一个重要问题。从不同个人和组织收集的数据必须向公众或私人披露,以便进行分析和研究。对收集到的数据进行数字化研究和分析,以提取各种有用的模式,用于决策研究目的。保护隐私的数据发布意义重大,因为侵犯患者数据隐私可能会对个人的正面声誉产生不利影响。通过引入 "半敏感属性 "的概念,提出了一种高效的基于集群的匿名模型,对具有单一敏感属性的 1:1 数据集进行匿名处理。根据相关性,属性被分为准标识符和半敏感属性。对准标识符和半敏感属性表实施 k 匿名,并实施模糊 c 均值聚类,以确定半敏感属性的匿名值范围。疾病被视为敏感属性,因为研究工作的重点是医疗数据集。研究表明,所提出的模型可以抵御三种隐私攻击,如:i) 身份泄露;ii) 属性泄露;iii) 成员身份泄露。计算每一行的效用损失,汇总每条记录的效用损失,并将其视为每个属性的总信息损失。基于聚类的匿名模型测量了所有属性的效用损失,匿名患者数据集的平均效用损失为 3.78%。
Cluster-based anonymity model and algorithm for 1:1 dataset with a single sensitive attribute using machine learning technique
Privacy is a significant issue that requires consideration in all applications. Data collected from various individuals and organizations must be disclosed to the public or private parties for analysis and research purposes. The collected data are studied and analyzed digitally for the extraction of various useful patterns for decision-making research purposes. Privacy-preserving data publishing is significant as privacy violations in the patient’s data may have an adverse effect on the individual positive reputation. An efficient Cluster Based anonymity model has been proposed to anonymizes the 1:1 dataset with a single sensitive attribute through the introduction of a concept named “Semi-sensitive attribute.” Based on correlation, the attributes are categorized as quasi-identifier and semi-sensitive attributes. The k-anonymity is implemented on the quasi-identifier with the semi-sensitive attribute table and Fuzzy c-means clustering has been implemented to fix a range of values for anonymizing the semi-sensitive attributes. The disease is considered a sensitive attribute as the research work focuses on the medical dataset. The proposed model is demonstrated to resist the three privacy attacks such as, i)Identity Disclosure, ii) Attribute Disclosure, and iii) Membership Disclosure. The utility loss is calculated for each row and utility loss of each record are aggregated and considered as the total information loss for each attribute. Cluster Based anonymity model measured the utility loss for all the attributes and the average utility loss for the anonymized patient dataset is 3.78%.
期刊介绍:
The Egyptian Informatics Journal is published by the Faculty of Computers and Artificial Intelligence, Cairo University. This Journal provides a forum for the state-of-the-art research and development in the fields of computing, including computer sciences, information technologies, information systems, operations research and decision support. Innovative and not-previously-published work in subjects covered by the Journal is encouraged to be submitted, whether from academic, research or commercial sources.