J. Jayapradha , Ghaida Muttashar Abdulsahib , Osamah Ibrahim Khalaf , M. Prakash , Mueen Uddin , Maha Abdelhaq , Raed Alsaqour
{"title":"Cluster-based anonymity model and algorithm for 1:1 dataset with a single sensitive attribute using machine learning technique","authors":"J. Jayapradha , Ghaida Muttashar Abdulsahib , Osamah Ibrahim Khalaf , M. Prakash , Mueen Uddin , Maha Abdelhaq , Raed Alsaqour","doi":"10.1016/j.eij.2024.100485","DOIUrl":null,"url":null,"abstract":"<div><p>Privacy is a significant issue that requires consideration in all applications. Data collected from various individuals and organizations must be disclosed to the public or private parties for analysis and research purposes. The collected data are studied and analyzed digitally for the extraction of various useful patterns for decision-making research purposes. Privacy-preserving data publishing is significant as privacy violations in the patient’s data may have an adverse effect on the individual positive reputation. An efficient Cluster Based anonymity model has been proposed to anonymizes the 1:1 dataset with a single sensitive attribute through the introduction of a concept named “Semi-sensitive attribute.” Based on correlation, the attributes are categorized as quasi-identifier and semi-sensitive attributes. The k-anonymity is implemented on the quasi-identifier with the semi-sensitive attribute table and Fuzzy c-means clustering has been implemented to fix a range of values for anonymizing the semi-sensitive attributes. The disease is considered a sensitive attribute as the research work focuses on the medical dataset. The proposed model is demonstrated to resist the three privacy attacks such as, i)Identity Disclosure, ii) Attribute Disclosure, and iii) Membership Disclosure. The utility loss is calculated for each row and utility loss of each record are aggregated and considered as the total information loss for each attribute. Cluster Based anonymity model measured the utility loss for all the attributes and the average utility loss for the anonymized patient dataset is 3.78%.</p></div>","PeriodicalId":56010,"journal":{"name":"Egyptian Informatics Journal","volume":null,"pages":null},"PeriodicalIF":5.0000,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1110866524000483/pdfft?md5=8857fa73ae94805ac4758e191a2acbc2&pid=1-s2.0-S1110866524000483-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Egyptian Informatics Journal","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110866524000483","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Privacy is a significant issue that requires consideration in all applications. Data collected from various individuals and organizations must be disclosed to the public or private parties for analysis and research purposes. The collected data are studied and analyzed digitally for the extraction of various useful patterns for decision-making research purposes. Privacy-preserving data publishing is significant as privacy violations in the patient’s data may have an adverse effect on the individual positive reputation. An efficient Cluster Based anonymity model has been proposed to anonymizes the 1:1 dataset with a single sensitive attribute through the introduction of a concept named “Semi-sensitive attribute.” Based on correlation, the attributes are categorized as quasi-identifier and semi-sensitive attributes. The k-anonymity is implemented on the quasi-identifier with the semi-sensitive attribute table and Fuzzy c-means clustering has been implemented to fix a range of values for anonymizing the semi-sensitive attributes. The disease is considered a sensitive attribute as the research work focuses on the medical dataset. The proposed model is demonstrated to resist the three privacy attacks such as, i)Identity Disclosure, ii) Attribute Disclosure, and iii) Membership Disclosure. The utility loss is calculated for each row and utility loss of each record are aggregated and considered as the total information loss for each attribute. Cluster Based anonymity model measured the utility loss for all the attributes and the average utility loss for the anonymized patient dataset is 3.78%.
期刊介绍:
The Egyptian Informatics Journal is published by the Faculty of Computers and Artificial Intelligence, Cairo University. This Journal provides a forum for the state-of-the-art research and development in the fields of computing, including computer sciences, information technologies, information systems, operations research and decision support. Innovative and not-previously-published work in subjects covered by the Journal is encouraged to be submitted, whether from academic, research or commercial sources.