{"title":"多敏感属性横向和纵向划分用户档案数据库的匿名化与分析","authors":"Yuki Ina, Y. Sei, Yasuyuki Tahara, Akihiko Ohsuga","doi":"10.1109/SOLI.2018.8476730","DOIUrl":null,"url":null,"abstract":"Preventing the identification of individuals is important when data analyzers have to guarantee the safety of the data analysis they work with. A method proposed to solve this problem entails altering a part of the data value or deleting it. As to the processes, attributes of the individual data are divided into three groups: identifier (ID), quasi-identifier (QID), and sensitive attribute (SA). ID is the data that identify an individual directly, such as name. QID is the attributes that could identify an individual by combining them, such as age and birthplace. SA is very important information and should not be exposed when the data is identified to an individual. Utilizing these concepts, a safety metric for the data, such as l-diversity, is proposed so far. Under l-diversity, we use the assumption that the SA value is not known for anyone, and we process the data to prevent attackers from identifying. However, there are scenarios in which existing methods cannot protect the data against an invasion of privacy. In an analysis completed by multiple organizations, they integrated their data to carry out the effective data research. Although they can obtain profitable results, the integrated data could include information that attackers use to identify people. Specifically speaking, if the attacker is an institute providing data, they can use their own data’ SA value as a QID value. The assumption of l-diversity is violated, so the existing safety metric loses its effect on protecting data. In this paper, we propose a new anonymization method to conceal organizations’ important data by inserting dummy values, thereby enabling analysts to use the data safely. At the same time, we provide a calculating method to decrease the influence of the noise generated from the dummy insertion. We confirm these methods’ effectiveness by measuring accuracy in a data analysis experiments.","PeriodicalId":424115,"journal":{"name":"2018 IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI)","volume":"219 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Anonymization and Analysis of Horizontally and Vertically Divided User Profile Databases with Multiple Sensitive Attributes\",\"authors\":\"Yuki Ina, Y. Sei, Yasuyuki Tahara, Akihiko Ohsuga\",\"doi\":\"10.1109/SOLI.2018.8476730\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Preventing the identification of individuals is important when data analyzers have to guarantee the safety of the data analysis they work with. A method proposed to solve this problem entails altering a part of the data value or deleting it. As to the processes, attributes of the individual data are divided into three groups: identifier (ID), quasi-identifier (QID), and sensitive attribute (SA). ID is the data that identify an individual directly, such as name. QID is the attributes that could identify an individual by combining them, such as age and birthplace. SA is very important information and should not be exposed when the data is identified to an individual. Utilizing these concepts, a safety metric for the data, such as l-diversity, is proposed so far. Under l-diversity, we use the assumption that the SA value is not known for anyone, and we process the data to prevent attackers from identifying. However, there are scenarios in which existing methods cannot protect the data against an invasion of privacy. In an analysis completed by multiple organizations, they integrated their data to carry out the effective data research. Although they can obtain profitable results, the integrated data could include information that attackers use to identify people. Specifically speaking, if the attacker is an institute providing data, they can use their own data’ SA value as a QID value. The assumption of l-diversity is violated, so the existing safety metric loses its effect on protecting data. In this paper, we propose a new anonymization method to conceal organizations’ important data by inserting dummy values, thereby enabling analysts to use the data safely. At the same time, we provide a calculating method to decrease the influence of the noise generated from the dummy insertion. We confirm these methods’ effectiveness by measuring accuracy in a data analysis experiments.\",\"PeriodicalId\":424115,\"journal\":{\"name\":\"2018 IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI)\",\"volume\":\"219 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SOLI.2018.8476730\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SOLI.2018.8476730","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Anonymization and Analysis of Horizontally and Vertically Divided User Profile Databases with Multiple Sensitive Attributes
Preventing the identification of individuals is important when data analyzers have to guarantee the safety of the data analysis they work with. A method proposed to solve this problem entails altering a part of the data value or deleting it. As to the processes, attributes of the individual data are divided into three groups: identifier (ID), quasi-identifier (QID), and sensitive attribute (SA). ID is the data that identify an individual directly, such as name. QID is the attributes that could identify an individual by combining them, such as age and birthplace. SA is very important information and should not be exposed when the data is identified to an individual. Utilizing these concepts, a safety metric for the data, such as l-diversity, is proposed so far. Under l-diversity, we use the assumption that the SA value is not known for anyone, and we process the data to prevent attackers from identifying. However, there are scenarios in which existing methods cannot protect the data against an invasion of privacy. In an analysis completed by multiple organizations, they integrated their data to carry out the effective data research. Although they can obtain profitable results, the integrated data could include information that attackers use to identify people. Specifically speaking, if the attacker is an institute providing data, they can use their own data’ SA value as a QID value. The assumption of l-diversity is violated, so the existing safety metric loses its effect on protecting data. In this paper, we propose a new anonymization method to conceal organizations’ important data by inserting dummy values, thereby enabling analysts to use the data safely. At the same time, we provide a calculating method to decrease the influence of the noise generated from the dummy insertion. We confirm these methods’ effectiveness by measuring accuracy in a data analysis experiments.