{"title":"大型调查评级数据的识别匿名化研究","authors":"Xiaoxun Sun, Hua Wang","doi":"10.1109/NSS.2010.11","DOIUrl":null,"url":null,"abstract":"We study the challenge of identity protection in the large public survey rating data. Even though the survey participants do not reveal any of their ratings, their survey records are potentially identifiable by using information from other public sources. None of the existing anonymisation principles (e.g., $k$-anonymity, $l$-diversity, etc.) can effectively prevent such breaches in large survey rating data sets. In this paper, we tackle the problem by defining the $ (k, \\epsilon)$-anonymity principle. The principle requires for each transaction $t$ in the given survey rating data $T$, at least $ (k-1)$ other transactions in $T$ must have ratings similar with $t$, where the similarity is controlled by $\\epsilon$. We propose a greedy approach to anonymize survey rating data and apply the method to two real-life data sets to demonstrate their efficiency and practical utility.","PeriodicalId":127173,"journal":{"name":"2010 Fourth International Conference on Network and System Security","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Towards Identify Anonymization in Large Survey Rating Data\",\"authors\":\"Xiaoxun Sun, Hua Wang\",\"doi\":\"10.1109/NSS.2010.11\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study the challenge of identity protection in the large public survey rating data. Even though the survey participants do not reveal any of their ratings, their survey records are potentially identifiable by using information from other public sources. None of the existing anonymisation principles (e.g., $k$-anonymity, $l$-diversity, etc.) can effectively prevent such breaches in large survey rating data sets. In this paper, we tackle the problem by defining the $ (k, \\\\epsilon)$-anonymity principle. The principle requires for each transaction $t$ in the given survey rating data $T$, at least $ (k-1)$ other transactions in $T$ must have ratings similar with $t$, where the similarity is controlled by $\\\\epsilon$. We propose a greedy approach to anonymize survey rating data and apply the method to two real-life data sets to demonstrate their efficiency and practical utility.\",\"PeriodicalId\":127173,\"journal\":{\"name\":\"2010 Fourth International Conference on Network and System Security\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 Fourth International Conference on Network and System Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NSS.2010.11\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Fourth International Conference on Network and System Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NSS.2010.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Towards Identify Anonymization in Large Survey Rating Data
We study the challenge of identity protection in the large public survey rating data. Even though the survey participants do not reveal any of their ratings, their survey records are potentially identifiable by using information from other public sources. None of the existing anonymisation principles (e.g., $k$-anonymity, $l$-diversity, etc.) can effectively prevent such breaches in large survey rating data sets. In this paper, we tackle the problem by defining the $ (k, \epsilon)$-anonymity principle. The principle requires for each transaction $t$ in the given survey rating data $T$, at least $ (k-1)$ other transactions in $T$ must have ratings similar with $t$, where the similarity is controlled by $\epsilon$. We propose a greedy approach to anonymize survey rating data and apply the method to two real-life data sets to demonstrate their efficiency and practical utility.