基于差分隐私的序列模式挖掘的两阶段算法

Proceedings of the 22nd ACM international conference on Information & Knowledge Management Pub Date : 2013-10-27 DOI:10.1145/2505515.2505553

Luca Bonomi, Li Xiong

{"title":"基于差分隐私的序列模式挖掘的两阶段算法","authors":"Luca Bonomi, Li Xiong","doi":"10.1145/2505515.2505553","DOIUrl":null,"url":null,"abstract":"Frequent sequential pattern mining is a central task in many fields such as biology and finance. However, release of these patterns is raising increasing concerns on individual privacy. In this paper, we study the sequential pattern mining problem under the differential privacy framework which provides formal and provable guarantees of privacy. Due to the nature of the differential privacy mechanism which perturbs the frequency results with noise, and the high dimensionality of the pattern space, this mining problem is particularly challenging. In this work, we propose a novel two-phase algorithm for mining both prefixes and substring patterns. In the first phase, our approach takes advantage of the statistical properties of the data to construct a model-based prefix tree which is used to mine prefixes and a candidate set of substring patterns. The frequency of the substring patterns is further refined in the successive phase where we employ a novel transformation of the original data to reduce the perturbation noise. Extensive experiment results using real datasets showed that our approach is effective for mining both substring and prefix patterns in comparison to the state-of-the-art solutions.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"13 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"87","resultStr":"{\"title\":\"A two-phase algorithm for mining sequential patterns with differential privacy\",\"authors\":\"Luca Bonomi, Li Xiong\",\"doi\":\"10.1145/2505515.2505553\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Frequent sequential pattern mining is a central task in many fields such as biology and finance. However, release of these patterns is raising increasing concerns on individual privacy. In this paper, we study the sequential pattern mining problem under the differential privacy framework which provides formal and provable guarantees of privacy. Due to the nature of the differential privacy mechanism which perturbs the frequency results with noise, and the high dimensionality of the pattern space, this mining problem is particularly challenging. In this work, we propose a novel two-phase algorithm for mining both prefixes and substring patterns. In the first phase, our approach takes advantage of the statistical properties of the data to construct a model-based prefix tree which is used to mine prefixes and a candidate set of substring patterns. The frequency of the substring patterns is further refined in the successive phase where we employ a novel transformation of the original data to reduce the perturbation noise. Extensive experiment results using real datasets showed that our approach is effective for mining both substring and prefix patterns in comparison to the state-of-the-art solutions.\",\"PeriodicalId\":20528,\"journal\":{\"name\":\"Proceedings of the 22nd ACM international conference on Information & Knowledge Management\",\"volume\":\"13 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"87\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 22nd ACM international conference on Information & Knowledge Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2505515.2505553\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2505515.2505553","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 87

摘要

频繁的序列模式挖掘是生物和金融等许多领域的核心任务。然而，这些模式的发布引起了人们对个人隐私的日益关注。本文研究了差分隐私框架下的顺序模式挖掘问题，差分隐私框架提供了形式化的、可证明的隐私保证。由于差分隐私机制的特性会使频率结果受到噪声的干扰，以及模式空间的高维性，使得这种挖掘问题特别具有挑战性。在这项工作中，我们提出了一种新的两阶段算法来挖掘前缀和子字符串模式。在第一阶段，我们的方法利用数据的统计属性来构建一个基于模型的前缀树，该树用于挖掘前缀和候选子字符串模式集。子串模式的频率在连续阶段进一步细化，我们采用了原始数据的新变换来降低扰动噪声。使用真实数据集的大量实验结果表明，与最先进的解决方案相比，我们的方法在挖掘子字符串和前缀模式方面都是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A two-phase algorithm for mining sequential patterns with differential privacy

Frequent sequential pattern mining is a central task in many fields such as biology and finance. However, release of these patterns is raising increasing concerns on individual privacy. In this paper, we study the sequential pattern mining problem under the differential privacy framework which provides formal and provable guarantees of privacy. Due to the nature of the differential privacy mechanism which perturbs the frequency results with noise, and the high dimensionality of the pattern space, this mining problem is particularly challenging. In this work, we propose a novel two-phase algorithm for mining both prefixes and substring patterns. In the first phase, our approach takes advantage of the statistical properties of the data to construct a model-based prefix tree which is used to mine prefixes and a candidate set of substring patterns. The frequency of the substring patterns is further refined in the successive phase where we employ a novel transformation of the original data to reduce the perturbation noise. Extensive experiment results using real datasets showed that our approach is effective for mining both substring and prefix patterns in comparison to the state-of-the-art solutions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 22nd ACM international conference on Information & Knowledge Management

自引率

0.00%

发文量

期刊最新文献

Exploring XML data is as easy as using maps Mining-based compression approach of propositional formulae Flexible and dynamic compromises for effective recommendations Efficient parsing-based search over structured data Recommendation via user's personality and social contextual