使用位置数据链接跨域用户:理论与验证

Proceedings of the 25th International Conference on World Wide Web Pub Date : 2016-04-11 DOI:10.1145/2872427.2883002

Christopher J. Riederer, Yunsung Kim, A. Chaintreau, Nitish Korula, Silvio Lattanzi

{"title":"使用位置数据链接跨域用户:理论与验证","authors":"Christopher J. Riederer, Yunsung Kim, A. Chaintreau, Nitish Korula, Silvio Lattanzi","doi":"10.1145/2872427.2883002","DOIUrl":null,"url":null,"abstract":"Linking accounts of the same user across datasets -- even when personally identifying information is removed or unavailable -- is an important open problem studied in many contexts. Beyond many practical applications, (such as cross domain analysis, recommendation, and link prediction), understanding this problem more generally informs us on the privacy implications of data disclosure. Previous work has typically addressed this question using either different portions of the same dataset or observing the same behavior across thematically similar domains. In contrast, the general cross-domain case where users have different profiles independently generated from a common but unknown pattern raises new challenges, including difficulties in validation, and remains under-explored. In this paper, we address the reconciliation problem for location-based datasets and introduce a robust method for this general setting. Location datasets are a particularly fruitful domain to study: such records are frequently produced by users in an increasing number of applications and are highly sensitive, especially when linked to other datasets. Our main contribution is a generic and self-tunable algorithm that leverages any pair of sporadic location-based datasets to determine the most likely matching between the users it contains. While making very general assumptions on the patterns of mobile users, we show that the maximum weight matching we compute is provably correct. Although true cross-domain datasets are a rarity, our experimental evaluation uses two entirely new data collections, including one we crawled, on an unprecedented scale. The method we design outperforms naive rules and prior heuristics. As it combines both sparse and dense properties of location-based data and accounts for probabilistic dynamics of observation, it can be shown to be robust even when data gets sparse.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"144","resultStr":"{\"title\":\"Linking Users Across Domains with Location Data: Theory and Validation\",\"authors\":\"Christopher J. Riederer, Yunsung Kim, A. Chaintreau, Nitish Korula, Silvio Lattanzi\",\"doi\":\"10.1145/2872427.2883002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Linking accounts of the same user across datasets -- even when personally identifying information is removed or unavailable -- is an important open problem studied in many contexts. Beyond many practical applications, (such as cross domain analysis, recommendation, and link prediction), understanding this problem more generally informs us on the privacy implications of data disclosure. Previous work has typically addressed this question using either different portions of the same dataset or observing the same behavior across thematically similar domains. In contrast, the general cross-domain case where users have different profiles independently generated from a common but unknown pattern raises new challenges, including difficulties in validation, and remains under-explored. In this paper, we address the reconciliation problem for location-based datasets and introduce a robust method for this general setting. Location datasets are a particularly fruitful domain to study: such records are frequently produced by users in an increasing number of applications and are highly sensitive, especially when linked to other datasets. Our main contribution is a generic and self-tunable algorithm that leverages any pair of sporadic location-based datasets to determine the most likely matching between the users it contains. While making very general assumptions on the patterns of mobile users, we show that the maximum weight matching we compute is provably correct. Although true cross-domain datasets are a rarity, our experimental evaluation uses two entirely new data collections, including one we crawled, on an unprecedented scale. The method we design outperforms naive rules and prior heuristics. As it combines both sparse and dense properties of location-based data and accounts for probabilistic dynamics of observation, it can be shown to be robust even when data gets sparse.\",\"PeriodicalId\":20455,\"journal\":{\"name\":\"Proceedings of the 25th International Conference on World Wide Web\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"144\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 25th International Conference on World Wide Web\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2872427.2883002\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Conference on World Wide Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2872427.2883002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 144

摘要

跨数据集链接同一用户的帐户——即使个人身份信息已被删除或不可用——是在许多情况下研究的一个重要开放性问题。除了许多实际应用(如跨域分析、推荐和链接预测)之外，更普遍地理解这个问题可以让我们了解数据披露对隐私的影响。以前的工作通常使用相同数据集的不同部分或在主题相似的领域中观察相同的行为来解决这个问题。相比之下，一般的跨域情况下，用户有不同的配置文件，这些配置文件是由一个共同的但未知的模式独立生成的，这会带来新的挑战，包括验证方面的困难，并且仍然没有得到充分的研究。在本文中，我们解决了基于位置的数据集的协调问题，并为这种一般设置引入了一种鲁棒方法。位置数据集是一个特别富有成效的研究领域:此类记录经常由用户在越来越多的应用程序中产生，并且高度敏感，特别是当与其他数据集相关联时。我们的主要贡献是一个通用的、自调的算法，它利用任何一对零星的基于位置的数据集来确定它所包含的用户之间最可能的匹配。在对移动用户的模式做出非常一般的假设时，我们证明了我们计算的最大权重匹配是可以证明正确的。虽然真正的跨领域数据集是罕见的，但我们的实验评估使用了两个全新的数据集，其中一个是我们抓取的，规模前所未有。我们设计的方法优于朴素规则和先验启发式。由于它结合了基于位置的数据的稀疏和密集特性，并考虑了观测的概率动态，因此即使数据变得稀疏，它也可以显示出鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Linking Users Across Domains with Location Data: Theory and Validation

Linking accounts of the same user across datasets -- even when personally identifying information is removed or unavailable -- is an important open problem studied in many contexts. Beyond many practical applications, (such as cross domain analysis, recommendation, and link prediction), understanding this problem more generally informs us on the privacy implications of data disclosure. Previous work has typically addressed this question using either different portions of the same dataset or observing the same behavior across thematically similar domains. In contrast, the general cross-domain case where users have different profiles independently generated from a common but unknown pattern raises new challenges, including difficulties in validation, and remains under-explored. In this paper, we address the reconciliation problem for location-based datasets and introduce a robust method for this general setting. Location datasets are a particularly fruitful domain to study: such records are frequently produced by users in an increasing number of applications and are highly sensitive, especially when linked to other datasets. Our main contribution is a generic and self-tunable algorithm that leverages any pair of sporadic location-based datasets to determine the most likely matching between the users it contains. While making very general assumptions on the patterns of mobile users, we show that the maximum weight matching we compute is provably correct. Although true cross-domain datasets are a rarity, our experimental evaluation uses two entirely new data collections, including one we crawled, on an unprecedented scale. The method we design outperforms naive rules and prior heuristics. As it combines both sparse and dense properties of location-based data and accounts for probabilistic dynamics of observation, it can be shown to be robust even when data gets sparse.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 25th International Conference on World Wide Web

自引率

0.00%

发文量