Browsing Unicity: On the Limits of Anonymizing Web Tracking Data

2020 IEEE Symposium on Security and Privacy (SP) Pub Date : 2020-05-01 DOI:10.1109/SP40000.2020.00018

Clemens Deusser, Steffen Passmann, T. Strufe

{"title":"Browsing Unicity: On the Limits of Anonymizing Web Tracking Data","authors":"Clemens Deusser, Steffen Passmann, T. Strufe","doi":"10.1109/SP40000.2020.00018","DOIUrl":null,"url":null,"abstract":"Cross domain tracking has become the rule, rather than the exception, and scripts that collect behavioral data from visitors across sites have become ubiquitous on the Web. The collections form comprehensive profiles of browsing patterns and contain personal, sensitive information. This data can easily be linked back to the tracked individuals, most of whom are likely unaware of this information’s mere existence, let alone its perpetual storage and processing. As public pressure has increased, tracking companies like Google, Facebook, or Baidu now claim to anonymize their datasets, thus limiting or eliminating the possibility of linking it back to data subjects.In cooperation with Europe’s largest audience measurement association we use access to a comprehensive tracking dataset to assess both identifiability and the possibility of convincingly anonymizing browsing data. Our results show that anonymization through generalization does not sufficiently protect anonymity. Reducing unicity of browsing data to negligible levels would necessitate removal of all client and web domain information as well as click timings. In tangible adversary scenarios, supposedly anonymized datasets are highly vulnerable to dataset enrichment and shoulder surfing adversaries, with almost half of all browsing sessions being identified by just two observations. We conclude that while it may be possible to store single coarsened clicks anonymously, any collection of higher complexity will contain large amounts of pseudonymous data.","PeriodicalId":6849,"journal":{"name":"2020 IEEE Symposium on Security and Privacy (SP)","volume":"28 1","pages":"777-790"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Symposium on Security and Privacy (SP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SP40000.2020.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

Cross domain tracking has become the rule, rather than the exception, and scripts that collect behavioral data from visitors across sites have become ubiquitous on the Web. The collections form comprehensive profiles of browsing patterns and contain personal, sensitive information. This data can easily be linked back to the tracked individuals, most of whom are likely unaware of this information’s mere existence, let alone its perpetual storage and processing. As public pressure has increased, tracking companies like Google, Facebook, or Baidu now claim to anonymize their datasets, thus limiting or eliminating the possibility of linking it back to data subjects.In cooperation with Europe’s largest audience measurement association we use access to a comprehensive tracking dataset to assess both identifiability and the possibility of convincingly anonymizing browsing data. Our results show that anonymization through generalization does not sufficiently protect anonymity. Reducing unicity of browsing data to negligible levels would necessitate removal of all client and web domain information as well as click timings. In tangible adversary scenarios, supposedly anonymized datasets are highly vulnerable to dataset enrichment and shoulder surfing adversaries, with almost half of all browsing sessions being identified by just two observations. We conclude that while it may be possible to store single coarsened clicks anonymously, any collection of higher complexity will contain large amounts of pseudonymous data.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

浏览唯一性:论匿名化网络跟踪数据的局限性

跨域跟踪已经成为一种规则，而不是例外，从跨站点的访问者那里收集行为数据的脚本在Web上已经变得无处不在。这些集合形成了浏览模式的综合配置文件，并包含个人敏感信息。这些数据可以很容易地追溯到被跟踪的个人，他们中的大多数人可能都不知道这些信息的存在，更不用说它的永久存储和处理了。随着公众压力的增加，谷歌、Facebook或百度等追踪公司现在声称对其数据集进行匿名化处理，从而限制或消除了将其与数据主体联系起来的可能性。我们与欧洲最大的受众测量协会合作，使用全面的跟踪数据集来评估可识别性和令人信服的匿名浏览数据的可能性。我们的研究结果表明，通过泛化进行匿名化并不能充分保护匿名性。将浏览数据的唯一性降低到可以忽略不计的水平将需要删除所有客户端和web域信息以及点击时间。在有形的对手场景中，所谓的匿名数据集非常容易受到数据集丰富和肩部冲浪对手的攻击，几乎一半的浏览会话仅通过两次观察就被识别出来。我们的结论是，虽然匿名存储单个粗化点击是可能的，但任何更高复杂性的集合都将包含大量的假名数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 IEEE Symposium on Security and Privacy (SP)

自引率

0.00%

发文量

期刊最新文献

Unexpected Data Dependency Creation and Chaining: A New Attack to SDN TextExerciser: Feedback-driven Text Input Exercising for Android Applications Ijon: Exploring Deep State Spaces via Fuzzing Efficient and Secure Multiparty Computation from Fixed-Key Block Ciphers EverCrypt: A Fast, Verified, Cross-Platform Cryptographic Provider