Browsing Unicity: On the Limits of Anonymizing Web Tracking Data

Clemens Deusser, Steffen Passmann, T. Strufe
{"title":"Browsing Unicity: On the Limits of Anonymizing Web Tracking Data","authors":"Clemens Deusser, Steffen Passmann, T. Strufe","doi":"10.1109/SP40000.2020.00018","DOIUrl":null,"url":null,"abstract":"Cross domain tracking has become the rule, rather than the exception, and scripts that collect behavioral data from visitors across sites have become ubiquitous on the Web. The collections form comprehensive profiles of browsing patterns and contain personal, sensitive information. This data can easily be linked back to the tracked individuals, most of whom are likely unaware of this information’s mere existence, let alone its perpetual storage and processing. As public pressure has increased, tracking companies like Google, Facebook, or Baidu now claim to anonymize their datasets, thus limiting or eliminating the possibility of linking it back to data subjects.In cooperation with Europe’s largest audience measurement association we use access to a comprehensive tracking dataset to assess both identifiability and the possibility of convincingly anonymizing browsing data. Our results show that anonymization through generalization does not sufficiently protect anonymity. Reducing unicity of browsing data to negligible levels would necessitate removal of all client and web domain information as well as click timings. In tangible adversary scenarios, supposedly anonymized datasets are highly vulnerable to dataset enrichment and shoulder surfing adversaries, with almost half of all browsing sessions being identified by just two observations. We conclude that while it may be possible to store single coarsened clicks anonymously, any collection of higher complexity will contain large amounts of pseudonymous data.","PeriodicalId":6849,"journal":{"name":"2020 IEEE Symposium on Security and Privacy (SP)","volume":"28 1","pages":"777-790"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Symposium on Security and Privacy (SP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SP40000.2020.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

Cross domain tracking has become the rule, rather than the exception, and scripts that collect behavioral data from visitors across sites have become ubiquitous on the Web. The collections form comprehensive profiles of browsing patterns and contain personal, sensitive information. This data can easily be linked back to the tracked individuals, most of whom are likely unaware of this information’s mere existence, let alone its perpetual storage and processing. As public pressure has increased, tracking companies like Google, Facebook, or Baidu now claim to anonymize their datasets, thus limiting or eliminating the possibility of linking it back to data subjects.In cooperation with Europe’s largest audience measurement association we use access to a comprehensive tracking dataset to assess both identifiability and the possibility of convincingly anonymizing browsing data. Our results show that anonymization through generalization does not sufficiently protect anonymity. Reducing unicity of browsing data to negligible levels would necessitate removal of all client and web domain information as well as click timings. In tangible adversary scenarios, supposedly anonymized datasets are highly vulnerable to dataset enrichment and shoulder surfing adversaries, with almost half of all browsing sessions being identified by just two observations. We conclude that while it may be possible to store single coarsened clicks anonymously, any collection of higher complexity will contain large amounts of pseudonymous data.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
浏览唯一性:论匿名化网络跟踪数据的局限性
跨域跟踪已经成为一种规则,而不是例外,从跨站点的访问者那里收集行为数据的脚本在Web上已经变得无处不在。这些集合形成了浏览模式的综合配置文件,并包含个人敏感信息。这些数据可以很容易地追溯到被跟踪的个人,他们中的大多数人可能都不知道这些信息的存在,更不用说它的永久存储和处理了。随着公众压力的增加,谷歌、Facebook或百度等追踪公司现在声称对其数据集进行匿名化处理,从而限制或消除了将其与数据主体联系起来的可能性。我们与欧洲最大的受众测量协会合作,使用全面的跟踪数据集来评估可识别性和令人信服的匿名浏览数据的可能性。我们的研究结果表明,通过泛化进行匿名化并不能充分保护匿名性。将浏览数据的唯一性降低到可以忽略不计的水平将需要删除所有客户端和web域信息以及点击时间。在有形的对手场景中,所谓的匿名数据集非常容易受到数据集丰富和肩部冲浪对手的攻击,几乎一半的浏览会话仅通过两次观察就被识别出来。我们的结论是,虽然匿名存储单个粗化点击是可能的,但任何更高复杂性的集合都将包含大量的假名数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Unexpected Data Dependency Creation and Chaining: A New Attack to SDN TextExerciser: Feedback-driven Text Input Exercising for Android Applications Ijon: Exploring Deep State Spaces via Fuzzing Efficient and Secure Multiparty Computation from Fixed-Key Block Ciphers EverCrypt: A Fast, Verified, Cross-Platform Cryptographic Provider
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1