Aaron Cahn, Scott Alfeld, P. Barford, S. Muthukrishnan
{"title":"An Empirical Study of Web Cookies","authors":"Aaron Cahn, Scott Alfeld, P. Barford, S. Muthukrishnan","doi":"10.1145/2872427.2882991","DOIUrl":null,"url":null,"abstract":"Web cookies are used widely by publishers and 3rd parties to track users and their behaviors. Despite the ubiquitous use of cookies, there is little prior work on their characteristics such as standard attributes, placement policies, and the knowledge that can be amassed via 3rd party cookies. In this paper, we present an empirical study of web cookie characteristics, placement practices and information transmission. To conduct this study, we implemented a lightweight web crawler that tracks and stores the cookies as it navigates to websites. We use this crawler to collect over 3.2M cookies from the two crawls, separated by 18 months, of the top 100K Alexa web sites. We report on the general cookie characteristics and add context via a cookie category index and website genre labels. We consider privacy implications by examining specific cookie attributes and placement behavior of 3rd party cookies. We find that 3rd party cookies outnumber 1st party cookies by a factor of two, and we illuminate the connection between domain genres and cookie attributes. We find that less than 1% of the entities that place cookies can aggregate information across 75% of web sites. Finally, we consider the issue of information transmission and aggregation by domains via 3rd party cookies. We develop a mathematical framework to quantify user information leakage for a broad class of users, and present findings using real world domains. In particular, we demonstrate the interplay between a domain's footprint across the Internet and the browsing behavior of users, which has significant impact on information transmission.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"86","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Conference on World Wide Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2872427.2882991","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 86
Abstract
Web cookies are used widely by publishers and 3rd parties to track users and their behaviors. Despite the ubiquitous use of cookies, there is little prior work on their characteristics such as standard attributes, placement policies, and the knowledge that can be amassed via 3rd party cookies. In this paper, we present an empirical study of web cookie characteristics, placement practices and information transmission. To conduct this study, we implemented a lightweight web crawler that tracks and stores the cookies as it navigates to websites. We use this crawler to collect over 3.2M cookies from the two crawls, separated by 18 months, of the top 100K Alexa web sites. We report on the general cookie characteristics and add context via a cookie category index and website genre labels. We consider privacy implications by examining specific cookie attributes and placement behavior of 3rd party cookies. We find that 3rd party cookies outnumber 1st party cookies by a factor of two, and we illuminate the connection between domain genres and cookie attributes. We find that less than 1% of the entities that place cookies can aggregate information across 75% of web sites. Finally, we consider the issue of information transmission and aggregation by domains via 3rd party cookies. We develop a mathematical framework to quantify user information leakage for a broad class of users, and present findings using real world domains. In particular, we demonstrate the interplay between a domain's footprint across the Internet and the browsing behavior of users, which has significant impact on information transmission.