Zachary Weinberg, Shinyoung Cho, Nicolas Christin, V. Sekar, Phillipa Gill
Internet users worldwide rely on commercial network proxies both to conceal their true location and identity, and to control their apparent location. Their reasons range from mundane to security-critical. Proxy operators offer no proof that their advertised server locations are accurate. IP-to-location databases tend to agree with the advertised locations, but there have been many reports of serious errors in such databases. In this study we estimate the locations of 2269 proxy servers from ping-time measurements to hosts in known locations, combined with AS and network information. These servers are operated by seven proxy services, and, according to the operators, spread over 222 countries and territories. Our measurements show that one-third of them are definitely not located in the advertised countries, and another third might not be. Instead, they are concentrated in countries where server hosting is cheap and reliable (e.g. Czech Republic, Germany, Netherlands, UK, USA). In the process, we address a number of technical challenges with applying active geolocation to proxy servers, which may not be directly pingable, and may restrict the types of packets that can be sent through them, e.g. forbidding traceroute. We also test three geolocation algorithms from previous literature, plus two variations of our own design, at the scale of the whole world.
{"title":"How to Catch when Proxies Lie: Verifying the Physical Locations of Network Proxies with Active Geolocation","authors":"Zachary Weinberg, Shinyoung Cho, Nicolas Christin, V. Sekar, Phillipa Gill","doi":"10.1145/3278532.3278551","DOIUrl":"https://doi.org/10.1145/3278532.3278551","url":null,"abstract":"Internet users worldwide rely on commercial network proxies both to conceal their true location and identity, and to control their apparent location. Their reasons range from mundane to security-critical. Proxy operators offer no proof that their advertised server locations are accurate. IP-to-location databases tend to agree with the advertised locations, but there have been many reports of serious errors in such databases. In this study we estimate the locations of 2269 proxy servers from ping-time measurements to hosts in known locations, combined with AS and network information. These servers are operated by seven proxy services, and, according to the operators, spread over 222 countries and territories. Our measurements show that one-third of them are definitely not located in the advertised countries, and another third might not be. Instead, they are concentrated in countries where server hosting is cheap and reliable (e.g. Czech Republic, Germany, Netherlands, UK, USA). In the process, we address a number of technical challenges with applying active geolocation to proxy servers, which may not be directly pingable, and may restrict the types of packets that can be sent through them, e.g. forbidding traceroute. We also test three geolocation algorithms from previous literature, plus two variations of our own design, at the scale of the whole world.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"171 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73077423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Harini Kolamunna, Ilias Leontiadis, Diego Perino, Suranga Seneviratne, Kanchana Thilakarathna, A. Seneviratne
Recent advances are driving wearables towards stand-alone devices with cellular network support (e.g. SIM-enabled Apple Watch series-3). Nonetheless, a little has been studied on SIM-enabled wearable traffic in ISP networks to gain customer insights and to understand traffic characteristics. In this paper, we characterize the network traffic of several thousand SIM-enabled wearable users in a large European mobile ISP. We present insights on user behavior, application characteristics such as popularity and usage, and wearable traffic patterns. We observed a 9% increase in SIM-enabled wearable users over a five month observation period. However, only 34% of such users actually generate any network transaction. Our analysis also indicates that SIM-enabled wearable users are significantly more active in terms of mobility, data consumption and frequency of app usage compared to the remaining customers of the ISP who are mostly equipped with a smartphone. Finally, wearable apps directly communicate with third parties such as advertisement and analytics networks similarly to smartphone apps.
{"title":"A First Look at SIM-Enabled Wearables in the Wild","authors":"Harini Kolamunna, Ilias Leontiadis, Diego Perino, Suranga Seneviratne, Kanchana Thilakarathna, A. Seneviratne","doi":"10.1145/3278532.3278540","DOIUrl":"https://doi.org/10.1145/3278532.3278540","url":null,"abstract":"Recent advances are driving wearables towards stand-alone devices with cellular network support (e.g. SIM-enabled Apple Watch series-3). Nonetheless, a little has been studied on SIM-enabled wearable traffic in ISP networks to gain customer insights and to understand traffic characteristics. In this paper, we characterize the network traffic of several thousand SIM-enabled wearable users in a large European mobile ISP. We present insights on user behavior, application characteristics such as popularity and usage, and wearable traffic patterns. We observed a 9% increase in SIM-enabled wearable users over a five month observation period. However, only 34% of such users actually generate any network transaction. Our analysis also indicates that SIM-enabled wearable users are significantly more active in terms of mobility, data consumption and frequency of app usage compared to the remaining customers of the ISP who are mostly equipped with a smartphone. Finally, wearable apps directly communicate with third parties such as advertisement and analytics networks similarly to smartphone apps.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"165 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78852756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BGP blackholing is an operational countermeasure that builds upon the capabilities of BGP to achieve DoS mitigation. Although empirical evidence of blackholing activities are documented in literature, a clear understanding of how blackholing is used in practice when attacks occur is still missing. This paper presents a first joint look at DoS attacks and BGP blackholing in the wild. We do this on the basis of two complementary data sets of DoS attacks, inferred from a large network telescope and DoS honeypots, and on a data set of blackholing events. All data sets span a period of three years, thus providing a longitudinal overview of operational deployment of blackholing during DoS attacks.
{"title":"A First Joint Look at DoS Attacks and BGP Blackholing in the Wild","authors":"M. Jonker, A. Pras, A. Dainotti, A. Sperotto","doi":"10.1145/3278532.3278571","DOIUrl":"https://doi.org/10.1145/3278532.3278571","url":null,"abstract":"BGP blackholing is an operational countermeasure that builds upon the capabilities of BGP to achieve DoS mitigation. Although empirical evidence of blackholing activities are documented in literature, a clear understanding of how blackholing is used in practice when attacks occur is still missing. This paper presents a first joint look at DoS attacks and BGP blackholing in the wild. We do this on the basis of two complementary data sets of DoS attacks, inferred from a large network telescope and DoS honeypots, and on a data set of blackholing events. All data sets span a period of three years, thus providing a longitudinal overview of operational deployment of blackholing during DoS attacks.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74598355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Allison McDonald, Matthew Bernhard, Luke Valenta, Benjamin VanderSloot, W. Scott, N. Sullivan, J. A. Halderman, Roya Ensafi
We report the first wide-scale measurement study of server-side geographic restriction, or geoblocking, a phenomenon in which server operators intentionally deny access to users from particular countries or regions. Many sites practice geoblocking due to legal requirements or other business reasons, but excessive blocking can needlessly deny valuable content and services to entire national populations. To help researchers and policymakers understand this phenomenon, we develop a semi-automated system to detect instances where whole websites were rendered inaccessible due to geoblocking. By focusing on detecting geoblocking capabilities offered by large CDNs and cloud providers, we can reliably distinguish the practice from dynamic anti-abuse mechanisms and network-based censorship. We apply our techniques to test for geoblocking across the Alexa Top 10K sites from thousands of vantage points in 177 countries. We then expand our measurement to a sample of CDN customers in the Alexa Top 1M. We find that geoblocking occurs across a broad set of countries and sites. We observe geoblocking in nearly all countries we study, with Iran, Syria, Sudan, Cuba, and Russia experiencing the highest rates. These countries experience particularly high rates of geoblocking for finance and banking sites, likely as a result of U.S. economic sanctions. We also verify our measurements with data provided by Cloudflare, and find our observations to be accurate.
{"title":"403 Forbidden: A Global View of CDN Geoblocking","authors":"Allison McDonald, Matthew Bernhard, Luke Valenta, Benjamin VanderSloot, W. Scott, N. Sullivan, J. A. Halderman, Roya Ensafi","doi":"10.1145/3278532.3278552","DOIUrl":"https://doi.org/10.1145/3278532.3278552","url":null,"abstract":"We report the first wide-scale measurement study of server-side geographic restriction, or geoblocking, a phenomenon in which server operators intentionally deny access to users from particular countries or regions. Many sites practice geoblocking due to legal requirements or other business reasons, but excessive blocking can needlessly deny valuable content and services to entire national populations. To help researchers and policymakers understand this phenomenon, we develop a semi-automated system to detect instances where whole websites were rendered inaccessible due to geoblocking. By focusing on detecting geoblocking capabilities offered by large CDNs and cloud providers, we can reliably distinguish the practice from dynamic anti-abuse mechanisms and network-based censorship. We apply our techniques to test for geoblocking across the Alexa Top 10K sites from thousands of vantage points in 177 countries. We then expand our measurement to a sample of CDN customers in the Alexa Top 1M. We find that geoblocking occurs across a broad set of countries and sites. We observe geoblocking in nearly all countries we study, with Iran, Syria, Sudan, Cuba, and Russia experiencing the highest rates. These countries experience particularly high rates of geoblocking for finance and banking sites, likely as a result of U.S. economic sanctions. We also verify our measurements with data provided by Cloudflare, and find our observations to be accurate.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73656109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Bashir, Sajjad Arshad, E. Kirda, William K. Robertson, Christo Wilson
In this study of 100,000 websites, we document how Advertising and Analytics (A&A) companies have used WebSockets to bypass ad blocking, exfiltrate user tracking data, and deliver advertisements. Specifically, our measurements investigate how a long-standing bug in Chrome's (the world's most popular browser) chrome.webRequest API prevented blocking extensions from being able to interpose on WebSocket connections. We conducted large-scale crawls of top publishers before and after this bug was patched in April 2017 to examine which A&A companies were using WebSockets, what information was being transferred, and whether companies altered their behavior after the patch. We find that a small but persistent group of A&A companies use WebSockets, and that several of them engaged in troubling behavior, such as browser fingerprinting, exfiltrating the DOM, and serving advertisements, that would have circumvented blocking due to the Chrome bug.
{"title":"How Tracking Companies Circumvented Ad Blockers Using WebSockets","authors":"M. Bashir, Sajjad Arshad, E. Kirda, William K. Robertson, Christo Wilson","doi":"10.1145/3278532.3278573","DOIUrl":"https://doi.org/10.1145/3278532.3278573","url":null,"abstract":"In this study of 100,000 websites, we document how Advertising and Analytics (A&A) companies have used WebSockets to bypass ad blocking, exfiltrate user tracking data, and deliver advertisements. Specifically, our measurements investigate how a long-standing bug in Chrome's (the world's most popular browser) chrome.webRequest API prevented blocking extensions from being able to interpose on WebSocket connections. We conducted large-scale crawls of top publishers before and after this bug was patched in April 2017 to examine which A&A companies were using WebSockets, what information was being transferred, and whether companies altered their behavior after the patch. We find that a small but persistent group of A&A companies use WebSockets, and that several of them engaged in troubling behavior, such as browser fingerprinting, exfiltrating the DOM, and serving advertisements, that would have circumvented blocking due to the Chrome bug.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80282889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A tracking flow is a flow between an end user and a Web tracking service. We develop an extensive measurement methodology for quantifying at scale the amount of tracking flows that cross data protection borders, be it national or international, such as the EU28 border within which the General Data Protection Regulation (GDPR) applies. Our methodology uses a browser extension to fully render advertising and tracking code, various lists and heuristics to extract well known trackers, passive DNS replication to get all the IP ranges of trackers, and state-of-the art geolocation. We employ our methodology on a dataset from 350 real users of the browser extension over a period of more than four months, and then generalize our results by analyzing billions of web tracking flows from more than 60 million broadband and mobile users from 4 large European ISPs. We show that the majority of tracking flows cross national borders in Europe but, unlike popular belief, are pretty well confined within the larger GDPR jurisdiction. Simple DNS redirection and PoP mirroring can increase national confinement while sealing almost all tracking flows within Europe. Last, we show that cross boarder tracking is prevalent even in sensitive and hence protected data categories and groups including health, sexual orientation, minors, and others.
{"title":"Tracing Cross Border Web Tracking","authors":"Costas Iordanou, Georgios Smaragdakis, Ingmar Poese, Nikolaos Laoutaris","doi":"10.1145/3278532.3278561","DOIUrl":"https://doi.org/10.1145/3278532.3278561","url":null,"abstract":"A tracking flow is a flow between an end user and a Web tracking service. We develop an extensive measurement methodology for quantifying at scale the amount of tracking flows that cross data protection borders, be it national or international, such as the EU28 border within which the General Data Protection Regulation (GDPR) applies. Our methodology uses a browser extension to fully render advertising and tracking code, various lists and heuristics to extract well known trackers, passive DNS replication to get all the IP ranges of trackers, and state-of-the art geolocation. We employ our methodology on a dataset from 350 real users of the browser extension over a period of more than four months, and then generalize our results by analyzing billions of web tracking flows from more than 60 million broadband and mobile users from 4 large European ISPs. We show that the majority of tracking flows cross national borders in Europe but, unlike popular belief, are pretty well confined within the larger GDPR jurisdiction. Simple DNS redirection and PoP mirroring can increase national confinement while sealing almost all tracking flows within Europe. Last, we show that cross boarder tracking is prevalent even in sensitive and hence protected data categories and groups including health, sexual orientation, minors, and others.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87583280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Domain Name System (DNS) maps human-friendly names into the network addresses necessary for network communication. Therefore, the robustness of the DNS is crucial to the general operation of the Internet. As such, the DNS protocol and architecture were designed to facilitate structural robustness within system. For instance, a domain can depend on authoritative nameservers in several topologically disparate datacenters to aid robustness. However, the actual operation of the system need not utilize these robustness tools. In this paper we provide an initial analysis of the structural robustness of the DNS ecosystem over the last nine years.
DNS (Domain Name System)将人性化的名称映射为网络通信所需的网络地址。因此,DNS的健壮性对互联网的正常运行至关重要。因此,设计DNS协议和体系结构是为了促进系统内部的结构健壮性。例如,一个域可以依赖于几个拓扑上完全不同的数据中心中的权威名称服务器来帮助实现健壮性。然而,系统的实际运行并不需要利用这些鲁棒性工具。在本文中,我们对过去九年DNS生态系统的结构稳健性进行了初步分析。
{"title":"Comments on DNS Robustness","authors":"M. Allman","doi":"10.1145/3278532.3278541","DOIUrl":"https://doi.org/10.1145/3278532.3278541","url":null,"abstract":"The Domain Name System (DNS) maps human-friendly names into the network addresses necessary for network communication. Therefore, the robustness of the DNS is crucial to the general operation of the Internet. As such, the DNS protocol and architecture were designed to facilitate structural robustness within system. For instance, a domain can depend on authoritative nameservers in several topologically disparate datacenters to aid robustness. However, the actual operation of the system need not utilize these robustness tools. In this paper we provide an initial analysis of the structural robustness of the DNS ecosystem over the last nine years.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83650027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Platon Kotzias, Abbas Razaghpanah, J. Amann, K. Paterson, N. Vallina-Rodriguez, Juan Caballero
The Transport Layer Security (TLS) protocol is the de-facto standard for encrypted communication on the Internet. However, it has been plagued by a number of different attacks and security issues over the last years. Addressing these attacks requires changes to the protocol, to server- or client-software, or to all of them. In this paper we conduct the first large-scale longitudinal study examining the evolution of the TLS ecosystem over the last six years. We place a special focus on the ecosystem's evolution in response to high-profile attacks. For our analysis, we use a passive measurement dataset with more than 319.3B connections since February 2012, and an active dataset that contains TLS and SSL scans of the entire IPv4 address space since August 2015. To identify the evolution of specific clients we also create the---to our knowledge---largest TLS client fingerprint database to date, consisting of 1,684 fingerprints. We observe that the ecosystem has shifted significantly since 2012, with major changes in which cipher suites and TLS extensions are offered by clients and accepted by servers having taken place. Where possible, we correlate these with the timing of specific attacks on TLS. At the same time, our results show that while clients, especially browsers, are quick to adopt new algorithms, they are also slow to drop support for older ones. We also encounter significant amounts of client software that probably unwittingly offer unsafe ciphers. We discuss these findings in the context of long tail effects in the TLS ecosystem.
{"title":"Coming of Age: A Longitudinal Study of TLS Deployment","authors":"Platon Kotzias, Abbas Razaghpanah, J. Amann, K. Paterson, N. Vallina-Rodriguez, Juan Caballero","doi":"10.1145/3278532.3278568","DOIUrl":"https://doi.org/10.1145/3278532.3278568","url":null,"abstract":"The Transport Layer Security (TLS) protocol is the de-facto standard for encrypted communication on the Internet. However, it has been plagued by a number of different attacks and security issues over the last years. Addressing these attacks requires changes to the protocol, to server- or client-software, or to all of them. In this paper we conduct the first large-scale longitudinal study examining the evolution of the TLS ecosystem over the last six years. We place a special focus on the ecosystem's evolution in response to high-profile attacks. For our analysis, we use a passive measurement dataset with more than 319.3B connections since February 2012, and an active dataset that contains TLS and SSL scans of the entire IPv4 address space since August 2015. To identify the evolution of specific clients we also create the---to our knowledge---largest TLS client fingerprint database to date, consisting of 1,684 fingerprints. We observe that the ecosystem has shifted significantly since 2012, with major changes in which cipher suites and TLS extensions are offered by clients and accepted by servers having taken place. Where possible, we correlate these with the timing of specific attacks on TLS. At the same time, our results show that while clients, especially browsers, are quick to adopt new algorithms, they are also slow to drop support for older ones. We also encounter significant amounts of client software that probably unwittingly offer unsafe ciphers. We discuss these findings in the context of long tail effects in the TLS ecosystem.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"151 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86645731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DNS has evolved over the last 20 years, improving in security and privacy and broadening the kinds of applications it supports. However, this evolution has been slowed by the large installed base and the wide range of implementations. The impact of changes is difficult to model due to complex interactions between DNS optimizations, caching, and distributed operation. We suggest that experimentation at scale is needed to evaluate changes and facilitate DNS evolution. This paper presents LDplayer, a configurable, general-purpose DNS experimental framework that enables DNS experiments to scale in several dimensions: many zones, multiple levels of DNS hierarchy, high query rates, and diverse query sources. LDplayer provides high fidelity experiments while meeting these requirements through its distributed DNS query replay system, methods to rebuild the relevant DNS hierarchy from traces, and efficient emulation of this hierarchy on minimal hardware. We show that a single DNS server can correctly emulate multiple independent levels of the DNS hierarchy while providing correct responses as if they were independent. We validate that our system can replay a DNS root traffic with tiny error (± 8 ms quartiles in query timing and ± 0.1% difference in query rate). We show that our system can replay queries at 87k queries/s while using only one CPU, more than twice of a normal DNS Root traffic rate. LDplayer's trace replay has the unique ability to evaluate important design questions with confidence that we capture the interplay of caching, timeouts, and resource constraints. As an example, we demonstrate the memory requirements of a DNS root server with all traffic running over TCP and TLS, and identify performance discontinuities in latency as a function of client RTT.
{"title":"LDplayer","authors":"Liang Zhu, J. Heidemann","doi":"10.1145/3278532.3278544","DOIUrl":"https://doi.org/10.1145/3278532.3278544","url":null,"abstract":"DNS has evolved over the last 20 years, improving in security and privacy and broadening the kinds of applications it supports. However, this evolution has been slowed by the large installed base and the wide range of implementations. The impact of changes is difficult to model due to complex interactions between DNS optimizations, caching, and distributed operation. We suggest that experimentation at scale is needed to evaluate changes and facilitate DNS evolution. This paper presents LDplayer, a configurable, general-purpose DNS experimental framework that enables DNS experiments to scale in several dimensions: many zones, multiple levels of DNS hierarchy, high query rates, and diverse query sources. LDplayer provides high fidelity experiments while meeting these requirements through its distributed DNS query replay system, methods to rebuild the relevant DNS hierarchy from traces, and efficient emulation of this hierarchy on minimal hardware. We show that a single DNS server can correctly emulate multiple independent levels of the DNS hierarchy while providing correct responses as if they were independent. We validate that our system can replay a DNS root traffic with tiny error (± 8 ms quartiles in query timing and ± 0.1% difference in query rate). We show that our system can replay queries at 87k queries/s while using only one CPU, more than twice of a normal DNS Root traffic rate. LDplayer's trace replay has the unique ability to evaluate important design questions with confidence that we capture the interplay of caching, timeouts, and resource constraints. As an example, we demonstrate the memory requirements of a DNS root server with all traffic running over TCP and TLS, and identify performance discontinuities in latency as a function of client RTT.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78147177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Justin Meza, Tianyin Xu, K. Veeraraghavan, O. Mutlu
The ability to tolerate, remediate, and recover from network incidents (caused by device failures and fiber cuts, for example) is critical for building and operating highly-available web services. Achieving fault tolerance and failure preparedness requires system architects, software developers, and site operators to have a deep understanding of network reliability at scale, along with its implications on the software systems that run in data centers. Unfortunately, little has been reported on the reliability characteristics of large scale data center network infrastructure, let alone its impact on the availability of services powered by software running on that network infrastructure. This paper fills the gap by presenting a large scale, longitudinal study of data center network reliability based on operational data collected from the production network infrastructure at Facebook, one of the largest web service providers in the world. Our study covers reliability characteristics of both intra and inter data center networks. For intra data center networks, we study seven years of operation data comprising thousands of network incidents across two different data center network designs, a cluster network design and a state-of-the-art fabric network design. For inter data center networks, we study eighteen months of recent repair tickets from the field to understand reliability of Wide Area Network (WAN) backbones. In contrast to prior work, we study the effects of network reliability on software systems, and how these reliability characteristics evolve over time. We discuss the implications of network reliability on the design, implementation, and operation of large scale data center systems and how it affects highly-available web services. We hope our study forms a foundation for understanding the reliability of large scale network infrastructure, and inspires new reliability solutions to network incidents.
{"title":"A Large Scale Study of Data Center Network Reliability","authors":"Justin Meza, Tianyin Xu, K. Veeraraghavan, O. Mutlu","doi":"10.1145/3278532.3278566","DOIUrl":"https://doi.org/10.1145/3278532.3278566","url":null,"abstract":"The ability to tolerate, remediate, and recover from network incidents (caused by device failures and fiber cuts, for example) is critical for building and operating highly-available web services. Achieving fault tolerance and failure preparedness requires system architects, software developers, and site operators to have a deep understanding of network reliability at scale, along with its implications on the software systems that run in data centers. Unfortunately, little has been reported on the reliability characteristics of large scale data center network infrastructure, let alone its impact on the availability of services powered by software running on that network infrastructure. This paper fills the gap by presenting a large scale, longitudinal study of data center network reliability based on operational data collected from the production network infrastructure at Facebook, one of the largest web service providers in the world. Our study covers reliability characteristics of both intra and inter data center networks. For intra data center networks, we study seven years of operation data comprising thousands of network incidents across two different data center network designs, a cluster network design and a state-of-the-art fabric network design. For inter data center networks, we study eighteen months of recent repair tickets from the field to understand reliability of Wide Area Network (WAN) backbones. In contrast to prior work, we study the effects of network reliability on software systems, and how these reliability characteristics evolve over time. We discuss the implications of network reliability on the design, implementation, and operation of large scale data center systems and how it affects highly-available web services. We hope our study forms a foundation for understanding the reliability of large scale network infrastructure, and inspires new reliability solutions to network incidents.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83637139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}