The Transport Layer Security (TLS) protocol has evolved in response to different attacks and is increasingly relied on to secure Internet communications. Web browsers have led the adoption of newer and more secure cryptographic algorithms and protocol versions, and thus improved the security of the TLS ecosystem. Other application categories, however, are increasingly using TLS, but too often are relying on obsolete and insecure protocol options. To understand in detail what applications are using TLS, and how they are using it, we developed a novel system for obtaining process information from end hosts and fusing it with network data to produce a TLS fingerprint knowledge base. This data has a rich set of context for each fingerprint, is representative of enterprise TLS deployments, and is automatically updated from ongoing data collection. Our dataset is based on 471 million endpoint-labeled and 8 billion unlabeled TLS sessions obtained from enterprise edge networks in five countries, plus millions of sessions from a malware analysis sandbox. We actively maintain an open source dataset that, at 4,500+ fingerprints and counting, is both the largest and most informative ever published. In this paper, we use the knowledge base to identify trends in enterprise TLS applications beyond the browser: application categories such as storage, communication, system, and email. We identify a rise in the use of TLS by nonbrowser applications and a corresponding decline in the fraction of sessions using version 1.3. Finally, we highlight the shortcomings of naïvely applying TLS fingerprinting to detect malware, and we present recent trends in malware's use of TLS such as the adoption of cipher suite randomization.
{"title":"TLS Beyond the Browser: Combining End Host and Network Data to Understand Application Behavior","authors":"Blake Anderson, D. McGrew","doi":"10.1145/3355369.3355601","DOIUrl":"https://doi.org/10.1145/3355369.3355601","url":null,"abstract":"The Transport Layer Security (TLS) protocol has evolved in response to different attacks and is increasingly relied on to secure Internet communications. Web browsers have led the adoption of newer and more secure cryptographic algorithms and protocol versions, and thus improved the security of the TLS ecosystem. Other application categories, however, are increasingly using TLS, but too often are relying on obsolete and insecure protocol options. To understand in detail what applications are using TLS, and how they are using it, we developed a novel system for obtaining process information from end hosts and fusing it with network data to produce a TLS fingerprint knowledge base. This data has a rich set of context for each fingerprint, is representative of enterprise TLS deployments, and is automatically updated from ongoing data collection. Our dataset is based on 471 million endpoint-labeled and 8 billion unlabeled TLS sessions obtained from enterprise edge networks in five countries, plus millions of sessions from a malware analysis sandbox. We actively maintain an open source dataset that, at 4,500+ fingerprints and counting, is both the largest and most informative ever published. In this paper, we use the knowledge base to identify trends in enterprise TLS applications beyond the browser: application categories such as storage, communication, system, and email. We identify a rise in the use of TLS by nonbrowser applications and a corresponding decline in the fraction of sessions using version 1.3. Finally, we highlight the shortcomings of naïvely applying TLS fingerprinting to detect malware, and we present recent trends in malware's use of TLS such as the adoption of cipher suite randomization.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75877321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Content delivery networks (CDNs) commonly use DNS to map end-users to the best edge servers. A recently proposed EDNS0-Client-Subnet (ECS) extension allows recursive resolvers to include end-user subnet information in DNS queries, so that authoritative DNS servers, especially those belonging to CDNs, could use this information to improve user mapping. In this paper, we study the ECS behavior of ECS-enabled recursive resolvers from the perspectives of the opposite sides of a DNS interaction, the authoritative DNS servers of a major CDN and a busy DNS resolution service. We find a range of erroneous (i.e., deviating from the protocol specification) and detrimental (even if compliant) behaviors that may unnecessarily erode client privacy, reduce the effectiveness of DNS caching, diminish ECS benefits, and in some cases turn ECS from facilitator into an obstacle to authoritative DNS servers' ability to optimize user-to-edge-server mappings.
{"title":"A Look at the ECS Behavior of DNS Resolvers","authors":"R. Al-Dalky, M. Rabinovich, Kyle Schomp","doi":"10.1145/3355369.3355586","DOIUrl":"https://doi.org/10.1145/3355369.3355586","url":null,"abstract":"Content delivery networks (CDNs) commonly use DNS to map end-users to the best edge servers. A recently proposed EDNS0-Client-Subnet (ECS) extension allows recursive resolvers to include end-user subnet information in DNS queries, so that authoritative DNS servers, especially those belonging to CDNs, could use this information to improve user mapping. In this paper, we study the ECS behavior of ECS-enabled recursive resolvers from the perspectives of the opposite sides of a DNS interaction, the authoritative DNS servers of a major CDN and a busy DNS resolution service. We find a range of erroneous (i.e., deviating from the protocol specification) and detrimental (even if compliant) behaviors that may unnecessarily erode client privacy, reduce the effectiveness of DNS caching, diminish ECS benefits, and in some cases turn ECS from facilitator into an obstacle to authoritative DNS servers' ability to optimize user-to-edge-server mappings.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73330591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modern web security and privacy research depends on accurate measurement of an often evasive and hostile web. No longer just a network of static, hyperlinked documents, the modern web is alive with JavaScript (JS) loaded from third parties of unknown trustworthiness. Dynamic analysis of potentially hostile JS currently presents a cruel dilemma: use heavyweight in-browser solutions that prove impossible to maintain, or use lightweight inline JS solutions that are detectable by evasive JS and which cannot match the scope of coverage provided by in-browser systems. We present VisibleV8, a dynamic analysis framework hosted inside V8, the JS engine of the Chrome browser, that logs native function or property accesses during any JS execution. At less than 600 lines (only 67 of which modify V8's existing behavior), our patches are lightweight and have been maintained from Chrome versions 63 through 72 without difficulty. VV8 consistently outperforms equivalent inline instrumentation, and it intercepts accesses impossible to instrument inline. This comprehensive coverage allows us to isolate and identify 46 JavaScript namespace artifacts used by JS code in the wild to detect automated browsing platforms and to discover that 29% of the Alexa top 50k sites load content which actively probes these artifacts.
{"title":"VisibleV8","authors":"Jordan Jueckstock, A. Kapravelos","doi":"10.1145/3355369.3355599","DOIUrl":"https://doi.org/10.1145/3355369.3355599","url":null,"abstract":"Modern web security and privacy research depends on accurate measurement of an often evasive and hostile web. No longer just a network of static, hyperlinked documents, the modern web is alive with JavaScript (JS) loaded from third parties of unknown trustworthiness. Dynamic analysis of potentially hostile JS currently presents a cruel dilemma: use heavyweight in-browser solutions that prove impossible to maintain, or use lightweight inline JS solutions that are detectable by evasive JS and which cannot match the scope of coverage provided by in-browser systems. We present VisibleV8, a dynamic analysis framework hosted inside V8, the JS engine of the Chrome browser, that logs native function or property accesses during any JS execution. At less than 600 lines (only 67 of which modify V8's existing behavior), our patches are lightweight and have been maintained from Chrome versions 63 through 72 without difficulty. VV8 consistently outperforms equivalent inline instrumentation, and it intercepts accesses impossible to instrument inline. This comprehensive coverage allows us to isolate and identify 46 JavaScript namespace artifacts used by JS code in the wild to detect automated browsing platforms and to discover that 29% of the Alexa top 50k sites load content which actively probes these artifacts.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"574 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77079075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Online scan engines such as VirusTotal are heavily used by researchers to label malicious URLs and files. Unfortunately, it is not well understood how the labels are generated and how reliable the scanning results are. In this paper, we focus on VirusTotal and its 68 third-party vendors to examine their labeling process on phishing URLs. We perform a series of measurements by setting up our own phishing websites (mimicking PayPal and IRS) and submitting the URLs for scanning. By analyzing the incoming network traffic and the dynamic label changes at VirusTotal, we reveal new insights into how VirusTotal works and the quality of their labels. Among other things, we show that vendors have trouble flagging all phishing sites, and even the best vendors missed 30% of our phishing sites. In addition, the scanning results are not immediately updated to VirusTotal after the scanning, and there are inconsistent results between VirusTotal scan and some vendors' own scanners. Our results reveal the need for developing more rigorous methodologies to assess and make use of the labels obtained from VirusTotal.
{"title":"Opening the Blackbox of VirusTotal: Analyzing Online Phishing Scan Engines","authors":"Peng Peng, Limin Yang, Linhai Song, Gang Wang","doi":"10.1145/3355369.3355585","DOIUrl":"https://doi.org/10.1145/3355369.3355585","url":null,"abstract":"Online scan engines such as VirusTotal are heavily used by researchers to label malicious URLs and files. Unfortunately, it is not well understood how the labels are generated and how reliable the scanning results are. In this paper, we focus on VirusTotal and its 68 third-party vendors to examine their labeling process on phishing URLs. We perform a series of measurements by setting up our own phishing websites (mimicking PayPal and IRS) and submitting the URLs for scanning. By analyzing the incoming network traffic and the dynamic label changes at VirusTotal, we reveal new insights into how VirusTotal works and the quality of their labels. Among other things, we show that vendors have trouble flagging all phishing sites, and even the best vendors missed 30% of our phishing sites. In addition, the scanning results are not immediately updated to VirusTotal after the scanning, and there are inconsistent results between VirusTotal scan and some vendors' own scanners. Our results reveal the need for developing more rigorous methodologies to assess and make use of the labels obtained from VirusTotal.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"76 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86094158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Louis F. DeKoven, A. Randall, A. Mirian, Gautam Akiwate, Ansel Blume, L. Saul, Aaron Schulman, G. Voelker, S. Savage
Security is a discipline that places significant expectations on lay users. Thus, there are a wide array of technologies and behaviors that we exhort end users to adopt and thereby reduce their security risk. However, the adoption of these "best practices" --- ranging from the use of antivirus products to actively keeping software updated --- is not well understood, nor is their practical impact on security risk well-established. This paper explores both of these issues via a large-scale empirical measurement study covering approximately 15,000 computers over six months. We use passive monitoring to infer and characterize the prevalence of various security practices in situ as well as a range of other potentially security-relevant behaviors. We then explore the extent to which differences in key security behaviors impact real-world outcomes (i.e., that a device shows clear evidence of having been compromised).
{"title":"Measuring Security Practices and How They Impact Security","authors":"Louis F. DeKoven, A. Randall, A. Mirian, Gautam Akiwate, Ansel Blume, L. Saul, Aaron Schulman, G. Voelker, S. Savage","doi":"10.1145/3355369.3355571","DOIUrl":"https://doi.org/10.1145/3355369.3355571","url":null,"abstract":"Security is a discipline that places significant expectations on lay users. Thus, there are a wide array of technologies and behaviors that we exhort end users to adopt and thereby reduce their security risk. However, the adoption of these \"best practices\" --- ranging from the use of antivirus products to actively keeping software updated --- is not well understood, nor is their practical impact on security risk well-established. This paper explores both of these issues via a large-scale empirical measurement study covering approximately 15,000 computers over six months. We use passive monitoring to infer and characterize the prevalence of various security practices in situ as well as a range of other potentially security-relevant behaviors. We then explore the extent to which differences in key security behaviors impact real-world outcomes (i.e., that a device shows clear evidence of having been compromised).","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84790312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chaoyi Lu, Baojun Liu, Zhou Li, S. Hao, Haixin Duan, Mingming Zhang, Chunying Leng, Y. Liu, Zaifeng Zhang, Jianping Wu
DNS packets are designed to travel in unencrypted form through the Internet based on its initial standard. Recent discoveries show that real-world adversaries are actively exploiting this design vulnerability to compromise Internet users' security and privacy. To mitigate such threats, several protocols have been proposed to encrypt DNS queries between DNS clients and servers, which we jointly term as DNS-over-Encryption. While some proposals have been standardized and are gaining strong support from the industry, little has been done to understand their status from the view of global users. This paper performs by far the first end-to-end and large-scale analysis on DNS-over-Encryption. By collecting data from Internet scanning, user-end measurement and passive monitoring logs, we have gained several unique insights. In general, the service quality of DNS-over-Encryption is satisfying, in terms of accessibility and latency. For DNS clients, DNS-over-Encryption queries are less likely to be disrupted by in-path interception compared to traditional DNS, and the extra overhead is tolerable. However, we also discover several issues regarding how the services are operated. As an example, we find 25% DNS-over-TLS service providers use invalid SSL certificates. Compared to traditional DNS, DNS-over-Encryption is used by far fewer users but we have witnessed a growing trend. As such, we believe the community should push broader adoption of DNS-over-Encryption and we also suggest the service providers carefully review their implementations.
{"title":"An End-to-End, Large-Scale Measurement of DNS-over-Encryption: How Far Have We Come?","authors":"Chaoyi Lu, Baojun Liu, Zhou Li, S. Hao, Haixin Duan, Mingming Zhang, Chunying Leng, Y. Liu, Zaifeng Zhang, Jianping Wu","doi":"10.1145/3355369.3355580","DOIUrl":"https://doi.org/10.1145/3355369.3355580","url":null,"abstract":"DNS packets are designed to travel in unencrypted form through the Internet based on its initial standard. Recent discoveries show that real-world adversaries are actively exploiting this design vulnerability to compromise Internet users' security and privacy. To mitigate such threats, several protocols have been proposed to encrypt DNS queries between DNS clients and servers, which we jointly term as DNS-over-Encryption. While some proposals have been standardized and are gaining strong support from the industry, little has been done to understand their status from the view of global users. This paper performs by far the first end-to-end and large-scale analysis on DNS-over-Encryption. By collecting data from Internet scanning, user-end measurement and passive monitoring logs, we have gained several unique insights. In general, the service quality of DNS-over-Encryption is satisfying, in terms of accessibility and latency. For DNS clients, DNS-over-Encryption queries are less likely to be disrupted by in-path interception compared to traditional DNS, and the extra overhead is tolerable. However, we also discover several issues regarding how the services are operated. As an example, we find 25% DNS-over-TLS service providers use invalid SSL certificates. Compared to traditional DNS, DNS-over-Encryption is used by far fewer users but we have witnessed a growing trend. As such, we believe the community should push broader adoption of DNS-over-Encryption and we also suggest the service providers carefully review their implementations.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"130 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84452108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Pastrana, Alice Hutchings, Daniel R. Thomas, J. Tapiador
eWhoring is the term used by offenders to refer to a type of online fraud in which cybersexual encounters are simulated for financial gain. Perpetrators use social engineering techniques to impersonate young women in online communities, e.g., chat or social networking sites. They engage potential customers in conversation with the aim of selling misleading sexual material -- mostly photographs and interactive video shows -- illicitly compiled from third-party sites. eWhoring is a popular topic in underground communities, with forums acting as a gateway into offending. Users not only share knowledge and tutorials, but also trade in goods and services, such as packs of images and videos. In this paper, we present a processing pipeline to quantitatively analyse various aspects of eWhoring. Our pipeline integrates multiple tools to crawl, annotate, and classify material in a semi-automatic way. It builds in precautions to safeguard against significant ethical issues, such as avoiding the researchers' exposure to pornographic material, and legal concerns, which were justified as some of the images were classified as child exploitation material. We use it to perform a longitudinal measurement of eWhoring activities in 10 specialised underground forums from 2008 to 2019. Our study focuses on three of the main eWhoring components: (i) the acquisition and provenance of images; (ii) the financial profits and monetisation techniques; and (iii) a social network analysis of the offenders, including their relationships, interests, and pathways before and after engaging in this fraudulent activity. We provide recommendations, including potential intervention approaches.
{"title":"Measuring eWhoring","authors":"S. Pastrana, Alice Hutchings, Daniel R. Thomas, J. Tapiador","doi":"10.1145/3355369.3355597","DOIUrl":"https://doi.org/10.1145/3355369.3355597","url":null,"abstract":"eWhoring is the term used by offenders to refer to a type of online fraud in which cybersexual encounters are simulated for financial gain. Perpetrators use social engineering techniques to impersonate young women in online communities, e.g., chat or social networking sites. They engage potential customers in conversation with the aim of selling misleading sexual material -- mostly photographs and interactive video shows -- illicitly compiled from third-party sites. eWhoring is a popular topic in underground communities, with forums acting as a gateway into offending. Users not only share knowledge and tutorials, but also trade in goods and services, such as packs of images and videos. In this paper, we present a processing pipeline to quantitatively analyse various aspects of eWhoring. Our pipeline integrates multiple tools to crawl, annotate, and classify material in a semi-automatic way. It builds in precautions to safeguard against significant ethical issues, such as avoiding the researchers' exposure to pornographic material, and legal concerns, which were justified as some of the images were classified as child exploitation material. We use it to perform a longitudinal measurement of eWhoring activities in 10 specialised underground forums from 2008 to 2019. Our study focuses on three of the main eWhoring components: (i) the acquisition and provenance of images; (ii) the financial profits and monetisation techniques; and (iii) a social network analysis of the offenders, including their relationships, interests, and pathways before and after engaging in this fraudulent activity. We provide recommendations, including potential intervention approaches.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"505 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76394274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcin Nawrocki, Jeremias Blendin, C. Dietzel, T. Schmidt, Matthias Wählisch
Large Distributed Denial-of-Service (DDoS) attacks pose a major threat not only to end systems but also to the Internet infrastructure as a whole. Remote Triggered Black Hole filtering (RTBH) has been established as a tool to mitigate inter-domain DDoS attacks by discarding unwanted traffic early in the network, e.g., at Internet eXchange Points (IXPs). As of today, little is known about the kind and effectiveness of its use, and about the need for more fine-grained filtering. In this paper, we present the first in-depth statistical analysis of all RTBH events at a large European IXP by correlating measurements of the data and the control plane for a period of 104 days. We identify a surprising practice that significantly deviates from the expected mitigation use patterns. First, we show that only one third of all 34k visible RTBH events correlate with indicators of DDoS attacks. Second, we witness over 2000 blackhole events announced for prefixes not of servers but of clients situated in DSL networks. Third, we find that blackholing on average causes dropping of only 50% of the unwanted traffic and is hence a much less reliable tool for mitigating DDoS attacks than expected. Our analysis gives also rise to first estimates of the collateral damage caused by RTBH-based DDoS mitigation.
{"title":"Down the Black Hole: Dismantling Operational Practices of BGP Blackholing at IXPs","authors":"Marcin Nawrocki, Jeremias Blendin, C. Dietzel, T. Schmidt, Matthias Wählisch","doi":"10.1145/3355369.3355593","DOIUrl":"https://doi.org/10.1145/3355369.3355593","url":null,"abstract":"Large Distributed Denial-of-Service (DDoS) attacks pose a major threat not only to end systems but also to the Internet infrastructure as a whole. Remote Triggered Black Hole filtering (RTBH) has been established as a tool to mitigate inter-domain DDoS attacks by discarding unwanted traffic early in the network, e.g., at Internet eXchange Points (IXPs). As of today, little is known about the kind and effectiveness of its use, and about the need for more fine-grained filtering. In this paper, we present the first in-depth statistical analysis of all RTBH events at a large European IXP by correlating measurements of the data and the control plane for a period of 104 days. We identify a surprising practice that significantly deviates from the expected mitigation use patterns. First, we show that only one third of all 34k visible RTBH events correlate with indicators of DDoS attacks. Second, we witness over 2000 blackhole events announced for prefixes not of servers but of clients situated in DSL networks. Third, we find that blackholing on average causes dropping of only 50% of the unwanted traffic and is hence a much less reliable tool for mitigating DDoS attacks than expected. Our analysis gives also rise to first estimates of the collateral damage caused by RTBH-based DDoS mitigation.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"55 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86916868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Santiago Vargas, U. Goel, Moritz Steiner, A. Balasubramanian
Content delivery networks serve a major fraction of the Internet traffic, and their geographically deployed infrastructure makes them a good vantage point to observe traffic access patterns. We perform a large-scale investigation to characterize Web traffic patterns observed from a major CDN infrastructure. Specifically, we discover that responses with application/json content-type form a growing majority of all HTTP requests. As a result, we seek to understand what types of devices and applications are requesting JSON objects and explore opportunities to optimize CDN delivery of JSON traffic. Our study shows that mobile applications account for at least 52% of JSON traffic on the CDN and embedded devices account for another 12% of all JSON traffic. We also find that more than 55% of JSON traffic on the CDN is uncacheable, showing that a large portion of JSON traffic on the CDN is dynamic. By further looking at patterns of periodicity in requests, we find that 6.3% of JSON traffic is periodically requested and reflects the use of (partially) autonomous software systems, IoT devices, and other kinds of machine-to-machine communication. Finally, we explore dependencies in JSON traffic through the lens of ngram models and find that these models can capture patterns between subsequent requests. We can potentially leverage this to prefetch requests, improving the cache hit ratio.
{"title":"Characterizing JSON Traffic Patterns on a CDN","authors":"Santiago Vargas, U. Goel, Moritz Steiner, A. Balasubramanian","doi":"10.1145/3355369.3355594","DOIUrl":"https://doi.org/10.1145/3355369.3355594","url":null,"abstract":"Content delivery networks serve a major fraction of the Internet traffic, and their geographically deployed infrastructure makes them a good vantage point to observe traffic access patterns. We perform a large-scale investigation to characterize Web traffic patterns observed from a major CDN infrastructure. Specifically, we discover that responses with application/json content-type form a growing majority of all HTTP requests. As a result, we seek to understand what types of devices and applications are requesting JSON objects and explore opportunities to optimize CDN delivery of JSON traffic. Our study shows that mobile applications account for at least 52% of JSON traffic on the CDN and embedded devices account for another 12% of all JSON traffic. We also find that more than 55% of JSON traffic on the CDN is uncacheable, showing that a large portion of JSON traffic on the CDN is dynamic. By further looking at patterns of periodicity in requests, we find that 6.3% of JSON traffic is periodically requested and reflects the use of (partially) autonomous software systems, IoT devices, and other kinds of machine-to-machine communication. Finally, we explore dependencies in JSON traffic through the lens of ngram models and find that these models can capture patterns between subsequent requests. We can potentially leverage this to prefetch requests, improving the cache hit ratio.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87056063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Malicious ads often use social engineering (SE) tactics to coax users into downloading unwanted software, purchasing fake products or services, or giving up valuable personal information. These ads are often served by low-tier ad networks that may not have the technical means (or simply the will) to patrol the ad content they serve to curtail abuse. In this paper, we propose a system for large-scale automatic discovery and tracking of SE Attack Campaigns delivered via Malicious Advertisements (SEACMA). Our system aims to be generic, allowing us to study the SEACMA ad distribution problem without being biased towards specific categories of ad-publishing websites or SE attacks. Starting with a seed of low-tier ad networks, we measure which of these networks are the most likely to distribute malicious ads and propose a mechanism to discover new ad networks that are also leveraged to support the distribution of SEACMA campaigns. The results of our study aim to be useful in a number of ways. For instance, we show that SEACMA ads use a number of tactics to successfully evade URL blacklists and ad blockers. By tracking SEACMA campaigns, our system provides a mechanism to more proactively detect and block such evasive ads. Therefore, our results provide valuable information that could be used to improve defense systems against social engineering attacks and malicious ads in general.
{"title":"What You See is NOT What You Get: Discovering and Tracking Social Engineering Attack Campaigns","authors":"Phani Vadrevu, R. Perdisci","doi":"10.1145/3355369.3355600","DOIUrl":"https://doi.org/10.1145/3355369.3355600","url":null,"abstract":"Malicious ads often use social engineering (SE) tactics to coax users into downloading unwanted software, purchasing fake products or services, or giving up valuable personal information. These ads are often served by low-tier ad networks that may not have the technical means (or simply the will) to patrol the ad content they serve to curtail abuse. In this paper, we propose a system for large-scale automatic discovery and tracking of SE Attack Campaigns delivered via Malicious Advertisements (SEACMA). Our system aims to be generic, allowing us to study the SEACMA ad distribution problem without being biased towards specific categories of ad-publishing websites or SE attacks. Starting with a seed of low-tier ad networks, we measure which of these networks are the most likely to distribute malicious ads and propose a mechanism to discover new ad networks that are also leveraged to support the distribution of SEACMA campaigns. The results of our study aim to be useful in a number of ways. For instance, we show that SEACMA ads use a number of tactics to successfully evade URL blacklists and ad blockers. By tracking SEACMA campaigns, our system provides a mechanism to more proactively detect and block such evasive ads. Therefore, our results provide valuable information that could be used to improve defense systems against social engineering attacks and malicious ads in general.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87723891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}