Pub Date : 2022-11-07DOI: https://dl.acm.org/doi/10.1145/3546068
Shaharyar Khan, Ilya Kabanov, Yunke Hua, Stuart Madnick
The 2019 Capital One data breach was one of the largest data breaches impacting the privacy and security of personal information of over a 100 million individuals. In most reports about a cyberattack, you will often hear that it succeeded because a single employee clicked on a link in a phishing email or forgot to patch some software, making it seem like an isolated, one-off, trivial problem involving maybe one person, committing a mistake or being negligent. But that is usually not the complete story. By ignoring the related managerial and organizational failures, you are leaving in place the conditions for the next breach. Using our Cybersafety analysis methodology, we identified control failures spanning control levels, going from rather technical issues up to top management, the Board of Directors, and Government regulators. In this analysis, we reconstruct the Capital One hierarchical cyber safety control structure, identify what parts failed and why, and provide recommendations for improvements. This work demonstrates how to discover the true causes of security failures in complex information systems and derive systematic cybersecurity improvements that likely apply to many other organizations. It also provides an approach that individuals can use to evaluate and better secure their organizations.
{"title":"A Systematic Analysis of the Capital One Data Breach: Critical Lessons Learned","authors":"Shaharyar Khan, Ilya Kabanov, Yunke Hua, Stuart Madnick","doi":"https://dl.acm.org/doi/10.1145/3546068","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3546068","url":null,"abstract":"<p>The 2019 Capital One data breach was one of the largest data breaches impacting the privacy and security of personal information of over a 100 million individuals. In most reports about a cyberattack, you will often hear that it succeeded because a single employee clicked on a link in a phishing email or forgot to patch some software, making it seem like an isolated, one-off, trivial problem involving maybe one person, committing a mistake or being negligent. But that is usually not the complete story. By ignoring the related managerial and organizational failures, you are leaving in place the conditions for the next breach. Using our Cybersafety analysis methodology, we identified control failures spanning control levels, going from rather technical issues up to top management, the Board of Directors, and Government regulators. In this analysis, we reconstruct the Capital One hierarchical cyber safety control structure, identify what parts failed and why, and provide recommendations for improvements. This work demonstrates how to discover the true causes of security failures in complex information systems and derive systematic cybersecurity improvements that likely apply to many other organizations. It also provides an approach that individuals can use to evaluate and better secure their organizations.</p>","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":"36 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138540692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-07DOI: https://dl.acm.org/doi/10.1145/3544837
Xueru Zhang, Mohammad Mahdi Khalili, Mingyan Liu
Many data analytics applications rely on temporal data, generated (and possibly acquired) sequentially for online analysis. How to release this type of data in a privacy-preserving manner is of great interest and more challenging than releasing one-time, static data. Because of the (potentially strong) temporal correlation within the data sequence, the overall privacy loss can accumulate significantly over time; an attacker with statistical knowledge of the correlation can be particularly hard to defend against. An idea that has been explored in the literature to mitigate this problem is to factor this correlation into the perturbation/noise mechanism. Existing work, however, either focuses on the offline setting (where perturbation is designed and introduced after the entire sequence has become available), or requires a priori information on the correlation in generating perturbation. In this study we propose an approach where the correlation is learned as the sequence is generated, and is used for estimating future data in the sequence. This estimate then drives the generation of the noisy released data. This method allows us to design better perturbation and is suitable for real-time operations. Using the notion of differential privacy, we show this approach achieves high accuracy with lower privacy loss compared to existing methods.
{"title":"Differentially Private Real-Time Release of Sequential Data","authors":"Xueru Zhang, Mohammad Mahdi Khalili, Mingyan Liu","doi":"https://dl.acm.org/doi/10.1145/3544837","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3544837","url":null,"abstract":"<p>Many data analytics applications rely on temporal data, generated (and possibly acquired) sequentially for online analysis. How to release this type of data in a privacy-preserving manner is of great interest and more challenging than releasing one-time, static data. Because of the (potentially strong) temporal correlation within the data sequence, the overall privacy loss can accumulate significantly over time; an attacker with statistical knowledge of the correlation can be particularly hard to defend against. An idea that has been explored in the literature to mitigate this problem is to factor this correlation into the perturbation/noise mechanism. Existing work, however, either focuses on the offline setting (where perturbation is designed and introduced after the entire sequence has become available), or requires <i>a priori</i> information on the correlation in generating perturbation. In this study we propose an approach where the correlation is learned as the sequence is generated, and is used for estimating future data in the sequence. This estimate then drives the generation of the noisy released data. This method allows us to design better perturbation and is suitable for real-time operations. Using the notion of differential privacy, we show this approach achieves high accuracy with lower privacy loss compared to existing methods.</p>","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":"191 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138540614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anchor link prediction across social networks plays an important role in multiple social network analysis. Traditional methods rely heavily on user privacy information or high-quality network topology information. These methods are not suitable for multiple social networks analysis in real-life. Deep learning methods based on graph embedding are restricted by the impact of the active privacy protection policy of users on the graph structure. In this paper, we propose a novel method which neutralizes the impact of users’ evasion strategies. First, graph embedding with conditional estimation analysis is used to obtain a robust embedding vector space. Secondly, cross-network features space for supervised learning is constructed via the constraints of cross-network feature collisions. The combination of robustness enhancement and cross-network feature collisions constraints eliminate the impact of evasion strategies. Extensive experiments on large-scale real-life social networks demonstrate that the proposed method significantly outperforms the state-of-the-art methods in terms of precision, adaptability, and robustness for the scenarios with evasion strategies.
{"title":"A Novel Cross-Network Embedding for Anchor Link Prediction with Social Adversarial Attacks","authors":"Huanran Wang, Wu Yang, Wei Wang, Dapeng Man, Jiguang Lv","doi":"https://dl.acm.org/doi/10.1145/3548685","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3548685","url":null,"abstract":"<p>Anchor link prediction across social networks plays an important role in multiple social network analysis. Traditional methods rely heavily on user privacy information or high-quality network topology information. These methods are not suitable for multiple social networks analysis in real-life. Deep learning methods based on graph embedding are restricted by the impact of the active privacy protection policy of users on the graph structure. In this paper, we propose a novel method which neutralizes the impact of users’ evasion strategies. First, graph embedding with conditional estimation analysis is used to obtain a robust embedding vector space. Secondly, cross-network features space for supervised learning is constructed via the constraints of cross-network feature collisions. The combination of robustness enhancement and cross-network feature collisions constraints eliminate the impact of evasion strategies. Extensive experiments on large-scale real-life social networks demonstrate that the proposed method significantly outperforms the state-of-the-art methods in terms of precision, adaptability, and robustness for the scenarios with evasion strategies.</p>","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":"64 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138540648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-07DOI: https://dl.acm.org/doi/10.1145/3546578
Kopo Marvin Ramokapane, Jose Such, Awais Rashid
Current cloud deletion mechanisms fall short in meeting users’ various deletion needs. They assume all data is deleted the same way—data is temporally removed (or hidden) from users’ cloud accounts before being completely deleted. This assumption neglects users’ desire to have data completely deleted instantly or their preference to have it recoverable for a more extended period. To date, these preferences have not been explored. To address this gap, we conducted a participatory study with four groups of active cloud users (five subjects per group). We examined their deletion preferences and the information they require to aid deletion. In particular, we explored how users want to delete cloud data and identify what information about cloud deletion they consider essential, the time it should be made available to them, and the communication channel that should be used. We show that cloud deletion preferences are complex and multi-dimensional, varying between subjects and groups. Information about deletion should be within reach when needed, for instance, be part of deletion controls. Based on these findings, we discuss the implications of our study in improving the current deletion mechanism to accommodate these preferences.
{"title":"What Users Want From Cloud Deletion and the Information They Need: A Participatory Action Study","authors":"Kopo Marvin Ramokapane, Jose Such, Awais Rashid","doi":"https://dl.acm.org/doi/10.1145/3546578","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3546578","url":null,"abstract":"<p>Current cloud deletion mechanisms fall short in meeting users’ various deletion needs. They assume all data is deleted the same way—data is temporally removed (or hidden) from users’ cloud accounts before being completely deleted. This assumption neglects users’ desire to have data completely deleted instantly or their preference to have it recoverable for a more extended period. To date, these preferences have not been explored. To address this gap, we conducted a participatory study with four groups of active cloud users (five subjects per group). We examined their deletion preferences and the information they require to aid deletion. In particular, we explored how users want to delete cloud data and identify what information about cloud deletion they consider essential, the time it should be made available to them, and the communication channel that should be used. We show that cloud deletion preferences are complex and multi-dimensional, varying between subjects and groups. Information about deletion should be within reach when needed, for instance, be part of deletion controls. Based on these findings, we discuss the implications of our study in improving the current deletion mechanism to accommodate these preferences.</p>","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":"2 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138540686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-07DOI: https://dl.acm.org/doi/10.1145/3558767
Euijin Choo, Mohamed Nabeel, Mashael Alsabah, Issa Khalil, Ting Yu, Wei Wang
We propose to identify compromised mobile devices from a network administrator’s point of view. Intuitively, inadvertent users (and thus their devices) who download apps through untrustworthy markets are often lured to install malicious apps through in-app advertisements or phishing. We thus hypothesize that devices sharing similar apps would have a similar likelihood of being compromised, resulting in an association between a compromised device and its apps. We propose to leverage such associations to identify unknown compromised devices using the guilt-by-association principle. Admittedly, such associations could be relatively weak as it is hard, if not impossible, for an app to automatically download and install other apps without explicit user initiation. We describe how we can magnify such associations by carefully choosing parameters when applying graph-based inferences. We empirically evaluate the effectiveness of our approach on real datasets provided by a major mobile service provider. Specifically, we show that our approach achieves nearly 98% AUC (area under the ROC curve) and further detects as many as 6 ~ 7 times of new compromised devices not covered by the ground truth by expanding the limited knowledge on known devices. We show that the newly detected devices indeed present undesirable behavior in terms of leaking private information and accessing risky IPs and domains. We further conduct in-depth analysis of the effectiveness of graph inferences to understand the unique structure of the associations between mobile devices and their apps, and its impact on graph inferences, based on which we propose how to choose key parameters.
{"title":"DeviceWatch: A Data-Driven Network Analysis Approach to Identifying Compromised Mobile Devices with Graph-Inference","authors":"Euijin Choo, Mohamed Nabeel, Mashael Alsabah, Issa Khalil, Ting Yu, Wei Wang","doi":"https://dl.acm.org/doi/10.1145/3558767","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3558767","url":null,"abstract":"<p>We propose to identify compromised mobile devices from a network administrator’s point of view. Intuitively, inadvertent users (and thus their devices) who download apps through untrustworthy markets are often lured to install malicious apps through in-app advertisements or phishing. We thus hypothesize that devices sharing similar apps would have a similar likelihood of being compromised, resulting in an association between a compromised device and its apps. We propose to leverage such associations to identify unknown compromised devices using the guilt-by-association principle. Admittedly, such associations could be relatively weak as it is hard, if not impossible, for an app to automatically download and install other apps without explicit user initiation. We describe how we can magnify such associations by carefully choosing parameters when applying graph-based inferences. We empirically evaluate the effectiveness of our approach on real datasets provided by a major mobile service provider. Specifically, we show that our approach achieves nearly 98% <b>AUC (area under the ROC curve)</b> and further detects as many as 6 ~ 7 times of new compromised devices not covered by the ground truth by expanding the limited knowledge on known devices. We show that the newly detected devices indeed present undesirable behavior in terms of leaking private information and accessing risky IPs and domains. We further conduct in-depth analysis of the effectiveness of graph inferences to understand the unique structure of the associations between mobile devices and their apps, and its impact on graph inferences, based on which we propose how to choose key parameters.</p>","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":"3 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138540622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-03DOI: https://dl.acm.org/doi/10.1145/3570161
Ranya Aloufi, Hamed Haddadi, David Boyle
Voice user interfaces and digital assistants are rapidly entering our lives and becoming singular touch points spanning our devices. These always-on services capture and transmit our audio data to powerful cloud services for further processing and subsequent actions. Our voices and raw audio signals collected through these devices contain a host of sensitive paralinguistic information that is transmitted to service providers regardless of deliberate or false triggers. As our emotional patterns and sensitive attributes like our identity, gender, well-being, are easily inferred using deep acoustic models, we encounter a new generation of privacy risks by using these services. One approach to mitigate the risk of paralinguistic-based privacy breaches is to exploit a combination of cloud-based processing with privacy-preserving, on-device paralinguistic information learning and filtering before transmitting voice data.
In this paper we introduce EDGY, a configurable, lightweight, disentangled representation learning framework that transforms and filters high-dimensional voice data to identify and contain sensitive attributes at the edge prior to offloading to the cloud. We evaluate EDGY’s on-device performance and explore optimization techniques, including model quantization and knowledge distillation, to enable private, accurate and efficient representation learning on resource-constrained devices. Our results show that EDGY runs in tens of milliseconds with 0.2% relative improvement in ‘zero-shot’ ABX score or minimal performance penalties of approximately 5.95% word error rate (WER) in learning linguistic representations from raw voice signals, using a CPU and a single-core ARM processor without specialized hardware.
{"title":"Paralinguistic Privacy Protection at the Edge","authors":"Ranya Aloufi, Hamed Haddadi, David Boyle","doi":"https://dl.acm.org/doi/10.1145/3570161","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3570161","url":null,"abstract":"<p>Voice user interfaces and digital assistants are rapidly entering our lives and becoming singular touch points spanning our devices. These <i>always-on</i> services capture and transmit our audio data to powerful cloud services for further processing and subsequent actions. Our voices and raw audio signals collected through these devices contain a host of sensitive paralinguistic information that is transmitted to service providers regardless of deliberate or false triggers. As our emotional patterns and sensitive attributes like our identity, gender, well-being, are easily inferred using deep acoustic models, we encounter a new generation of privacy risks by using these services. One approach to mitigate the risk of paralinguistic-based privacy breaches is to exploit a combination of cloud-based processing with privacy-preserving, on-device paralinguistic information learning and filtering before transmitting voice data. </p><p>In this paper we introduce <i>EDGY</i>, a configurable, lightweight, disentangled representation learning framework that transforms and filters high-dimensional voice data to identify and contain sensitive attributes at the edge prior to offloading to the cloud. We evaluate EDGY’s on-device performance and explore optimization techniques, including model quantization and knowledge distillation, to enable private, accurate and efficient representation learning on resource-constrained devices. Our results show that EDGY runs in tens of milliseconds with 0.2% relative improvement in ‘zero-shot’ ABX score or minimal performance penalties of approximately 5.95% word error rate (WER) in learning linguistic representations from raw voice signals, using a CPU and a single-core ARM processor without specialized hardware.</p>","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":"90 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2022-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138540613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giorgio Di Tizio, Patrick Speicher, Milivoj Simeonovski, M. Backes, Ben Stock, R. Künnemann
The integrity of the content a user is exposed to when browsing the web relies on a plethora of non-web technologies and an infrastructure of interdependent hosts, communication technologies, and trust relations. Incidents like the Chinese Great Cannon or the MyEtherWallet attack make it painfully clear: the security of end users hinges on the security of the surrounding infrastructure: routing, DNS, content delivery, and the PKI. There are many competing, but isolated proposals to increase security, from the network up to the application layer. So far, researchers have focused on analyzing attacks and defenses on specific layers. We still lack an evaluation of how, given the status quo of the web, these proposals can be combined, how effective they are, and at what cost the increase of security comes. In this work, we propose a graph-based analysis based on Stackelberg planning that considers a rich attacker model and a multitude of proposals from IPsec to DNSSEC and SRI. Our threat model considers the security of billions of users against attackers ranging from small hacker groups to nation-state actors. Analyzing the infrastructure of the Top 5k Alexa domains, we discover that the security mechanisms currently deployed are ineffective and that some infrastructure providers have a comparable threat potential to nations. We find a considerable increase of security (up to 13% protected web visits) is possible at a relatively modest cost, due to the effectiveness of mitigations at the application and transport layer, which dominate expensive infrastructure enhancements such as DNSSEC and IPsec.
{"title":"Pareto-optimal Defenses for the Web Infrastructure: Theory and Practice","authors":"Giorgio Di Tizio, Patrick Speicher, Milivoj Simeonovski, M. Backes, Ben Stock, R. Künnemann","doi":"10.1145/3567595","DOIUrl":"https://doi.org/10.1145/3567595","url":null,"abstract":"The integrity of the content a user is exposed to when browsing the web relies on a plethora of non-web technologies and an infrastructure of interdependent hosts, communication technologies, and trust relations. Incidents like the Chinese Great Cannon or the MyEtherWallet attack make it painfully clear: the security of end users hinges on the security of the surrounding infrastructure: routing, DNS, content delivery, and the PKI. There are many competing, but isolated proposals to increase security, from the network up to the application layer. So far, researchers have focused on analyzing attacks and defenses on specific layers. We still lack an evaluation of how, given the status quo of the web, these proposals can be combined, how effective they are, and at what cost the increase of security comes. In this work, we propose a graph-based analysis based on Stackelberg planning that considers a rich attacker model and a multitude of proposals from IPsec to DNSSEC and SRI. Our threat model considers the security of billions of users against attackers ranging from small hacker groups to nation-state actors. Analyzing the infrastructure of the Top 5k Alexa domains, we discover that the security mechanisms currently deployed are ineffective and that some infrastructure providers have a comparable threat potential to nations. We find a considerable increase of security (up to 13% protected web visits) is possible at a relatively modest cost, due to the effectiveness of mitigations at the application and transport layer, which dominate expensive infrastructure enhancements such as DNSSEC and IPsec.","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":"26 1","pages":"1 - 36"},"PeriodicalIF":2.3,"publicationDate":"2022-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47320596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-13DOI: https://dl.acm.org/doi/10.1145/3567595
Giorgio Di Tizio, Patrick Speicher, Milivoj Simeonovski, Michael Backes, Ben Stock, Robert Künnemann
The integrity of the content a user is exposed to when browsing the web relies on a plethora of non-web technologies and an infrastructure of interdependent hosts, communication technologies, and trust relations. Incidents like the Chinese Great Cannon or the MyEtherWallet attack make it painfully clear: the security of end users hinges on the security of the surrounding infrastructure: routing, DNS, content delivery, and the PKI. There are many competing, but isolated proposals to increase security, from the network up to the application layer. So far, researchers have focus on analyzing attacks and defenses on specific layers. We still lack an evaluation of how, given the status quo of the web, these proposals can be combined, how effective they are, and at what cost the increase of security comes. In this work, we propose a graph-based analysis based on Stackelberg planning that considers a rich attacker model and a multitude of proposals from IPsec to DNSSEC and SRI. Our threat model considers the security of billions of users against attackers ranging from small hacker groups to nation-state actors. Analyzing the infrastructure of the Top 5k Alexa domains, we discover that the security mechanisms currently deployed are ineffective and that some infrastructure providers have a comparable threat potential to nations. We find a considerable increase of security (up to 13% protected web visits) is possible at relatively modest cost, due to the effectiveness of mitigations at the application and transport layer, which dominate expensive infrastructure enhancements such as DNSSEC and IPsec.
{"title":"Pareto-Optimal Defenses for the Web Infrastructure: Theory and Practice","authors":"Giorgio Di Tizio, Patrick Speicher, Milivoj Simeonovski, Michael Backes, Ben Stock, Robert Künnemann","doi":"https://dl.acm.org/doi/10.1145/3567595","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3567595","url":null,"abstract":"<p>The integrity of the content a user is exposed to when browsing the web relies on a plethora of non-web technologies and an infrastructure of interdependent hosts, communication technologies, and trust relations. Incidents like the Chinese Great Cannon or the MyEtherWallet attack make it painfully clear: the security of end users hinges on the security of the surrounding infrastructure: routing, DNS, content delivery, and the PKI. There are many competing, but isolated proposals to increase security, from the network up to the application layer. So far, researchers have focus on analyzing attacks and defenses on specific layers. We still lack an evaluation of how, given the status quo of the web, these proposals can be combined, how effective they are, and at what cost the increase of security comes. In this work, we propose a graph-based analysis based on Stackelberg planning that considers a rich attacker model and a multitude of proposals from IPsec to DNSSEC and SRI. Our threat model considers the security of billions of users against attackers ranging from small hacker groups to nation-state actors. Analyzing the infrastructure of the Top 5k Alexa domains, we discover that the security mechanisms currently deployed are ineffective and that some infrastructure providers have a comparable threat potential to nations. We find a considerable increase of security (up to 13% protected web visits) is possible at relatively modest cost, due to the effectiveness of mitigations at the application and transport layer, which dominate expensive infrastructure enhancements such as DNSSEC and IPsec.</p>","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":"74 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2022-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138540616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Malware is still a widespread problem, and it is used by malicious actors to routinely compromise the security of computer systems. Consumers typically rely on a single AV product to detect and block possible malware infections, while corporations often install multiple security products, activate several layers of defenses, and establish security policies among employees. However, if a better security posture should lower the risk of malware infections, then the actual extent to which this happens is still under debate by risk analysis experts. Moreover, the difference in risks encountered by consumers and enterprises has never been empirically studied by using real-world data. In fact, the mere use of third-party software, network services, and the interconnected nature of our society necessarily exposes both classes of users to undiversifiable risks: Independently from how careful users are and how well they manage their cyber hygiene, a portion of that risk would simply exist because of the fact of using a computer, sharing the same networks, and running the same software. In this work, we shed light on both systemic (i.e., diversifiable and dependent on the security posture) and systematic (i.e., undiversifiable and independent of the cyber hygiene) risk classes. Leveraging the telemetry data of a popular security company, we compare, in the first part of our study, the effects that different security measures have on malware encounter risks in consumer and enterprise environments. In the second part, we conduct exploratory research on systematic risk, investigate the quality of nine different indicators we were able to extract from our telemetry, and provide, for the first time, quantitative indicators of their predictive power. Our results show that even if consumers have a slightly lower encounter rate than enterprises (9.8% vs. 12.0%), the latter do considerably better when selecting machines with an increasingly higher uptime (89% vs. 53%). The two segments also diverge when we separately consider the presence of Adware and Potentially Unwanted Applications (PUA) and the generic samples detected through behavioral signatures: While consumers have an encounter rate for Adware and PUA that is 6 times higher than enterprise machines, those on average match behavioral signatures 2 times more frequently than the counterpart. We find, instead, similar trends when analyzing the age of encountered signatures, and the prevalence of different classes of traditional malware (such as Ransomware and Cryptominers). Finally, our findings show that the amount of time a host is active, the volume of files generated on the machine, the number and reputation of vendors of the installed applications, the host geographical location, and its recurrent infected state carry useful information as indicators of systematic risk of malware encounters. Activity days and hours have a higher influence in the risk of consumers, increasing the odds of encountering malw
{"title":"A Comparison of Systemic and Systematic Risks of Malware Encounters in Consumer and Enterprise Environments","authors":"Savino Dambra, Leyla Bilge, D. Balzarotti","doi":"10.1145/3565362","DOIUrl":"https://doi.org/10.1145/3565362","url":null,"abstract":"Malware is still a widespread problem, and it is used by malicious actors to routinely compromise the security of computer systems. Consumers typically rely on a single AV product to detect and block possible malware infections, while corporations often install multiple security products, activate several layers of defenses, and establish security policies among employees. However, if a better security posture should lower the risk of malware infections, then the actual extent to which this happens is still under debate by risk analysis experts. Moreover, the difference in risks encountered by consumers and enterprises has never been empirically studied by using real-world data. In fact, the mere use of third-party software, network services, and the interconnected nature of our society necessarily exposes both classes of users to undiversifiable risks: Independently from how careful users are and how well they manage their cyber hygiene, a portion of that risk would simply exist because of the fact of using a computer, sharing the same networks, and running the same software. In this work, we shed light on both systemic (i.e., diversifiable and dependent on the security posture) and systematic (i.e., undiversifiable and independent of the cyber hygiene) risk classes. Leveraging the telemetry data of a popular security company, we compare, in the first part of our study, the effects that different security measures have on malware encounter risks in consumer and enterprise environments. In the second part, we conduct exploratory research on systematic risk, investigate the quality of nine different indicators we were able to extract from our telemetry, and provide, for the first time, quantitative indicators of their predictive power. Our results show that even if consumers have a slightly lower encounter rate than enterprises (9.8% vs. 12.0%), the latter do considerably better when selecting machines with an increasingly higher uptime (89% vs. 53%). The two segments also diverge when we separately consider the presence of Adware and Potentially Unwanted Applications (PUA) and the generic samples detected through behavioral signatures: While consumers have an encounter rate for Adware and PUA that is 6 times higher than enterprise machines, those on average match behavioral signatures 2 times more frequently than the counterpart. We find, instead, similar trends when analyzing the age of encountered signatures, and the prevalence of different classes of traditional malware (such as Ransomware and Cryptominers). Finally, our findings show that the amount of time a host is active, the volume of files generated on the machine, the number and reputation of vendors of the installed applications, the host geographical location, and its recurrent infected state carry useful information as indicators of systematic risk of malware encounters. Activity days and hours have a higher influence in the risk of consumers, increasing the odds of encountering malw","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":" ","pages":"1 - 30"},"PeriodicalIF":2.3,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46675772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-03DOI: https://dl.acm.org/doi/10.1145/3565362
Savino Dambra, Leyla Bilge, Davide Balzarotti
<p>Malware is still a widespread problem and it is used by malicious actors to routinely compromise the security of computer systems. Consumers typically rely on a single AV product to detect and block possible malware infections, while corporations often install multiple security products, activate several layers of defenses, and establish security policies among employees. However, if a better security posture should lower the risk of malware infections, the actual extent to which this happens is still under debate by risk analysis experts. Moreover, the difference in risks encountered by consumers and enterprises has never been empirically studied by using real-world data. </p><p>In fact, the mere use of third-party software, network services, and the interconnected nature of our society necessarily exposes both classes of users to undiversifiable risks: independently from how careful users are and how well they manage their cyber hygiene, a portion of that risk would simply exist because of the fact of using a computer, sharing the same networks, and running the same software. </p><p>In this work, we shed light on both systemic (i.e., diversifiable and dependent on the security posture) and systematic (i.e., undiversifiable and independent of the cyber hygiene) risk classes. Leveraging the telemetry data of a popular security company, we compare, in the first part of our study, the effects that different security measures have on malware encounter risks in consumer and enterprise environments. In the second part, we conduct exploratory research on systematic risk, investigate the quality of nine different indicators we were able to extract from our telemetry, and provide, for the first time, quantitative indicators of their predictive power. </p><p>Our results show that even if consumers have a slightly lower encounter rate than enterprises (9.8% vs 12.0%), the latter do considerably better when selecting machines with an increasingly higher uptime (89% vs 53%). The two segments also diverge when we separately consider the presence of Adware and Potentially Unwanted Applications (PUA), and the generic samples detected through behavioral signatures: while consumers have an encounter rate for Adware and PUA that is 6 times higher than enterprise machines, those on average match behavioral signatures two times more frequently than the counterpart. We find, instead, similar trends when analyzing the age of encountered signatures, and the prevalence of different classes of traditional malware (such as Ransomware and Cryptominers). Finally, our findings show that the amount of time a host is active, the volume of files generated on the machine, the number and reputation of vendors of the installed applications, the host geographical location and its recurrent infected state carry useful information as indicators of systematic risk of malware encounters. Activity days and hours have a higher influence in the risk of consumers, increasing the odds of
{"title":"A Comparison of Systemic and Systematic Risks of Malware Encounters in Consumer and Enterprise Environments","authors":"Savino Dambra, Leyla Bilge, Davide Balzarotti","doi":"https://dl.acm.org/doi/10.1145/3565362","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3565362","url":null,"abstract":"<p>Malware is still a widespread problem and it is used by malicious actors to routinely compromise the security of computer systems. Consumers typically rely on a single AV product to detect and block possible malware infections, while corporations often install multiple security products, activate several layers of defenses, and establish security policies among employees. However, if a better security posture should lower the risk of malware infections, the actual extent to which this happens is still under debate by risk analysis experts. Moreover, the difference in risks encountered by consumers and enterprises has never been empirically studied by using real-world data. </p><p>In fact, the mere use of third-party software, network services, and the interconnected nature of our society necessarily exposes both classes of users to undiversifiable risks: independently from how careful users are and how well they manage their cyber hygiene, a portion of that risk would simply exist because of the fact of using a computer, sharing the same networks, and running the same software. </p><p>In this work, we shed light on both systemic (i.e., diversifiable and dependent on the security posture) and systematic (i.e., undiversifiable and independent of the cyber hygiene) risk classes. Leveraging the telemetry data of a popular security company, we compare, in the first part of our study, the effects that different security measures have on malware encounter risks in consumer and enterprise environments. In the second part, we conduct exploratory research on systematic risk, investigate the quality of nine different indicators we were able to extract from our telemetry, and provide, for the first time, quantitative indicators of their predictive power. </p><p>Our results show that even if consumers have a slightly lower encounter rate than enterprises (9.8% vs 12.0%), the latter do considerably better when selecting machines with an increasingly higher uptime (89% vs 53%). The two segments also diverge when we separately consider the presence of Adware and Potentially Unwanted Applications (PUA), and the generic samples detected through behavioral signatures: while consumers have an encounter rate for Adware and PUA that is 6 times higher than enterprise machines, those on average match behavioral signatures two times more frequently than the counterpart. We find, instead, similar trends when analyzing the age of encountered signatures, and the prevalence of different classes of traditional malware (such as Ransomware and Cryptominers). Finally, our findings show that the amount of time a host is active, the volume of files generated on the machine, the number and reputation of vendors of the installed applications, the host geographical location and its recurrent infected state carry useful information as indicators of systematic risk of malware encounters. Activity days and hours have a higher influence in the risk of consumers, increasing the odds of","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":"13 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138540690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}