首页 > 最新文献

ACM Transactions on Privacy and Security最新文献

英文 中文
OptiClass: An Optimized Classifier for Application Layer Protocols Using Bit Level Signatures OptiClass:一个使用比特级签名的应用层协议的优化分类器
IF 2.3 4区 计算机科学 Q1 Computer Science Pub Date : 2023-11-22 DOI: 10.1145/3633777
Mayank Swarnkar, Neha Sharma

Network traffic classification has many applications, such as security monitoring, quality of service, traffic engineering, etc. For the aforementioned applications, Deep Packet Inspection (DPI) is a popularly used technique for traffic classification because it scrutinizes the payload and provides comprehensive information for accurate analysis of network traffic. However, DPI-based methods reduce network performance because they are computationally expensive and hinder end-user privacy as they analyze the payload. To overcome these challenges, bit-level signatures are significantly used to perform network traffic classification. However, most of these methods still need to improve performance as they perform one-by-one signature matching of unknown payloads with application signatures for classification. Moreover, these methods become stagnant with the increase in application signatures. Therefore, to fill this gap, we propose OptiClass, an optimized classifier for application protocols using bit-level signatures. OptiClass performs parallel application signature matching with unknown flows, which results in faster, more accurate, and more efficient network traffic classification. OptiClass achieves twofold performance gains compared to the state-of-the-art methods. First, OptiClass generates bit-level signatures of just 32 bits for all the applications. This keeps OptiClass swift and privacy-preserving. Second, OptiClass uses a novel data structure called BiTSPLITTER for signature matching for fast and accurate classification. We evaluated the performance of OptiClass on three datasets consisting of twenty application protocols. Experimental results report that OptiClass has an average recall, precision, and F1-score of 97.36%, 97.38%, and 97.37%, respectively, and an average classification speed of 9.08 times faster than five closely related state-of-the-art methods.

网络流量分类在安全监控、服务质量、流量工程等方面有着广泛的应用。对于上述应用,深度包检测(Deep Packet Inspection, DPI)是一种常用的流量分类技术,因为它可以仔细检查负载,并提供全面的信息,以便准确分析网络流量。然而,基于dpi的方法降低了网络性能,因为它们在计算上很昂贵,并且在分析有效负载时妨碍了最终用户的隐私。为了克服这些挑战,比特级签名被大量用于执行网络流分类。然而,这些方法中的大多数仍然需要提高性能,因为它们将未知有效负载与应用程序签名进行一对一的签名匹配以进行分类。而且,这些方法会随着应用程序签名的增加而停滞不前。因此,为了填补这一空白,我们提出了OptiClass,一个使用位级签名的应用协议的优化分类器。OptiClass对未知流进行并行应用签名匹配,从而实现更快、更准确、更高效的网络流分类。与最先进的方法相比,OptiClass实现了两倍的性能提升。首先,OptiClass为所有应用程序生成32位的位级签名。这使OptiClass保持快速和隐私保护。其次,OptiClass使用一种名为BiTSPLITTER的新颖数据结构进行签名匹配,实现快速准确的分类。我们在由20个应用协议组成的三个数据集上评估了OptiClass的性能。实验结果表明,OptiClass的平均查全率、准确率和f1分数分别为97.36%、97.38%和97.37%,平均分类速度比5种密切相关的最新方法快9.08倍。
{"title":"OptiClass: An Optimized Classifier for Application Layer Protocols Using Bit Level Signatures","authors":"Mayank Swarnkar, Neha Sharma","doi":"10.1145/3633777","DOIUrl":"https://doi.org/10.1145/3633777","url":null,"abstract":"<p>Network traffic classification has many applications, such as security monitoring, quality of service, traffic engineering, etc. For the aforementioned applications, Deep Packet Inspection (DPI) is a popularly used technique for traffic classification because it scrutinizes the payload and provides comprehensive information for accurate analysis of network traffic. However, DPI-based methods reduce network performance because they are computationally expensive and hinder end-user privacy as they analyze the payload. To overcome these challenges, bit-level signatures are significantly used to perform network traffic classification. However, most of these methods still need to improve performance as they perform one-by-one signature matching of unknown payloads with application signatures for classification. Moreover, these methods become stagnant with the increase in application signatures. Therefore, to fill this gap, we propose <i>OptiClass</i>, an optimized classifier for application protocols using bit-level signatures. <i>OptiClass</i> performs parallel application signature matching with unknown flows, which results in faster, more accurate, and more efficient network traffic classification. <i>OptiClass</i> achieves twofold performance gains compared to the state-of-the-art methods. First, <i>OptiClass</i> generates bit-level signatures of just 32 bits for all the applications. This keeps <i>OptiClass</i> swift and privacy-preserving. Second, <i>OptiClass</i> uses a novel data structure called <i>BiTSPLITTER</i> for signature matching for fast and accurate classification. We evaluated the performance of <i>OptiClass</i> on three datasets consisting of twenty application protocols. Experimental results report that <i>OptiClass</i> has an average recall, precision, and F1-score of 97.36%, 97.38%, and 97.37%, respectively, and an average classification speed of 9.08 times faster than five closely related state-of-the-art methods.</p>","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138540712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-supervised Classification of Malware Families Under Extreme Class Imbalance via Hierarchical Non-Negative Matrix Factorization with Automatic Model Selection 基于层次非负矩阵分解和自动模型选择的极端类不平衡下的半监督恶意软件分类
4区 计算机科学 Q1 Computer Science Pub Date : 2023-11-13 DOI: 10.1145/3624567
Maksim E. Eren, Manish Bhattarai, Robert J. Joyce, Edward Raff, Charles Nicholas, Boian S. Alexandrov
Identification of the family to which a malware specimen belongs is essential in understanding the behavior of the malware and developing mitigation strategies. Solutions proposed by prior work, however, are often not practicable due to the lack of realistic evaluation factors. These factors include learning under class imbalance, the ability to identify new malware, and the cost of production-quality labeled data. In practice, deployed models face prominent, rare, and new malware families. At the same time, obtaining a large quantity of up-to-date labeled malware for training a model can be expensive. In this article, we address these problems and propose a novel hierarchical semi-supervised algorithm, which we call the HNMFk Classifier , that can be used in the early stages of the malware family labeling process. Our method is based on non-negative matrix factorization with automatic model selection, that is, with an estimation of the number of clusters. With HNMFk Classifier , we exploit the hierarchical structure of the malware data together with a semi-supervised setup, which enables us to classify malware families under conditions of extreme class imbalance. Our solution can perform abstaining predictions, or rejection option, which yields promising results in the identification of novel malware families and helps with maintaining the performance of the model when a low quantity of labeled data is used. We perform bulk classification of nearly 2,900 both rare and prominent malware families, through static analysis, using nearly 388,000 samples from the EMBER-2018 corpus. In our experiments, we surpass both supervised and semi-supervised baseline models with an F1 score of 0.80.
识别恶意软件样本所属的家族对于理解恶意软件的行为和制定缓解策略至关重要。然而,由于缺乏现实的评价因素,以往工作提出的解决方案往往不可行。这些因素包括在班级不平衡的情况下学习,识别新恶意软件的能力,以及生产质量标记数据的成本。实际上,部署的模型面临着突出的、罕见的和新的恶意软件家族。与此同时,获取大量最新标记的恶意软件来训练模型可能是昂贵的。在本文中,我们解决了这些问题,并提出了一种新的分层半监督算法,我们称之为HNMFk分类器,可用于恶意软件家族标记过程的早期阶段。我们的方法是基于自动模型选择的非负矩阵分解,即对聚类数量进行估计。利用HNMFk分类器,我们利用恶意软件数据的层次结构和半监督设置,使我们能够在极端类别不平衡的情况下对恶意软件家族进行分类。我们的解决方案可以执行弃权预测或拒绝选项,这在识别新的恶意软件家族方面产生了有希望的结果,并有助于在使用少量标记数据时保持模型的性能。通过静态分析,我们使用来自2018年12月语料库的近38.8万个样本,对近2900个罕见和突出的恶意软件家族进行了批量分类。在我们的实验中,我们以0.80的F1分数超越了监督和半监督基线模型。
{"title":"Semi-supervised Classification of Malware Families Under Extreme Class Imbalance via Hierarchical Non-Negative Matrix Factorization with Automatic Model Selection","authors":"Maksim E. Eren, Manish Bhattarai, Robert J. Joyce, Edward Raff, Charles Nicholas, Boian S. Alexandrov","doi":"10.1145/3624567","DOIUrl":"https://doi.org/10.1145/3624567","url":null,"abstract":"Identification of the family to which a malware specimen belongs is essential in understanding the behavior of the malware and developing mitigation strategies. Solutions proposed by prior work, however, are often not practicable due to the lack of realistic evaluation factors. These factors include learning under class imbalance, the ability to identify new malware, and the cost of production-quality labeled data. In practice, deployed models face prominent, rare, and new malware families. At the same time, obtaining a large quantity of up-to-date labeled malware for training a model can be expensive. In this article, we address these problems and propose a novel hierarchical semi-supervised algorithm, which we call the HNMFk Classifier , that can be used in the early stages of the malware family labeling process. Our method is based on non-negative matrix factorization with automatic model selection, that is, with an estimation of the number of clusters. With HNMFk Classifier , we exploit the hierarchical structure of the malware data together with a semi-supervised setup, which enables us to classify malware families under conditions of extreme class imbalance. Our solution can perform abstaining predictions, or rejection option, which yields promising results in the identification of novel malware families and helps with maintaining the performance of the model when a low quantity of labeled data is used. We perform bulk classification of nearly 2,900 both rare and prominent malware families, through static analysis, using nearly 388,000 samples from the EMBER-2018 corpus. In our experiments, we surpass both supervised and semi-supervised baseline models with an F1 score of 0.80.","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134992865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
symbSODA: Configurable and Verifiable Orchestration Automation for Active Malware Deception symbSODA:主动恶意软件欺骗的可配置和可验证编排自动化
4区 计算机科学 Q1 Computer Science Pub Date : 2023-11-13 DOI: 10.1145/3624568
Md Sajidul Islam Sajid, Jinpeng Wei, Ehab Al-Shaer, Qi Duan, Basel Abdeen, Latifur Khan
Malware is commonly used by adversaries to compromise and infiltrate cyber systems in order to steal sensitive information or destroy critical assets. Active Cyber Deception (ACD) has emerged as an effective proactive cyber defense against malware to enable misleading adversaries by presenting fake data and engaging them to learn novel attack techniques. However, real-time malware deception is a complex and challenging task because (1) it requires a comprehensive understanding of the malware behaviors at technical and tactical levels in order to create the appropriate deception ploys and resources that can leverage this behavior and mislead malware, and (2) it requires a configurable yet provably valid deception planning to guarantee effective and safe real-time deception orchestration. This article presents symbSODA, a highly configurable and verifiable cyber deception system that analyzes real-world malware using multipath execution to discover API patterns that represent attack techniques/tactics critical for deception, enables users to create their own customized deception ploys based on the malware type and objectives, allows for constructing conflict-free Deception Playbooks , and finally automates the deception orchestration to execute the malware inside a deceptive environment. symbSODA extracts Malicious Sub-graphs (MSGs) consisting of WinAPIs from real-world malware and maps them to tactics and techniques using the ATT&CK framework to facilitate the construction of meaningful user-defined deception playbooks. We conducted a comprehensive evaluation study on symbSODA using 255 recent malware samples. We demonstrated that the accuracy of the end-to-end malware deception is 95% on average, with negligible overhead using various deception goals and strategies. Furthermore, our approach successfully extracted MSGs with a 97% recall, and our MSG-to-MITRE mapping achieved a top-1 accuracy of 88.75%. Our study suggests that symbSODA can serve as a general-purpose Malware Deception Factory to automatically produce customized deception playbooks against arbitrary malware behavior.
恶意软件通常被对手用来破坏和渗透网络系统,以窃取敏感信息或破坏关键资产。主动网络欺骗(ACD)已经成为一种有效的主动网络防御恶意软件,通过提供虚假数据并吸引他们学习新的攻击技术来误导对手。然而,实时恶意软件欺骗是一项复杂而具有挑战性的任务,因为(1)它需要在技术和战术层面全面了解恶意软件行为,以便创建适当的欺骗手段和资源,可以利用这种行为并误导恶意软件;(2)它需要一个可配置但可证明有效的欺骗计划,以保证有效和安全的实时欺骗编排。本文介绍了symbSODA,一个高度可配置和可验证的网络欺骗系统,它使用多路径执行来分析现实世界的恶意软件,以发现对欺骗至关重要的攻击技术/战术的API模式,使用户能够根据恶意软件类型和目标创建自己的定制欺骗策略,允许构建无冲突的欺骗剧本。最后自动化欺骗编排,在欺骗环境中执行恶意软件。symbSODA从真实世界的恶意软件中提取由winapi组成的恶意子图(msg),并使用ATT&CK框架将它们映射到战术和技术上,以促进有意义的用户定义欺骗剧本的构建。我们使用255个最近的恶意软件样本对symbSODA进行了全面的评估研究。我们证明了端到端恶意软件欺骗的准确率平均为95%,使用各种欺骗目标和策略的开销可以忽略不计。此外,我们的方法以97%的召回率成功提取了msg,我们的MSG-to-MITRE映射达到了88.75%的前1精度。我们的研究表明,symbSODA可以作为一个通用的恶意软件欺骗工厂,自动生成针对任意恶意软件行为的定制欺骗剧本。
{"title":"symbSODA: Configurable and Verifiable Orchestration Automation for Active Malware Deception","authors":"Md Sajidul Islam Sajid, Jinpeng Wei, Ehab Al-Shaer, Qi Duan, Basel Abdeen, Latifur Khan","doi":"10.1145/3624568","DOIUrl":"https://doi.org/10.1145/3624568","url":null,"abstract":"Malware is commonly used by adversaries to compromise and infiltrate cyber systems in order to steal sensitive information or destroy critical assets. Active Cyber Deception (ACD) has emerged as an effective proactive cyber defense against malware to enable misleading adversaries by presenting fake data and engaging them to learn novel attack techniques. However, real-time malware deception is a complex and challenging task because (1) it requires a comprehensive understanding of the malware behaviors at technical and tactical levels in order to create the appropriate deception ploys and resources that can leverage this behavior and mislead malware, and (2) it requires a configurable yet provably valid deception planning to guarantee effective and safe real-time deception orchestration. This article presents symbSODA, a highly configurable and verifiable cyber deception system that analyzes real-world malware using multipath execution to discover API patterns that represent attack techniques/tactics critical for deception, enables users to create their own customized deception ploys based on the malware type and objectives, allows for constructing conflict-free Deception Playbooks , and finally automates the deception orchestration to execute the malware inside a deceptive environment. symbSODA extracts Malicious Sub-graphs (MSGs) consisting of WinAPIs from real-world malware and maps them to tactics and techniques using the ATT&amp;CK framework to facilitate the construction of meaningful user-defined deception playbooks. We conducted a comprehensive evaluation study on symbSODA using 255 recent malware samples. We demonstrated that the accuracy of the end-to-end malware deception is 95% on average, with negligible overhead using various deception goals and strategies. Furthermore, our approach successfully extracted MSGs with a 97% recall, and our MSG-to-MITRE mapping achieved a top-1 accuracy of 88.75%. Our study suggests that symbSODA can serve as a general-purpose Malware Deception Factory to automatically produce customized deception playbooks against arbitrary malware behavior.","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134992869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Measures of Information Leakage for Incomplete Statistical Information: Application to a Binary Privacy Mechanism 不完全统计信息的信息泄漏度量:在二进制隐私机制中的应用
4区 计算机科学 Q1 Computer Science Pub Date : 2023-11-13 DOI: 10.1145/3624982
Shahnewaz Karim Sakib, George T Amariucai, Yong Guan
Information leakage is usually defined as the logarithmic increment in the adversary’s probability of correctly guessing the legitimate user’s private data or some arbitrary function of the private data when presented with the legitimate user’s publicly disclosed information. However, this definition of information leakage implicitly assumes that both the privacy mechanism and the prior probability of the original data are entirely known to the attacker. In reality, the assumption of complete knowledge of the privacy mechanism for an attacker is often impractical. The attacker can usually have access to only an approximate version of the correct privacy mechanism, computed from a limited set of the disclosed data, for which they can access the corresponding un-distorted data. In this scenario, the conventional definition of leakage no longer has an operational meaning. To address this problem, in this article, we propose novel meaningful information-theoretic metrics for information leakage when the attacker has incomplete information about the privacy mechanism—we call them average subjective leakage , average confidence boost , and average objective leakage , respectively. For the simplest, binary scenario, we demonstrate how to find an optimized privacy mechanism that minimizes the worst-case value of either of these leakages.
信息泄漏通常被定义为攻击者正确猜测合法用户私有数据的概率的对数增量,或者当合法用户公开披露的信息出现时私有数据的任意函数。然而,信息泄漏的这个定义隐含地假设攻击者完全知道隐私机制和原始数据的先验概率。在现实中,假设攻击者完全了解隐私机制通常是不切实际的。攻击者通常只能访问从有限的公开数据集计算出来的正确隐私机制的一个近似版本,因此他们可以访问相应的未扭曲的数据。在这种情况下,泄漏的传统定义不再具有操作意义。为了解决这个问题,在本文中,我们为攻击者拥有关于隐私机制的不完全信息时的信息泄漏提出了新的有意义的信息论度量——我们分别称之为平均主观泄漏、平均信心增强和平均客观泄漏。对于最简单的二进制场景,我们演示了如何找到一种优化的隐私机制,使这两种泄漏的最坏情况值最小化。
{"title":"Measures of Information Leakage for Incomplete Statistical Information: Application to a Binary Privacy Mechanism","authors":"Shahnewaz Karim Sakib, George T Amariucai, Yong Guan","doi":"10.1145/3624982","DOIUrl":"https://doi.org/10.1145/3624982","url":null,"abstract":"Information leakage is usually defined as the logarithmic increment in the adversary’s probability of correctly guessing the legitimate user’s private data or some arbitrary function of the private data when presented with the legitimate user’s publicly disclosed information. However, this definition of information leakage implicitly assumes that both the privacy mechanism and the prior probability of the original data are entirely known to the attacker. In reality, the assumption of complete knowledge of the privacy mechanism for an attacker is often impractical. The attacker can usually have access to only an approximate version of the correct privacy mechanism, computed from a limited set of the disclosed data, for which they can access the corresponding un-distorted data. In this scenario, the conventional definition of leakage no longer has an operational meaning. To address this problem, in this article, we propose novel meaningful information-theoretic metrics for information leakage when the attacker has incomplete information about the privacy mechanism—we call them average subjective leakage , average confidence boost , and average objective leakage , respectively. For the simplest, binary scenario, we demonstrate how to find an optimized privacy mechanism that minimizes the worst-case value of either of these leakages.","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134992867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sound-based Two-Factor Authentication: Vulnerabilities and Redesign 基于声音的双因素身份验证:漏洞和重新设计
4区 计算机科学 Q1 Computer Science Pub Date : 2023-11-11 DOI: 10.1145/3632175
Prakash Shrestha, Ahmed Tanvir Mahdad, Nitesh Saxena
Reducing the level of user effort involved in traditional two-factor authentication (TFA) constitutes an important research topic. An interesting representative approach, Sound-Proof , leverages ambient sounds to detect the proximity between the second-factor device (phone) and the login terminal (browser), and eliminates the need for the user to transfer PIN codes. In this paper, we identify a weakness of the Sound-Proof system that makes it completely vulnerable to passive “environment guessing” and active “environment manipulating” remote attackers and proximity attackers. Addressing these security issues, we propose Listening-Watch , a new TFA mechanism based on a wearable device (watch/bracelet) and active browser-generated random speech sounds. As the user attempts to log in, the browser populates a short random code encoded into speech, and the login succeeds if the watch’s audio recording contains this code (decoded using speech recognition ), and is similar enough to the browser’s audio recording. The remote attacker, who has guessed/manipulated the user’s environment, will be defeated since authentication success relies upon the presence of the random code in watch’s recordings. The proximity attacker will also be defeated unless it is extremely close (< 50 cm) to the watch since the wearable microphones are usually designed to capture only nearby sounds (e.g., voice commands).
减少传统的双因素身份验证(TFA)所涉及的用户工作量是一个重要的研究课题。一种有趣的代表性方法,Sound-Proof,利用环境声音来检测第二因素设备(电话)和登录终端(浏览器)之间的接近程度,并且消除了用户传输PIN码的需要。在本文中,我们确定了隔音系统的一个弱点,使其完全容易受到被动的“环境猜测”和主动的“环境操纵”远程攻击者和近距离攻击者的攻击。为了解决这些安全问题,我们提出了listen - watch,一种基于可穿戴设备(手表/手环)和主动浏览器生成随机语音的新TFA机制。当用户尝试登录时,浏览器将填充一个短的随机代码编码为语音,如果手表的音频记录包含此代码(使用语音识别解码),并且与浏览器的音频记录足够相似,则登录成功。远程攻击者,谁已经猜到/操纵用户的环境,将被击败,因为身份验证的成功依赖于手表的记录中随机代码的存在。近距离攻击者也将被击败,除非它非常接近(<50厘米),因为可穿戴式麦克风通常被设计为只捕捉附近的声音(例如语音命令)。
{"title":"Sound-based Two-Factor Authentication: Vulnerabilities and Redesign","authors":"Prakash Shrestha, Ahmed Tanvir Mahdad, Nitesh Saxena","doi":"10.1145/3632175","DOIUrl":"https://doi.org/10.1145/3632175","url":null,"abstract":"Reducing the level of user effort involved in traditional two-factor authentication (TFA) constitutes an important research topic. An interesting representative approach, Sound-Proof , leverages ambient sounds to detect the proximity between the second-factor device (phone) and the login terminal (browser), and eliminates the need for the user to transfer PIN codes. In this paper, we identify a weakness of the Sound-Proof system that makes it completely vulnerable to passive “environment guessing” and active “environment manipulating” remote attackers and proximity attackers. Addressing these security issues, we propose Listening-Watch , a new TFA mechanism based on a wearable device (watch/bracelet) and active browser-generated random speech sounds. As the user attempts to log in, the browser populates a short random code encoded into speech, and the login succeeds if the watch’s audio recording contains this code (decoded using speech recognition ), and is similar enough to the browser’s audio recording. The remote attacker, who has guessed/manipulated the user’s environment, will be defeated since authentication success relies upon the presence of the random code in watch’s recordings. The proximity attacker will also be defeated unless it is extremely close (< 50 cm) to the watch since the wearable microphones are usually designed to capture only nearby sounds (e.g., voice commands).","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135041828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Eyes See Hazy while Algorithms Recognize Who You Are 眼睛看到模糊,而算法识别你是谁
4区 计算机科学 Q1 Computer Science Pub Date : 2023-11-10 DOI: 10.1145/3632292
Yong Zeng, Jiale Liu, Tong Dong, Qingqi Pei, Jianfeng Ma, Yao Liu
Facial recognition technology has been developed and widely used for decades. However, it has also made privacy concerns and researchers’ expectations for facial recognition privacy-preserving technologies. To provide privacy, detailed or semantic contents in face images should be obfuscated. However, face recognition algorithms have to be tailor-designed according to current obfuscation methods, as a result the face recognition service provider has to update its commercial off-the-shelf(COTS) products for each obfuscation method. Meanwhile, current obfuscation methods have no clearly quantified explanation. This paper presents a universal face obfuscation method for a family of face recognition algorithms using global or local structure of eigenvector space. By specific mathematical explanations, we show that the upper bound of the distance between the original and obfuscated face images is smaller than the given recognition threshold. Experiments show that the recognition degradation is 0% for global structure based and 0.3%-5.3% for local structure based, respectively. Meanwhile, we show that even if an attacker knows the whole obfuscation method, he/she has to enumerate all the possible roots of a polynomial with an obfuscation coefficient, which is computationally infeasible to reconstruct original faces. So our method shows a good performance in both privacy and recognition accuracy without modifying recognition algorithms.
人脸识别技术已经发展和广泛应用了几十年。然而,这也引起了人们对隐私的担忧和研究人员对面部识别隐私保护技术的期望。为了保护隐私,人脸图像中的细节或语义内容应该进行模糊处理。然而,人脸识别算法必须根据现有的混淆方法进行定制设计,因此人脸识别服务提供商必须针对每种混淆方法更新其商用现货(COTS)产品。同时,目前的混淆方法没有明确的量化解释。本文提出了一种基于特征向量空间的全局或局部结构的通用人脸混淆方法。通过具体的数学解释,我们证明了原始和模糊人脸图像之间距离的上界小于给定的识别阈值。实验结果表明,基于全局结构的识别退化率为0%,基于局部结构的识别退化率为0.3% ~ 5.3%。同时,我们证明了即使攻击者知道整个混淆方法,他/她也必须枚举具有混淆系数的多项式的所有可能根,这在计算上是不可实现的,无法重建原始人脸。该方法在不修改识别算法的情况下,在隐私性和识别精度方面都有较好的表现。
{"title":"Eyes See Hazy while Algorithms Recognize Who You Are","authors":"Yong Zeng, Jiale Liu, Tong Dong, Qingqi Pei, Jianfeng Ma, Yao Liu","doi":"10.1145/3632292","DOIUrl":"https://doi.org/10.1145/3632292","url":null,"abstract":"Facial recognition technology has been developed and widely used for decades. However, it has also made privacy concerns and researchers’ expectations for facial recognition privacy-preserving technologies. To provide privacy, detailed or semantic contents in face images should be obfuscated. However, face recognition algorithms have to be tailor-designed according to current obfuscation methods, as a result the face recognition service provider has to update its commercial off-the-shelf(COTS) products for each obfuscation method. Meanwhile, current obfuscation methods have no clearly quantified explanation. This paper presents a universal face obfuscation method for a family of face recognition algorithms using global or local structure of eigenvector space. By specific mathematical explanations, we show that the upper bound of the distance between the original and obfuscated face images is smaller than the given recognition threshold. Experiments show that the recognition degradation is 0% for global structure based and 0.3%-5.3% for local structure based, respectively. Meanwhile, we show that even if an attacker knows the whole obfuscation method, he/she has to enumerate all the possible roots of a polynomial with an obfuscation coefficient, which is computationally infeasible to reconstruct original faces. So our method shows a good performance in both privacy and recognition accuracy without modifying recognition algorithms.","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135137594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient History-Driven Adversarial Perturbation Distribution Learning in Low Frequency Domain 低频域历史驱动的有效对抗摄动分布学习
4区 计算机科学 Q1 Computer Science Pub Date : 2023-11-08 DOI: 10.1145/3632293
Han Cao, Qindong Sun, Yaqi Li, Rong Geng, Xiaoxiong Wang
The existence of adversarial image makes us have to doubt the credibility of artificial intelligence system. Attackers can use carefully processed adversarial images to carry out a variety of attacks. Inspired by the theory of image compressed sensing, this paper proposes a new black-box attack, (mathcal {N}text{-HSA}_{LF} ) . It uses covariance matrix adaptive evolution strategy (CMA-ES) to learn the distribution of adversarial perturbation in low frequency domain, reducing the dimensionality of solution space. And sep-CMA-ES is used to set the covariance matrix as a diagonal matrix, which further reduces the dimensions that need to be updated for the covariance matrix of multivariate Gaussian distribution learned in attacks, thereby reducing the computational cost of attack. And on this basis, we propose history-driven mean update and current optimal solution-guided improvement strategies to avoid the evolution of distribution to a worse direction. The experimental results show that the proposed (mathcal {N}text{-HSA}_{LF} ) can achieve a higher attack success rate with fewer queries on attacking both CNN-based and transformer-based target models under L 2 -norm and L ∞ -norm constraints of perturbation. We also conduct an ablation study and the results show that the proposed improved strategies can effectively reduce the number of visits to the target model when making adversarial examples for hard examples. In addition, our attack is able to make the integrated defense strategy of GRIP-GAN and noise-embedded training ineffective to a certain extent.
对抗性图像的存在使我们不得不对人工智能系统的可信度产生怀疑。攻击者可以使用经过精心处理的对抗图像来进行各种攻击。受图像压缩感知理论的启发,本文提出了一种新的黑盒攻击方法(mathcal {N}text{-HSA}_{LF} )。它采用协方差矩阵自适应进化策略(CMA-ES)来学习对抗扰动在低频域的分布,降低解空间的维数。利用sep-CMA-ES将协方差矩阵设置为对角矩阵,进一步降低了攻击中学习到的多元高斯分布协方差矩阵需要更新的维数,从而降低了攻击的计算代价。在此基础上,提出了历史驱动的均值更新策略和当前最优解导向的改进策略,以避免分布向较差方向演化。实验结果表明,在扰动的l2范数和L∞范数约束下,本文提出的(mathcal {N}text{-HSA}_{LF} )在攻击基于cnn和基于变压器的目标模型时,能够以较少的查询次数获得较高的攻击成功率。我们还进行了消融研究,结果表明所提出的改进策略可以有效地减少对目标模型的访问次数。此外,我们的攻击可以在一定程度上使GRIP-GAN与噪声嵌入训练的综合防御策略失效。
{"title":"Efficient History-Driven Adversarial Perturbation Distribution Learning in Low Frequency Domain","authors":"Han Cao, Qindong Sun, Yaqi Li, Rong Geng, Xiaoxiong Wang","doi":"10.1145/3632293","DOIUrl":"https://doi.org/10.1145/3632293","url":null,"abstract":"The existence of adversarial image makes us have to doubt the credibility of artificial intelligence system. Attackers can use carefully processed adversarial images to carry out a variety of attacks. Inspired by the theory of image compressed sensing, this paper proposes a new black-box attack, (mathcal {N}text{-HSA}_{LF} ) . It uses covariance matrix adaptive evolution strategy (CMA-ES) to learn the distribution of adversarial perturbation in low frequency domain, reducing the dimensionality of solution space. And sep-CMA-ES is used to set the covariance matrix as a diagonal matrix, which further reduces the dimensions that need to be updated for the covariance matrix of multivariate Gaussian distribution learned in attacks, thereby reducing the computational cost of attack. And on this basis, we propose history-driven mean update and current optimal solution-guided improvement strategies to avoid the evolution of distribution to a worse direction. The experimental results show that the proposed (mathcal {N}text{-HSA}_{LF} ) can achieve a higher attack success rate with fewer queries on attacking both CNN-based and transformer-based target models under L 2 -norm and L ∞ -norm constraints of perturbation. We also conduct an ablation study and the results show that the proposed improved strategies can effectively reduce the number of visits to the target model when making adversarial examples for hard examples. In addition, our attack is able to make the integrated defense strategy of GRIP-GAN and noise-embedded training ineffective to a certain extent.","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135342518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Forward Security with Crash Recovery for Secure Logs 前向安全与崩溃恢复安全日志
4区 计算机科学 Q1 Computer Science Pub Date : 2023-11-03 DOI: 10.1145/3631524
Erik-Oliver Blass, Guevara Noubir
Logging is a key mechanism in the security of computer systems. Beyond supporting important forward security properties, it is critical that logging withstands both failures and intentional tampering to prevent subtle attacks leaving the system in an inconsistent state with inconclusive evidence. We propose new techniques combining forward security with crash recovery for secure log data storage. As the support of specifically forward integrity and the online nature of logging prevent the use of conventional coding, we propose and analyze a coding scheme resolving these unique design constraints. Specifically, our coding enables forward integrity, online encoding, and most importantly a constant number of operations per encoding. It adds a new log item by (mathsf {XOR} ) ing it to k cells of a table. If up to a certain threshold of cells is modified by the adversary, or lost due to a crash, we still guarantee recovery of all stored log items. The main advantage of the coding scheme is its efficiency and compatibility with forward integrity. The key contribution of the paper is the use of spectral graph theory techniques to prove that k is constant in the number n of all log items ever stored and small in practice, e.g., k = 5. Moreover, we prove that to cope with up to (sqrt {n} ) modified or lost log items, storage expansion is constant in n and small in practice. For k = 5, the size of the table is only (12% ) more than the simple concatenation of all n items. We propose and evaluate original techniques to scale the computation cost of recovery to several GBytes of security logs. We instantiate our scheme into an abstract data structure which allows to either detect adversarial modifications to log items or treat modifications like data loss in a system crash. The data structure can recover lost log items, thereby effectively reverting adversarial modifications.
日志记录是保证计算机系统安全的关键机制。除了支持重要的前向安全属性之外,日志记录还必须能够承受故障和故意篡改,以防止微妙的攻击使系统处于不一致的状态和不确定的证据。我们提出了将前向安全与崩溃恢复相结合的新技术来保证日志数据的安全存储。由于前向完整性的支持和日志的在线特性阻止了传统编码的使用,我们提出并分析了一种解决这些独特设计约束的编码方案。具体来说,我们的编码支持前向完整性、在线编码,最重要的是,每次编码的操作次数是恒定的。它通过(mathsf {XOR} )将一个新的日志项添加到一个表的k个单元格中。如果攻击者修改了一定阈值的单元格,或者由于崩溃而丢失了单元格,我们仍然保证恢复所有存储的日志项。该编码方案的主要优点是高效且兼容前向完整性。本文的关键贡献是使用谱图理论技术证明了k在所有存储的log项的数量n中是恒定的,并且在实践中很小,例如k = 5。此外,我们证明了在处理高达(sqrt {n} )修改或丢失的日志项时,存储扩展在n中是恒定的,并且在实践中很小。对于k = 5,表的大小仅比所有n项的简单连接大(12% )。我们提出并评估了将恢复计算成本扩展到几gb安全日志的原始技术。我们将我们的方案实例化为一个抽象的数据结构,该结构允许检测对日志项的对抗性修改,或者将修改视为系统崩溃中的数据丢失。数据结构可以恢复丢失的日志项,从而有效地恢复对抗性修改。
{"title":"Forward Security with Crash Recovery for Secure Logs","authors":"Erik-Oliver Blass, Guevara Noubir","doi":"10.1145/3631524","DOIUrl":"https://doi.org/10.1145/3631524","url":null,"abstract":"Logging is a key mechanism in the security of computer systems. Beyond supporting important forward security properties, it is critical that logging withstands both failures and intentional tampering to prevent subtle attacks leaving the system in an inconsistent state with inconclusive evidence. We propose new techniques combining forward security with crash recovery for secure log data storage. As the support of specifically forward integrity and the online nature of logging prevent the use of conventional coding, we propose and analyze a coding scheme resolving these unique design constraints. Specifically, our coding enables forward integrity, online encoding, and most importantly a constant number of operations per encoding. It adds a new log item by (mathsf {XOR} ) ing it to k cells of a table. If up to a certain threshold of cells is modified by the adversary, or lost due to a crash, we still guarantee recovery of all stored log items. The main advantage of the coding scheme is its efficiency and compatibility with forward integrity. The key contribution of the paper is the use of spectral graph theory techniques to prove that k is constant in the number n of all log items ever stored and small in practice, e.g., k = 5. Moreover, we prove that to cope with up to (sqrt {n} ) modified or lost log items, storage expansion is constant in n and small in practice. For k = 5, the size of the table is only (12% ) more than the simple concatenation of all n items. We propose and evaluate original techniques to scale the computation cost of recovery to several GBytes of security logs. We instantiate our scheme into an abstract data structure which allows to either detect adversarial modifications to log items or treat modifications like data loss in a system crash. The data structure can recover lost log items, thereby effectively reverting adversarial modifications.","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135818730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepMark: A Scalable and Robust Framework for DeepFake Video Detection DeepMark:一个可扩展和鲁棒的深度假视频检测框架
4区 计算机科学 Q1 Computer Science Pub Date : 2023-11-01 DOI: 10.1145/3629976
Li Tang, Qingqing Ye, Haibo Hu, Qiao Xue, Yaxin Xiao, Jin Li
With the rapid growth of DeepFake video techniques, it becomes increasingly challenging to identify them visually, posing a huge threat to our society. Unfortunately, existing detection schemes are limited to exploiting the artifacts left by DeepFake manipulations, so they struggle to keep pace with the ever-improving DeepFake models. In this work, we propose DeepMark, a scalable and robust framework for detecting DeepFakes. It imprints essential visual features of a video into DeepMark Meta (DMM), and uses it to detect DeepFake manipulations by comparing the extracted visual features with the ground truth in DMM. Therefore, DeepMark is future-proof because a DeepFake video must aim to alter some visual feature, no matter how “natural” it looks. Furthermore, DMM also contains a signature for verifying the integrity of the above features. And an essential link to the features as well as their signature is attached with error correction codes and embedded in the video watermark. To improve the efficiency of DMM creation, we also present a threshold-based feature selection scheme and a deduced face detection scheme. Experimental results demonstrate the effectiveness and efficiency of DeepMark on DeepFake video detection under various datasets and parameter settings.
随着DeepFake视频技术的快速发展,在视觉上识别它们变得越来越困难,对我们的社会构成了巨大的威胁。不幸的是,现有的检测方案仅限于利用DeepFake操纵留下的工件,因此它们很难跟上不断改进的DeepFake模型的步伐。在这项工作中,我们提出了DeepMark,一个可扩展和鲁棒的框架,用于检测DeepFakes。它将视频的基本视觉特征刻印到DeepMark Meta (DMM)中,并通过将提取的视觉特征与DMM中的ground truth进行比较来检测DeepFake的操作。因此,DeepMark是面向未来的,因为DeepFake视频必须旨在改变一些视觉特征,无论它看起来多么“自然”。此外,DMM还包含一个签名,用于验证上述特性的完整性。在特征及其签名的关键环节上附加纠错码并嵌入到视频水印中。为了提高DMM的创建效率,我们还提出了一种基于阈值的特征选择方案和一种推导的人脸检测方案。实验结果证明了DeepMark在不同数据集和参数设置下对DeepFake视频检测的有效性和高效性。
{"title":"DeepMark: A Scalable and Robust Framework for DeepFake Video Detection","authors":"Li Tang, Qingqing Ye, Haibo Hu, Qiao Xue, Yaxin Xiao, Jin Li","doi":"10.1145/3629976","DOIUrl":"https://doi.org/10.1145/3629976","url":null,"abstract":"With the rapid growth of DeepFake video techniques, it becomes increasingly challenging to identify them visually, posing a huge threat to our society. Unfortunately, existing detection schemes are limited to exploiting the artifacts left by DeepFake manipulations, so they struggle to keep pace with the ever-improving DeepFake models. In this work, we propose DeepMark, a scalable and robust framework for detecting DeepFakes. It imprints essential visual features of a video into DeepMark Meta (DMM), and uses it to detect DeepFake manipulations by comparing the extracted visual features with the ground truth in DMM. Therefore, DeepMark is future-proof because a DeepFake video must aim to alter some visual feature, no matter how “natural” it looks. Furthermore, DMM also contains a signature for verifying the integrity of the above features. And an essential link to the features as well as their signature is attached with error correction codes and embedded in the video watermark. To improve the efficiency of DMM creation, we also present a threshold-based feature selection scheme and a deduced face detection scheme. Experimental results demonstrate the effectiveness and efficiency of DeepMark on DeepFake video detection under various datasets and parameter settings.","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135372097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Detecting and Measuring Exploitable JavaScript Functions in Real-World Applications 关于在实际应用中检测和测量可利用的JavaScript函数
4区 计算机科学 Q1 Computer Science Pub Date : 2023-10-26 DOI: 10.1145/3630253
Maryna Kluban, Mohammad Mannan, Amr Youssef
JavaScript is often rated as the most popular programming language for the development of both client-side and server-side applications. Because of its popularity, JavaScript has become a frequent target for attackers who exploit vulnerabilities in the source code to take control over the application. To address these JavaScript security issues, such vulnerabilities must be identified first. Existing studies in vulnerable code detection in JavaScript mostly consider package-level vulnerability tracking and measurements. However, such package-level analysis is largely imprecise as real-world services that include a vulnerable package may not use the vulnerable functions in the package. Moreover, even the inclusion of a vulnerable function may not lead to a security problem, if the function cannot be triggered with exploitable inputs. In this paper, we develop a vulnerability detection framework that uses vulnerable pattern recognition and textual similarity methods to detect vulnerable functions in real-world JavaScript projects, combined with a static multi-file taint analysis mechanism to further assess the impact of the vulnerabilities on the whole project (i.e., whether the vulnerability can be exploited in a given project). We compose a comprehensive dataset of 1,360 verified vulnerable JavaScript functions using the Snyk vulnerability database and the VulnCode-DB project. From this ground-truth dataset, we build our vulnerable patterns for two common vulnerability types: prototype pollution and Regular Expression Denial of Service (ReDoS). With our framework, we analyze 9,205,654 functions (from 3,000 NPM packages, 1892 websites and 557 Chrome Web extensions), and detect 117,601 prototype pollution and 7,333 ReDoS vulnerabilities. By further processing all 5,839 findings from NPM packages with our taint analyzer, we verify the exploitability of 290 zero-day cases across 134 NPM packages. In addition, we conduct an in-depth contextual analysis of the findings in 17 popular/critical projects and study the practical security exposure of 20 functions. With our semi-automated vulnerability reporting functionality, we disclosed all verified findings to project owners. We also obtained 25 published CVEs for our findings, 19 of them rated as “Critical” severity, and six rated as “High” severity. Additionally, we obtained 169 CVEs that are currently “Reserved” (as of Apr. 2023). As evident from the results, our approach can shift JavaScript vulnerability detection from the coarse package/library level to the function level, and thus improve the accuracy of detection and aid timely patching.
JavaScript通常被认为是开发客户端和服务器端应用程序最流行的编程语言。由于其受欢迎程度,JavaScript已经成为攻击者利用源代码中的漏洞来控制应用程序的常见目标。要解决这些JavaScript安全问题,必须首先确定这些漏洞。现有的JavaScript漏洞代码检测研究主要考虑包级漏洞跟踪和度量。然而,这种包级分析在很大程度上是不精确的,因为包含易受攻击包的实际服务可能不会使用包中的易受攻击功能。此外,如果不能使用可利用的输入触发该功能,即使包含易受攻击的功能也可能不会导致安全问题。在本文中,我们开发了一个漏洞检测框架,利用漏洞模式识别和文本相似度方法检测真实JavaScript项目中的漏洞函数,并结合静态多文件污染分析机制,进一步评估漏洞对整个项目的影响(即在给定项目中是否可以利用漏洞)。我们使用Snyk漏洞数据库和VulnCode-DB项目组成了1360个经过验证的脆弱JavaScript函数的综合数据集。根据这个基本事实数据集,我们为两种常见的漏洞类型构建了漏洞模式:原型污染和正则表达式拒绝服务(ReDoS)。利用我们的框架,我们分析了9,205,654个函数(来自3,000个NPM包,1892个网站和557个Chrome Web扩展),并检测出117,601个原型污染和7,333个ReDoS漏洞。通过使用污染分析仪进一步处理NPM包中的所有5839个发现,我们验证了134个NPM包中290个零日漏洞的可利用性。此外,我们对17个流行/关键项目的调查结果进行了深入的上下文分析,并研究了20个功能的实际安全暴露。通过我们的半自动漏洞报告功能,我们向项目所有者披露了所有经过验证的发现。我们还为我们的发现获得了25个已发表的cve,其中19个被评为“关键”严重性,6个被评为“高”严重性。此外,我们获得了169个目前“保留”的cve(截至2023年4月)。从结果可以看出,我们的方法可以将JavaScript漏洞检测从粗包/库级别转移到函数级别,从而提高检测的准确性并有助于及时修补。
{"title":"On Detecting and Measuring Exploitable JavaScript Functions in Real-World Applications","authors":"Maryna Kluban, Mohammad Mannan, Amr Youssef","doi":"10.1145/3630253","DOIUrl":"https://doi.org/10.1145/3630253","url":null,"abstract":"JavaScript is often rated as the most popular programming language for the development of both client-side and server-side applications. Because of its popularity, JavaScript has become a frequent target for attackers who exploit vulnerabilities in the source code to take control over the application. To address these JavaScript security issues, such vulnerabilities must be identified first. Existing studies in vulnerable code detection in JavaScript mostly consider package-level vulnerability tracking and measurements. However, such package-level analysis is largely imprecise as real-world services that include a vulnerable package may not use the vulnerable functions in the package. Moreover, even the inclusion of a vulnerable function may not lead to a security problem, if the function cannot be triggered with exploitable inputs. In this paper, we develop a vulnerability detection framework that uses vulnerable pattern recognition and textual similarity methods to detect vulnerable functions in real-world JavaScript projects, combined with a static multi-file taint analysis mechanism to further assess the impact of the vulnerabilities on the whole project (i.e., whether the vulnerability can be exploited in a given project). We compose a comprehensive dataset of 1,360 verified vulnerable JavaScript functions using the Snyk vulnerability database and the VulnCode-DB project. From this ground-truth dataset, we build our vulnerable patterns for two common vulnerability types: prototype pollution and Regular Expression Denial of Service (ReDoS). With our framework, we analyze 9,205,654 functions (from 3,000 NPM packages, 1892 websites and 557 Chrome Web extensions), and detect 117,601 prototype pollution and 7,333 ReDoS vulnerabilities. By further processing all 5,839 findings from NPM packages with our taint analyzer, we verify the exploitability of 290 zero-day cases across 134 NPM packages. In addition, we conduct an in-depth contextual analysis of the findings in 17 popular/critical projects and study the practical security exposure of 20 functions. With our semi-automated vulnerability reporting functionality, we disclosed all verified findings to project owners. We also obtained 25 published CVEs for our findings, 19 of them rated as “Critical” severity, and six rated as “High” severity. Additionally, we obtained 169 CVEs that are currently “Reserved” (as of Apr. 2023). As evident from the results, our approach can shift JavaScript vulnerability detection from the coarse package/library level to the function level, and thus improve the accuracy of detection and aid timely patching.","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134906837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ACM Transactions on Privacy and Security
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1