首页 > 最新文献

Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security最新文献

英文 中文
Thwarting Fake OSN Accounts by Predicting their Victims 通过预测受害者来阻止虚假的OSN账户
Pub Date : 2015-10-16 DOI: 10.1145/2808769.2808772
Yazan Boshmaf, M. Ripeanu, K. Beznosov, E. Santos-Neto
Traditional defense mechanisms for fighting against automated fake accounts in online social networks are victim-agnostic. Even though victims of fake accounts play an important role in the viability of subsequent attacks, there is no work on utilizing this insight to improve the status quo. In this position paper, we take the first step and propose to incorporate predictions about victims of unknown fakes into the workflows of existing defense mechanisms. In particular, we investigated how such an integration could lead to more robust fake account defense mechanisms. We also used real-world datasets from Facebook and Tuenti to evaluate the feasibility of predicting victims of fake accounts using supervised machine learning.
打击在线社交网络中自动虚假账户的传统防御机制是与受害者无关的。尽管虚假账户的受害者在后续攻击的可行性中发挥了重要作用,但目前还没有利用这种洞察力来改善现状的工作。在本立场文件中,我们迈出了第一步,并建议将对未知虚假受害者的预测纳入现有防御机制的工作流程。特别是,我们调查了这样的集成如何导致更强大的虚假账户防御机制。我们还使用来自Facebook和Tuenti的真实数据集来评估使用监督机器学习预测虚假账户受害者的可行性。
{"title":"Thwarting Fake OSN Accounts by Predicting their Victims","authors":"Yazan Boshmaf, M. Ripeanu, K. Beznosov, E. Santos-Neto","doi":"10.1145/2808769.2808772","DOIUrl":"https://doi.org/10.1145/2808769.2808772","url":null,"abstract":"Traditional defense mechanisms for fighting against automated fake accounts in online social networks are victim-agnostic. Even though victims of fake accounts play an important role in the viability of subsequent attacks, there is no work on utilizing this insight to improve the status quo. In this position paper, we take the first step and propose to incorporate predictions about victims of unknown fakes into the workflows of existing defense mechanisms. In particular, we investigated how such an integration could lead to more robust fake account defense mechanisms. We also used real-world datasets from Facebook and Tuenti to evaluate the feasibility of predicting victims of fake accounts using supervised machine learning.","PeriodicalId":426614,"journal":{"name":"Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125718174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Detecting Clusters of Fake Accounts in Online Social Networks 在线社交网络中虚假账户集群的检测
Pub Date : 2015-10-16 DOI: 10.1145/2808769.2808779
Cao Xiao, D. Freeman, Theodore Hwa
Fake accounts are a preferred means for malicious users of online social networks to send spam, commit fraud, or otherwise abuse the system. A single malicious actor may create dozens to thousands of fake accounts in order to scale their operation to reach the maximum number of legitimate members. Detecting and taking action on these accounts as quickly as possible is imperative in order to protect legitimate members and maintain the trustworthiness of the network. However, any individual fake account may appear to be legitimate on first inspection, for example by having a real-sounding name or a believable profile. In this work we describe a scalable approach to finding groups of fake accounts registered by the same actor. The main technique is a supervised machine learning pipeline for classifying {em an entire cluster} of accounts as malicious or legitimate. The key features used in the model are statistics on fields of user-generated text such as name, email address, company or university; these include both frequencies of patterns {em within} the cluster (e.g., do all of the emails share a common letter/digit pattern) and comparison of text frequencies across the entire user base (e.g., are all of the names rare?). We apply our framework to analyze account data on LinkedIn grouped by registration IP address and registration date. Our model achieved AUC 0.98 on a held-out test set and AUC 0.95 on out-of-sample testing data. The model has been productionalized and has identified more than 250,000 fake accounts since deployment.
虚假账户是在线社交网络恶意用户发送垃圾邮件、实施欺诈或以其他方式滥用系统的首选手段。单个恶意行为者可能会创建数十到数千个虚假账户,以扩大其操作规模,达到合法成员的最大数量。为了保护合法会员和维护网络的可信度,必须尽快发现并对这些账户采取行动。然而,任何个人虚假账户在第一次检查时都可能看起来是合法的,例如,有一个听起来真实的名字或可信的个人资料。在这项工作中,我们描述了一种可扩展的方法来查找由同一参与者注册的虚假账户组。主要技术是一个有监督的机器学习管道,用于将整个集群的帐户分类为恶意或合法。模型中使用的关键特性是对用户生成文本字段(如姓名、电子邮件地址、公司或大学)的统计;这包括集群内模式的频率(例如,所有的电子邮件是否共享一个共同的字母/数字模式)和整个用户群的文本频率比较(例如,所有的名字都是罕见的吗?)我们应用我们的框架来分析LinkedIn上按注册IP地址和注册日期分组的帐户数据。我们的模型在hold out测试集上实现了AUC 0.98,在out-of-sample测试数据上实现了AUC 0.95。该模型已投入生产,自部署以来已识别出超过25万个虚假账户。
{"title":"Detecting Clusters of Fake Accounts in Online Social Networks","authors":"Cao Xiao, D. Freeman, Theodore Hwa","doi":"10.1145/2808769.2808779","DOIUrl":"https://doi.org/10.1145/2808769.2808779","url":null,"abstract":"Fake accounts are a preferred means for malicious users of online social networks to send spam, commit fraud, or otherwise abuse the system. A single malicious actor may create dozens to thousands of fake accounts in order to scale their operation to reach the maximum number of legitimate members. Detecting and taking action on these accounts as quickly as possible is imperative in order to protect legitimate members and maintain the trustworthiness of the network. However, any individual fake account may appear to be legitimate on first inspection, for example by having a real-sounding name or a believable profile. In this work we describe a scalable approach to finding groups of fake accounts registered by the same actor. The main technique is a supervised machine learning pipeline for classifying {em an entire cluster} of accounts as malicious or legitimate. The key features used in the model are statistics on fields of user-generated text such as name, email address, company or university; these include both frequencies of patterns {em within} the cluster (e.g., do all of the emails share a common letter/digit pattern) and comparison of text frequencies across the entire user base (e.g., are all of the names rare?). We apply our framework to analyze account data on LinkedIn grouped by registration IP address and registration date. Our model achieved AUC 0.98 on a held-out test set and AUC 0.95 on out-of-sample testing data. The model has been productionalized and has identified more than 250,000 fake accounts since deployment.","PeriodicalId":426614,"journal":{"name":"Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125491559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 169
Better Malware Ground Truth: Techniques for Weighting Anti-Virus Vendor Labels 更好的恶意软件真相:反病毒厂商标签加权技术
Pub Date : 2015-10-16 DOI: 10.1145/2808769.2808780
Alex Kantchelian, Michael Carl Tschantz, Sadia Afroz, Brad Miller, Vaishaal Shankar, Rekha Bachwani, A. Joseph, J. D. Tygar
We examine the problem of aggregating the results of multiple anti-virus (AV) vendors' detectors into a single authoritative ground-truth label for every binary. To do so, we adapt a well-known generative Bayesian model that postulates the existence of a hidden ground truth upon which the AV labels depend. We use training based on Expectation Maximization for this fully unsupervised technique. We evaluate our method using 279,327 distinct binaries from VirusTotal, each of which appeared for the first time between January 2012 and June 2014. Our evaluation shows that our statistical model is consistently more accurate at predicting the future-derived ground truth than all unweighted rules of the form "k out of n" AV detections. In addition, we evaluate the scenario where partial ground truth is available for model building. We train a logistic regression predictor on the partial label information. Our results show that as few as a 100 randomly selected training instances with ground truth are enough to achieve 80% true positive rate for 0.1% false positive rate. In comparison, the best unweighted threshold rule provides only 60% true positive rate at the same false positive rate.
我们研究了将多个反病毒(AV)供应商的检测器的结果聚合为每个二进制文件的单个权威真值标签的问题。为此,我们采用了一个著名的生成贝叶斯模型,该模型假设AV标签所依赖的隐藏基础真理的存在。对于这种完全无监督的技术,我们使用基于期望最大化的训练。我们使用VirusTotal的279,327个不同的二进制文件来评估我们的方法,每个二进制文件都是在2012年1月至2014年6月之间首次出现的。我们的评估表明,我们的统计模型在预测未来衍生的基础真理方面始终比“k out of n”AV检测形式的所有未加权规则更准确。此外,我们评估了部分地真值可用于模型构建的场景。我们在部分标签信息上训练逻辑回归预测器。我们的结果表明,只需100个随机选择的训练实例,就足以达到80%的真阳性率和0.1%的假阳性率。相比之下,在假阳性率相同的情况下,最佳的非加权阈值规则只能提供60%的真阳性率。
{"title":"Better Malware Ground Truth: Techniques for Weighting Anti-Virus Vendor Labels","authors":"Alex Kantchelian, Michael Carl Tschantz, Sadia Afroz, Brad Miller, Vaishaal Shankar, Rekha Bachwani, A. Joseph, J. D. Tygar","doi":"10.1145/2808769.2808780","DOIUrl":"https://doi.org/10.1145/2808769.2808780","url":null,"abstract":"We examine the problem of aggregating the results of multiple anti-virus (AV) vendors' detectors into a single authoritative ground-truth label for every binary. To do so, we adapt a well-known generative Bayesian model that postulates the existence of a hidden ground truth upon which the AV labels depend. We use training based on Expectation Maximization for this fully unsupervised technique. We evaluate our method using 279,327 distinct binaries from VirusTotal, each of which appeared for the first time between January 2012 and June 2014. Our evaluation shows that our statistical model is consistently more accurate at predicting the future-derived ground truth than all unweighted rules of the form \"k out of n\" AV detections. In addition, we evaluate the scenario where partial ground truth is available for model building. We train a logistic regression predictor on the partial label information. Our results show that as few as a 100 randomly selected training instances with ground truth are enough to achieve 80% true positive rate for 0.1% false positive rate. In comparison, the best unweighted threshold rule provides only 60% true positive rate at the same false positive rate.","PeriodicalId":426614,"journal":{"name":"Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122677774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 96
Remote Operating System Classification over IPv6 基于IPv6的远程操作系统分类
Pub Date : 2015-10-16 DOI: 10.1145/2808769.2808777
D. Fifield, A. Geana, Luis MartinGarcia, M. Morbitzer, J. D. Tygar
Differences in the implementation of common networking protocols make it possible to identify the operating system of a remote host by the characteristics of its TCP and IP packets, even in the absence of application-layer information. This technique, "OS fingerprinting," is relevant to network security because of its relationship to network inventory, vulnerability scanning, and tailoring of exploits. Various techniques of fingerprinting over IPv4 have been in use for over a decade; however IPv6 has had comparatively scant attention in both research and in practical tools. In this paper we describe an IPv6-based OS fingerprinting engine that is based on a linear classifier. It introduces innovative classification features and network probes that take advantage of the specifics of IPv6, while also making use of existing proven techniques. The engine is deployed in Nmap, a widely used network security scanner. This engine provides good performance at a fraction of the maintenance costs of classical signature-based systems. We describe our work in progress to enhance the deployed system: new network probes that help to further distinguish operating systems, and imputation of incomplete feature vectors.
通用网络协议实现的差异使得通过TCP和IP数据包的特征来识别远程主机的操作系统成为可能,即使在没有应用层信息的情况下也是如此。这种称为“操作系统指纹”的技术与网络安全相关,因为它与网络库存、漏洞扫描和漏洞裁剪有关。各种基于IPv4的指纹识别技术已经使用了十多年;然而,IPv6在研究和实用工具方面的关注相对较少。本文描述了一种基于线性分类器的基于ipv6的OS指纹识别引擎。它引入了创新的分类功能和网络探测,利用了IPv6的特点,同时也利用了现有的成熟技术。该引擎部署在Nmap中,Nmap是一个广泛使用的网络安全扫描程序。该引擎提供了良好的性能,而维护成本只是传统基于签名系统的一小部分。我们描述了我们正在进行的工作,以增强部署的系统:有助于进一步区分操作系统的新网络探针,以及不完整特征向量的插入。
{"title":"Remote Operating System Classification over IPv6","authors":"D. Fifield, A. Geana, Luis MartinGarcia, M. Morbitzer, J. D. Tygar","doi":"10.1145/2808769.2808777","DOIUrl":"https://doi.org/10.1145/2808769.2808777","url":null,"abstract":"Differences in the implementation of common networking protocols make it possible to identify the operating system of a remote host by the characteristics of its TCP and IP packets, even in the absence of application-layer information. This technique, \"OS fingerprinting,\" is relevant to network security because of its relationship to network inventory, vulnerability scanning, and tailoring of exploits. Various techniques of fingerprinting over IPv4 have been in use for over a decade; however IPv6 has had comparatively scant attention in both research and in practical tools. In this paper we describe an IPv6-based OS fingerprinting engine that is based on a linear classifier. It introduces innovative classification features and network probes that take advantage of the specifics of IPv6, while also making use of existing proven techniques. The engine is deployed in Nmap, a widely used network security scanner. This engine provides good performance at a fraction of the maintenance costs of classical signature-based systems. We describe our work in progress to enhance the deployed system: new network probes that help to further distinguish operating systems, and imputation of incomplete feature vectors.","PeriodicalId":426614,"journal":{"name":"Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129516903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Differential Privacy for Classifier Evaluation 分类器评估的差分隐私
Pub Date : 2015-10-16 DOI: 10.1145/2808769.2808775
Kendrick Boyd, Eric Lantz, David Page
Differential privacy provides powerful guarantees that individuals incur minimal additional risk by including their personal data in a database. Most work in differential privacy has focused on differentially private algorithms that produce models, counts, and histograms. Nevertheless, even with a classification model produced by a differentially private algorithm, directly reporting the classifier's performance on a database has the potential for disclosure. Thus, differentially private computation of evaluation metrics for machine learning is an important research area. We find effective mechanisms for area under the receiver-operating characteristic (ROC) curve and average precision.
差异隐私提供了强有力的保证,通过将个人数据包含在数据库中,个人将承担最小的额外风险。差分隐私的大多数工作都集中在差分隐私算法上,这些算法产生模型、计数和直方图。然而,即使使用由差异私有算法生成的分类模型,直接在数据库上报告分类器的性能也有泄露的可能。因此,评价指标的差分私有计算是机器学习的一个重要研究领域。我们找到了接收机工作特性(ROC)曲线下面积和平均精度的有效机制。
{"title":"Differential Privacy for Classifier Evaluation","authors":"Kendrick Boyd, Eric Lantz, David Page","doi":"10.1145/2808769.2808775","DOIUrl":"https://doi.org/10.1145/2808769.2808775","url":null,"abstract":"Differential privacy provides powerful guarantees that individuals incur minimal additional risk by including their personal data in a database. Most work in differential privacy has focused on differentially private algorithms that produce models, counts, and histograms. Nevertheless, even with a classification model produced by a differentially private algorithm, directly reporting the classifier's performance on a database has the potential for disclosure. Thus, differentially private computation of evaluation metrics for machine learning is an important research area. We find effective mechanisms for area under the receiver-operating characteristic (ROC) curve and average precision.","PeriodicalId":426614,"journal":{"name":"Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128133358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Machine Learning for Enterprise Security 企业安全的机器学习
Pub Date : 2015-10-16 DOI: 10.1145/2808769.2808782
P. Manadhata
Enterprise security is about protecting an enterprise's computing infrastructure and the enterprise's sensitive information stored and processed by the infrastructure. We secure the infrastructure and the information by combining three steps: (a) prevention, i.e., preventing security breaches to the extent possible, (b) detection, i.e., detecting breaches as soon as possible since prevention is not fool-proof, and (c) recovery, i.e., recovering from and responding to breaches after detection. Prior work, both in academia and in industry, has focused on prevention and detection, whereas recovery is a relatively unexplored area. Machine learning as a discipline has had a significant impact over the state of the art in enterprise security in the last few years, especially in the prevention and detection steps. However, widespread adoption remains a challenge for several reasons. In this talk, we describe current uses of machine learning in the prevention and detection steps, and highlight a few key challenges. We then discuss future opportunities for machine learning to improve the state of the art in recovery.
企业安全是指保护企业的计算基础设施以及基础设施存储和处理的企业敏感信息。我们通过结合三个步骤来保护基础设施和信息:(a)预防,即尽可能地防止安全漏洞;(b)检测,即尽快发现漏洞,因为预防不是万无一失的;(c)恢复,即从漏洞中恢复并在发现漏洞后做出响应。学术界和工业界先前的工作都集中在预防和检测上,而恢复是一个相对未开发的领域。机器学习作为一门学科,在过去几年中对企业安全的现状产生了重大影响,特别是在预防和检测步骤方面。然而,由于几个原因,广泛采用仍然是一个挑战。在这次演讲中,我们描述了机器学习在预防和检测步骤中的当前应用,并强调了一些关键挑战。然后,我们讨论了未来机器学习的机会,以提高恢复的艺术状态。
{"title":"Machine Learning for Enterprise Security","authors":"P. Manadhata","doi":"10.1145/2808769.2808782","DOIUrl":"https://doi.org/10.1145/2808769.2808782","url":null,"abstract":"Enterprise security is about protecting an enterprise's computing infrastructure and the enterprise's sensitive information stored and processed by the infrastructure. We secure the infrastructure and the information by combining three steps: (a) prevention, i.e., preventing security breaches to the extent possible, (b) detection, i.e., detecting breaches as soon as possible since prevention is not fool-proof, and (c) recovery, i.e., recovering from and responding to breaches after detection. Prior work, both in academia and in industry, has focused on prevention and detection, whereas recovery is a relatively unexplored area. Machine learning as a discipline has had a significant impact over the state of the art in enterprise security in the last few years, especially in the prevention and detection steps. However, widespread adoption remains a challenge for several reasons. In this talk, we describe current uses of machine learning in the prevention and detection steps, and highlight a few key challenges. We then discuss future opportunities for machine learning to improve the state of the art in recovery.","PeriodicalId":426614,"journal":{"name":"Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115619452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Fast, Privacy Preserving Linear Regression over Distributed Datasets based on Pre-Distributed Data 基于预分布数据的分布式数据集快速、保护隐私的线性回归
Pub Date : 2015-10-16 DOI: 10.1145/2808769.2808774
M. D. Cock, Rafael Dowsley, Anderson C. A. Nascimento, S. Newman
This work proposes a protocol for performing linear regression over a dataset that is distributed over multiple parties. The parties will jointly compute a linear regression model without actually sharing their own private datasets. We provide security definitions, a protocol, and security proofs. Our solution is information-theoretically secure and is based on the assumption that a Trusted Initializer pre-distributes random, correlated data to the parties during a setup phase. The actual computation happens later on, during an online phase, and does not involve the trusted initializer. Our online protocol is orders of magnitude faster than previous solutions. In the case where a trusted initializer is not available, we propose a computationally secure two-party protocol based on additive homomorphic encryption that substitutes the trusted initializer. In this case, the online phase remains the same and the offline phase is computationally heavy. However, because the computations in the offline phase happen over random data, the overall problem is embarrassingly parallelizable, making it faster than existing solutions for processors with an appropriate number of cores.
这项工作提出了一种对分布在多方的数据集执行线性回归的协议。双方将共同计算线性回归模型,而无需实际共享各自的私有数据集。我们提供了安全定义、协议和安全证明。我们的解决方案在信息理论上是安全的,并且基于可信初始化器在设置阶段向各方预分发随机相关数据的假设。实际计算稍后在联机阶段进行,并且不涉及可信初始化器。我们的在线协议比以前的解决方案快了几个数量级。在不可用可信初始化器的情况下,我们提出了一种基于加性同态加密的计算安全的两方协议,以替代可信初始化器。在这种情况下,在线阶段保持不变,而离线阶段的计算量很大。然而,由于脱机阶段的计算是在随机数据上进行的,所以整个问题是可以并行化的,这使得它比具有适当核数的处理器的现有解决方案更快。
{"title":"Fast, Privacy Preserving Linear Regression over Distributed Datasets based on Pre-Distributed Data","authors":"M. D. Cock, Rafael Dowsley, Anderson C. A. Nascimento, S. Newman","doi":"10.1145/2808769.2808774","DOIUrl":"https://doi.org/10.1145/2808769.2808774","url":null,"abstract":"This work proposes a protocol for performing linear regression over a dataset that is distributed over multiple parties. The parties will jointly compute a linear regression model without actually sharing their own private datasets. We provide security definitions, a protocol, and security proofs. Our solution is information-theoretically secure and is based on the assumption that a Trusted Initializer pre-distributes random, correlated data to the parties during a setup phase. The actual computation happens later on, during an online phase, and does not involve the trusted initializer. Our online protocol is orders of magnitude faster than previous solutions. In the case where a trusted initializer is not available, we propose a computationally secure two-party protocol based on additive homomorphic encryption that substitutes the trusted initializer. In this case, the online phase remains the same and the offline phase is computationally heavy. However, because the computations in the offline phase happen over random data, the overall problem is embarrassingly parallelizable, making it faster than existing solutions for processors with an appropriate number of cores.","PeriodicalId":426614,"journal":{"name":"Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126230491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 80
Automated Attacks on Compression-Based Classifiers 基于压缩分类器的自动攻击
Pub Date : 2015-10-16 DOI: 10.1145/2808769.2808778
Igor Burago, Daniel Lowd
Methods of compression-based text classification have proven their usefulness for various applications. However, in some classification problems, such as spam filtering, a classifier confronts one or many adversaries willing to induce errors in the classifier's judgment on certain kinds of input. In this paper, we consider the problem of finding thrifty strategies for character-based text modification that allow an adversary to revert classifier's verdict on a given family of input texts. We propose three statistical statements of the problem that can be used by an attacker to obtain transformation models which are optimal in some sense. Evaluating these three techniques on a realistic spam corpus, we find that an adversary can transform a spam message (detectable as such by an entropy-based text classifier) into a legitimate one by generating and appending, in some cases, as few additional characters as 11% of the original length of the message.
基于压缩的文本分类方法已经证明了它们在各种应用中的有效性。然而,在一些分类问题中,例如垃圾邮件过滤,分类器面临一个或多个对手,这些对手愿意在分类器对某些类型输入的判断中诱导错误。在本文中,我们考虑了寻找基于字符的文本修改的节约策略的问题,该策略允许对手恢复分类器对给定输入文本族的判决。我们提出了问题的三种统计表述,攻击者可以利用这些表述来获得在某种意义上最优的转换模型。在一个真实的垃圾邮件语料库上评估这三种技术,我们发现攻击者可以通过生成和附加(在某些情况下)相当于消息原始长度11%的额外字符,将垃圾邮件消息(可以通过基于熵的文本分类器检测到)转换为合法消息。
{"title":"Automated Attacks on Compression-Based Classifiers","authors":"Igor Burago, Daniel Lowd","doi":"10.1145/2808769.2808778","DOIUrl":"https://doi.org/10.1145/2808769.2808778","url":null,"abstract":"Methods of compression-based text classification have proven their usefulness for various applications. However, in some classification problems, such as spam filtering, a classifier confronts one or many adversaries willing to induce errors in the classifier's judgment on certain kinds of input. In this paper, we consider the problem of finding thrifty strategies for character-based text modification that allow an adversary to revert classifier's verdict on a given family of input texts. We propose three statistical statements of the problem that can be used by an attacker to obtain transformation models which are optimal in some sense. Evaluating these three techniques on a realistic spam corpus, we find that an adversary can transform a spam message (detectable as such by an entropy-based text classifier) into a legitimate one by generating and appending, in some cases, as few additional characters as 11% of the original length of the message.","PeriodicalId":426614,"journal":{"name":"Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121853751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Subsampled Exponential Mechanism: Differential Privacy in Large Output Spaces 下采样指数机制:大输出空间中的差分隐私
Pub Date : 2015-10-16 DOI: 10.1145/2808769.2808776
Eric Lantz, Kendrick Boyd, David Page
In the last several years, differential privacy has become the leading framework for private data analysis. It provides bounds on the amount that a randomized function can change as the result of a modification to one record of a database. This requirement can be satisfied by using the exponential mechanism to perform a weighted choice among the possible alternatives, with better options receiving higher weights. However, in some situations the number of possible outcomes is too large to compute all weights efficiently. We present the subsampled exponential mechanism, which scores only a sample of the outcomes. We show that it still preserves differential privacy, and fulfills a similar accuracy bound. Using a clustering application, we show that the subsampled exponential mechanism outperforms a previously published private algorithm and is comparable to the full exponential mechanism but more scalable.
在过去的几年里,差分隐私已经成为私有数据分析的主要框架。它提供了随机函数在修改数据库的一条记录时所能改变的量的上限。这一需求可以通过使用指数机制在可能的备选方案中执行加权选择来满足,其中更好的选项获得更高的权重。然而,在某些情况下,可能结果的数量太大,无法有效地计算所有权重。我们提出了次抽样指数机制,它只对结果的一个样本进行评分。我们证明它仍然保留微分隐私,并满足类似的精度界。使用一个聚类应用程序,我们证明了下采样指数机制优于先前发布的私有算法,并且与全指数机制相当,但更具可扩展性。
{"title":"Subsampled Exponential Mechanism: Differential Privacy in Large Output Spaces","authors":"Eric Lantz, Kendrick Boyd, David Page","doi":"10.1145/2808769.2808776","DOIUrl":"https://doi.org/10.1145/2808769.2808776","url":null,"abstract":"In the last several years, differential privacy has become the leading framework for private data analysis. It provides bounds on the amount that a randomized function can change as the result of a modification to one record of a database. This requirement can be satisfied by using the exponential mechanism to perform a weighted choice among the possible alternatives, with better options receiving higher weights. However, in some situations the number of possible outcomes is too large to compute all weights efficiently. We present the subsampled exponential mechanism, which scores only a sample of the outcomes. We show that it still preserves differential privacy, and fulfills a similar accuracy bound. Using a clustering application, we show that the subsampled exponential mechanism outperforms a previously published private algorithm and is comparable to the full exponential mechanism but more scalable.","PeriodicalId":426614,"journal":{"name":"Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security","volume":"303 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134454446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Malicious Behavior Detection using Windows Audit Logs 使用Windows审计日志进行恶意行为检测
Pub Date : 2015-06-13 DOI: 10.1145/2808769.2808773
Konstantin Berlin, David Slater, Joshua Saxe
As antivirus and network intrusion detection systems have increasingly proven insufficient to detect advanced threats, large security operations centers have moved to deploy endpoint-based sensors that provide deeper visibility into low-level events across their enterprises. Unfortunately, for many organizations in government and industry, the installation, maintenance, and resource requirements of these newer solutions pose barriers to adoption and are perceived as risks to organizations' missions. To mitigate this problem we investigated the utility of agentless detection of malicious endpoint behavior, using only the standard built-in Windows audit logging facility as our signal. We found that Windows audit logs, while emitting manageable sized data streams on the endpoints, provide enough information to allow robust detection of malicious behavior. Audit logs provide an effective, low-cost alternative to deploying additional expensive agent-based breach detection systems in many government and industrial settings, and can be used to detect, in our tests, 83% percent of malware samples with a 0.1% false positive rate. They can also supplement already existing host signature-based antivirus solutions, like Kaspersky, Symantec, and McAfee, detecting, in our testing environment, 78% of malware missed by those antivirus systems.
随着反病毒和网络入侵检测系统越来越被证明不足以检测高级威胁,大型安全运营中心已经开始部署基于端点的传感器,以便对企业中的低级事件提供更深入的可视性。不幸的是,对于政府和行业中的许多组织来说,这些新解决方案的安装、维护和资源需求构成了采用的障碍,并被视为组织任务的风险。为了缓解这个问题,我们研究了无代理检测恶意端点行为的实用程序,仅使用标准的内置Windows审计日志记录工具作为我们的信号。我们发现,Windows审计日志在端点上发出可管理大小的数据流的同时,提供了足够的信息,允许对恶意行为进行稳健的检测。审计日志为在许多政府和工业环境中部署额外昂贵的基于代理的漏洞检测系统提供了一种有效、低成本的替代方案,在我们的测试中,审计日志可用于检测83%的恶意软件样本和0.1%的误报率。它们还可以补充现有的基于主机签名的防病毒解决方案,如卡巴斯基、赛门铁克和迈克菲,在我们的测试环境中,检测到这些防病毒系统遗漏的78%的恶意软件。
{"title":"Malicious Behavior Detection using Windows Audit Logs","authors":"Konstantin Berlin, David Slater, Joshua Saxe","doi":"10.1145/2808769.2808773","DOIUrl":"https://doi.org/10.1145/2808769.2808773","url":null,"abstract":"As antivirus and network intrusion detection systems have increasingly proven insufficient to detect advanced threats, large security operations centers have moved to deploy endpoint-based sensors that provide deeper visibility into low-level events across their enterprises. Unfortunately, for many organizations in government and industry, the installation, maintenance, and resource requirements of these newer solutions pose barriers to adoption and are perceived as risks to organizations' missions. To mitigate this problem we investigated the utility of agentless detection of malicious endpoint behavior, using only the standard built-in Windows audit logging facility as our signal. We found that Windows audit logs, while emitting manageable sized data streams on the endpoints, provide enough information to allow robust detection of malicious behavior. Audit logs provide an effective, low-cost alternative to deploying additional expensive agent-based breach detection systems in many government and industrial settings, and can be used to detect, in our tests, 83% percent of malware samples with a 0.1% false positive rate. They can also supplement already existing host signature-based antivirus solutions, like Kaspersky, Symantec, and McAfee, detecting, in our testing environment, 78% of malware missed by those antivirus systems.","PeriodicalId":426614,"journal":{"name":"Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123599722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
期刊
Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1