Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security最新文献

英文中文

Session details: Authentication and Intrusion Detection 会话详细信息:身份验证和入侵检测

Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

Pub Date : 2017-11-03 DOI: 10.1145/3252888

Sam Bretheim

引用次数: 0

Malware Analysis of Imaged Binary Samples by Convolutional Neural Network with Attention Mechanism 基于注意机制的卷积神经网络图像二值样本恶意软件分析

Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

Pub Date : 2017-11-03 DOI: 10.1145/3128572.3140457

Hiromu Yakura, S. Shinozaki, R. Nishimura, Y. Oyama, Jun Sakuma

This paper presents a method to extract important byte sequences in malware samples by application of convolutional neural network (CNN) to images converted from binary data. This method, by combining a technique called the attention mechanism into CNN, enables calculation of an "attention map," which shows regions having higher importance for classification in the image. The extracted region with higher importance can provide useful information for human analysts who investigate the functionalities of unknown malware samples. Results of our evaluation experiment using malware dataset show that the proposed method provides higher classification accuracy than a conventional method. Furthermore, analysis of malware samples based on the calculated attention map confirmed that the extracted sequences provide useful information for manual analysis.

本文提出了一种将卷积神经网络(CNN)应用于由二进制数据转换而成的图像中提取恶意软件样本中重要字节序列的方法。这种方法通过将一种称为注意力机制的技术结合到CNN中，可以计算出“注意力地图”，该地图显示图像中对分类具有更高重要性的区域。提取的具有较高重要性的区域可以为研究未知恶意软件样本功能的人类分析人员提供有用的信息。基于恶意软件数据集的评估实验结果表明，该方法比传统方法具有更高的分类精度。此外，基于计算的注意图对恶意软件样本的分析证实，提取的序列为人工分析提供了有用的信息。

引用次数: 1

Malware Classification and Class Imbalance via Stochastic Hashed LZJD 基于随机哈希LZJD的恶意软件分类与类不平衡

Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

Pub Date : 2017-11-03 DOI: 10.1145/3128572.3140446

Edward Raff, Charles K. Nicholas

There are currently few methods that can be applied to malware classification problems which don't require domain knowledge to apply. In this work, we develop our new SHWeL feature vector representation, by extending the recently proposed Lempel-Ziv Jaccard Distance. These SHWeL vectors improve upon LZJD's accuracy, outperform byte n-grams, and allow us to build efficient algorithms for both training (a weakness of byte n-grams) and inference (a weakness of LZJD). Furthermore, our new SHWeL method also allows us to directly tackle the class imbalance problem, which is common for malware-related tasks. Compared to existing methods like SMOTE, SHWeL provides significantly improved accuracy while reducing algorithmic complexity to O(N). Because our approach is developed without the use of domain knowledge, it can be easily re-applied to any new domain where there is a need to classify byte sequences.

目前很少有不需要领域知识的方法可以应用于恶意软件分类问题。在这项工作中，我们通过扩展最近提出的Lempel-Ziv Jaccard距离，开发了新的SHWeL特征向量表示。这些SHWeL向量提高了LZJD的准确性，优于字节n-grams，并允许我们为训练(字节n-grams的缺点)和推理(LZJD的缺点)构建有效的算法。此外，我们的新SHWeL方法还允许我们直接处理类不平衡问题，这在与恶意软件相关的任务中很常见。与SMOTE等现有方法相比，SHWeL在将算法复杂度降低到0 (N)的同时，显著提高了精度。由于我们的方法是在不使用领域知识的情况下开发的，因此可以很容易地将其重新应用于需要对字节序列进行分类的任何新领域。

引用次数: 36

An Early Warning System for Suspicious Accounts 可疑账户预警系统

Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

Pub Date : 2017-11-03 DOI: 10.1145/3128572.3140455

Hassan Halawa, M. Ripeanu, K. Beznosov, Baris Coskun, Meizhu Liu

In the face of large-scale automated cyber-attacks to large online services, fast detection and remediation of compromised accounts are crucial to limit the spread of new attacks and to mitigate the overall damage to users, companies, and the public at large. We advocate a fully automated approach based on machine learning to enable large-scale online service providers to quickly identify potentially compromised accounts. We develop an early warning system for the detection of suspicious account activity with the goal of quick identification and remediation of compromised accounts. We demonstrate the feasibility and applicability of our proposed system in a four month experiment at a large-scale online service provider using real-world production data encompassing hundreds of millions of users. We show that - even using only login data, features with low computational cost, and a basic model selection approach - around one out of five accounts later flagged as suspicious are correctly predicted a month in advance based on one week's worth of their login activity.

面对针对大型在线服务的大规模自动化网络攻击，快速检测和修复受损帐户对于限制新攻击的传播以及减轻对用户、公司和广大公众的总体损害至关重要。我们提倡一种基于机器学习的全自动方法，使大型在线服务提供商能够快速识别潜在的受损账户。我们开发了一个早期预警系统，用于检测可疑账户活动，目的是快速识别和修复受损账户。我们在一家大型在线服务提供商进行了为期四个月的实验，使用包含数亿用户的真实生产数据，证明了我们提出的系统的可行性和适用性。我们表明，即使只使用登录数据、低计算成本的特征和基本的模型选择方法，也可以根据一周的登录活动提前一个月正确预测出大约五分之一后来被标记为可疑的账户。

引用次数: 5

Generating Look-alike Names For Security Challenges 为安全挑战生成相似的名称

Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

Pub Date : 2017-11-03 DOI: 10.1145/3128572.3140441

Shuchu Han, Yifan Hu, S. Skiena, Baris Coskun, Meizhu Liu, Hong Qin, Jaime Perez

Motivated by the need to automatically generate behavior-based security challenges to improve user authentication for web services, we consider the problem of large-scale construction of realistic-looking names to serve as aliases for real individuals. We aim to use these names to construct security challenges, where users are asked to identify their real contacts among a presented pool of names. We seek these look-alike names to preserve name characteristics like gender, ethnicity, and popularity, while being unlinkable back to the source individual, thereby making the real contacts not easily guessable by attackers. To achive this, we introduce the technique of distributed name embeddings, representing names in a high-dimensional space such that distance between name components reflects the degree of cultural similarity between these strings. We present different approaches to construct name embeddings from contact lists observed at a large web-mail provider, and evaluate their cultural coherence. We demonstrate that name embeddings strongly encode gender and ethnicity, as well as name popularity. We applied this algorithm to generate imitation names in email contact list challenge. Our controlled user study verified that the proposed technique reduced the attacker's success rate to 26.08%, indistinguishable from random guessing, compared to a success rate of 62.16% from previous name generation algorithms. Finally, we use these embeddings to produce an open synthetic name resource of 1 million names for security applications, constructed to respect both cultural coherence and U.S. census name frequencies.

由于需要自动生成基于行为的安全挑战以改进web服务的用户身份验证，我们考虑了大规模构建逼真的名称以作为真实个人的别名的问题。我们的目标是使用这些名称来构建安全挑战，要求用户在提供的名称池中识别他们的真实联系人。我们寻找这些相似的名字，以保留姓名特征，如性别、种族和受欢迎程度，同时不能链接到源个人，从而使真正的联系人不容易被攻击者猜测。为了实现这一点，我们引入了分布式名称嵌入技术，在高维空间中表示名称，以便名称组件之间的距离反映这些字符串之间的文化相似性程度。我们提出了不同的方法，从大型网络邮件提供商观察到的联系人列表中构建姓名嵌入，并评估其文化一致性。我们证明，名字嵌入强烈编码性别和种族，以及名字的受欢迎程度。我们将该算法应用于电子邮件联系人列表挑战中生成模拟姓名。我们的受控用户研究验证了所提出的技术将攻击者的成功率降低到26.08%，与随机猜测没有区别，而以前的名称生成算法的成功率为62.16%。最后，我们使用这些嵌入为安全应用程序生成一个包含100万个名字的开放合成名称资源，该资源的构建既尊重文化一致性，又尊重美国人口普查的名称频率。

{"title":"Generating Look-alike Names For Security Challenges","authors":"Shuchu Han, Yifan Hu, S. Skiena, Baris Coskun, Meizhu Liu, Hong Qin, Jaime Perez","doi":"10.1145/3128572.3140441","DOIUrl":"https://doi.org/10.1145/3128572.3140441","url":null,"abstract":"Motivated by the need to automatically generate behavior-based security challenges to improve user authentication for web services, we consider the problem of large-scale construction of realistic-looking names to serve as aliases for real individuals. We aim to use these names to construct security challenges, where users are asked to identify their real contacts among a presented pool of names. We seek these look-alike names to preserve name characteristics like gender, ethnicity, and popularity, while being unlinkable back to the source individual, thereby making the real contacts not easily guessable by attackers. To achive this, we introduce the technique of distributed name embeddings, representing names in a high-dimensional space such that distance between name components reflects the degree of cultural similarity between these strings. We present different approaches to construct name embeddings from contact lists observed at a large web-mail provider, and evaluate their cultural coherence. We demonstrate that name embeddings strongly encode gender and ethnicity, as well as name popularity. We applied this algorithm to generate imitation names in email contact list challenge. Our controlled user study verified that the proposed technique reduced the attacker's success rate to 26.08%, indistinguishable from random guessing, compared to a success rate of 62.16% from previous name generation algorithms. Finally, we use these embeddings to produce an open synthetic name resource of 1 million names for security applications, constructed to respect both cultural coherence and U.S. census name frequencies.","PeriodicalId":318259,"journal":{"name":"Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116347279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Robust Linear Regression Against Training Data Poisoning 抗训练数据中毒的鲁棒线性回归

Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

Pub Date : 2017-11-03 DOI: 10.1145/3128572.3140447

Chang Liu, Bo Li, Yevgeniy Vorobeychik, Alina Oprea

The effectiveness of supervised learning techniques has made them ubiquitous in research and practice. In high-dimensional settings, supervised learning commonly relies on dimensionality reduction to improve performance and identify the most important factors in predicting outcomes. However, the economic importance of learning has made it a natural target for adversarial manipulation of training data, which we term poisoning attacks. Prior approaches to dealing with robust supervised learning rely on strong assumptions about the nature of the feature matrix, such as feature independence and sub-Gaussian noise with low variance. We propose an integrated method for robust regression that relaxes these assumptions, assuming only that the feature matrix can be well approximated by a low-rank matrix. Our techniques integrate improved robust low-rank matrix approximation and robust principle component regression, and yield strong performance guarantees. Moreover, we experimentally show that our methods significantly outperform state of the art both in running time and prediction error.

监督学习技术的有效性使其在研究和实践中无处不在。在高维环境中，监督学习通常依赖于降维来提高性能，并确定预测结果的最重要因素。然而，学习的经济重要性使其成为训练数据对抗性操纵的自然目标，我们称之为中毒攻击。先前处理鲁棒监督学习的方法依赖于对特征矩阵性质的强假设，例如特征独立性和低方差的亚高斯噪声。我们提出了一种集成的鲁棒回归方法，该方法放宽了这些假设，仅假设特征矩阵可以很好地由低秩矩阵近似。我们的技术集成了改进的鲁棒低秩矩阵近似和鲁棒主成分回归，并产生了强大的性能保证。此外，我们通过实验表明，我们的方法在运行时间和预测误差方面都明显优于现有的方法。

引用次数: 76

Beyond Big Data: What Can We Learn from AI Models?: Invited Keynote 超越大数据:我们能从人工智能模型中学到什么?:特邀演讲

Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

Pub Date : 2017-11-03 DOI: 10.1145/3128572.3140452

Aylin Caliskan

My research involves the heavy use of machine learning and natural language processing in novel ways to interpret big data, develop privacy and security attacks, and gain insights about humans and society through these methods. I do not use machine learning only as a tool but I also analyze machine learning models? internal representations to investigate how the artificial intelligence perceives the world. This work [3] has been recently featured in Science where I showed that societal bias exists at the construct level of machine learning models, namely semantic space word embeddings which are dictionaries for machines to understand language. When I use machine learning as a tool to uncover privacy and security problems, I characterize and quantify human behavior in language, including programming languages, by coming up with a linguistic fingerprint for each individual. By extracting linguistic features from natural language or programming language texts of humans, I show that humans have unique linguistic fingerprints since they all learn language on an individual basis. Based on this finding, I can de-anonymize humans that have written certain text, source code, or even executable binaries of compiled code [2, 4, 5]. This is a serious privacy threat for individuals that would like to remain anonymous, such as activists, programmers in oppressed regimes, or malware authors. Nevertheless, being able to identify authors of malicious code enhances security. On the other hand, identifying authors can be used to resolve copyright disputes or detect plagiarism. The methods in this realm [1] have been used to identify so called doppelgängers to link the accounts that belong to the same identities across platforms, especially underground forums that are business platforms for cyber criminals. By analyzing machine learning models? internal representation and linguistic human fingerprints, I am able to uncover facts about the world, society, and the use of language, which have implications for privacy, security, and fairness in machine learning.

我的研究涉及以新颖的方式大量使用机器学习和自然语言处理来解释大数据，开发隐私和安全攻击，并通过这些方法获得关于人类和社会的见解。我不只是把机器学习作为一种工具，我还分析机器学习模型。内部表征来研究人工智能如何感知世界。这项工作b[3]最近被刊登在《科学》杂志上，我展示了社会偏见存在于机器学习模型的构建层面，即语义空间词嵌入，它是机器理解语言的字典。当我使用机器学习作为发现隐私和安全问题的工具时，我通过为每个人提出语言指纹来描述和量化语言中的人类行为，包括编程语言。通过从人类的自然语言或编程语言文本中提取语言特征，我展示了人类有独特的语言指纹，因为他们都是在个体的基础上学习语言的。基于这个发现，我可以去匿名化那些写过某些文本、源代码，甚至是编译代码的可执行二进制文件的人[2,4,5]。对于那些希望保持匿名的个人来说，这是一个严重的隐私威胁，比如激进分子、受压迫政权的程序员或恶意软件作者。然而，能够识别恶意代码的作者可以增强安全性。另一方面，识别作者可以用来解决版权纠纷或检测剽窃。[1]领域的方法已被用于识别所谓的doppelgängers，以链接跨平台的相同身份的帐户，特别是作为网络罪犯商业平台的地下论坛。通过分析机器学习模型?通过内部表征和人类语言指纹，我能够揭示有关世界、社会和语言使用的事实，这些事实对机器学习中的隐私、安全和公平具有影响。

{"title":"Beyond Big Data: What Can We Learn from AI Models?: Invited Keynote","authors":"Aylin Caliskan","doi":"10.1145/3128572.3140452","DOIUrl":"https://doi.org/10.1145/3128572.3140452","url":null,"abstract":"My research involves the heavy use of machine learning and natural language processing in novel ways to interpret big data, develop privacy and security attacks, and gain insights about humans and society through these methods. I do not use machine learning only as a tool but I also analyze machine learning models? internal representations to investigate how the artificial intelligence perceives the world. This work [3] has been recently featured in Science where I showed that societal bias exists at the construct level of machine learning models, namely semantic space word embeddings which are dictionaries for machines to understand language. When I use machine learning as a tool to uncover privacy and security problems, I characterize and quantify human behavior in language, including programming languages, by coming up with a linguistic fingerprint for each individual. By extracting linguistic features from natural language or programming language texts of humans, I show that humans have unique linguistic fingerprints since they all learn language on an individual basis. Based on this finding, I can de-anonymize humans that have written certain text, source code, or even executable binaries of compiled code [2, 4, 5]. This is a serious privacy threat for individuals that would like to remain anonymous, such as activists, programmers in oppressed regimes, or malware authors. Nevertheless, being able to identify authors of malicious code enhances security. On the other hand, identifying authors can be used to resolve copyright disputes or detect plagiarism. The methods in this realm [1] have been used to identify so called doppelgängers to link the accounts that belong to the same identities across platforms, especially underground forums that are business platforms for cyber criminals. By analyzing machine learning models? internal representation and linguistic human fingerprints, I am able to uncover facts about the world, society, and the use of language, which have implications for privacy, security, and fairness in machine learning.","PeriodicalId":318259,"journal":{"name":"Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security","volume":"408 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129254182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Session details: Lightning Round 会话细节:闪电回合

Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

Pub Date : 2017-11-03 DOI: 10.1145/3252887

David Mandell Freeman

引用次数: 0

Session details: Defense against Poisoning 会议细节:防御中毒

Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

Pub Date : 2017-11-03 DOI: 10.1145/3252889

Luis Mu?oz-Gonz?lez

引用次数: 0

Differentially Private Noisy Search with Applications to Anomaly Detection (Abstract) 差分私有噪声搜索在异常检测中的应用(摘要)

Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

Pub Date : 2017-11-03 DOI: 10.1145/3128572.3140456

D. M. Bittner, A. Sarwate, R. Wright

We consider the problem of privacy-sensitive anomaly detection - screening to detect individuals, behaviors, areas, or data samples of high interest. What defines an anomaly is context-specific; for example, a spoofed rather than genuine user attempting to log in to a web site, a fraudulent credit card transaction, or a suspicious traveler in an airport. The unifying assumption is that the number of anomalous points is quite small with respect to the population, so that deep screening of all individual data points would potentially be time-intensive, costly, and unnecessarily invasive of privacy. Such privacy violations can raise concerns due sensitive nature of data being used, raise fears about violations of data use agreements, and make people uncomfortable with anomaly detection methods. Anomaly detection is well studied, but methods to provide anomaly detection along with privacy are less well studied. Our overall goal in this research is to provide a framework for identifying anomalous data while guaranteeing quantifiable privacy in a rigorous sense. Once identified, such anomalies could warrant further data collection and investigation, depending on the context and relevant policies. In this research, we focus on privacy protection during the deployment of anomaly detection. Our main contribution is a differentially private access mechanism for finding anomalies using a search algorithm based on adaptive noisy group testing. To achieve this, we take as our starting point the notion of group testing [1], which was most famously used to screen US military draftees for syphilis during World War II. In group testing, individuals are tested in groups to limit the number of tests. Using multiple rounds of screenings, a small number of positive individuals can be detected very efficiently. Group testing has the added benefit of providing privacy to individuals through plausible deniability - since the group tests use aggregate data, individual contributions to the test are masked by the group. We follow on these concepts by demonstrating a search model utilizing adaptive queries on aggregated group data. Our work takes the first steps toward strengthening and formalizing these privacy concepts by achieving differential privacy [2]. Differential privacy is a statistical measure of disclosure risk that captures the intuition that an individual's privacy is protected if the results of a computation have at most a very small and quantifiable dependence on that individual's data. In the last decade, there hpractical adoption underway by high-profile companies such as Apple, Google, and Uber. In order to make differential privacy meaningful in the context of a task that seeks to specifically identify some (anomalous) individuals, we introduce the notion of anomaly-restricted differential privacy. Using ideas from information theory, we show that noise can be added to group query results in a way that provides differential privacy for non-anomalous indi

我们考虑的问题是隐私敏感的异常检测-筛选检测个人，行为，区域，或高兴趣的数据样本。异常的定义是特定于上下文的;例如，一个被欺骗的而不是真正的用户试图登录一个网站，一个欺诈性的信用卡交易，或者一个可疑的旅客在机场。统一的假设是，异常点的数量相对于总体而言是相当小的，因此对所有单个数据点的深度筛选可能会耗费大量时间，成本高昂，并且不必要地侵犯隐私。由于使用数据的敏感性，这种隐私侵犯会引起人们的担忧，引发对违反数据使用协议的担忧，并使人们对异常检测方法感到不舒服。异常检测已经得到了很好的研究，但是提供异常检测和隐私的方法还没有得到很好的研究。我们在这项研究中的总体目标是提供一个框架来识别异常数据，同时保证严格意义上的可量化隐私。一旦确定，这些异常情况可能需要进一步的数据收集和调查，具体取决于具体情况和相关政策。在本研究中，我们重点关注异常检测部署过程中的隐私保护。我们的主要贡献是使用基于自适应噪声群测试的搜索算法来发现异常的差异私有访问机制。为了实现这一点，我们以群体测试的概念为出发点[1]，这是二战期间最著名的用于筛查美国军事征兵梅毒的概念。在群体检测中，个体被分组检测以限制检测的次数。通过多轮筛查，可以非常有效地发现少数阳性个体。群体测试还有一个额外的好处，那就是通过合理的推诿为个人提供隐私——因为群体测试使用的是汇总数据，个人对测试的贡献被群体掩盖了。我们通过展示一个利用聚合组数据的自适应查询的搜索模型来遵循这些概念。我们的工作通过实现差异隐私，为加强和形式化这些隐私概念迈出了第一步[2]。差异隐私是一种披露风险的统计度量，它抓住了这样一种直觉，即如果计算结果对个人数据的依赖程度至多非常小且可量化，则个人隐私受到保护。在过去的十年里，苹果、谷歌和优步等知名公司正在实际采用这种技术。为了使差分隐私在寻求具体识别某些(异常)个体的任务上下文中有意义，我们引入了异常限制差分隐私的概念。利用信息论的思想，我们证明了噪声可以以一种为非异常个体提供差异隐私的方式添加到组查询结果中，并且仍然能够有效和准确地检测异常个体。我们的方法确保使用点组的差异私有聚合，为组内的个人提供隐私，同时精炼组选择，使我们可以概率地将注意力集中在少数个人或样本上，以便进一步关注。总结:我们引入了一个新的异常限制差分隐私的概念，这可能是一个独立的兴趣。我们提供了一种满足异常限制差分隐私定义的基于噪声群的搜索算法。我们对我们的噪声搜索算法进行了理论和实证分析，表明它在某些情况下表现良好，并展示了差异私有机制通常的隐私/准确性权衡。我们工作的潜在异常检测应用可能包括对异常值的空间搜索:这将依赖于新的传感技术，这些技术可以执行查询，以揭示和隔离异常异常值。例如，这可能导致使用对隐私敏感的方法来搜索地理位置中的外围手机活动模式或Internet活动模式。

{"title":"Differentially Private Noisy Search with Applications to Anomaly Detection (Abstract)","authors":"D. M. Bittner, A. Sarwate, R. Wright","doi":"10.1145/3128572.3140456","DOIUrl":"https://doi.org/10.1145/3128572.3140456","url":null,"abstract":"We consider the problem of privacy-sensitive anomaly detection - screening to detect individuals, behaviors, areas, or data samples of high interest. What defines an anomaly is context-specific; for example, a spoofed rather than genuine user attempting to log in to a web site, a fraudulent credit card transaction, or a suspicious traveler in an airport. The unifying assumption is that the number of anomalous points is quite small with respect to the population, so that deep screening of all individual data points would potentially be time-intensive, costly, and unnecessarily invasive of privacy. Such privacy violations can raise concerns due sensitive nature of data being used, raise fears about violations of data use agreements, and make people uncomfortable with anomaly detection methods. Anomaly detection is well studied, but methods to provide anomaly detection along with privacy are less well studied. Our overall goal in this research is to provide a framework for identifying anomalous data while guaranteeing quantifiable privacy in a rigorous sense. Once identified, such anomalies could warrant further data collection and investigation, depending on the context and relevant policies. In this research, we focus on privacy protection during the deployment of anomaly detection. Our main contribution is a differentially private access mechanism for finding anomalies using a search algorithm based on adaptive noisy group testing. To achieve this, we take as our starting point the notion of group testing [1], which was most famously used to screen US military draftees for syphilis during World War II. In group testing, individuals are tested in groups to limit the number of tests. Using multiple rounds of screenings, a small number of positive individuals can be detected very efficiently. Group testing has the added benefit of providing privacy to individuals through plausible deniability - since the group tests use aggregate data, individual contributions to the test are masked by the group. We follow on these concepts by demonstrating a search model utilizing adaptive queries on aggregated group data. Our work takes the first steps toward strengthening and formalizing these privacy concepts by achieving differential privacy [2]. Differential privacy is a statistical measure of disclosure risk that captures the intuition that an individual's privacy is protected if the results of a computation have at most a very small and quantifiable dependence on that individual's data. In the last decade, there hpractical adoption underway by high-profile companies such as Apple, Google, and Uber. In order to make differential privacy meaningful in the context of a task that seeks to specifically identify some (anomalous) individuals, we introduce the notion of anomaly-restricted differential privacy. Using ideas from information theory, we show that noise can be added to group query results in a way that provides differential privacy for non-anomalous indi","PeriodicalId":318259,"journal":{"name":"Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133481622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀