Proceedings of the 2021 ACM Workshop on Security and Privacy Analytics最新文献

An Empirical Evaluation of Automated Machine Learning Techniques for Malware Detection 用于恶意软件检测的自动机器学习技术的经验评估

Proceedings of the 2021 ACM Workshop on Security and Privacy Analytics

Pub Date : 2021-04-28 DOI: 10.1145/3445970.3451155

P. P. Kundu, Lux Anatharaman, Tram Truong-Huu

Nowadays, it is increasingly difficult even for a machine learning expert to incorporate all of the recent best practices into their modeling due to the fast development of state-of-the-art machine learning techniques. For the applications that handle big data sets, the complexity of the problem of choosing the best performing model with the best hyper-parameter setting becomes harder. In this work, we present an empirical evaluation of automated machine learning (AutoML) frameworks or techniques that aim to optimize hyper-parameters for machine learning models to achieve the best achievable performance. We apply AutoML techniques to the malware detection problem, which requires achieving the true positive rate as high as possible while reducing the false positive rate as low as possible. We adopt two AutoML frameworks, namely AutoGluon-Tabular and Microsoft Neural Network Intelligence (NNI) to optimize hyper-parameters of a Light Gradient Boosted Machine (LightGBM) model for classifying malware samples. We carry out extensive experiments on two data sets. The first data set is a publicly available data set (EMBER data set), that has been used as a benchmarking data set for many malware detection works. The second data set is a private data set we have acquired from a security company that provides recently-collected malware samples. We provide empirical analysis and performance comparison of the two AutoML frameworks. The experimental results show that AutoML frameworks could identify the set of hyper-parameters that significantly outperform the performance of the model with the known best performing hyper-parameter setting and improve the performance of a LightGBM classifier with respect to the true positive rate from $86.8%$ to $90%$ at $0.1%$ of false positive rate on EMBER data set and from $80.8%$ to $87.4%$ on the private data set.

如今，由于最先进的机器学习技术的快速发展，即使是机器学习专家也越来越难以将所有最新的最佳实践纳入他们的建模中。对于处理大数据集的应用程序，选择具有最佳超参数设置的最佳表现模型的问题变得更加复杂。在这项工作中，我们提出了自动化机器学习(AutoML)框架或技术的经验评估，旨在优化机器学习模型的超参数，以实现最佳的可实现性能。我们将AutoML技术应用于恶意软件检测问题，该问题要求实现尽可能高的真阳性率，同时尽可能低的假阳性率。我们采用AutoGluon-Tabular和Microsoft Neural Network Intelligence (NNI)两个AutoML框架对Light Gradient boosting Machine (LightGBM)模型的超参数进行优化，用于恶意软件样本分类。我们在两个数据集上进行了广泛的实验。第一个数据集是一个公开可用的数据集(EMBER数据集)，它已被用作许多恶意软件检测工作的基准数据集。第二个数据集是我们从一家提供最近收集的恶意软件样本的安全公司获得的私人数据集。我们对两种AutoML框架进行了实证分析和性能比较。实验结果表明，AutoML框架可以识别出明显优于已知最佳超参数设置模型的超参数集，并将LightGBM分类器的性能从EMBER数据集的真阳性率从86.8%提高到90%，假阳性率为0.1%，在私有数据集上从80.8%提高到87.4%。

{"title":"An Empirical Evaluation of Automated Machine Learning Techniques for Malware Detection","authors":"P. P. Kundu, Lux Anatharaman, Tram Truong-Huu","doi":"10.1145/3445970.3451155","DOIUrl":"https://doi.org/10.1145/3445970.3451155","url":null,"abstract":"Nowadays, it is increasingly difficult even for a machine learning expert to incorporate all of the recent best practices into their modeling due to the fast development of state-of-the-art machine learning techniques. For the applications that handle big data sets, the complexity of the problem of choosing the best performing model with the best hyper-parameter setting becomes harder. In this work, we present an empirical evaluation of automated machine learning (AutoML) frameworks or techniques that aim to optimize hyper-parameters for machine learning models to achieve the best achievable performance. We apply AutoML techniques to the malware detection problem, which requires achieving the true positive rate as high as possible while reducing the false positive rate as low as possible. We adopt two AutoML frameworks, namely AutoGluon-Tabular and Microsoft Neural Network Intelligence (NNI) to optimize hyper-parameters of a Light Gradient Boosted Machine (LightGBM) model for classifying malware samples. We carry out extensive experiments on two data sets. The first data set is a publicly available data set (EMBER data set), that has been used as a benchmarking data set for many malware detection works. The second data set is a private data set we have acquired from a security company that provides recently-collected malware samples. We provide empirical analysis and performance comparison of the two AutoML frameworks. The experimental results show that AutoML frameworks could identify the set of hyper-parameters that significantly outperform the performance of the model with the known best performing hyper-parameter setting and improve the performance of a LightGBM classifier with respect to the true positive rate from $86.8%$ to $90%$ at $0.1%$ of false positive rate on EMBER data set and from $80.8%$ to $87.4%$ on the private data set.","PeriodicalId":117291,"journal":{"name":"Proceedings of the 2021 ACM Workshop on Security and Privacy Analytics","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134165085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

EMPAware: Analyzing Changes in User Perceptions of Mobile Privacy on iOS with Enhanced Awareness EMPAware:分析iOS用户对移动隐私认知的变化

Proceedings of the 2021 ACM Workshop on Security and Privacy Analytics

Pub Date : 2021-04-28 DOI: 10.1145/3445970.3451153

Brian Krupp, Emily Timko, Kyle Cox, William Hicks, Malik Bursey, Christopher Banfield

Smartphones contain intimate details of users that are inferred from collected data or explicitly stored on the device. These details include daily travel patterns including most frequently visited locations, private photos, addresses and birthdays of their contacts, and more. Consumers have a general awareness that services they use without a financial payment are paid for in part by advertisements. Additionally, they have a general awareness that these services collect detailed information while they use the service. In iOS, applications must provide a detailed description in how they will use data that requires permission from the user. However, the provided description often only tells part of the story. Behind the scenes, consumers are unable to see how applications share information or which part of data the application utilizes. Additionally, consumers are unable to see how often applications communicate with advertisement services and if they share data gathered through permissions from the application. In this paper we created EMPAware, a system that provides users an enhanced awareness in how applications use their data. Users are able to view in real-time through a web portal how applications use their data and how they communicate with advertisement servers. Using EMPAware, we performed a study measuring the impact that an enhanced awareness has on the perception of mobile privacy with 32 participants. From this study, users became more concerned with privacy where 79% believe applications misuse data and 89% believe they have little control of their data. EMPAware demonstrates that when users have a better understanding in how applications use their data, they become more concerned with the privacy.

智能手机包含用户的私密细节，这些细节是从收集的数据推断出来的，或者明确存储在设备上。这些详细信息包括日常旅行模式，包括最常访问的地点、私人照片、联系人的地址和生日等等。消费者普遍意识到，他们使用的服务没有经济支付，部分是由广告支付的。此外，他们一般都知道这些服务在使用服务时收集详细信息。在iOS中，应用程序必须提供详细的描述，说明它们将如何使用需要用户许可的数据。然而，所提供的描述往往只讲述了故事的一部分。在幕后，使用者无法看到应用程序如何共享信息或应用程序利用了哪一部分数据。此外，消费者无法看到应用程序与广告服务通信的频率，以及它们是否共享通过应用程序的权限收集的数据。在本文中，我们创建了EMPAware，这是一个为用户提供应用程序如何使用其数据的增强意识的系统。用户可以通过门户网站实时查看应用程序如何使用他们的数据，以及他们如何与广告服务器通信。使用EMPAware，我们对32名参与者进行了一项研究，以衡量增强意识对移动隐私感知的影响。从这项研究中，用户变得更加关心隐私，79%的人认为应用程序滥用数据，89%的人认为他们对自己的数据几乎没有控制权。EMPAware表明，当用户更好地了解应用程序如何使用他们的数据时，他们会更加关心隐私。

{"title":"EMPAware: Analyzing Changes in User Perceptions of Mobile Privacy on iOS with Enhanced Awareness","authors":"Brian Krupp, Emily Timko, Kyle Cox, William Hicks, Malik Bursey, Christopher Banfield","doi":"10.1145/3445970.3451153","DOIUrl":"https://doi.org/10.1145/3445970.3451153","url":null,"abstract":"Smartphones contain intimate details of users that are inferred from collected data or explicitly stored on the device. These details include daily travel patterns including most frequently visited locations, private photos, addresses and birthdays of their contacts, and more. Consumers have a general awareness that services they use without a financial payment are paid for in part by advertisements. Additionally, they have a general awareness that these services collect detailed information while they use the service. In iOS, applications must provide a detailed description in how they will use data that requires permission from the user. However, the provided description often only tells part of the story. Behind the scenes, consumers are unable to see how applications share information or which part of data the application utilizes. Additionally, consumers are unable to see how often applications communicate with advertisement services and if they share data gathered through permissions from the application. In this paper we created EMPAware, a system that provides users an enhanced awareness in how applications use their data. Users are able to view in real-time through a web portal how applications use their data and how they communicate with advertisement servers. Using EMPAware, we performed a study measuring the impact that an enhanced awareness has on the perception of mobile privacy with 32 participants. From this study, users became more concerned with privacy where 79% believe applications misuse data and 89% believe they have little control of their data. EMPAware demonstrates that when users have a better understanding in how applications use their data, they become more concerned with the privacy.","PeriodicalId":117291,"journal":{"name":"Proceedings of the 2021 ACM Workshop on Security and Privacy Analytics","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124229334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Detecting Telephone-based Social Engineering Attacks using Scam Signatures 使用诈骗签名检测基于电话的社会工程攻击

Proceedings of the 2021 ACM Workshop on Security and Privacy Analytics

Pub Date : 2021-04-28 DOI: 10.1145/3445970.3451152

Ali Derakhshan, I. Harris, Mitra Behzadi

As social engineering attacks have become prevalent, people are increasingly convinced to give their important personal or financial information to attackers. Telephone scams are common and less well-studied than phishing emails. We have found that social engineering attacks can be characterized by a set of speech acts which are performed as part of the scam. A speech act is statements or utterances expressed by an individual that not only conveys information but also performs an action. Although attackers adjust their delivery and wording on the phone to match the victim, scams can be grouped into classes that all share common speech acts. Each scam type is identified by a set of speech acts that are collectively referred to as a scam signature. We present a social engineering detection approach called the Anti-Social Engineering Tool ASsET, which detects attacks based on the semantic content of the conversation. Our approach uses word embedding techniques from natural language processing to determine if the meaning of a scam signature is contained in a conversation. In order to evaluate our approach, a dataset of telephone scams has been gathered which are written by volunteers based on examples of real scams from official websites. This dataset is the first telephone-based scam dataset, to the best of our knowledge. Our detection method was able to distinguish scam and non-scam calls with high accuracy.

随着社会工程攻击变得普遍，人们越来越倾向于向攻击者提供重要的个人或财务信息。电话诈骗是常见的，而且没有网络钓鱼电子邮件那么深入研究。我们发现，社会工程攻击可以通过一系列作为骗局一部分的言语行为来表现。言语行为是一个人表达的语句或话语，它不仅传达了信息，而且执行了一个动作。尽管攻击者会调整他们在电话中的表达和措辞，以匹配受害者，但诈骗可以被分为几类，这些类都有共同的语言行为。每一种骗局类型都是通过一组语言行为来识别的，这些行为统称为骗局签名。我们提出了一种社会工程检测方法，称为反社会工程工具资产，它根据会话的语义内容检测攻击。我们的方法使用自然语言处理中的词嵌入技术来确定诈骗签名的含义是否包含在对话中。为了评估我们的方法，我们收集了一个电话诈骗的数据集，这些数据集是由志愿者根据官方网站上的真实诈骗案例编写的。据我们所知，这个数据集是第一个基于电话的骗局数据集。我们的检测方法能够以较高的准确率区分诈骗和非诈骗电话。

{"title":"Detecting Telephone-based Social Engineering Attacks using Scam Signatures","authors":"Ali Derakhshan, I. Harris, Mitra Behzadi","doi":"10.1145/3445970.3451152","DOIUrl":"https://doi.org/10.1145/3445970.3451152","url":null,"abstract":"As social engineering attacks have become prevalent, people are increasingly convinced to give their important personal or financial information to attackers. Telephone scams are common and less well-studied than phishing emails. We have found that social engineering attacks can be characterized by a set of speech acts which are performed as part of the scam. A speech act is statements or utterances expressed by an individual that not only conveys information but also performs an action. Although attackers adjust their delivery and wording on the phone to match the victim, scams can be grouped into classes that all share common speech acts. Each scam type is identified by a set of speech acts that are collectively referred to as a scam signature. We present a social engineering detection approach called the Anti-Social Engineering Tool ASsET, which detects attacks based on the semantic content of the conversation. Our approach uses word embedding techniques from natural language processing to determine if the meaning of a scam signature is contained in a conversation. In order to evaluate our approach, a dataset of telephone scams has been gathered which are written by volunteers based on examples of real scams from official websites. This dataset is the first telephone-based scam dataset, to the best of our knowledge. Our detection method was able to distinguish scam and non-scam calls with high accuracy.","PeriodicalId":117291,"journal":{"name":"Proceedings of the 2021 ACM Workshop on Security and Privacy Analytics","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121388679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

AI vs. AI: Exploring the Intersections of AI and Cybersecurity 人工智能与人工智能:探索人工智能与网络安全的交叉点

Proceedings of the 2021 ACM Workshop on Security and Privacy Analytics

Pub Date : 2021-04-28 DOI: 10.1145/3445970.3456286

Ian Molloy, J. Rao, M. Stoecklin

The future of cybersecurity will pit AI against AI. In this talk, we explore the role of AI in strengthening security defenses as well as the role of security in protecting AI services. We expect that the scale, scope and frequency of cyber attacks will increase disruptively with attackers harnessing AI to develop attacks that are even more targeted, sophisticated and evasive. At the same time, analysts in security operations centers are being increasingly overwhelmed in their efforts to keep up with the tasks of detecting, managing and responding to attacks. To cope, the security industry and practitioners are experimenting with the application of AI and machine learning technologies in different areas of security operations. These include a diverse set of areas such as detecting (mis)behaviors and malware, extracting and consolidating threat intelligence, reasoning over security alerts, and recommending countermeasures and/or protective measures. At the same time, adversarial attacks on machine learning systems have become an indisputable threat. Attackers can compromise the training of machine learning models by injecting malicious data into the training set (so-called poisoning attacks), or by crafting adversarial samples that exploit the blind spots of machine learning models at test time (so-called evasion attacks). Adversarial attacks have been demonstrated in a number of different application domains, including malware detection, spam filtering, visual recognition, speech-to-text conversion, and natural language understanding. Devising comprehensive defenses against poisoning and evasion attacks by adaptive adversaries is still an open challenge. Thus, gaining a better understanding of the threat by adversarial attacks and developing more effective defense systems and methods are paramount for the adoption of machine learning systems in security-critical real-world applications. The talk will provide an industrial research perspective and will cover research conducted at IBM Security Research over the several years.

网络安全的未来将是人工智能与人工智能之间的较量。在本次演讲中，我们将探讨人工智能在加强安全防御方面的作用，以及安全在保护人工智能服务方面的作用。我们预计，随着攻击者利用人工智能开发更有针对性、更复杂、更难以捉摸的攻击，网络攻击的规模、范围和频率将会增加。与此同时，安全运营中心的分析人员在检测、管理和响应攻击的任务中越来越不堪重负。为了应对这种情况，安防行业和从业人员正在尝试将人工智能和机器学习技术应用于不同的安防业务领域。这包括一系列不同的领域，如检测(错误)行为和恶意软件、提取和整合威胁情报、对安全警报进行推理，以及建议对策和/或保护措施。与此同时，对机器学习系统的对抗性攻击已经成为一种无可争辩的威胁。攻击者可以通过向训练集中注入恶意数据(所谓的中毒攻击)，或者通过在测试时利用机器学习模型的盲点制作对抗性样本(所谓的逃避攻击)来破坏机器学习模型的训练。对抗性攻击已经在许多不同的应用领域得到了证明，包括恶意软件检测、垃圾邮件过滤、视觉识别、语音到文本转换和自然语言理解。设计针对自适应对手的中毒和逃避攻击的综合防御仍然是一个公开的挑战。因此，更好地了解对抗性攻击的威胁，开发更有效的防御系统和方法，对于在安全关键的现实世界应用中采用机器学习系统至关重要。该演讲将提供一个工业研究的视角，并将涵盖IBM安全研究在过去几年中所进行的研究。

{"title":"AI vs. AI: Exploring the Intersections of AI and Cybersecurity","authors":"Ian Molloy, J. Rao, M. Stoecklin","doi":"10.1145/3445970.3456286","DOIUrl":"https://doi.org/10.1145/3445970.3456286","url":null,"abstract":"The future of cybersecurity will pit AI against AI. In this talk, we explore the role of AI in strengthening security defenses as well as the role of security in protecting AI services. We expect that the scale, scope and frequency of cyber attacks will increase disruptively with attackers harnessing AI to develop attacks that are even more targeted, sophisticated and evasive. At the same time, analysts in security operations centers are being increasingly overwhelmed in their efforts to keep up with the tasks of detecting, managing and responding to attacks. To cope, the security industry and practitioners are experimenting with the application of AI and machine learning technologies in different areas of security operations. These include a diverse set of areas such as detecting (mis)behaviors and malware, extracting and consolidating threat intelligence, reasoning over security alerts, and recommending countermeasures and/or protective measures. At the same time, adversarial attacks on machine learning systems have become an indisputable threat. Attackers can compromise the training of machine learning models by injecting malicious data into the training set (so-called poisoning attacks), or by crafting adversarial samples that exploit the blind spots of machine learning models at test time (so-called evasion attacks). Adversarial attacks have been demonstrated in a number of different application domains, including malware detection, spam filtering, visual recognition, speech-to-text conversion, and natural language understanding. Devising comprehensive defenses against poisoning and evasion attacks by adaptive adversaries is still an open challenge. Thus, gaining a better understanding of the threat by adversarial attacks and developing more effective defense systems and methods are paramount for the adoption of machine learning systems in security-critical real-world applications. The talk will provide an industrial research perspective and will cover research conducted at IBM Security Research over the several years.","PeriodicalId":117291,"journal":{"name":"Proceedings of the 2021 ACM Workshop on Security and Privacy Analytics","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129710861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Scalable Role Mining Approach for Large Organizations 面向大型组织的可扩展角色挖掘方法

Proceedings of the 2021 ACM Workshop on Security and Privacy Analytics

Pub Date : 2021-04-28 DOI: 10.1145/3445970.3451154

Masoumeh Abolfathi, Zohreh Raghebi, J. H. Jafarian, F. Kashani

Role-based access control (RBAC) model has gained significant attention in cybersecurity in recent years. RBAC restricts system access only to authorized users based on the roles and regulations within an organization. The flexibility and usability of this model have encouraged organizations to migrate from traditional discretionary access control (DAC) models to RBAC. However, this transition requires accomplishing a very challenging task called role mining in which users' roles are generated from the existing access control lists. Although various approaches have been proposed to address this NP-complete problem in the literature, they suffer either from low scalability such that their execution time increases exponentially with the input size, or they rely on fast heuristics with low optimality that generate too many roles. In this paper, we introduce a highly scalable yet optimal approach to tackle the role mining problem. To this end, we utilize a non-negative rank reduced matrix decomposition method to decompose a large-scale user-permission assignment into two constitutive components, i.e. the user-role and role-permission assignments. Then, we apply a thresholding technique to convert real-valued components into binary-valued factors. We employ various access control configurations and demonstrate that our proposed model is able to effectively discover the latent relationship behind the user-permission data even with large datasets.

基于角色的访问控制(RBAC)模型近年来在网络安全领域得到了广泛的关注。RBAC根据组织内的角色和法规，限制授权用户访问系统。该模型的灵活性和可用性鼓励组织从传统的自主访问控制(DAC)模型迁移到RBAC。然而，这种转换需要完成一个非常具有挑战性的任务，称为角色挖掘，其中用户的角色是从现有的访问控制列表中生成的。尽管在文献中已经提出了各种方法来解决这个np完全问题，但它们要么具有低可伸缩性，使得它们的执行时间随着输入大小呈指数增长，要么依赖于具有低优化性的快速启发式，从而生成太多角色。在本文中，我们引入了一种高度可扩展且最优的方法来解决角色挖掘问题。为此，我们利用非负秩约简矩阵分解方法将大规模用户权限分配分解为两个组成部分，即用户角色分配和角色权限分配。然后，我们应用阈值技术将实值分量转换为二值因子。我们采用了各种访问控制配置，并证明我们提出的模型能够有效地发现用户权限数据背后的潜在关系，即使是大型数据集。

{"title":"A Scalable Role Mining Approach for Large Organizations","authors":"Masoumeh Abolfathi, Zohreh Raghebi, J. H. Jafarian, F. Kashani","doi":"10.1145/3445970.3451154","DOIUrl":"https://doi.org/10.1145/3445970.3451154","url":null,"abstract":"Role-based access control (RBAC) model has gained significant attention in cybersecurity in recent years. RBAC restricts system access only to authorized users based on the roles and regulations within an organization. The flexibility and usability of this model have encouraged organizations to migrate from traditional discretionary access control (DAC) models to RBAC. However, this transition requires accomplishing a very challenging task called role mining in which users' roles are generated from the existing access control lists. Although various approaches have been proposed to address this NP-complete problem in the literature, they suffer either from low scalability such that their execution time increases exponentially with the input size, or they rely on fast heuristics with low optimality that generate too many roles. In this paper, we introduce a highly scalable yet optimal approach to tackle the role mining problem. To this end, we utilize a non-negative rank reduced matrix decomposition method to decompose a large-scale user-permission assignment into two constitutive components, i.e. the user-role and role-permission assignments. Then, we apply a thresholding technique to convert real-valued components into binary-valued factors. We employ various access control configurations and demonstrate that our proposed model is able to effectively discover the latent relationship behind the user-permission data even with large datasets.","PeriodicalId":117291,"journal":{"name":"Proceedings of the 2021 ACM Workshop on Security and Privacy Analytics","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127155100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

SDGchain: When Service Dependency Graph Meets Blockchain to Enhance Privacy SDGchain:当服务依赖图满足区块链时增强隐私

Proceedings of the 2021 ACM Workshop on Security and Privacy Analytics

Pub Date : 2021-04-28 DOI: 10.1145/3445970.3451157

Rofaida Khemaissia, M. Derdour, A. Djeddai, M. Ferrag

Nowadays, the number of services is increasing which allows users to perform their tasks easily. A huge amount of data is published regularly where the personal data takes the lion's share. In fact, personal data protection remains a big issue as far as how the service provider collected the data and for what purpose. Therefore, the privacy of the data owner has to be preserved by putting a strategy to control the data usage by the services. The concept of inter services privacy has been marginalized in the previous privacy preserving works. This paper proposes SDGchain, a secure and privacy-preserving decentralized model that uses secure service dependency graph (SDG) combined with permissioned blockchain Hyperledger fabric, which cooperates with an off-chain storage. In our design the SDG is used to control inter services interactions by building dependencies, measuring the level of trust, calculating the quality of service to ensure the privacy preservation of the service using blockchain to support authentication, access control and the logging operations for immutable and auditable history.

如今，服务的数量正在增加，这使得用户可以轻松地执行任务。大量的数据定期发布，其中个人数据占据了最大的份额。事实上，就服务提供商如何收集数据以及出于何种目的而言，个人数据保护仍然是一个大问题。因此，必须通过制定策略来控制服务对数据的使用，从而保护数据所有者的隐私。在以往的隐私保护工作中，服务间隐私的概念被边缘化了。本文提出了SDGchain，这是一种安全且保护隐私的去中心化模型，它使用安全服务依赖图(SDG)与许可的区块链超级账本结构相结合，与链下存储协作。在我们的设计中，SDG用于通过构建依赖关系、测量信任级别、计算服务质量来控制服务间交互，以确保服务的隐私保护，使用区块链来支持身份验证、访问控制和不可变和可审计历史记录的日志操作。

引用次数: 4

WeStat: a Privacy-Preserving Mobile Data Usage Statistics System WeStat:保护隐私的移动数据使用统计系统

Proceedings of the 2021 ACM Workshop on Security and Privacy Analytics

Pub Date : 2021-04-28 DOI: 10.1145/3445970.3451151

Sébastien Canard, Nicolas Desmoulins, Sébastien Hallay, Adel Hamdi, Dominique Le Hello

The preponderance of smart devices, such as smartphones, has boosted the development and use of mobile applications (apps) in the recent years. This prevalence induces a large volume of mobile app usage data. The analysis of such information could lead to a better understanding of users' behaviours in using the apps they have installed, even more if these data can be coupled with a given context (location, time, date, sociological data...). However, mobile and apps usage data are very sensitive, and are today considered as personal. Their collection and use pose serious concerns associated with individuals' privacy. To reconcile harnessing of data and privacy of users, we investigate in this paper the possibility to conduct privacy-preserving mobile data usage statistics that will prevent any inference or re-identification risks. The key idea is for each user to encrypt their (private and sensitive) inputs before sending them to the data processor. The possibility to perform statistics on those data is then possible thanks to the use of functional encryption, a cryptographic building block permitting to perform some allowed operations over encrypted data. In this paper, we first show how it is possible to obtain such individuals' usage of their apps, which step is necessary for our use case, but can at the same time pose some security problems w.r.t. those apps. We then design our new encryption scheme, adding some fault tolerance property to a recent dynamic decentralized function encryption scheme. We finally show how we have implemented all that, and give some benchmarks.

近年来，智能手机等智能设备的普及推动了移动应用程序的开发和使用。这种流行导致了大量的手机应用使用数据。对这些信息的分析可以让我们更好地了解用户在使用他们安装的应用程序时的行为，如果这些数据可以与给定的上下文(地点、时间、日期、社会学数据……)相结合，那就更好了。然而，手机和应用程序的使用数据非常敏感，如今被视为个人数据。它们的收集和使用引发了与个人隐私相关的严重担忧。为了协调数据利用和用户隐私，我们在本文中研究了进行保护隐私的移动数据使用统计的可能性，这将防止任何推断或重新识别风险。关键思想是让每个用户在将其(私有且敏感的)输入发送给数据处理器之前对其进行加密。由于使用了功能加密，因此有可能对这些数据执行统计数据，功能加密是一种允许对加密数据执行某些允许操作的加密构建块。在本文中，我们首先展示了如何获得这些个人对其应用程序的使用情况，这一步对于我们的用例是必要的，但同时可能会给这些应用程序带来一些安全问题。然后，我们设计了新的加密方案，在最近的动态分散函数加密方案中增加了一些容错特性。我们最后展示了我们是如何实现所有这些的，并给出了一些基准。

{"title":"WeStat: a Privacy-Preserving Mobile Data Usage Statistics System","authors":"Sébastien Canard, Nicolas Desmoulins, Sébastien Hallay, Adel Hamdi, Dominique Le Hello","doi":"10.1145/3445970.3451151","DOIUrl":"https://doi.org/10.1145/3445970.3451151","url":null,"abstract":"The preponderance of smart devices, such as smartphones, has boosted the development and use of mobile applications (apps) in the recent years. This prevalence induces a large volume of mobile app usage data. The analysis of such information could lead to a better understanding of users' behaviours in using the apps they have installed, even more if these data can be coupled with a given context (location, time, date, sociological data...). However, mobile and apps usage data are very sensitive, and are today considered as personal. Their collection and use pose serious concerns associated with individuals' privacy. To reconcile harnessing of data and privacy of users, we investigate in this paper the possibility to conduct privacy-preserving mobile data usage statistics that will prevent any inference or re-identification risks. The key idea is for each user to encrypt their (private and sensitive) inputs before sending them to the data processor. The possibility to perform statistics on those data is then possible thanks to the use of functional encryption, a cryptographic building block permitting to perform some allowed operations over encrypted data. In this paper, we first show how it is possible to obtain such individuals' usage of their apps, which step is necessary for our use case, but can at the same time pose some security problems w.r.t. those apps. We then design our new encryption scheme, adding some fault tolerance property to a recent dynamic decentralized function encryption scheme. We finally show how we have implemented all that, and give some benchmarks.","PeriodicalId":117291,"journal":{"name":"Proceedings of the 2021 ACM Workshop on Security and Privacy Analytics","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125512890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Large Feature Mining and Deep Learning in Multimedia Forensics 多媒体取证中的大特征挖掘和深度学习

Proceedings of the 2021 ACM Workshop on Security and Privacy Analytics

Pub Date : 2021-04-28 DOI: 10.1145/3445970.3456285

Qingzhong Liu, Naciye Celebi

As one of the most interesting areas in cyber forensics, multimedia forensics faces many challenges as users are generating a humongous amount of data with different operations. Forgery detection and steganography detection are two hotspots in multimedia forensics. To solve some highly challenging problems in multimedia forensics, especially in image forensics, we will introduce in this tutorial large feature mining-based approaches with ensemble learning in image forgery detection, including seam-carving forgery and inpainting forgery in JPEG images with the subsequent anti-forensics' operations. We will also introduce deep learning and apply the well-known deep learning models that were transferred and used for image forgery detection and image steganalysis, which considerably improve the detection accuracy.

作为网络取证中最有趣的领域之一，多媒体取证面临着许多挑战，因为用户正在通过不同的操作生成大量的数据。伪造检测和隐写检测是多媒体取证研究的两个热点。为了解决多媒体取证中一些极具挑战性的问题，特别是在图像取证中，我们将在本教程中介绍图像伪造检测中基于集成学习的基于大特征挖掘的方法，包括接缝雕刻伪造和在JPEG图像中绘制伪造，以及随后的反取证操作。我们还将引入深度学习，并将众所周知的深度学习模型应用于图像伪造检测和图像隐写分析，这大大提高了检测精度。

引用次数: 1

PRICURE: Privacy-Preserving Collaborative Inference in a Multi-Party Setting PRICURE:多方环境下保护隐私的协同推理

Proceedings of the 2021 ACM Workshop on Security and Privacy Analytics

Pub Date : 2021-02-19 DOI: 10.1145/3445970.3451156

Ismat Jarin, Birhanu Eshete

When multiple parties that deal with private data aim for a collaborative prediction task such as medical image classification, they are often constrained by data protection regulations and lack of trust among collaborating parties. If done in a privacy-preserving manner, predictive analytics can benefit from the collective prediction capability of multiple parties holding complementary datasets on the same machine learning task. This paper presents PRICURE, a system that combines complementary strengths of secure multi-party computation (SMPC) and differential privacy (DP) to enable privacy-preserving collaborative prediction among multiple model owners. SMPC enables secret-sharing of private models and client inputs with non-colluding secure servers to compute predictions without leaking model parameters and inputs. DP masks true prediction results via noisy aggregation so as to deter a semi-honest client who may mount membership inference attacks. We evaluate PRICURE on neural networks across four datasets including benchmark medical image classification datasets. Our results suggest PRICURE guarantees privacy for tens of model owners and clients with acceptable accuracy loss. We also show that DP reduces membership inference attack exposure without hurting accuracy.

当处理私人数据的多方以协同预测任务(如医学图像分类)为目标时，他们往往受到数据保护法规和协作各方之间缺乏信任的限制。如果以保护隐私的方式进行，预测分析可以受益于在同一机器学习任务上持有互补数据集的多方的集体预测能力。本文提出了一种将安全多方计算(SMPC)和差分隐私(DP)的互补优势相结合的PRICURE系统，以实现多个模型所有者之间保持隐私的协作预测。SMPC支持私有模型和客户端输入与非串通安全服务器的秘密共享，从而在不泄漏模型参数和输入的情况下计算预测。DP通过噪声聚合掩盖真实的预测结果，从而阻止可能发动成员推理攻击的半诚实客户端。我们在包括基准医学图像分类数据集在内的四个数据集上对神经网络的PRICURE进行了评估。我们的结果表明，PRICURE在可接受的精度损失下保证了数十个模型所有者和客户的隐私。我们还表明，DP在不损害准确性的情况下减少了隶属度推理攻击的暴露。

{"title":"PRICURE: Privacy-Preserving Collaborative Inference in a Multi-Party Setting","authors":"Ismat Jarin, Birhanu Eshete","doi":"10.1145/3445970.3451156","DOIUrl":"https://doi.org/10.1145/3445970.3451156","url":null,"abstract":"When multiple parties that deal with private data aim for a collaborative prediction task such as medical image classification, they are often constrained by data protection regulations and lack of trust among collaborating parties. If done in a privacy-preserving manner, predictive analytics can benefit from the collective prediction capability of multiple parties holding complementary datasets on the same machine learning task. This paper presents PRICURE, a system that combines complementary strengths of secure multi-party computation (SMPC) and differential privacy (DP) to enable privacy-preserving collaborative prediction among multiple model owners. SMPC enables secret-sharing of private models and client inputs with non-colluding secure servers to compute predictions without leaking model parameters and inputs. DP masks true prediction results via noisy aggregation so as to deter a semi-honest client who may mount membership inference attacks. We evaluate PRICURE on neural networks across four datasets including benchmark medical image classification datasets. Our results suggest PRICURE guarantees privacy for tens of model owners and clients with acceptable accuracy loss. We also show that DP reduces membership inference attack exposure without hurting accuracy.","PeriodicalId":117291,"journal":{"name":"Proceedings of the 2021 ACM Workshop on Security and Privacy Analytics","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114638609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

TrollHunter2020: Real-time Detection of Trolling Narratives on Twitter During the 2020 U.S. Elections TrollHunter2020:在2020年美国大选期间实时检测推特上的喷子叙述

Proceedings of the 2021 ACM Workshop on Security and Privacy Analytics

Pub Date : 2020-12-04 DOI: 10.1145/3445970.3451158

Peter Jachim, Filipo Sharevski, Emma Pieroni

This paper presents TrollHunter2020, a real-time detection mechanism we used to hunt for trolling narratives on Twitter during and in the aftermath of the 2020 U.S. elections. Trolling narratives form on Twitter as alternative explanations of polarizing events with the goal of conducting information operations or provoking emotional responses. Detecting trolling narratives thus is an imperative step to preserve constructive discourse on Twitter and remove the influx of misinformation. Using existing techniques, the detection of such content takes time and a wealth of data, which, in a rapidly changing election cycle with high stakes, might not be available. To overcome this limitation, we developed TrollHunter2020 to hunt for trolls in real-time with several dozen trending Twitter topics and hashtags corresponding to the candidates' debates, the election night, and the election aftermath. TrollHunter2020 utilizes a correspondence analysis to detect meaningful relationships between the top nouns and verbs used in constructing trolling narratives while they emerge on Twitter. Our results suggest that the TrollHunter2020 indeed captures the emerging trolling narratives in a very early stage of an unfolding polarizing event. We discuss the utility of TrollHunter2020 for early detection of information operations or trolling and the implications of its use in supporting a constrictive discourse on the platform around polarizing topics.

本文介绍了TrollHunter2020，这是一种实时检测机制，我们用来在2020年美国大选期间和之后在Twitter上寻找喷子叙述。在Twitter上形成的巨魔叙事是对两极分化事件的另一种解释，目的是进行信息操作或引发情绪反应。因此，检测恶意言论是维护Twitter上建设性话语和消除大量错误信息的必要步骤。使用现有技术，检测此类内容需要时间和大量数据，而在瞬息万变的高风险选举周期中，这些数据可能无法获得。为了克服这一限制，我们开发了TrollHunter2020，通过数十个热门推特话题和标签来实时搜索巨魔，这些话题和标签与候选人的辩论、选举之夜和选举后果相对应。TrollHunter2020利用对应分析来检测在推特上出现时用于构建喷子叙述的顶级名词和动词之间的有意义的关系。我们的研究结果表明，巨魔猎人2020确实在两极分化事件的早期阶段捕捉到了新兴的巨魔叙事。我们讨论了TrollHunter2020在早期发现信息操作或拖钓方面的效用，以及它在支持围绕两极分化话题的平台上的限制性话语方面的意义。

{"title":"TrollHunter2020: Real-time Detection of Trolling Narratives on Twitter During the 2020 U.S. Elections","authors":"Peter Jachim, Filipo Sharevski, Emma Pieroni","doi":"10.1145/3445970.3451158","DOIUrl":"https://doi.org/10.1145/3445970.3451158","url":null,"abstract":"This paper presents TrollHunter2020, a real-time detection mechanism we used to hunt for trolling narratives on Twitter during and in the aftermath of the 2020 U.S. elections. Trolling narratives form on Twitter as alternative explanations of polarizing events with the goal of conducting information operations or provoking emotional responses. Detecting trolling narratives thus is an imperative step to preserve constructive discourse on Twitter and remove the influx of misinformation. Using existing techniques, the detection of such content takes time and a wealth of data, which, in a rapidly changing election cycle with high stakes, might not be available. To overcome this limitation, we developed TrollHunter2020 to hunt for trolls in real-time with several dozen trending Twitter topics and hashtags corresponding to the candidates' debates, the election night, and the election aftermath. TrollHunter2020 utilizes a correspondence analysis to detect meaningful relationships between the top nouns and verbs used in constructing trolling narratives while they emerge on Twitter. Our results suggest that the TrollHunter2020 indeed captures the emerging trolling narratives in a very early stage of an unfolding polarizing event. We discuss the utility of TrollHunter2020 for early detection of information operations or trolling and the implications of its use in supporting a constrictive discourse on the platform around polarizing topics.","PeriodicalId":117291,"journal":{"name":"Proceedings of the 2021 ACM Workshop on Security and Privacy Analytics","volume":"382 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115852504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11