2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)最新文献_第9页

Assessing the Similarity of Smart Contracts by Clustering their Interfaces 基于接口聚类的智能合约相似性评估

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

Pub Date : 2020-12-01 DOI: 10.1109/TrustCom50675.2020.00261

Monika Di Angelo, G. Salzer

Like most programs, smart contracts offer their functionality via entry points that constitute the interface. Interface standards, e.g. for tokens contracts, foster interoperability. Ethereum is the most prominent platform for smart contracts. The number of contract deployments approaches 30 million, corresponding to roughly 300 000 distinct contract codes. In view of these numbers, it is necessary to develop automated methods for classifying contracts regarding their purpose, if one aims at a qualitative and quantitative understanding of what blockchain applications are used for at large. We approach the task by considering contracts as similar if their interfaces are. We encode interfaces and their interrelationships as graphs and explore several algorithms regarding their ability to find clusters of functionally similar contracts. Our evaluation of the quality of clustering relies on a ground truth of token and wallet contracts identified in earlier work. Our analysis is based on the bytecodes deployed on the main chain of Ethereum up to block 10.5 million, mined on July 21, 2020.

像大多数程序一样，智能合约通过构成接口的入口点提供功能。接口标准，例如代币合约，促进互操作性。以太坊是智能合约最突出的平台。合同部署的数量接近3000万，对应大约30万个不同的合同代码。鉴于这些数字，如果人们的目标是对区块链应用程序的总体用途进行定性和定量理解，就有必要开发自动化的方法来对合同进行分类。如果契约的接口是相似的，我们就认为它们是相似的。我们将接口及其相互关系编码为图，并探索了几种关于它们找到功能相似契约簇的能力的算法。我们对聚类质量的评估依赖于早期工作中确定的令牌和钱包合约的基本事实。我们的分析是基于部署在以太坊主链上的字节码，截至2020年7月21日开采的1050万个区块。

{"title":"Assessing the Similarity of Smart Contracts by Clustering their Interfaces","authors":"Monika Di Angelo, G. Salzer","doi":"10.1109/TrustCom50675.2020.00261","DOIUrl":"https://doi.org/10.1109/TrustCom50675.2020.00261","url":null,"abstract":"Like most programs, smart contracts offer their functionality via entry points that constitute the interface. Interface standards, e.g. for tokens contracts, foster interoperability. Ethereum is the most prominent platform for smart contracts. The number of contract deployments approaches 30 million, corresponding to roughly 300 000 distinct contract codes. In view of these numbers, it is necessary to develop automated methods for classifying contracts regarding their purpose, if one aims at a qualitative and quantitative understanding of what blockchain applications are used for at large. We approach the task by considering contracts as similar if their interfaces are. We encode interfaces and their interrelationships as graphs and explore several algorithms regarding their ability to find clusters of functionally similar contracts. Our evaluation of the quality of clustering relies on a ground truth of token and wallet contracts identified in earlier work. Our analysis is based on the bytecodes deployed on the main chain of Ethereum up to block 10.5 million, mined on July 21, 2020.","PeriodicalId":221956,"journal":{"name":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122685216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Towards A New Approach to Identify WhatsApp Messages 迈向识别WhatsApp消息的新方法

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

Pub Date : 2020-12-01 DOI: 10.1109/TrustCom50675.2020.00259

R. Cents, Nhien-An Le-Khac

Today traditional communication methods, such as SMS or phone calls, are used less often and are replaced by the use of chat applications. WhatsApp is one of the most popular chat applications nowadays. WhatsApp offers different ways of communicating, which include sending text messages and making phone calls. The implementation of encryption makes WhatsApp more challenging for law enforcement agencies to identify when a suspect is sending or receiving messages via this chat application. Most research in literature focused on the analysis of WhatsApp data by obtaining information from a physical device, such as a seized mobile device. However, it is not always possible to extract the data needed from a mobile device for the analysis of the WhatsApp data because of the encryption, or no devices have been seized yet. In addition, the current techniques for real time analysis of WhatsApp messages show that there is a high risk of detection by the suspect. Alternative methods are needed to understand the communication patterns of a suspect and criminal organizations. In this paper, we focused on identifying when a suspect is receiving or sending WhatsApp messages using only wiretap data. Therefore, no seized devices are needed. The pattern analysis has been used to identify patterns of data sent to and received from the WhatsApp servers. The identified patterns were tested against a large dataset created with different mobile devices to determine if the patterns are consistent. By using the technique described in this paper, investigators will obtain more information if and with whom a suspect is communicating.

如今，传统的通信方式，如短信或电话，使用的频率越来越低，取而代之的是聊天应用程序的使用。WhatsApp是当今最流行的聊天应用程序之一。WhatsApp提供不同的沟通方式，包括发短信和打电话。加密的实施使执法机构更难识别嫌疑人何时通过这款聊天应用程序发送或接收消息。文献中的大多数研究都集中在通过从物理设备(例如被扣押的移动设备)获取信息来分析WhatsApp数据。然而，由于加密，从移动设备中提取分析WhatsApp数据所需的数据并不总是可能的，或者还没有设备被扣押。此外，目前对WhatsApp消息进行实时分析的技术表明，被嫌疑人发现的风险很高。需要其他方法来了解嫌疑人和犯罪组织的通信模式。在本文中，我们专注于仅使用窃听数据识别嫌疑人何时接收或发送WhatsApp消息。因此，不需要占用设备。模式分析已被用于识别发送和接收来自WhatsApp服务器的数据模式。在使用不同移动设备创建的大型数据集上对识别出的模式进行了测试，以确定模式是否一致。通过使用本文中描述的技术，调查人员将获得更多的信息，如果和谁嫌疑人通信。

{"title":"Towards A New Approach to Identify WhatsApp Messages","authors":"R. Cents, Nhien-An Le-Khac","doi":"10.1109/TrustCom50675.2020.00259","DOIUrl":"https://doi.org/10.1109/TrustCom50675.2020.00259","url":null,"abstract":"Today traditional communication methods, such as SMS or phone calls, are used less often and are replaced by the use of chat applications. WhatsApp is one of the most popular chat applications nowadays. WhatsApp offers different ways of communicating, which include sending text messages and making phone calls. The implementation of encryption makes WhatsApp more challenging for law enforcement agencies to identify when a suspect is sending or receiving messages via this chat application. Most research in literature focused on the analysis of WhatsApp data by obtaining information from a physical device, such as a seized mobile device. However, it is not always possible to extract the data needed from a mobile device for the analysis of the WhatsApp data because of the encryption, or no devices have been seized yet. In addition, the current techniques for real time analysis of WhatsApp messages show that there is a high risk of detection by the suspect. Alternative methods are needed to understand the communication patterns of a suspect and criminal organizations. In this paper, we focused on identifying when a suspect is receiving or sending WhatsApp messages using only wiretap data. Therefore, no seized devices are needed. The pattern analysis has been used to identify patterns of data sent to and received from the WhatsApp servers. The identified patterns were tested against a large dataset created with different mobile devices to determine if the patterns are consistent. By using the technique described in this paper, investigators will obtain more information if and with whom a suspect is communicating.","PeriodicalId":221956,"journal":{"name":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122167703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

ALBFL: A Novel Neural Ranking Model for Software Fault Localization via Combining Static and Dynamic Features ALBFL:一种结合静态和动态特征的软件故障定位神经网络排序模型

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

Pub Date : 2020-12-01 DOI: 10.1109/TrustCom50675.2020.00107

Yuqing Pan, Xi Xiao, Guangwu Hu, Bin Zhang, Qing Li, Haitao Zheng

Automatic fault localization plays a significant role in assisting developers to fix software bugs efficiently. Although existing approaches, e.g., static methods and dynamic ones, have greatly alleviated this problem by analyzing static features in source code and diagnosing dynamic behaviors in software running state respectively, the fault localization accuracy still does not meet user requirements. To improve the fault locating ability with statement granularity, this paper proposes ALBFL, a novel neural ranking model that involves the attention mechanism and the LambdaRank model, which can integrate the static and dynamic features and achieve very high accuracy for identifying software faults. ALBFL first introduces a transformer encoder to learn the semantic features from software source code. Also, it leverages other static statistical features and dynamic features, i.e., eleven Spectrum-Based Fault Localization (SBFL) features, three mutation features, to evaluate software together. Specially, the two types of features are integrated through a self-attention layer, and fed into the LambdaRank model so as to rank a list of possible fault statements. Finally, thorough experiments are conducted on 5 open-source projects with 357 faulty programs in Defects4J. The results show that ALBFL outperforms 11 traditional SBFL methods (by three times) and 2 state-of-the-art approaches (by 13%) on ranking faulty statements in the first position.

自动故障定位在帮助开发人员有效地修复软件错误方面起着重要的作用。虽然现有的静态方法和动态方法分别通过分析源代码中的静态特征和诊断软件运行状态中的动态行为，大大缓解了这一问题，但故障定位的精度仍然不能满足用户的要求。为了提高基于语句粒度的故障定位能力，本文提出了一种新的神经排序模型ALBFL，该模型结合了注意机制和LambdaRank模型，可以将静态和动态特征结合起来，对软件故障进行识别，具有很高的准确率。ALBFL首先引入了一个转换器编码器，从软件源代码中学习语义特征。此外，它还利用其他静态统计特征和动态特征，即11个基于谱的故障定位(SBFL)特征和3个突变特征，共同对软件进行评估。特别地，这两种类型的特征通过一个自关注层集成，并输入到LambdaRank模型中，从而对可能的故障陈述列表进行排序。最后，在缺陷4j中对包含357个错误程序的5个开源项目进行了彻底的实验。结果表明，ALBFL在将错误语句排在第一位置上优于11种传统的SBFL方法(高出3倍)和2种最先进的方法(高出13%)。

{"title":"ALBFL: A Novel Neural Ranking Model for Software Fault Localization via Combining Static and Dynamic Features","authors":"Yuqing Pan, Xi Xiao, Guangwu Hu, Bin Zhang, Qing Li, Haitao Zheng","doi":"10.1109/TrustCom50675.2020.00107","DOIUrl":"https://doi.org/10.1109/TrustCom50675.2020.00107","url":null,"abstract":"Automatic fault localization plays a significant role in assisting developers to fix software bugs efficiently. Although existing approaches, e.g., static methods and dynamic ones, have greatly alleviated this problem by analyzing static features in source code and diagnosing dynamic behaviors in software running state respectively, the fault localization accuracy still does not meet user requirements. To improve the fault locating ability with statement granularity, this paper proposes ALBFL, a novel neural ranking model that involves the attention mechanism and the LambdaRank model, which can integrate the static and dynamic features and achieve very high accuracy for identifying software faults. ALBFL first introduces a transformer encoder to learn the semantic features from software source code. Also, it leverages other static statistical features and dynamic features, i.e., eleven Spectrum-Based Fault Localization (SBFL) features, three mutation features, to evaluate software together. Specially, the two types of features are integrated through a self-attention layer, and fed into the LambdaRank model so as to rank a list of possible fault statements. Finally, thorough experiments are conducted on 5 open-source projects with 357 faulty programs in Defects4J. The results show that ALBFL outperforms 11 traditional SBFL methods (by three times) and 2 state-of-the-art approaches (by 13%) on ranking faulty statements in the first position.","PeriodicalId":221956,"journal":{"name":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125813441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Asset-Oriented Threat Modeling 面向资产的威胁建模

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

Pub Date : 2020-12-01 DOI: 10.1109/TrustCom50675.2020.00073

Nan Messe, Vanea Chiprianov, Nicolas Belloir, Jamal El Hachem, Régis Fleurquin, Salah Sadou

Threat modeling is recognized as one of the most important activities in software security. It helps to address security issues in software development. Several threat modeling processes are widely used in the industry such as the one of Microsoft SDL. In threat modeling, it is essential to first identify assets before enumerating threats, in order to diagnose the threat targets and spot the protection mechanisms. Asset identification and threat enumeration are collaborative activities involving many actors such as security experts and software architects. These activities are traditionally carried out in brainstorming sessions. Due to the lack of guidance, the lack of a sufficiently formalized process, the high dependence on actors' knowledge, and the variety of actors' background, these actors often have difficulties collaborating with each other. Brainstorming sessions are thus often conducted sub-optimally and require significant effort. To address this problem, we aim at structuring the asset identification phase by proposing a systematic asset identification process, which is based on a reference model. This process structures and identifies relevant assets, facilitating the threat enumeration during brainstorming. We illustrate the proposed process with a case study and show the usefulness of our process in supporting threat enumeration and improving existing threat modeling processes such as the Microsoft SDL one.

威胁建模被认为是软件安全中最重要的活动之一。它有助于解决软件开发中的安全问题。业界广泛使用了几种威胁建模过程，例如Microsoft SDL。在威胁建模中，为了诊断威胁目标和发现保护机制，在列举威胁之前首先识别资产是至关重要的。资产识别和威胁枚举是涉及许多参与者(如安全专家和软件架构师)的协作活动。这些活动传统上是在头脑风暴会议中进行的。由于缺乏指导，缺乏足够形式化的过程，对参与者知识的高度依赖，以及参与者背景的多样性，这些参与者往往难以相互协作。因此，头脑风暴会议往往进行得不够理想，需要付出巨大的努力。为了解决这个问题，我们的目标是通过提出一个基于参考模型的系统的资产识别过程来构建资产识别阶段。这个过程结构和识别相关资产，便于在头脑风暴期间列举威胁。我们通过一个案例研究说明了所建议的流程，并展示了我们的流程在支持威胁枚举和改进现有威胁建模流程(如Microsoft SDL)方面的有用性。

{"title":"Asset-Oriented Threat Modeling","authors":"Nan Messe, Vanea Chiprianov, Nicolas Belloir, Jamal El Hachem, Régis Fleurquin, Salah Sadou","doi":"10.1109/TrustCom50675.2020.00073","DOIUrl":"https://doi.org/10.1109/TrustCom50675.2020.00073","url":null,"abstract":"Threat modeling is recognized as one of the most important activities in software security. It helps to address security issues in software development. Several threat modeling processes are widely used in the industry such as the one of Microsoft SDL. In threat modeling, it is essential to first identify assets before enumerating threats, in order to diagnose the threat targets and spot the protection mechanisms. Asset identification and threat enumeration are collaborative activities involving many actors such as security experts and software architects. These activities are traditionally carried out in brainstorming sessions. Due to the lack of guidance, the lack of a sufficiently formalized process, the high dependence on actors' knowledge, and the variety of actors' background, these actors often have difficulties collaborating with each other. Brainstorming sessions are thus often conducted sub-optimally and require significant effort. To address this problem, we aim at structuring the asset identification phase by proposing a systematic asset identification process, which is based on a reference model. This process structures and identifies relevant assets, facilitating the threat enumeration during brainstorming. We illustrate the proposed process with a case study and show the usefulness of our process in supporting threat enumeration and improving existing threat modeling processes such as the Microsoft SDL one.","PeriodicalId":221956,"journal":{"name":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130043712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Awareness of Secure Coding Guidelines in the Industry - A first data analysis 业界对安全编码指引的认识-首个数据分析

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

Pub Date : 2020-12-01 DOI: 10.1109/TrustCom50675.2020.00055

T. Gasiba, U. Lechner, M. Pinto-Albuquerque, Daniel Méndez Fernández

Software needs to be secure, in particular, when deployed to critical infrastructures. Secure coding guidelines capture practices in industrial software engineering to ensure the security of code. This study aims to assess the level of awareness of secure coding in industrial software engineering, the skills of software developers to spot weaknesses in software code, avoid them, and the organizational support to adhere to coding guidelines. The approach draws on well-established theories of policy compliance, neutralization theory, and security-related stress and the authors' many years of experience in industrial software engineering and on lessons identified from training secure coding in the industry. The paper presents the questionnaire design for the online survey and the first analysis of data from the pilot study.

软件需要安全，特别是在部署到关键基础设施时。安全编码指南捕获了工业软件工程中的实践，以确保代码的安全性。本研究旨在评估工业软件工程中安全编码的意识水平，软件开发人员发现软件代码中的弱点并避免它们的技能，以及坚持编码指南的组织支持。该方法借鉴了政策遵从性、中和理论和安全相关压力的成熟理论，以及作者在工业软件工程方面的多年经验，以及在行业中培训安全编码的经验教训。本文介绍了在线调查的问卷设计，并对试点研究的数据进行了初步分析。

引用次数: 13

IoT-Sphere: A Framework To Secure IoT Devices From Becoming Attack Target And Attack Source 物联网领域:防止物联网设备成为攻击目标和攻击源的框架

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

Pub Date : 2020-12-01 DOI: 10.1109/TrustCom50675.2020.00189

Syed Ghazanfar Abbas, M. Husnain, U. U. Fayyaz, F. Shahzad, G. Shah, K. Zafar

In this research we propose a framework that will strengthen the IoT devices security from dual perspectives; avoid devices to become attack target as well as a source of an attack. Unlike traditional devices, IoT devices are equipped with insufficient host-based defense system and a continuous internet connection. All time internet enabled devices with insufficient security allures the attackers to use such devices and carry out their attacks on rest of internet. When plethora of vulnerable devices become source of an attack, intensity of such attacks increases exponentially. Mirai was one of the first well-known attack that exploited large number of vulnerable IoT devices, that bring down a large part of Internet. To strengthen the IoT devices from dual security perspective, we propose a two step framework. Firstly, confine the communication boundary of IoT devices; IoT-Sphere. A sphere of IPs that are allowed to communicate with a device. Any communication that violates the sphere will be blocked at the gateway level. Secondly, only allowed communication will be evaluated for potential attacks and anomalies using advance detection engines. To show the effectiveness of our proposed framework, we perform couple of attacks on IoT devices; camera and google home and show the feasibility of IoT-Sphere.

在本研究中，我们提出了一个框架，将从两个角度加强物联网设备的安全性;避免设备成为攻击目标和攻击源。与传统设备不同，物联网设备配备的基于主机的防御系统和持续的互联网连接不足。安全性不足的所有支持互联网的设备都会诱使攻击者使用这些设备并对互联网的其余部分进行攻击。当大量易受攻击的设备成为攻击源时，这种攻击的强度会呈指数级增长。Mirai是第一个众所周知的攻击之一，它利用了大量易受攻击的物联网设备，导致大部分互联网瘫痪。为了从双重安全的角度加强物联网设备，我们提出了一个两步框架。首先，限制物联网设备的通信边界;IoT-Sphere。允许与设备通信的ip范围。任何违反球体的通信都将在网关级别被阻止。其次，只有允许的通信才会使用先进的检测引擎来评估潜在的攻击和异常。为了证明我们提出的框架的有效性，我们对物联网设备进行了几次攻击;并展示物联网领域的可行性。

{"title":"IoT-Sphere: A Framework To Secure IoT Devices From Becoming Attack Target And Attack Source","authors":"Syed Ghazanfar Abbas, M. Husnain, U. U. Fayyaz, F. Shahzad, G. Shah, K. Zafar","doi":"10.1109/TrustCom50675.2020.00189","DOIUrl":"https://doi.org/10.1109/TrustCom50675.2020.00189","url":null,"abstract":"In this research we propose a framework that will strengthen the IoT devices security from dual perspectives; avoid devices to become attack target as well as a source of an attack. Unlike traditional devices, IoT devices are equipped with insufficient host-based defense system and a continuous internet connection. All time internet enabled devices with insufficient security allures the attackers to use such devices and carry out their attacks on rest of internet. When plethora of vulnerable devices become source of an attack, intensity of such attacks increases exponentially. Mirai was one of the first well-known attack that exploited large number of vulnerable IoT devices, that bring down a large part of Internet. To strengthen the IoT devices from dual security perspective, we propose a two step framework. Firstly, confine the communication boundary of IoT devices; IoT-Sphere. A sphere of IPs that are allowed to communicate with a device. Any communication that violates the sphere will be blocked at the gateway level. Secondly, only allowed communication will be evaluated for potential attacks and anomalies using advance detection engines. To show the effectiveness of our proposed framework, we perform couple of attacks on IoT devices; camera and google home and show the feasibility of IoT-Sphere.","PeriodicalId":221956,"journal":{"name":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127802465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

DNRTI: A Large-scale Dataset for Named Entity Recognition in Threat Intelligence DNRTI:威胁情报中命名实体识别的大规模数据集

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

Pub Date : 2020-12-01 DOI: 10.1109/TrustCom50675.2020.00252

Xuren Wang, Xinpei Liu, Shengqin Ao, Ning Li, Zhengwei Jiang, Zongyi Xu, Zihan Xiong, Mengbo Xiong, Xiaoqing Zhang

Named entity recognition is an important and challenging problem in Natural language processing. Although the past decade has witnessed major advances in entity recognition in many fields, such successes have been slow to network security field, not only because of the data in the network security field is very professional, but also due to the sensitive information in the data. To advance named entity recognition research in network security field, we introduce a large-scale Dataset for Named Entity Recognition in Threat Intelligence (DNRTI). To this end, we collect more than 300 pieces of threat intelligence. The data in DNRTI is all annotated by experts in threat intelligence interpretation using 13 object categories. The fully annotated DNRTI contains 175220 words. To build a baseline for named entity recognition in the threat intelligence field, we evaluate some deep learning model on DNRTI. Experiments demonstrate that DNRTI well represents the key information in threat intelligence and are quite challenging.

命名实体识别是自然语言处理中的一个重要而富有挑战性的问题。尽管在过去的十年中，实体识别在许多领域都取得了重大进展，但这种成功在网络安全领域却进展缓慢，这不仅是因为网络安全领域的数据非常专业，而且还因为数据中的敏感信息。为了推进网络安全领域的命名实体识别研究，我们引入了一个大规模的威胁情报命名实体识别数据集(DNRTI)。为此，我们收集了300多条威胁情报。DNRTI中的数据全部由威胁情报解释专家使用13个对象类别进行注释。完整注释的DNRTI包含175220个单词。为了在威胁情报领域建立命名实体识别的基线，我们在DNRTI上评估了一些深度学习模型。实验表明，DNRTI很好地代表了威胁情报中的关键信息，具有一定的挑战性。

{"title":"DNRTI: A Large-scale Dataset for Named Entity Recognition in Threat Intelligence","authors":"Xuren Wang, Xinpei Liu, Shengqin Ao, Ning Li, Zhengwei Jiang, Zongyi Xu, Zihan Xiong, Mengbo Xiong, Xiaoqing Zhang","doi":"10.1109/TrustCom50675.2020.00252","DOIUrl":"https://doi.org/10.1109/TrustCom50675.2020.00252","url":null,"abstract":"Named entity recognition is an important and challenging problem in Natural language processing. Although the past decade has witnessed major advances in entity recognition in many fields, such successes have been slow to network security field, not only because of the data in the network security field is very professional, but also due to the sensitive information in the data. To advance named entity recognition research in network security field, we introduce a large-scale Dataset for Named Entity Recognition in Threat Intelligence (DNRTI). To this end, we collect more than 300 pieces of threat intelligence. The data in DNRTI is all annotated by experts in threat intelligence interpretation using 13 object categories. The fully annotated DNRTI contains 175220 words. To build a baseline for named entity recognition in the threat intelligence field, we evaluate some deep learning model on DNRTI. Experiments demonstrate that DNRTI well represents the key information in threat intelligence and are quite challenging.","PeriodicalId":221956,"journal":{"name":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132488293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

A Unified Host-based Intrusion Detection Framework using Spark in Cloud 基于Spark的统一主机入侵检测框架

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

Pub Date : 2020-12-01 DOI: 10.1109/TrustCom50675.2020.00026

Ming Liu, Zhi Xue, Xiangjian He

The host-based intrusion detection system (HIDS) is an essential research domain of cybersecurity. HIDS examines log data of hosts to identify intrusive behaviors. The detection efficiency is a significant factor of HIDS. Traditionally, HIDS is often installed with a standalone mode. Training detection engines with a large amount of data on a single physical computer with limited computing resources may be time-consuming. Therefore, this paper offers a unified HIDS framework based on Spark and deployed in the Google cloud. The framework includes a unified machine learning pipeline to implement scalable and efficient HIDS.

基于主机的入侵检测系统(HIDS)是网络安全研究的一个重要领域。HIDS通过检查主机的日志数据来识别入侵行为。检测效率是影响HIDS的重要因素。传统上，HIDS通常以独立模式安装。在计算资源有限的单个物理计算机上训练具有大量数据的检测引擎可能非常耗时。因此，本文提出了一个基于Spark并部署在Google云上的统一HIDS框架。该框架包括一个统一的机器学习管道，以实现可扩展和高效的HIDS。

引用次数: 2

CMIRGen: Automatic Signature Generation Algorithm for Malicious Network Traffic CMIRGen:恶意网络流量签名自动生成算法

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

Pub Date : 2020-12-01 DOI: 10.1109/TrustCom50675.2020.00101

Runzi Zhang, Mingkai Tong, Lei Chen, Jianxin Xue, Wenmao Liu, Feng Xie

Although machine learning (ML) based solutions are ever-evolving for the attack defending paradigm, signatures of malicious network traffic are vital resources for intrusion detection systems (IDSs) and network forensic procedure, covering the lack of interpretability and stability for ML models. However, signature extraction is still a time and labor consuming task nowadays, resulting in possible increase of the attackers' dwell time. Existing automatic solutions rely too much on sequence similarity based and heuristic based methods, encountering performance degradation in large scale and dynamic network environment. In this paper, we present a novel method, called Clustering and Model Inference-based Rule Generation (CMIRGen), automatically generating token-set based signature rules for malicious traffic payloads to be inspected. CMIRGen leverages both optimized sequence similarity based and black-box model inference based methods to extract patterns from homogeneous and heterogeneous payloads respectively. Experimental evaluations have been conducted on several datasets and show the CMIRGen framework can extract discriminative signatures, presenting high recall rate and low false positive rate at the same time for malicious content recognition.

尽管基于机器学习(ML)的解决方案在攻击防御范例中不断发展，但恶意网络流量的签名是入侵检测系统(ids)和网络取证程序的重要资源，覆盖了ML模型缺乏可解释性和稳定性。但是，目前签名提取仍然是一项费时费力的工作，可能会增加攻击者的停留时间。现有的自动解决方案过于依赖基于序列相似度和启发式的方法，在大规模和动态网络环境中存在性能下降的问题。在本文中，我们提出了一种新的方法，称为聚类和基于模型推理的规则生成(CMIRGen)，自动生成基于令牌集的签名规则来检测恶意流量有效负载。CMIRGen利用基于优化序列相似性和基于黑盒模型推理的方法分别从同质和异构有效载荷中提取模式。在多个数据集上进行了实验评估，结果表明CMIRGen框架能够提取出鉴别签名，在识别恶意内容时具有较高的召回率和较低的误报率。

{"title":"CMIRGen: Automatic Signature Generation Algorithm for Malicious Network Traffic","authors":"Runzi Zhang, Mingkai Tong, Lei Chen, Jianxin Xue, Wenmao Liu, Feng Xie","doi":"10.1109/TrustCom50675.2020.00101","DOIUrl":"https://doi.org/10.1109/TrustCom50675.2020.00101","url":null,"abstract":"Although machine learning (ML) based solutions are ever-evolving for the attack defending paradigm, signatures of malicious network traffic are vital resources for intrusion detection systems (IDSs) and network forensic procedure, covering the lack of interpretability and stability for ML models. However, signature extraction is still a time and labor consuming task nowadays, resulting in possible increase of the attackers' dwell time. Existing automatic solutions rely too much on sequence similarity based and heuristic based methods, encountering performance degradation in large scale and dynamic network environment. In this paper, we present a novel method, called Clustering and Model Inference-based Rule Generation (CMIRGen), automatically generating token-set based signature rules for malicious traffic payloads to be inspected. CMIRGen leverages both optimized sequence similarity based and black-box model inference based methods to extract patterns from homogeneous and heterogeneous payloads respectively. Experimental evaluations have been conducted on several datasets and show the CMIRGen framework can extract discriminative signatures, presenting high recall rate and low false positive rate at the same time for malicious content recognition.","PeriodicalId":221956,"journal":{"name":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123028409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

More efficient SM9 algorithm based on bilinear pair optimization processing 基于双线性对优化处理的更高效SM9算法

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

Pub Date : 2020-12-01 DOI: 10.1109/TrustCom50675.2020.00234

Xianze Liu, Jihong Liu, B. Jiang, Haozhen Jiang, Zhi Yang

Currently, SM9 algorithm has received more and more attention as a new cryptographic product. The SM9 algorithm encryption and decryption principle relies on the mapping relationship on the elliptic curve. Although this mapping relationship improves the security, it will slightly reduce the efficiency. The goal of this article is to improve the efficiency of the SM9 algorithm. Different from the traditional assembly line acceleration method, we decided to start with the basic operation of the algorithm itself. There is a bilinear pairing operation on the elliptic curve, which completes the point to point on the elliptic curve. The calculation complexity directly determines the SM9 algorithm. For this reason, we propose two new bilinear pair processing methods. The former uses the properties of isomorphic mapping to transfer the operations involved in the calculation of bilinear pairs from a large feature domain to a small feature domain, reducing the number of operations on the feature domain. The latter is for special operations in the bilinear pairing process, adding intermediate variables to convert them into low-time-consuming multiplication operations. According to the traditional Miller algorithm, the calculation of bilinear pairs requires 900 multiplication time units. Our solution can reduce this value to 700 and 800 multiplication time units respectively. In addition, the two algorithms have not changed the mapping relationship of the bilinear pair. On the premise of ensuring the correct mapping relationship, the efficiency of the SM9 algorithm is improved.

目前，SM9算法作为一种新的密码学产品受到越来越多的关注。SM9算法的加解密原理依赖于椭圆曲线上的映射关系。虽然这种映射关系提高了安全性，但它会略微降低效率。本文的目标是提高SM9算法的效率。与传统的装配线加速方法不同，我们决定从算法本身的基本操作开始。在椭圆曲线上进行双线性配对运算，完成椭圆曲线上的点对点运算。计算复杂度直接决定了SM9算法的优劣。为此，我们提出了两种新的双线性对处理方法。前者利用同构映射的性质，将计算双线性对所涉及的操作从大特征域转移到小特征域，减少了在特征域上的操作次数。后者用于双线性配对过程中的特殊操作，添加中间变量，将其转换为耗时较低的乘法运算。根据传统的Miller算法，双线性对的计算需要900个乘法单位。我们的解决方案可以将这个值分别减少到700和800个乘法时间单位。此外，两种算法都没有改变双线性对的映射关系。在保证映射关系正确的前提下，提高了SM9算法的效率。

{"title":"More efficient SM9 algorithm based on bilinear pair optimization processing","authors":"Xianze Liu, Jihong Liu, B. Jiang, Haozhen Jiang, Zhi Yang","doi":"10.1109/TrustCom50675.2020.00234","DOIUrl":"https://doi.org/10.1109/TrustCom50675.2020.00234","url":null,"abstract":"Currently, SM9 algorithm has received more and more attention as a new cryptographic product. The SM9 algorithm encryption and decryption principle relies on the mapping relationship on the elliptic curve. Although this mapping relationship improves the security, it will slightly reduce the efficiency. The goal of this article is to improve the efficiency of the SM9 algorithm. Different from the traditional assembly line acceleration method, we decided to start with the basic operation of the algorithm itself. There is a bilinear pairing operation on the elliptic curve, which completes the point to point on the elliptic curve. The calculation complexity directly determines the SM9 algorithm. For this reason, we propose two new bilinear pair processing methods. The former uses the properties of isomorphic mapping to transfer the operations involved in the calculation of bilinear pairs from a large feature domain to a small feature domain, reducing the number of operations on the feature domain. The latter is for special operations in the bilinear pairing process, adding intermediate variables to convert them into low-time-consuming multiplication operations. According to the traditional Miller algorithm, the calculation of bilinear pairs requires 900 multiplication time units. Our solution can reduce this value to 700 and 800 multiplication time units respectively. In addition, the two algorithms have not changed the mapping relationship of the bilinear pair. On the premise of ensuring the correct mapping relationship, the efficiency of the SM9 algorithm is improved.","PeriodicalId":221956,"journal":{"name":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128789067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0