基于对称交叉熵学习的深度时间卷积网络的恶意软件鲁棒识别

IF 1.3 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING IET Software Pub Date : 2023-07-07 DOI:10.1049/sfw2.12137

Jiankun Sun, Xiong Luo, Weiping Wang, Yang Gao, Wenbing Zhao

{"title":"基于对称交叉熵学习的深度时间卷积网络的恶意软件鲁棒识别","authors":"Jiankun Sun, Xiong Luo, Weiping Wang, Yang Gao, Wenbing Zhao","doi":"10.1049/sfw2.12137","DOIUrl":null,"url":null,"abstract":"<p>Recent developments in the field of Internet of things (IoT) have aroused growing attention to the security of smart devices. Specifically, there is an increasing number of malicious software (Malware) on IoT systems. Nowadays, researchers have made many efforts concerning supervised machine learning methods to identify malicious attacks. High-quality labels are of great importance for supervised machine learning, but noises widely exist due to the non-deterministic production environment. Therefore, learning from noisy labels is significant for machine learning-enabled Malware identification. In this study, motivated by the symmetric cross entropy with satisfactory noise robustness, the authors propose a robust Malware identification method using temporal convolutional network (TCN). Moreover, word embedding techniques are generally utilised to understand the contextual relationship between the input operation code (opcode) and application programming interface function names. Here, considering the numerous unlabelled samples in real-world intelligent environments, the authors pre-train the TCN model on an unlabelled set using a word embedding method, that is, Word2Vec. In the experiments, the proposed method is compared with several traditional statistical methods and more recent neural networks on a synthetic Malware dataset and a real-world dataset. The performance comparisons demonstrate the better performance and noise robustness of their proposed method, especially that the proposed method can yield the best identification accuracy of 98.75% in real-world scenarios.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"17 4","pages":"392-404"},"PeriodicalIF":1.3000,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2.12137","citationCount":"0","resultStr":"{\"title\":\"Robust Malware identification via deep temporal convolutional network with symmetric cross entropy learning\",\"authors\":\"Jiankun Sun, Xiong Luo, Weiping Wang, Yang Gao, Wenbing Zhao\",\"doi\":\"10.1049/sfw2.12137\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Recent developments in the field of Internet of things (IoT) have aroused growing attention to the security of smart devices. Specifically, there is an increasing number of malicious software (Malware) on IoT systems. Nowadays, researchers have made many efforts concerning supervised machine learning methods to identify malicious attacks. High-quality labels are of great importance for supervised machine learning, but noises widely exist due to the non-deterministic production environment. Therefore, learning from noisy labels is significant for machine learning-enabled Malware identification. In this study, motivated by the symmetric cross entropy with satisfactory noise robustness, the authors propose a robust Malware identification method using temporal convolutional network (TCN). Moreover, word embedding techniques are generally utilised to understand the contextual relationship between the input operation code (opcode) and application programming interface function names. Here, considering the numerous unlabelled samples in real-world intelligent environments, the authors pre-train the TCN model on an unlabelled set using a word embedding method, that is, Word2Vec. In the experiments, the proposed method is compared with several traditional statistical methods and more recent neural networks on a synthetic Malware dataset and a real-world dataset. The performance comparisons demonstrate the better performance and noise robustness of their proposed method, especially that the proposed method can yield the best identification accuracy of 98.75% in real-world scenarios.</p>\",\"PeriodicalId\":50378,\"journal\":{\"name\":\"IET Software\",\"volume\":\"17 4\",\"pages\":\"392-404\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2023-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2.12137\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IET Software\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/sfw2.12137\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Software","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/sfw2.12137","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

物联网（IoT）领域的最新发展引起了人们对智能设备安全性的日益关注。具体而言，物联网系统上的恶意软件（恶意软件）数量不断增加。如今，研究人员已经在有监督的机器学习方法方面做出了许多努力来识别恶意攻击。高质量的标签在有监督的机器学习中非常重要，但由于生产环境的不确定性，噪声广泛存在。因此，从噪声标签中学习对于启用机器学习的恶意软件识别具有重要意义。在本研究中，受具有令人满意的噪声鲁棒性的对称交叉熵的激励，作者提出了一种使用时间卷积网络（TCN）的鲁棒恶意软件识别方法。此外，单词嵌入技术通常用于理解输入操作码（操作码）和应用程序编程接口函数名之间的上下文关系。在这里，考虑到现实世界智能环境中的大量未标记样本，作者使用单词嵌入方法，即Word2Vec，在未标记集上预训练TCN模型。在实验中，将所提出的方法与几种传统的统计方法和最近的神经网络在合成恶意软件数据集和真实世界数据集上进行了比较。性能比较表明，他们提出的方法具有更好的性能和噪声鲁棒性，特别是在真实场景中，该方法可以产生98.75%的最佳识别精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Robust Malware identification via deep temporal convolutional network with symmetric cross entropy learning

Recent developments in the field of Internet of things (IoT) have aroused growing attention to the security of smart devices. Specifically, there is an increasing number of malicious software (Malware) on IoT systems. Nowadays, researchers have made many efforts concerning supervised machine learning methods to identify malicious attacks. High-quality labels are of great importance for supervised machine learning, but noises widely exist due to the non-deterministic production environment. Therefore, learning from noisy labels is significant for machine learning-enabled Malware identification. In this study, motivated by the symmetric cross entropy with satisfactory noise robustness, the authors propose a robust Malware identification method using temporal convolutional network (TCN). Moreover, word embedding techniques are generally utilised to understand the contextual relationship between the input operation code (opcode) and application programming interface function names. Here, considering the numerous unlabelled samples in real-world intelligent environments, the authors pre-train the TCN model on an unlabelled set using a word embedding method, that is, Word2Vec. In the experiments, the proposed method is compared with several traditional statistical methods and more recent neural networks on a synthetic Malware dataset and a real-world dataset. The performance comparisons demonstrate the better performance and noise robustness of their proposed method, especially that the proposed method can yield the best identification accuracy of 98.75% in real-world scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IET Software 工程技术-计算机：软件工程

CiteScore

4.20

自引率

0.00%

发文量

审稿时长

9 months

期刊介绍： IET Software publishes papers on all aspects of the software lifecycle, including design, development, implementation and maintenance. The focus of the journal is on the methods used to develop and maintain software, and their practical application. Authors are especially encouraged to submit papers on the following topics, although papers on all aspects of software engineering are welcome: Software and systems requirements engineering Formal methods, design methods, practice and experience Software architecture, aspect and object orientation, reuse and re-engineering Testing, verification and validation techniques Software dependability and measurement Human systems engineering and human-computer interaction Knowledge engineering; expert and knowledge-based systems, intelligent agents Information systems engineering Application of software engineering in industry and commerce Software engineering technology transfer Management of software development Theoretical aspects of software development Machine learning Big data and big code Cloud computing Current Special Issue. Call for papers: Knowledge Discovery for Software Development - https://digital-library.theiet.org/files/IET_SEN_CFP_KDSD.pdf Big Data Analytics for Sustainable Software Development - https://digital-library.theiet.org/files/IET_SEN_CFP_BDASSD.pdf