Robust Malware identification via deep temporal convolutional network with symmetric cross entropy learning

IF 1.5 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING IET Software Pub Date : 2023-07-07 DOI:10.1049/sfw2.12137
Jiankun Sun, Xiong Luo, Weiping Wang, Yang Gao, Wenbing Zhao
{"title":"Robust Malware identification via deep temporal convolutional network with symmetric cross entropy learning","authors":"Jiankun Sun,&nbsp;Xiong Luo,&nbsp;Weiping Wang,&nbsp;Yang Gao,&nbsp;Wenbing Zhao","doi":"10.1049/sfw2.12137","DOIUrl":null,"url":null,"abstract":"<p>Recent developments in the field of Internet of things (IoT) have aroused growing attention to the security of smart devices. Specifically, there is an increasing number of malicious software (Malware) on IoT systems. Nowadays, researchers have made many efforts concerning supervised machine learning methods to identify malicious attacks. High-quality labels are of great importance for supervised machine learning, but noises widely exist due to the non-deterministic production environment. Therefore, learning from noisy labels is significant for machine learning-enabled Malware identification. In this study, motivated by the symmetric cross entropy with satisfactory noise robustness, the authors propose a robust Malware identification method using temporal convolutional network (TCN). Moreover, word embedding techniques are generally utilised to understand the contextual relationship between the input operation code (opcode) and application programming interface function names. Here, considering the numerous unlabelled samples in real-world intelligent environments, the authors pre-train the TCN model on an unlabelled set using a word embedding method, that is, Word2Vec. In the experiments, the proposed method is compared with several traditional statistical methods and more recent neural networks on a synthetic Malware dataset and a real-world dataset. The performance comparisons demonstrate the better performance and noise robustness of their proposed method, especially that the proposed method can yield the best identification accuracy of 98.75% in real-world scenarios.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"17 4","pages":"392-404"},"PeriodicalIF":1.5000,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2.12137","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Software","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/sfw2.12137","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Recent developments in the field of Internet of things (IoT) have aroused growing attention to the security of smart devices. Specifically, there is an increasing number of malicious software (Malware) on IoT systems. Nowadays, researchers have made many efforts concerning supervised machine learning methods to identify malicious attacks. High-quality labels are of great importance for supervised machine learning, but noises widely exist due to the non-deterministic production environment. Therefore, learning from noisy labels is significant for machine learning-enabled Malware identification. In this study, motivated by the symmetric cross entropy with satisfactory noise robustness, the authors propose a robust Malware identification method using temporal convolutional network (TCN). Moreover, word embedding techniques are generally utilised to understand the contextual relationship between the input operation code (opcode) and application programming interface function names. Here, considering the numerous unlabelled samples in real-world intelligent environments, the authors pre-train the TCN model on an unlabelled set using a word embedding method, that is, Word2Vec. In the experiments, the proposed method is compared with several traditional statistical methods and more recent neural networks on a synthetic Malware dataset and a real-world dataset. The performance comparisons demonstrate the better performance and noise robustness of their proposed method, especially that the proposed method can yield the best identification accuracy of 98.75% in real-world scenarios.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于对称交叉熵学习的深度时间卷积网络的恶意软件鲁棒识别
物联网(IoT)领域的最新发展引起了人们对智能设备安全性的日益关注。具体而言,物联网系统上的恶意软件(恶意软件)数量不断增加。如今,研究人员已经在有监督的机器学习方法方面做出了许多努力来识别恶意攻击。高质量的标签在有监督的机器学习中非常重要,但由于生产环境的不确定性,噪声广泛存在。因此,从噪声标签中学习对于启用机器学习的恶意软件识别具有重要意义。在本研究中,受具有令人满意的噪声鲁棒性的对称交叉熵的激励,作者提出了一种使用时间卷积网络(TCN)的鲁棒恶意软件识别方法。此外,单词嵌入技术通常用于理解输入操作码(操作码)和应用程序编程接口函数名之间的上下文关系。在这里,考虑到现实世界智能环境中的大量未标记样本,作者使用单词嵌入方法,即Word2Vec,在未标记集上预训练TCN模型。在实验中,将所提出的方法与几种传统的统计方法和最近的神经网络在合成恶意软件数据集和真实世界数据集上进行了比较。性能比较表明,他们提出的方法具有更好的性能和噪声鲁棒性,特别是在真实场景中,该方法可以产生98.75%的最佳识别精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IET Software
IET Software 工程技术-计算机:软件工程
CiteScore
4.20
自引率
0.00%
发文量
27
审稿时长
9 months
期刊介绍: IET Software publishes papers on all aspects of the software lifecycle, including design, development, implementation and maintenance. The focus of the journal is on the methods used to develop and maintain software, and their practical application. Authors are especially encouraged to submit papers on the following topics, although papers on all aspects of software engineering are welcome: Software and systems requirements engineering Formal methods, design methods, practice and experience Software architecture, aspect and object orientation, reuse and re-engineering Testing, verification and validation techniques Software dependability and measurement Human systems engineering and human-computer interaction Knowledge engineering; expert and knowledge-based systems, intelligent agents Information systems engineering Application of software engineering in industry and commerce Software engineering technology transfer Management of software development Theoretical aspects of software development Machine learning Big data and big code Cloud computing Current Special Issue. Call for papers: Knowledge Discovery for Software Development - https://digital-library.theiet.org/files/IET_SEN_CFP_KDSD.pdf Big Data Analytics for Sustainable Software Development - https://digital-library.theiet.org/files/IET_SEN_CFP_BDASSD.pdf
期刊最新文献
Software Defect Prediction Method Based on Clustering Ensemble Learning ConCPDP: A Cross-Project Defect Prediction Method Integrating Contrastive Pretraining and Category Boundary Adjustment Breaking the Blockchain Trilemma: A Comprehensive Consensus Mechanism for Ensuring Security, Scalability, and Decentralization IC-GraF: An Improved Clustering with Graph-Embedding-Based Features for Software Defect Prediction IAPCP: An Effective Cross-Project Defect Prediction Model via Intra-Domain Alignment and Programming-Based Distribution Adaptation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1