Family Classification based on Tree Representations for Malware

Yang Xu, Zhuotai Chen
{"title":"Family Classification based on Tree Representations for Malware","authors":"Yang Xu, Zhuotai Chen","doi":"10.1145/3609510.3609818","DOIUrl":null,"url":null,"abstract":"Malware classification is helpful for malware detection and analysis. Family classification of malware is a multi-classification task. Many studies have exploited API call sequences as malware features. However, API call sequences do not explicitly express the information about control structures between API calls, which may be useful to represent malware behavior features more accurately. In this paper, we propose a novel malware familial classification method. We model each malware as a Behavioral Tree from API call sequence obtained from dynamic analysis, which describes the control structure between the API calls. To reduce the computational complexity, we capture a set of binary relations, called as Heighted Behavior Relations, from the behavior tree as behavior features of malware. The TF-IDF technology is used to calculate the family behavior features from the behavior features of malware. Then the similarity vector of each malware is constructed based on the similarity between it and all the families. For family classification purpose, the similarity vectors of malware are fed into Naive Bayes algorithm to train a classifier. The experiments on dataset with 10620 malware samples from 43 malware families show that the classification accuracy of our approach is 10% higher than that of the classical methods based on API call sequences.","PeriodicalId":149629,"journal":{"name":"Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems","volume":"129 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3609510.3609818","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Malware classification is helpful for malware detection and analysis. Family classification of malware is a multi-classification task. Many studies have exploited API call sequences as malware features. However, API call sequences do not explicitly express the information about control structures between API calls, which may be useful to represent malware behavior features more accurately. In this paper, we propose a novel malware familial classification method. We model each malware as a Behavioral Tree from API call sequence obtained from dynamic analysis, which describes the control structure between the API calls. To reduce the computational complexity, we capture a set of binary relations, called as Heighted Behavior Relations, from the behavior tree as behavior features of malware. The TF-IDF technology is used to calculate the family behavior features from the behavior features of malware. Then the similarity vector of each malware is constructed based on the similarity between it and all the families. For family classification purpose, the similarity vectors of malware are fed into Naive Bayes algorithm to train a classifier. The experiments on dataset with 10620 malware samples from 43 malware families show that the classification accuracy of our approach is 10% higher than that of the classical methods based on API call sequences.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于树表示的恶意软件族分类
恶意软件分类有助于恶意软件的检测和分析。恶意软件族分类是一项多分类任务。许多研究利用API调用序列作为恶意软件的特征。然而,API调用序列并没有显式地表达API调用之间的控制结构信息,这可能有助于更准确地表示恶意软件的行为特征。本文提出了一种新的恶意软件家族分类方法。我们将每个恶意软件建模为动态分析得到的API调用序列的行为树,它描述了API调用之间的控制结构。为了降低计算复杂度,我们从行为树中捕获一组称为高度行为关系的二进制关系作为恶意软件的行为特征。利用TF-IDF技术从恶意软件的行为特征中计算出家族行为特征。然后根据每个恶意软件与所有家族的相似度构造其相似度向量。为了进行家族分类,将恶意软件的相似向量输入朴素贝叶斯算法训练分类器。在43个恶意软件家族的10620个样本数据集上进行的实验表明,该方法的分类准确率比基于API调用序列的经典方法提高了10%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Improving Throughput-oriented Generative Inference with CPUs First steps in verifying the seL4 Core Platform Family Classification based on Tree Representations for Malware ZapRAID: Toward High-Performance RAID for ZNS SSDs via Zone Append Quantifying the Security Profile of Linux Applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1