Behavior-based Malware analysis using profile hidden Markov models

S. Ravi, N. Balakrishnan, Bharath Venkatesh
{"title":"Behavior-based Malware analysis using profile hidden Markov models","authors":"S. Ravi, N. Balakrishnan, Bharath Venkatesh","doi":"10.5220/0004528201950206","DOIUrl":null,"url":null,"abstract":"In the area of malware analysis, static binary analysis techniques are becoming increasingly difficult with the code obfuscation methods and code packing employed when writing the malware. The behavior-based analysis techniques are being used in large malware analysis systems because of this reason. In these dynamic analysis systems, the malware samples are executed and monitored in a controlled environment using tools such as CWSandbox(Willems et al., 2007). In previous works, a number of clustering and classification techniques from machine learning and data mining have been used to classify the malwares into families and to identify even new malware families, from the behavior reports. In our work, we propose to use the Profile Hidden Markov Model to classify the malware files into families or groups based on their behavior on the host system. PHMM has been used extensively in the area of bioinformatics to search for similar protein and DNA sequences in a large database. We see that using this particular model will help us overcome the hurdle posed by polymorphism that is common in malware today. We show that the classification accuracy is high and comparable with the state-of-art-methods, even when using very few training samples for building models. The experiments were on a dataset with 24 families initially, and later using a larger dataset with close to 400 different families of malware. A fast clustering method to group malware with similar behaviour following the scoring on the PHMM profile database was used for the large dataset. We have presented the challenges in the evaluation methods and metrics of clustering on large number of malware files and show the effectiveness of using profile hidden model models for known malware families.","PeriodicalId":174026,"journal":{"name":"2013 International Conference on Security and Cryptography (SECRYPT)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Security and Cryptography (SECRYPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0004528201950206","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

In the area of malware analysis, static binary analysis techniques are becoming increasingly difficult with the code obfuscation methods and code packing employed when writing the malware. The behavior-based analysis techniques are being used in large malware analysis systems because of this reason. In these dynamic analysis systems, the malware samples are executed and monitored in a controlled environment using tools such as CWSandbox(Willems et al., 2007). In previous works, a number of clustering and classification techniques from machine learning and data mining have been used to classify the malwares into families and to identify even new malware families, from the behavior reports. In our work, we propose to use the Profile Hidden Markov Model to classify the malware files into families or groups based on their behavior on the host system. PHMM has been used extensively in the area of bioinformatics to search for similar protein and DNA sequences in a large database. We see that using this particular model will help us overcome the hurdle posed by polymorphism that is common in malware today. We show that the classification accuracy is high and comparable with the state-of-art-methods, even when using very few training samples for building models. The experiments were on a dataset with 24 families initially, and later using a larger dataset with close to 400 different families of malware. A fast clustering method to group malware with similar behaviour following the scoring on the PHMM profile database was used for the large dataset. We have presented the challenges in the evaluation methods and metrics of clustering on large number of malware files and show the effectiveness of using profile hidden model models for known malware families.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于行为的恶意软件分析使用配置文件隐马尔可夫模型
在恶意软件分析领域,由于编写恶意软件时使用的代码混淆方法和代码打包,静态二进制分析技术变得越来越困难。由于这个原因,基于行为的分析技术正在大型恶意软件分析系统中使用。在这些动态分析系统中,恶意软件样本使用CWSandbox等工具在受控环境中执行和监控(Willems等,2007)。在以前的工作中,许多来自机器学习和数据挖掘的聚类和分类技术已经被用于将恶意软件分类,甚至从行为报告中识别新的恶意软件家族。在我们的工作中,我们建议使用配置文件隐马尔可夫模型根据恶意软件文件在主机系统上的行为将其分类为家族或组。PHMM已广泛应用于生物信息学领域,用于在大型数据库中搜索相似的蛋白质和DNA序列。我们看到,使用这个特殊的模型将帮助我们克服目前恶意软件中常见的多态性所带来的障碍。我们表明,即使使用很少的训练样本来构建模型,分类精度也很高,可以与最先进的方法相媲美。实验最初是在一个包含24个家族的数据集上进行的,后来使用了一个包含近400个不同恶意软件家族的更大数据集。针对大型数据集,采用基于PHMM配置文件数据库评分的快速聚类方法对具有相似行为的恶意软件进行分组。我们提出了在大量恶意软件文件的聚类评估方法和度量方面的挑战,并展示了对已知恶意软件家族使用配置文件隐藏模型模型的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Improving block cipher design by rearranging internal operations Adaptive resource management for balancing availability and performance in cloud computing Instance-based anomaly method for Android malware detection Are biometric web services a reality? A best practice analysis for telebiometric deployment in open networks Preimage attack on BioHashing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1