Extracting the Representative API Call Patterns of Malware Families Using Recurrent Neural Network

Proceedings of the International Conference on Research in Adaptive and Convergent Systems Pub Date : 2017-09-20 DOI:10.1145/3129676.3129712

Iltaek Kwon, E. Im

{"title":"Extracting the Representative API Call Patterns of Malware Families Using Recurrent Neural Network","authors":"Iltaek Kwon, E. Im","doi":"10.1145/3129676.3129712","DOIUrl":null,"url":null,"abstract":"With thousands of malware samples pouring out every day, how can we reduce malware analysis time and detect them effectively? Malware family classification provides one of good measures to predict characteristics of unknown malware since malware belonging to the same family can have similar features. Static analysis and dynamic analysis are techniques to obtain features to be used for classifying malware samples to their families. Static analysis performs analysis based on specific signatures included in the malware. Static analysis has the advantages that the scope of the analysis covers the entire code, and the analysis can be performed without executing the malware. However, it is very difficult to detect or classify malware variants with only the results of the static analysis, because malware developers use polymorphic or encryption techniques to avoid static analysis-based detection of anti-virus software. Dynamic analysis analyzes malware behaviors, so the results of dynamic analysis can be used to detect or classify malware variants. One of dynamic features that can be used to detect or classify malware variants is API call sequences. In this paper, we propose a novel method to extract representative API call patterns of malware families using Recurrent Neural Network (RNN). We conducted experiments with 787 malware samples belonging to 9 families. In our experiments, we extracted representative API call patterns of 9 malware families on 551 samples as a training set and performed classification on the 236 samples as a test set. Classification accuracy results using API call patterns extracted from RNN were measured as 71% on average. The results show the feasibility of our approach using RNN to extract representative API call pattern of malware families for malware family classification.","PeriodicalId":326100,"journal":{"name":"Proceedings of the International Conference on Research in Adaptive and Convergent Systems","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on Research in Adaptive and Convergent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3129676.3129712","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

Abstract

With thousands of malware samples pouring out every day, how can we reduce malware analysis time and detect them effectively? Malware family classification provides one of good measures to predict characteristics of unknown malware since malware belonging to the same family can have similar features. Static analysis and dynamic analysis are techniques to obtain features to be used for classifying malware samples to their families. Static analysis performs analysis based on specific signatures included in the malware. Static analysis has the advantages that the scope of the analysis covers the entire code, and the analysis can be performed without executing the malware. However, it is very difficult to detect or classify malware variants with only the results of the static analysis, because malware developers use polymorphic or encryption techniques to avoid static analysis-based detection of anti-virus software. Dynamic analysis analyzes malware behaviors, so the results of dynamic analysis can be used to detect or classify malware variants. One of dynamic features that can be used to detect or classify malware variants is API call sequences. In this paper, we propose a novel method to extract representative API call patterns of malware families using Recurrent Neural Network (RNN). We conducted experiments with 787 malware samples belonging to 9 families. In our experiments, we extracted representative API call patterns of 9 malware families on 551 samples as a training set and performed classification on the 236 samples as a test set. Classification accuracy results using API call patterns extracted from RNN were measured as 71% on average. The results show the feasibility of our approach using RNN to extract representative API call pattern of malware families for malware family classification.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于递归神经网络的恶意软件家族API调用模式提取

每天都有成千上万的恶意软件样本流出，我们如何减少恶意软件的分析时间并有效地检测它们?恶意软件家族分类是预测未知恶意软件特征的一种很好的方法，因为属于同一家族的恶意软件可能具有相似的特征。静态分析和动态分析是获取特征的技术，用于对恶意软件样本进行分类。静态分析基于恶意软件中包含的特定签名进行分析。静态分析的优点是分析的范围覆盖了整个代码，并且可以在不执行恶意软件的情况下执行分析。然而，仅用静态分析的结果来检测或分类恶意软件变体是非常困难的，因为恶意软件开发人员使用多态或加密技术来避免基于静态分析的反病毒软件检测。动态分析是对恶意软件的行为进行分析，可以利用动态分析的结果对恶意软件变种进行检测或分类。可用于检测或分类恶意软件变体的动态特性之一是API调用序列。本文提出了一种利用递归神经网络(RNN)提取具有代表性的恶意软件API调用模式的新方法。我们对9个家族的787个恶意软件样本进行了实验。在我们的实验中，我们在551个样本上提取了9个恶意软件家族的代表性API调用模式作为训练集，并在236个样本上进行分类作为测试集。使用从RNN中提取的API调用模式的分类精度结果平均为71%。结果表明，利用RNN提取具有代表性的恶意软件API调用模式进行恶意软件分类的方法是可行的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the International Conference on Research in Adaptive and Convergent Systems

自引率

0.00%

发文量

期刊最新文献

An Extrinsic Depth Camera Calibration Method for Narrow Field of View Color Camera Motion Mode Recognition for Traffic Safety in Campus Guiding Application Failure Prediction by Utilizing Log Analysis: A Systematic Mapping Study PerfNet Road Surface Profiling based on Artificial-Neural Networks