Machine learning-based dynamic analysis of Android apps with improved code coverage

IF 2.5 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS EURASIP Journal on Information Security Pub Date : 2019-04-29 DOI:10.1186/s13635-019-0087-1

Suleiman Y. Yerima, Mohammed K. Alzaylaee, Sakir Sezer

{"title":"Machine learning-based dynamic analysis of Android apps with improved code coverage","authors":"Suleiman Y. Yerima, Mohammed K. Alzaylaee, Sakir Sezer","doi":"10.1186/s13635-019-0087-1","DOIUrl":null,"url":null,"abstract":"This paper investigates the impact of code coverage on machine learning-based dynamic analysis of Android malware. In order to maximize the code coverage, dynamic analysis on Android typically requires the generation of events to trigger the user interface and maximize the discovery of the run-time behavioral features. The commonly used event generation approach in most existing Android dynamic analysis systems is the random-based approach implemented with the Monkey tool that comes with the Android SDK. Monkey is utilized in popular dynamic analysis platforms like AASandbox, vetDroid, MobileSandbox, TraceDroid, Andrubis, ANANAS, DynaLog, and HADM. In this paper, we propose and investigate approaches based on stateful event generation and compare their code coverage capabilities with the state-of-the-practice random-based Monkey approach. The two proposed approaches are the state-based method (implemented with DroidBot) and a hybrid approach that combines the state-based and random-based methods. We compare the three different input generation methods on real devices, in terms of their ability to log dynamic behavior features and the impact on various machine learning algorithms that utilize the behavioral features for malware detection. Experiments performed using 17,444 applications show that overall, the proposed methods provide much better code coverage which in turn leads to more accurate machine learning-based malware detection compared to the state-of- the- art approach.","PeriodicalId":46070,"journal":{"name":"EURASIP Journal on Information Security","volume":"218 1","pages":"1-24"},"PeriodicalIF":2.5000,"publicationDate":"2019-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EURASIP Journal on Information Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s13635-019-0087-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 22

Abstract

This paper investigates the impact of code coverage on machine learning-based dynamic analysis of Android malware. In order to maximize the code coverage, dynamic analysis on Android typically requires the generation of events to trigger the user interface and maximize the discovery of the run-time behavioral features. The commonly used event generation approach in most existing Android dynamic analysis systems is the random-based approach implemented with the Monkey tool that comes with the Android SDK. Monkey is utilized in popular dynamic analysis platforms like AASandbox, vetDroid, MobileSandbox, TraceDroid, Andrubis, ANANAS, DynaLog, and HADM. In this paper, we propose and investigate approaches based on stateful event generation and compare their code coverage capabilities with the state-of-the-practice random-based Monkey approach. The two proposed approaches are the state-based method (implemented with DroidBot) and a hybrid approach that combines the state-based and random-based methods. We compare the three different input generation methods on real devices, in terms of their ability to log dynamic behavior features and the impact on various machine learning algorithms that utilize the behavioral features for malware detection. Experiments performed using 17,444 applications show that overall, the proposed methods provide much better code coverage which in turn leads to more accurate machine learning-based malware detection compared to the state-of- the- art approach.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于机器学习的Android应用动态分析，提高代码覆盖率

本文研究了代码覆盖率对基于机器学习的Android恶意软件动态分析的影响。为了最大化代码覆盖率，Android上的动态分析通常需要生成事件来触发用户界面，并最大化地发现运行时行为特征。在大多数现有的Android动态分析系统中，常用的事件生成方法是使用Android SDK附带的Monkey工具实现的基于随机的方法。Monkey被用于流行的动态分析平台，如AASandbox, vetDroid, MobileSandbox, TraceDroid, Andrubis, ANANAS, DynaLog和HADM。在本文中，我们提出并研究了基于有状态事件生成的方法，并将其代码覆盖能力与基于随机的Monkey方法进行了比较。提出的两种方法是基于状态的方法(由DroidBot实现)和结合基于状态和基于随机的方法的混合方法。我们在真实设备上比较了三种不同的输入生成方法，包括它们记录动态行为特征的能力，以及对利用行为特征进行恶意软件检测的各种机器学习算法的影响。使用17,444个应用程序进行的实验表明，总的来说，所提出的方法提供了更好的代码覆盖率，这反过来又导致了更准确的基于机器学习的恶意软件检测，而不是最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

EURASIP Journal on Information Security COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

8.80

自引率

0.00%

发文量

审稿时长

13 weeks

期刊介绍： The overall goal of the EURASIP Journal on Information Security, sponsored by the European Association for Signal Processing (EURASIP), is to bring together researchers and practitioners dealing with the general field of information security, with a particular emphasis on the use of signal processing tools in adversarial environments. As such, it addresses all works whereby security is achieved through a combination of techniques from cryptography, computer security, machine learning and multimedia signal processing. Application domains lie, for example, in secure storage, retrieval and tracking of multimedia data, secure outsourcing of computations, forgery detection of multimedia data, or secure use of biometrics. The journal also welcomes survey papers that give the reader a gentle introduction to one of the topics covered as well as papers that report large-scale experimental evaluations of existing techniques. Pure cryptographic papers are outside the scope of the journal. Topics relevant to the journal include, but are not limited to: • Multimedia security primitives (such digital watermarking, perceptual hashing, multimedia authentictaion) • Steganography and Steganalysis • Fingerprinting and traitor tracing • Joint signal processing and encryption, signal processing in the encrypted domain, applied cryptography • Biometrics (fusion, multimodal biometrics, protocols, security issues) • Digital forensics • Multimedia signal processing approaches tailored towards adversarial environments • Machine learning in adversarial environments • Digital Rights Management • Network security (such as physical layer security, intrusion detection) • Hardware security, Physical Unclonable Functions • Privacy-Enhancing Technologies for multimedia data • Private data analysis, security in outsourced computations, cloud privacy