为什么Android应用会被归类为恶意软件

ACM Transactions on Software Engineering and Methodology (TOSEM) Pub Date : 2021-03-10 DOI:10.1145/3423096

Bozhi Wu, Sen Chen, Cuiyun Gao, Lingling Fan, Yang Liu, Weiping Wen, Michael R. Lyu

{"title":"为什么Android应用会被归类为恶意软件","authors":"Bozhi Wu, Sen Chen, Cuiyun Gao, Lingling Fan, Yang Liu, Weiping Wen, Michael R. Lyu","doi":"10.1145/3423096","DOIUrl":null,"url":null,"abstract":"Machine learning–(ML) based approach is considered as one of the most promising techniques for Android malware detection and has achieved high accuracy by leveraging commonly used features. In practice, most of the ML classifications only provide a binary label to mobile users and app security analysts. However, stakeholders are more interested in the reason why apps are classified as malicious in both academia and industry. This belongs to the research area of interpretable ML but in a specific research domain (i.e., mobile malware detection). Although several interpretable ML methods have been exhibited to explain the final classification results in many cutting-edge Artificial Intelligent–based research fields, until now, there is no study interpreting why an app is classified as malware or unveiling the domain-specific challenges. In this article, to fill this gap, we propose a novel and interpretable ML-based approach (named XMal) to classify malware with high accuracy and explain the classification result meanwhile. (1) The first classification phase of XMal hinges multi-layer perceptron and attention mechanism and also pinpoints the key features most related to the classification result. (2) The second interpreting phase aims at automatically producing neural language descriptions to interpret the core malicious behaviors within apps. We evaluate the behavior description results by leveraging a human study and an in-depth quantitative analysis. Moreover, we further compare XMal with the existing interpretable ML-based methods (i.e., Drebin and LIME) to demonstrate the effectiveness of XMal. We find that XMal is able to reveal the malicious behaviors more accurately. Additionally, our experiments show that XMal can also interpret the reason why some samples are misclassified by ML classifiers. Our study peeks into the interpretable ML through the research of Android malware detection and analysis.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"140 1","pages":"1 - 29"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Why an Android App Is Classified as Malware\",\"authors\":\"Bozhi Wu, Sen Chen, Cuiyun Gao, Lingling Fan, Yang Liu, Weiping Wen, Michael R. Lyu\",\"doi\":\"10.1145/3423096\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning–(ML) based approach is considered as one of the most promising techniques for Android malware detection and has achieved high accuracy by leveraging commonly used features. In practice, most of the ML classifications only provide a binary label to mobile users and app security analysts. However, stakeholders are more interested in the reason why apps are classified as malicious in both academia and industry. This belongs to the research area of interpretable ML but in a specific research domain (i.e., mobile malware detection). Although several interpretable ML methods have been exhibited to explain the final classification results in many cutting-edge Artificial Intelligent–based research fields, until now, there is no study interpreting why an app is classified as malware or unveiling the domain-specific challenges. In this article, to fill this gap, we propose a novel and interpretable ML-based approach (named XMal) to classify malware with high accuracy and explain the classification result meanwhile. (1) The first classification phase of XMal hinges multi-layer perceptron and attention mechanism and also pinpoints the key features most related to the classification result. (2) The second interpreting phase aims at automatically producing neural language descriptions to interpret the core malicious behaviors within apps. We evaluate the behavior description results by leveraging a human study and an in-depth quantitative analysis. Moreover, we further compare XMal with the existing interpretable ML-based methods (i.e., Drebin and LIME) to demonstrate the effectiveness of XMal. We find that XMal is able to reveal the malicious behaviors more accurately. Additionally, our experiments show that XMal can also interpret the reason why some samples are misclassified by ML classifiers. Our study peeks into the interpretable ML through the research of Android malware detection and analysis.\",\"PeriodicalId\":7398,\"journal\":{\"name\":\"ACM Transactions on Software Engineering and Methodology (TOSEM)\",\"volume\":\"140 1\",\"pages\":\"1 - 29\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-03-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Software Engineering and Methodology (TOSEM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3423096\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology (TOSEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3423096","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

基于机器学习(ML)的方法被认为是最有前途的Android恶意软件检测技术之一，并且通过利用常用功能实现了高精度。在实践中，大多数ML分类只向移动用户和应用程序安全分析师提供二进制标签。然而，利益相关者更感兴趣的是为什么应用程序在学术界和工业界都被归类为恶意软件。这属于可解释机器学习的研究领域，但在一个特定的研究领域(即移动恶意软件检测)。尽管在许多基于人工智能的前沿研究领域，已经展示了几种可解释的ML方法来解释最终的分类结果，但到目前为止，还没有研究解释为什么一个应用程序被归类为恶意软件或揭示特定领域的挑战。在本文中，为了填补这一空白，我们提出了一种新颖的、可解释的基于ml的方法(称为XMal)来对恶意软件进行高精度分类，同时对分类结果进行解释。(1) xml的第一个分类阶段涉及多层感知器和注意机制，并确定了与分类结果最相关的关键特征。(2)第二阶段旨在自动生成神经语言描述，以解释应用内部的核心恶意行为。我们通过利用人类研究和深入的定量分析来评估行为描述结果。此外，我们进一步将XMal与现有的可解释的基于ml的方法(即Drebin和LIME)进行比较，以证明XMal的有效性。我们发现xml能够更准确地揭示恶意行为。此外，我们的实验表明，xml也可以解释为什么一些样本被ML分类器错误分类的原因。我们的研究通过Android恶意软件检测和分析的研究来窥视可解释的机器学习。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Why an Android App Is Classified as Malware

Machine learning–(ML) based approach is considered as one of the most promising techniques for Android malware detection and has achieved high accuracy by leveraging commonly used features. In practice, most of the ML classifications only provide a binary label to mobile users and app security analysts. However, stakeholders are more interested in the reason why apps are classified as malicious in both academia and industry. This belongs to the research area of interpretable ML but in a specific research domain (i.e., mobile malware detection). Although several interpretable ML methods have been exhibited to explain the final classification results in many cutting-edge Artificial Intelligent–based research fields, until now, there is no study interpreting why an app is classified as malware or unveiling the domain-specific challenges. In this article, to fill this gap, we propose a novel and interpretable ML-based approach (named XMal) to classify malware with high accuracy and explain the classification result meanwhile. (1) The first classification phase of XMal hinges multi-layer perceptron and attention mechanism and also pinpoints the key features most related to the classification result. (2) The second interpreting phase aims at automatically producing neural language descriptions to interpret the core malicious behaviors within apps. We evaluate the behavior description results by leveraging a human study and an in-depth quantitative analysis. Moreover, we further compare XMal with the existing interpretable ML-based methods (i.e., Drebin and LIME) to demonstrate the effectiveness of XMal. We find that XMal is able to reveal the malicious behaviors more accurately. Additionally, our experiments show that XMal can also interpret the reason why some samples are misclassified by ML classifiers. Our study peeks into the interpretable ML through the research of Android malware detection and analysis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Software Engineering and Methodology (TOSEM)

自引率

0.00%

发文量

期刊最新文献

Turnover of Companies in OpenStack: Prevalence and Rationale Super-optimization of Smart Contracts Verification of Programs Sensitive to Heap Layout Assessing and Improving an Evaluation Dataset for Detecting Semantic Code Clones via Deep Learning Guaranteeing Timed Opacity using Parametric Timed Model Checking