Android恶意软件检测中少数派过采样技术的实证评价

2021 28th Asia-Pacific Software Engineering Conference (APSEC) Pub Date : 2021-12-01 DOI:10.1109/APSEC53868.2021.00042

Lwin Khin Shar, T. Duong, D. Lo

{"title":"Android恶意软件检测中少数派过采样技术的实证评价","authors":"Lwin Khin Shar, T. Duong, D. Lo","doi":"10.1109/APSEC53868.2021.00042","DOIUrl":null,"url":null,"abstract":"In Android malware classification, the distribution of training data among classes is often imbalanced. This causes the learning algorithm to bias towards the dominant classes, resulting in mis-classification of minority classes. One effective way to improve the performance of classifiers is the synthetic generation of minority instances. One pioneer technique in this area is Synthetic Minority Oversampling Technique (SMOTE) and since its publication in 2002, several variants of SMOTE have been proposed and evaluated on various imbalanced datasets. However, these techniques have not been evaluated in the context of Android malware detection. Studies have shown that the performance of SMOTE and its variants can vary across different application domains. In this paper, we conduct a large scale empirical evaluation of SMOTE and its variants on six different datasets that reflect six types of features commonly used in Android malware detection. The datasets are extracted from a benchmark of 4,572 benign apps and 2,399 malicious Android apps, used in our previous study. Through extensive experiments, we set a new baseline in the field of Android malware detection, and provide guidance to practitioners on the application of different SMOTE variants to Android malware detection.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"312 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Empirical Evaluation of Minority Oversampling Techniques in the Context of Android Malware Detection\",\"authors\":\"Lwin Khin Shar, T. Duong, D. Lo\",\"doi\":\"10.1109/APSEC53868.2021.00042\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In Android malware classification, the distribution of training data among classes is often imbalanced. This causes the learning algorithm to bias towards the dominant classes, resulting in mis-classification of minority classes. One effective way to improve the performance of classifiers is the synthetic generation of minority instances. One pioneer technique in this area is Synthetic Minority Oversampling Technique (SMOTE) and since its publication in 2002, several variants of SMOTE have been proposed and evaluated on various imbalanced datasets. However, these techniques have not been evaluated in the context of Android malware detection. Studies have shown that the performance of SMOTE and its variants can vary across different application domains. In this paper, we conduct a large scale empirical evaluation of SMOTE and its variants on six different datasets that reflect six types of features commonly used in Android malware detection. The datasets are extracted from a benchmark of 4,572 benign apps and 2,399 malicious Android apps, used in our previous study. Through extensive experiments, we set a new baseline in the field of Android malware detection, and provide guidance to practitioners on the application of different SMOTE variants to Android malware detection.\",\"PeriodicalId\":143800,\"journal\":{\"name\":\"2021 28th Asia-Pacific Software Engineering Conference (APSEC)\",\"volume\":\"312 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 28th Asia-Pacific Software Engineering Conference (APSEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSEC53868.2021.00042\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSEC53868.2021.00042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在Android恶意软件分类中，训练数据在类之间的分布往往是不平衡的。这会导致学习算法偏向优势类，导致对少数类的错误分类。少数派实例的合成是提高分类器性能的一种有效方法。该领域的一个先驱技术是合成少数派过采样技术(SMOTE)，自2002年发表以来，已经提出了几种SMOTE的变体，并在各种不平衡数据集上进行了评估。然而，这些技术还没有在Android恶意软件检测的背景下进行评估。研究表明，SMOTE及其变体的性能可能在不同的应用领域中有所不同。在本文中，我们在六个不同的数据集上对SMOTE及其变体进行了大规模的实证评估，这些数据集反映了Android恶意软件检测中常用的六种特征。数据集是从我们之前的研究中使用的4572个良性应用和2399个恶意Android应用的基准中提取的。通过大量的实验，我们为Android恶意软件检测领域设定了新的基线，并为从业者提供了不同SMOTE变体在Android恶意软件检测中的应用指导。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Empirical Evaluation of Minority Oversampling Techniques in the Context of Android Malware Detection

In Android malware classification, the distribution of training data among classes is often imbalanced. This causes the learning algorithm to bias towards the dominant classes, resulting in mis-classification of minority classes. One effective way to improve the performance of classifiers is the synthetic generation of minority instances. One pioneer technique in this area is Synthetic Minority Oversampling Technique (SMOTE) and since its publication in 2002, several variants of SMOTE have been proposed and evaluated on various imbalanced datasets. However, these techniques have not been evaluated in the context of Android malware detection. Studies have shown that the performance of SMOTE and its variants can vary across different application domains. In this paper, we conduct a large scale empirical evaluation of SMOTE and its variants on six different datasets that reflect six types of features commonly used in Android malware detection. The datasets are extracted from a benchmark of 4,572 benign apps and 2,399 malicious Android apps, used in our previous study. Through extensive experiments, we set a new baseline in the field of Android malware detection, and provide guidance to practitioners on the application of different SMOTE variants to Android malware detection.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 28th Asia-Pacific Software Engineering Conference (APSEC)

自引率

0.00%

发文量

期刊最新文献

Verification Assisted Gas Reduction for Smart Contracts Effective Bug Triage Based on a Hybrid Neural Network Learn To Align: A Code Alignment Network For Code Clone Detection Framework for Recommending Data Residency Compliant Application Architecture Degree doesn't Matter: Identifying the Drivers of Interaction in Software Development Ecosystems