Le Duc Thuan, Pham Van Huong, H. Hiep, Nguyen Kim Khanh
{"title":"Feature selection based on popularity and value contrast for Android malware classification","authors":"Le Duc Thuan, Pham Van Huong, H. Hiep, Nguyen Kim Khanh","doi":"10.1109/KSE56063.2022.9953762","DOIUrl":null,"url":null,"abstract":"This study proposes a new approach for feature selection in the Android malware detection problem based on the popularity and contrast in a multi-target approach. The popularity of a feature is built on the frequency of each feature in the sample set. The contrast of features consists of two types: a contrast between malware and benign, and a contrast among malware classes. Obviously, the greater the contrast between classes of a feature, the higher the ability to classify based on this feature. There is a trade-off between the popularity and contrast of features, i.e., as popularity increases, contrast may decrease and vice versa. Therefore, to evaluate the global value of each feature, we use the global evaluation function (global measurement) according to the Pareto multi-objective approach. To evaluate the feature selection method, the selected feature is fed into a convolutional neural network (CNN) model, and test the model on a popular Android malware dataset, the AMD dataset. When we removed 1,000 features (500 permission features and 500 API features) accuracy decreased by 0.42%, and recall increased by 0.08%.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE56063.2022.9953762","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This study proposes a new approach for feature selection in the Android malware detection problem based on the popularity and contrast in a multi-target approach. The popularity of a feature is built on the frequency of each feature in the sample set. The contrast of features consists of two types: a contrast between malware and benign, and a contrast among malware classes. Obviously, the greater the contrast between classes of a feature, the higher the ability to classify based on this feature. There is a trade-off between the popularity and contrast of features, i.e., as popularity increases, contrast may decrease and vice versa. Therefore, to evaluate the global value of each feature, we use the global evaluation function (global measurement) according to the Pareto multi-objective approach. To evaluate the feature selection method, the selected feature is fed into a convolutional neural network (CNN) model, and test the model on a popular Android malware dataset, the AMD dataset. When we removed 1,000 features (500 permission features and 500 API features) accuracy decreased by 0.42%, and recall increased by 0.08%.