On the Effect of k Values and Distance Metrics in KNN Algorithm for Android Malware Detection

IF 0.5 Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Advances in Data Science and Adaptive Analysis Pub Date : 2021-09-24 DOI:10.1142/s2424922x21410011

Durmuş Özkan Şahin, S. Akleylek, E. Kılıç

{"title":"On the Effect of k Values and Distance Metrics in KNN Algorithm for Android Malware Detection","authors":"Durmuş Özkan Şahin, S. Akleylek, E. Kılıç","doi":"10.1142/s2424922x21410011","DOIUrl":null,"url":null,"abstract":"There is a remarkable increase in mobile device usage in recent years. The Android operating system is by far the most preferred open-source mobile operating system around the world. Besides, the Android operating system is preferred in many devices on the Internet of Things (IoT) devices are used in many areas of daily life. Smart cities, smart environment, health, home automation, agriculture, and livestock are some of the usage areas. Health is one of the most frequently used areas. Since the Android operating system is both the widely used operating system and open-source, the vast majority of malware released on the market is now designed for Android platforms. Therefore, devices using the Android operating system are under serious threat. In this study, a system that detects malware on Android operating systems based on machine learning is proposed. Besides, feature vectors are created with permissions that have an important place in the security of the Android operating system. Feature vectors created using the k-nearest neighbor algorithm (KNN), one of the machine learning techniques, are given as input to this algorithm, and a classification of malicious software and benign software is provided. In the KNN algorithm, the k value and the distance metric used to find the closest sample directly affect the classification performance. In addition, the study examining the parameters of the KNN algorithm in detail in permission-based studies is limited. For this reason, the performance of the malware detection system is presented comparatively using five different k values and five different distance metrics under different data sets. When the results are examined, it is observed that higher classification performances are obtained when values such as 1, 3 are given to k and metrics such as Euclidean and Minkowski are chosen instead of the Chebyshev distance metric.","PeriodicalId":47145,"journal":{"name":"Advances in Data Science and Adaptive Analysis","volume":"12 1","pages":"2141001:1-2141001:20"},"PeriodicalIF":0.5000,"publicationDate":"2021-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Data Science and Adaptive Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s2424922x21410011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

There is a remarkable increase in mobile device usage in recent years. The Android operating system is by far the most preferred open-source mobile operating system around the world. Besides, the Android operating system is preferred in many devices on the Internet of Things (IoT) devices are used in many areas of daily life. Smart cities, smart environment, health, home automation, agriculture, and livestock are some of the usage areas. Health is one of the most frequently used areas. Since the Android operating system is both the widely used operating system and open-source, the vast majority of malware released on the market is now designed for Android platforms. Therefore, devices using the Android operating system are under serious threat. In this study, a system that detects malware on Android operating systems based on machine learning is proposed. Besides, feature vectors are created with permissions that have an important place in the security of the Android operating system. Feature vectors created using the k-nearest neighbor algorithm (KNN), one of the machine learning techniques, are given as input to this algorithm, and a classification of malicious software and benign software is provided. In the KNN algorithm, the k value and the distance metric used to find the closest sample directly affect the classification performance. In addition, the study examining the parameters of the KNN algorithm in detail in permission-based studies is limited. For this reason, the performance of the malware detection system is presented comparatively using five different k values and five different distance metrics under different data sets. When the results are examined, it is observed that higher classification performances are obtained when values such as 1, 3 are given to k and metrics such as Euclidean and Minkowski are chosen instead of the Chebyshev distance metric.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

KNN算法中k值和距离度量对Android恶意软件检测的影响

近年来，移动设备的使用有了显著的增长。Android操作系统是目前世界上最受欢迎的开源移动操作系统。此外，在物联网(IoT)设备在日常生活的许多领域中使用，Android操作系统是许多设备的首选。智能城市、智能环境、健康、家庭自动化、农业和畜牧业是一些使用领域。健康是最常用的领域之一。由于Android操作系统既是广泛使用的操作系统，又是开源的，目前市场上发布的绝大多数恶意软件都是针对Android平台设计的。因此，使用Android操作系统的设备面临着严重的威胁。本研究提出了一种基于机器学习的Android操作系统恶意软件检测系统。此外，特征向量的创建权限在Android操作系统的安全性中占有重要地位。使用机器学习技术之一的k近邻算法(KNN)创建的特征向量作为该算法的输入，并提供了恶意软件和良性软件的分类。在KNN算法中，k值和用来寻找最近样本的距离度量直接影响分类性能。此外，在基于许可的研究中，详细检查KNN算法参数的研究是有限的。为此，比较了在不同数据集下，使用五种不同的k值和五种不同的距离度量对恶意软件检测系统性能的影响。当对结果进行检验时，可以观察到，当k赋值为1,3，并选择欧几里得和闵可夫斯基等度量而不是切比雪夫距离度量时，可以获得更高的分类性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Advances in Data Science and Adaptive Analysis MATHEMATICS, INTERDISCIPLINARY APPLICATIONS-

自引率

0.00%

发文量