On the Effect of k Values and Distance Metrics in KNN Algorithm for Android Malware Detection

IF 0.5 Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Advances in Data Science and Adaptive Analysis Pub Date : 2021-09-24 DOI:10.1142/s2424922x21410011
Durmuş Özkan Şahin, S. Akleylek, E. Kılıç
{"title":"On the Effect of k Values and Distance Metrics in KNN Algorithm for Android Malware Detection","authors":"Durmuş Özkan Şahin, S. Akleylek, E. Kılıç","doi":"10.1142/s2424922x21410011","DOIUrl":null,"url":null,"abstract":"There is a remarkable increase in mobile device usage in recent years. The Android operating system is by far the most preferred open-source mobile operating system around the world. Besides, the Android operating system is preferred in many devices on the Internet of Things (IoT) devices are used in many areas of daily life. Smart cities, smart environment, health, home automation, agriculture, and livestock are some of the usage areas. Health is one of the most frequently used areas. Since the Android operating system is both the widely used operating system and open-source, the vast majority of malware released on the market is now designed for Android platforms. Therefore, devices using the Android operating system are under serious threat. In this study, a system that detects malware on Android operating systems based on machine learning is proposed. Besides, feature vectors are created with permissions that have an important place in the security of the Android operating system. Feature vectors created using the k-nearest neighbor algorithm (KNN), one of the machine learning techniques, are given as input to this algorithm, and a classification of malicious software and benign software is provided. In the KNN algorithm, the k value and the distance metric used to find the closest sample directly affect the classification performance. In addition, the study examining the parameters of the KNN algorithm in detail in permission-based studies is limited. For this reason, the performance of the malware detection system is presented comparatively using five different k values and five different distance metrics under different data sets. When the results are examined, it is observed that higher classification performances are obtained when values such as 1, 3 are given to k and metrics such as Euclidean and Minkowski are chosen instead of the Chebyshev distance metric.","PeriodicalId":47145,"journal":{"name":"Advances in Data Science and Adaptive Analysis","volume":"12 1","pages":"2141001:1-2141001:20"},"PeriodicalIF":0.5000,"publicationDate":"2021-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Data Science and Adaptive Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s2424922x21410011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

There is a remarkable increase in mobile device usage in recent years. The Android operating system is by far the most preferred open-source mobile operating system around the world. Besides, the Android operating system is preferred in many devices on the Internet of Things (IoT) devices are used in many areas of daily life. Smart cities, smart environment, health, home automation, agriculture, and livestock are some of the usage areas. Health is one of the most frequently used areas. Since the Android operating system is both the widely used operating system and open-source, the vast majority of malware released on the market is now designed for Android platforms. Therefore, devices using the Android operating system are under serious threat. In this study, a system that detects malware on Android operating systems based on machine learning is proposed. Besides, feature vectors are created with permissions that have an important place in the security of the Android operating system. Feature vectors created using the k-nearest neighbor algorithm (KNN), one of the machine learning techniques, are given as input to this algorithm, and a classification of malicious software and benign software is provided. In the KNN algorithm, the k value and the distance metric used to find the closest sample directly affect the classification performance. In addition, the study examining the parameters of the KNN algorithm in detail in permission-based studies is limited. For this reason, the performance of the malware detection system is presented comparatively using five different k values and five different distance metrics under different data sets. When the results are examined, it is observed that higher classification performances are obtained when values such as 1, 3 are given to k and metrics such as Euclidean and Minkowski are chosen instead of the Chebyshev distance metric.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
KNN算法中k值和距离度量对Android恶意软件检测的影响
近年来,移动设备的使用有了显著的增长。Android操作系统是目前世界上最受欢迎的开源移动操作系统。此外,在物联网(IoT)设备在日常生活的许多领域中使用,Android操作系统是许多设备的首选。智能城市、智能环境、健康、家庭自动化、农业和畜牧业是一些使用领域。健康是最常用的领域之一。由于Android操作系统既是广泛使用的操作系统,又是开源的,目前市场上发布的绝大多数恶意软件都是针对Android平台设计的。因此,使用Android操作系统的设备面临着严重的威胁。本研究提出了一种基于机器学习的Android操作系统恶意软件检测系统。此外,特征向量的创建权限在Android操作系统的安全性中占有重要地位。使用机器学习技术之一的k近邻算法(KNN)创建的特征向量作为该算法的输入,并提供了恶意软件和良性软件的分类。在KNN算法中,k值和用来寻找最近样本的距离度量直接影响分类性能。此外,在基于许可的研究中,详细检查KNN算法参数的研究是有限的。为此,比较了在不同数据集下,使用五种不同的k值和五种不同的距离度量对恶意软件检测系统性能的影响。当对结果进行检验时,可以观察到,当k赋值为1,3,并选择欧几里得和闵可夫斯基等度量而不是切比雪夫距离度量时,可以获得更高的分类性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Advances in Data Science and Adaptive Analysis
Advances in Data Science and Adaptive Analysis MATHEMATICS, INTERDISCIPLINARY APPLICATIONS-
自引率
0.00%
发文量
13
期刊最新文献
Assessment Of Mars Analog Habitation Plans Using Network Analysis Methodologies A Novel Genetic-Inspired Binary Firefly Algorithm for Feature Selection in the Prediction of Cervical Cancer Big Data Analytics for Predictive System Maintenance Using Machine Learning Models Data Mining for Estimating the Impact of Physical Activity Levels on the Health-Related Well-Being A Novel Autoencoder Deep Architecture for Detecting the Outlier in Heterogeneous Data Sets
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1