机器学习恶意软件检测中特征处理的自适应Android apk逆向工程

IF 3.4 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal of Data Science and Analytics Pub Date : 2023-05-20 DOI:10.18517/ijods.4.1.10-25.2023
B. A. Gyunka, Aro Taye Oladele, Ojeniyi Adegoke
{"title":"机器学习恶意软件检测中特征处理的自适应Android apk逆向工程","authors":"B. A. Gyunka, Aro Taye Oladele, Ojeniyi Adegoke","doi":"10.18517/ijods.4.1.10-25.2023","DOIUrl":null,"url":null,"abstract":"The key component that makes the detection of android malware possible is the availability of the right triggers and pointers, which are found in the Android application packages, known as features or attributes. These are fundamental in the training of the different machine learning algorithms to produce the required detection model. The process of extracting these attributes or features, from the Android application packages, is known as reverse engineering. This paper delved into the experimental detail processes of applying reverse engineering procedure, using Sublime Text 2 and Androguard Plugin, on Android Application packages for the extraction of, particularly permissions, which are the targeted features. The study further discussed the cleaning stages, using NotePad++, Microsoft Excel Worksheet, and MS Word, to sort out all the relevant and important features by removing all the noisy ones. A total of 1500 Android apps were downloaded from both benign and malicious sources and used for the experiment. The cleaned or important features extracted from these application packages at the end of the reverse engineering processes are 162 in total and these were further used to form a feature binary matrix of size 1500 by 163 (including the class features).","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"12 1","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2023-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive Android APKs Reverse Engineering for Features Processing in Machine Learning Malware Detection\",\"authors\":\"B. A. Gyunka, Aro Taye Oladele, Ojeniyi Adegoke\",\"doi\":\"10.18517/ijods.4.1.10-25.2023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The key component that makes the detection of android malware possible is the availability of the right triggers and pointers, which are found in the Android application packages, known as features or attributes. These are fundamental in the training of the different machine learning algorithms to produce the required detection model. The process of extracting these attributes or features, from the Android application packages, is known as reverse engineering. This paper delved into the experimental detail processes of applying reverse engineering procedure, using Sublime Text 2 and Androguard Plugin, on Android Application packages for the extraction of, particularly permissions, which are the targeted features. The study further discussed the cleaning stages, using NotePad++, Microsoft Excel Worksheet, and MS Word, to sort out all the relevant and important features by removing all the noisy ones. A total of 1500 Android apps were downloaded from both benign and malicious sources and used for the experiment. The cleaned or important features extracted from these application packages at the end of the reverse engineering processes are 162 in total and these were further used to form a feature binary matrix of size 1500 by 163 (including the class features).\",\"PeriodicalId\":45667,\"journal\":{\"name\":\"International Journal of Data Science and Analytics\",\"volume\":\"12 1\",\"pages\":\"\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2023-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Data Science and Analytics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18517/ijods.4.1.10-25.2023\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Data Science and Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18517/ijods.4.1.10-25.2023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

使检测android恶意软件成为可能的关键组件是正确的触发器和指针的可用性,它们在android应用程序包中被称为功能或属性。这些是训练不同机器学习算法以产生所需检测模型的基础。从Android应用程序包中提取这些属性或特征的过程被称为逆向工程。本文深入研究了应用逆向工程程序的实验细节过程,使用Sublime Text 2和Androguard Plugin,在Android应用程序包上进行提取,特别是权限的提取,这是目标特性。该研究进一步讨论了清理阶段,使用notepad++, Microsoft Excel工作表和MS Word,通过删除所有嘈杂的功能来整理所有相关和重要的功能。总共有1500个安卓应用程序从良性和恶意来源下载并用于实验。在逆向工程过程结束时,从这些应用程序包中提取的清理或重要特征总共为162个,这些特征进一步用于形成大小为1500 × 163的特征二进制矩阵(包括类特征)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Adaptive Android APKs Reverse Engineering for Features Processing in Machine Learning Malware Detection
The key component that makes the detection of android malware possible is the availability of the right triggers and pointers, which are found in the Android application packages, known as features or attributes. These are fundamental in the training of the different machine learning algorithms to produce the required detection model. The process of extracting these attributes or features, from the Android application packages, is known as reverse engineering. This paper delved into the experimental detail processes of applying reverse engineering procedure, using Sublime Text 2 and Androguard Plugin, on Android Application packages for the extraction of, particularly permissions, which are the targeted features. The study further discussed the cleaning stages, using NotePad++, Microsoft Excel Worksheet, and MS Word, to sort out all the relevant and important features by removing all the noisy ones. A total of 1500 Android apps were downloaded from both benign and malicious sources and used for the experiment. The cleaned or important features extracted from these application packages at the end of the reverse engineering processes are 162 in total and these were further used to form a feature binary matrix of size 1500 by 163 (including the class features).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.40
自引率
8.30%
发文量
72
期刊介绍: Data Science has been established as an important emergent scientific field and paradigm driving research evolution in such disciplines as statistics, computing science and intelligence science, and practical transformation in such domains as science, engineering, the public sector, business, social sci­ence, and lifestyle. The field encompasses the larger ar­eas of artificial intelligence, data analytics, machine learning, pattern recognition, natural language understanding, and big data manipulation. It also tackles related new sci­entific chal­lenges, ranging from data capture, creation, storage, retrieval, sharing, analysis, optimization, and vis­ualization, to integrative analysis across heterogeneous and interdependent complex resources for better decision-making, collaboration, and, ultimately, value creation.The International Journal of Data Science and Analytics (JDSA) brings together thought leaders, researchers, industry practitioners, and potential users of data science and analytics, to develop the field, discuss new trends and opportunities, exchange ideas and practices, and promote transdisciplinary and cross-domain collaborations. The jour­nal is composed of three streams: Regular, to communicate original and reproducible theoretical and experimental findings on data science and analytics; Applications, to report the significant data science applications to real-life situations; and Trends, to report expert opinion and comprehensive surveys and reviews of relevant areas and topics in data science and analytics.Topics of relevance include all aspects of the trends, scientific foundations, techniques, and applica­tions of data science and analytics, with a primary focus on:statistical and mathematical foundations for data science and analytics;understanding and analytics of complex data, human, domain, network, organizational, social, behavior, and system characteristics, complexities and intelligences;creation and extraction, processing, representation and modelling, learning and discovery, fusion and integration, presentation and visualization of complex data, behavior, knowledge and intelligence;data analytics, pattern recognition, knowledge discovery, machine learning, deep analytics and deep learning, and intelligent processing of various data (including transaction, text, image, video, graph and network), behaviors and systems;active, real-time, personalized, actionable and automated analytics, learning, computation, optimization, presentation and recommendation; big data architecture, infrastructure, computing, matching, indexing, query processing, mapping, search, retrieval, interopera­bility, exchange, and recommendation;in-memory, distributed, parallel, scalable and high-performance computing, analytics and optimization for big data;review, surveys, trends, prospects and opportunities of data science research, innovation and applications;data science applications, intelligent devices and services in scientific, business, governmental, cultural, behavioral, social and economic, health and medical, human, natural and artificial (including online/Web, cloud, IoT, mobile and social media) domains; andethics, quality, privacy, safety and security, trust, and risk of data science and analytics
期刊最新文献
Power Analysis for Causal Discovery. Discrete double factors of a family of odd Weibull-G distributions: features and modeling Artificial intelligence trend analysis in German business and politics: a web mining approach Speech-based detection of multi-class Alzheimer’s disease classification using machine learning Implementation of air pollution traceability method based on IF-GNN-FC model with multiple-source data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1