重新审视基于静态特征的安卓恶意软件检测

Md Tanvirul Alam, Dipkamal Bhusal, Nidhi Rastogi
{"title":"重新审视基于静态特征的安卓恶意软件检测","authors":"Md Tanvirul Alam, Dipkamal Bhusal, Nidhi Rastogi","doi":"arxiv-2409.07397","DOIUrl":null,"url":null,"abstract":"The increasing reliance on machine learning (ML) in computer security,\nparticularly for malware classification, has driven significant advancements.\nHowever, the replicability and reproducibility of these results are often\noverlooked, leading to challenges in verifying research findings. This paper\nhighlights critical pitfalls that undermine the validity of ML research in\nAndroid malware detection, focusing on dataset and methodological issues. We\ncomprehensively analyze Android malware detection using two datasets and assess\noffline and continual learning settings with six widely used ML models. Our\nstudy reveals that when properly tuned, simpler baseline methods can often\noutperform more complex models. To address reproducibility challenges, we\npropose solutions for improving datasets and methodological practices, enabling\nfairer model comparisons. Additionally, we open-source our code to facilitate\nmalware analysis, making it extensible for new models and datasets. Our paper\naims to support future research in Android malware detection and other security\ndomains, enhancing the reliability and reproducibility of published results.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Revisiting Static Feature-Based Android Malware Detection\",\"authors\":\"Md Tanvirul Alam, Dipkamal Bhusal, Nidhi Rastogi\",\"doi\":\"arxiv-2409.07397\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The increasing reliance on machine learning (ML) in computer security,\\nparticularly for malware classification, has driven significant advancements.\\nHowever, the replicability and reproducibility of these results are often\\noverlooked, leading to challenges in verifying research findings. This paper\\nhighlights critical pitfalls that undermine the validity of ML research in\\nAndroid malware detection, focusing on dataset and methodological issues. We\\ncomprehensively analyze Android malware detection using two datasets and assess\\noffline and continual learning settings with six widely used ML models. Our\\nstudy reveals that when properly tuned, simpler baseline methods can often\\noutperform more complex models. To address reproducibility challenges, we\\npropose solutions for improving datasets and methodological practices, enabling\\nfairer model comparisons. Additionally, we open-source our code to facilitate\\nmalware analysis, making it extensible for new models and datasets. Our paper\\naims to support future research in Android malware detection and other security\\ndomains, enhancing the reliability and reproducibility of published results.\",\"PeriodicalId\":501332,\"journal\":{\"name\":\"arXiv - CS - Cryptography and Security\",\"volume\":\"6 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Cryptography and Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07397\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Cryptography and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07397","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

计算机安全领域对机器学习(ML)的依赖与日俱增,尤其是在恶意软件分类方面,推动了计算机安全领域的重大进步。然而,这些成果的可复制性和可再现性往往被忽视,导致在验证研究成果方面面临挑战。本文重点讨论了数据集和方法问题,指出了影响安卓恶意软件检测中人工智能研究有效性的关键陷阱。我们使用两个数据集对 Android 恶意软件检测进行了全面分析,并对六种广泛使用的 ML 模型的离线和持续学习设置进行了评估。我们的研究表明,如果调整得当,较简单的基线方法往往能胜过较复杂的模型。为了应对可重复性挑战,我们提出了改进数据集和方法实践的解决方案,从而能够进行更公平的模型比较。此外,我们还将代码开源,以方便软件分析,使其可扩展到新的模型和数据集。我们的论文旨在支持未来在安卓恶意软件检测和其他安全领域的研究,提高已发布结果的可靠性和可重复性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Revisiting Static Feature-Based Android Malware Detection
The increasing reliance on machine learning (ML) in computer security, particularly for malware classification, has driven significant advancements. However, the replicability and reproducibility of these results are often overlooked, leading to challenges in verifying research findings. This paper highlights critical pitfalls that undermine the validity of ML research in Android malware detection, focusing on dataset and methodological issues. We comprehensively analyze Android malware detection using two datasets and assess offline and continual learning settings with six widely used ML models. Our study reveals that when properly tuned, simpler baseline methods can often outperform more complex models. To address reproducibility challenges, we propose solutions for improving datasets and methodological practices, enabling fairer model comparisons. Additionally, we open-source our code to facilitate malware analysis, making it extensible for new models and datasets. Our paper aims to support future research in Android malware detection and other security domains, enhancing the reliability and reproducibility of published results.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning Artemis: Efficient Commit-and-Prove SNARKs for zkML A Survey-Based Quantitative Analysis of Stress Factors and Their Impacts Among Cybersecurity Professionals Log2graphs: An Unsupervised Framework for Log Anomaly Detection with Efficient Feature Extraction Practical Investigation on the Distinguishability of Longa's Atomic Patterns
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1