Sensitive Behavioral Chain-Focused Android Malware Detection Fused With AST Semantics

IF 6.3 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS IEEE Transactions on Information Forensics and Security Pub Date : 2024-09-26 DOI:10.1109/TIFS.2024.3468891

Jiacheng Gong;Weina Niu;Song Li;Mingxue Zhang;Xiaosong Zhang

{"title":"Sensitive Behavioral Chain-Focused Android Malware Detection Fused With AST Semantics","authors":"Jiacheng Gong;Weina Niu;Song Li;Mingxue Zhang;Xiaosong Zhang","doi":"10.1109/TIFS.2024.3468891","DOIUrl":null,"url":null,"abstract":"The proliferation of Android malware poses a substantial security threat to mobile devices. Thus, achieving efficient and accurate malware detection and malware family identification is crucial for safeguarding users’ individual property and privacy. Graph-based approaches have demonstrated remarkable detection performance in the realm of intelligent Android malware detection methods. This is attributed to the robust representation capabilities of graphs and the rich semantic information. The function call graph (FCG) is the most widely used graph in intelligent Android malware detection. However, existing FCG-based malware detection methods face challenges, such as the enormous computational and storage costs of modeling large graphs. Additionally, the ignorance of code semantics also makes them susceptible to structured attacks. In this paper, we proposed AndroAnalyzer, which embeds abstract syntax tree (AST) code semantics while focusing on sensitive behavior chains. It leverages FCGs to represent the macroscopic behavior of the application, and employs structured code semantics to represent the microscopic behavior of functions. Furthermore, we proposed the sensitive function call graph (SFCG) generation algorithm to narrow down the analysis scope to sensitive function calls, and the AST vectorization algorithm (AST2Vec) to capture structured code semantics. Experimental results demonstrate that the proposed SFCG generation algorithm noticeably reduces graph size while ensuring robust detection performance. AndroAnalyzer outperforms the baseline methods in binary and multiclass classification tasks, achieving F1-scores of 99.21% and 98.45% respectively. Moreover, AndroAnalyzer (trained with samples of 2010-2018) exhibits good generalization capabilities in detecting samples of 2019-2022.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"19 ","pages":"9216-9229"},"PeriodicalIF":6.3000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10695137/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

The proliferation of Android malware poses a substantial security threat to mobile devices. Thus, achieving efficient and accurate malware detection and malware family identification is crucial for safeguarding users’ individual property and privacy. Graph-based approaches have demonstrated remarkable detection performance in the realm of intelligent Android malware detection methods. This is attributed to the robust representation capabilities of graphs and the rich semantic information. The function call graph (FCG) is the most widely used graph in intelligent Android malware detection. However, existing FCG-based malware detection methods face challenges, such as the enormous computational and storage costs of modeling large graphs. Additionally, the ignorance of code semantics also makes them susceptible to structured attacks. In this paper, we proposed AndroAnalyzer, which embeds abstract syntax tree (AST) code semantics while focusing on sensitive behavior chains. It leverages FCGs to represent the macroscopic behavior of the application, and employs structured code semantics to represent the microscopic behavior of functions. Furthermore, we proposed the sensitive function call graph (SFCG) generation algorithm to narrow down the analysis scope to sensitive function calls, and the AST vectorization algorithm (AST2Vec) to capture structured code semantics. Experimental results demonstrate that the proposed SFCG generation algorithm noticeably reduces graph size while ensuring robust detection performance. AndroAnalyzer outperforms the baseline methods in binary and multiclass classification tasks, achieving F1-scores of 99.21% and 98.45% respectively. Moreover, AndroAnalyzer (trained with samples of 2010-2018) exhibits good generalization capabilities in detecting samples of 2019-2022.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

融合 AST 语义的以敏感行为链为重点的安卓恶意软件检测

安卓恶意软件的激增对移动设备构成了巨大的安全威胁。因此，实现高效、准确的恶意软件检测和恶意软件家族识别对于保护用户的个人财产和隐私至关重要。在智能安卓恶意软件检测方法领域，基于图的方法已显示出卓越的检测性能。这归功于图的强大表示能力和丰富的语义信息。函数调用图（FCG）是智能安卓恶意软件检测中使用最广泛的图。然而，现有的基于 FCG 的恶意软件检测方法面临着一些挑战，例如对大型图建模的巨大计算和存储成本。此外，对代码语义的忽略也使其容易受到结构化攻击。在本文中，我们提出了 AndroAnalyzer，它嵌入了抽象语法树（AST）代码语义，同时关注敏感行为链。它利用 FCG 表示应用程序的宏观行为，并采用结构化代码语义表示函数的微观行为。此外，我们还提出了敏感函数调用图（SFCG）生成算法，以将分析范围缩小到敏感函数调用，并提出了 AST 向量化算法（AST2Vec）来捕捉结构化代码语义。实验结果表明，所提出的 SFCG 生成算法在确保稳健检测性能的同时，明显缩小了图的大小。AndroAnalyzer 在二分类和多分类任务中的表现优于基准方法，F1 分数分别达到 99.21% 和 98.45%。此外，AndroAnalyzer（使用 2010-2018 年的样本进行训练）在检测 2019-2022 年的样本时表现出良好的泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Information Forensics and Security 工程技术-工程：电子与电气

CiteScore

14.40

自引率

7.40%

发文量

234

审稿时长

6.5 months

期刊介绍： The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features

期刊最新文献

Attackers Are Not the Same! Unveiling the Impact of Feature Distribution on Label Inference Attacks Backdoor Online Tracing With Evolving Graphs LHADRO: A Robust Control Framework for Autonomous Vehicles Under Cyber-Physical Attacks Towards Mobile Palmprint Recognition via Multi-view Hierarchical Graph Learning Succinct Hash-based Arbitrary-Range Proofs