{"title":"Sensitive Behavioral Chain-Focused Android Malware Detection Fused With AST Semantics","authors":"Jiacheng Gong;Weina Niu;Song Li;Mingxue Zhang;Xiaosong Zhang","doi":"10.1109/TIFS.2024.3468891","DOIUrl":null,"url":null,"abstract":"The proliferation of Android malware poses a substantial security threat to mobile devices. Thus, achieving efficient and accurate malware detection and malware family identification is crucial for safeguarding users’ individual property and privacy. Graph-based approaches have demonstrated remarkable detection performance in the realm of intelligent Android malware detection methods. This is attributed to the robust representation capabilities of graphs and the rich semantic information. The function call graph (FCG) is the most widely used graph in intelligent Android malware detection. However, existing FCG-based malware detection methods face challenges, such as the enormous computational and storage costs of modeling large graphs. Additionally, the ignorance of code semantics also makes them susceptible to structured attacks. In this paper, we proposed AndroAnalyzer, which embeds abstract syntax tree (AST) code semantics while focusing on sensitive behavior chains. It leverages FCGs to represent the macroscopic behavior of the application, and employs structured code semantics to represent the microscopic behavior of functions. Furthermore, we proposed the sensitive function call graph (SFCG) generation algorithm to narrow down the analysis scope to sensitive function calls, and the AST vectorization algorithm (AST2Vec) to capture structured code semantics. Experimental results demonstrate that the proposed SFCG generation algorithm noticeably reduces graph size while ensuring robust detection performance. AndroAnalyzer outperforms the baseline methods in binary and multiclass classification tasks, achieving F1-scores of 99.21% and 98.45% respectively. Moreover, AndroAnalyzer (trained with samples of 2010-2018) exhibits good generalization capabilities in detecting samples of 2019-2022.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"19 ","pages":"9216-9229"},"PeriodicalIF":6.3000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10695137/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
The proliferation of Android malware poses a substantial security threat to mobile devices. Thus, achieving efficient and accurate malware detection and malware family identification is crucial for safeguarding users’ individual property and privacy. Graph-based approaches have demonstrated remarkable detection performance in the realm of intelligent Android malware detection methods. This is attributed to the robust representation capabilities of graphs and the rich semantic information. The function call graph (FCG) is the most widely used graph in intelligent Android malware detection. However, existing FCG-based malware detection methods face challenges, such as the enormous computational and storage costs of modeling large graphs. Additionally, the ignorance of code semantics also makes them susceptible to structured attacks. In this paper, we proposed AndroAnalyzer, which embeds abstract syntax tree (AST) code semantics while focusing on sensitive behavior chains. It leverages FCGs to represent the macroscopic behavior of the application, and employs structured code semantics to represent the microscopic behavior of functions. Furthermore, we proposed the sensitive function call graph (SFCG) generation algorithm to narrow down the analysis scope to sensitive function calls, and the AST vectorization algorithm (AST2Vec) to capture structured code semantics. Experimental results demonstrate that the proposed SFCG generation algorithm noticeably reduces graph size while ensuring robust detection performance. AndroAnalyzer outperforms the baseline methods in binary and multiclass classification tasks, achieving F1-scores of 99.21% and 98.45% respectively. Moreover, AndroAnalyzer (trained with samples of 2010-2018) exhibits good generalization capabilities in detecting samples of 2019-2022.
期刊介绍:
The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features