{"title":"A temporal analysis and evaluation of fuzzy hashing algorithms for Android malware analysis","authors":"Murray Fleming, Oluwafemi Olukoya","doi":"10.1016/j.fsidi.2024.301770","DOIUrl":null,"url":null,"abstract":"<div><p>Fuzzy hashing has been utilised in digital forensics and malware analysis for malware detection, malware variant classification, file clustering, document similarity detection, embedded object detection and fragment detection. Previous research considered the efficacy of fuzzy hashing at a point in time for malware classification and did not specifically address the problem of malware evolution. Android malware presents a significant cybersecurity threat, and since malware is constantly mutating, a temporal analysis of the effectiveness of fuzzy hashing techniques for Android malware detection and classification contributes to understanding the value of fuzzy hashes in the evolution of malware. Through experimental examination, this study sought to determine whether or not fuzzy hashes are always effective, how quickly malware is evolving, and how malware evolution affects fuzzy hashing. Comparisons are made between the performance of different fuzzy hashing algorithms and the distinction between hashes at the file and class levels. Experiments with known malware family and analysis with over 4500 APK files, including 100 benign samples collected from 2012 - 2022 were conducted using various fuzzy hashing algorithms, file-level and section-level similarity hashing, symbolic and raw opcode hashing, and optimisations for improving fuzzy hashing comparisons. The performance of the methods was evaluated using detection and false positive rates. The results show that fuzzy hashing algorithms remain a valuable technique that demonstrates robustness to malware evolution with 10-year detection rates of over 80%.</p></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":null,"pages":null},"PeriodicalIF":2.0000,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666281724000891/pdfft?md5=45e25e15294ae9f8fbf35e580e62dc65&pid=1-s2.0-S2666281724000891-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Science International-Digital Investigation","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666281724000891","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Fuzzy hashing has been utilised in digital forensics and malware analysis for malware detection, malware variant classification, file clustering, document similarity detection, embedded object detection and fragment detection. Previous research considered the efficacy of fuzzy hashing at a point in time for malware classification and did not specifically address the problem of malware evolution. Android malware presents a significant cybersecurity threat, and since malware is constantly mutating, a temporal analysis of the effectiveness of fuzzy hashing techniques for Android malware detection and classification contributes to understanding the value of fuzzy hashes in the evolution of malware. Through experimental examination, this study sought to determine whether or not fuzzy hashes are always effective, how quickly malware is evolving, and how malware evolution affects fuzzy hashing. Comparisons are made between the performance of different fuzzy hashing algorithms and the distinction between hashes at the file and class levels. Experiments with known malware family and analysis with over 4500 APK files, including 100 benign samples collected from 2012 - 2022 were conducted using various fuzzy hashing algorithms, file-level and section-level similarity hashing, symbolic and raw opcode hashing, and optimisations for improving fuzzy hashing comparisons. The performance of the methods was evaluated using detection and false positive rates. The results show that fuzzy hashing algorithms remain a valuable technique that demonstrates robustness to malware evolution with 10-year detection rates of over 80%.