{"title":"A Practical Attack on the TLSH Similarity Digest Scheme","authors":"Gábor Fuchs, Roland Nagy, L. Buttyán","doi":"10.1145/3600160.3600173","DOIUrl":null,"url":null,"abstract":"Similarity digest schemes are used in various applications (e.g., digital forensics, spam filtering, malware clustering, and malware detection), which require them to be resistant to attacks aiming at generating semantically similar inputs that have very different similarity digest values. In this paper, we show that TLSH, a widely used similarity digest function, is not sufficiently robust against such attacks. More specifically, we propose an automated method for modifying executable files (binaries), such that the modified binary has the exact same functionality as the original one, it also remains syntactically similar to the original one, yet, the TLSH difference score between the original and the modified binaries becomes high. We evaluate our method on a large data set containing malware binaries, and we also show that it can be used effectively to generate adversarial samples that evade detection by SIMBIoTA, a recently proposed similarity-based malware detection approach.","PeriodicalId":107145,"journal":{"name":"Proceedings of the 18th International Conference on Availability, Reliability and Security","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 18th International Conference on Availability, Reliability and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3600160.3600173","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Similarity digest schemes are used in various applications (e.g., digital forensics, spam filtering, malware clustering, and malware detection), which require them to be resistant to attacks aiming at generating semantically similar inputs that have very different similarity digest values. In this paper, we show that TLSH, a widely used similarity digest function, is not sufficiently robust against such attacks. More specifically, we propose an automated method for modifying executable files (binaries), such that the modified binary has the exact same functionality as the original one, it also remains syntactically similar to the original one, yet, the TLSH difference score between the original and the modified binaries becomes high. We evaluate our method on a large data set containing malware binaries, and we also show that it can be used effectively to generate adversarial samples that evade detection by SIMBIoTA, a recently proposed similarity-based malware detection approach.