Wasif Khan , Saed Alrabaee , Mousa Al-kfairy , Jie Tang , Kim-Kwang Raymond Choo
{"title":"Compiler-provenance identification in obfuscated binaries using vision transformers","authors":"Wasif Khan , Saed Alrabaee , Mousa Al-kfairy , Jie Tang , Kim-Kwang Raymond Choo","doi":"10.1016/j.fsidi.2024.301764","DOIUrl":null,"url":null,"abstract":"<div><p>Extracting compiler-provenance-related information (e.g., the source of a compiler, its version, its optimization settings, and compiler-related functions) is crucial for binary-analysis tasks such as function fingerprinting, detecting code clones, and determining authorship attribution. However, the presence of obfuscation techniques has complicated the efforts to automate such extraction. In this paper, we propose an efficient and resilient approach to provenance identification in obfuscated binaries using advanced pre-trained computer-vision models. To achieve this, we transform the program binaries into images and apply a two-layer approach for compiler and optimization prediction. Extensive results from experiments performed on a large-scale dataset show that the proposed method can achieve an accuracy of over 98 % for both obfuscated and deobfuscated binaries.</p></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":null,"pages":null},"PeriodicalIF":2.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666281724000830/pdfft?md5=4be468a95e1def67152faeccf9135fb9&pid=1-s2.0-S2666281724000830-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Science International-Digital Investigation","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666281724000830","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Extracting compiler-provenance-related information (e.g., the source of a compiler, its version, its optimization settings, and compiler-related functions) is crucial for binary-analysis tasks such as function fingerprinting, detecting code clones, and determining authorship attribution. However, the presence of obfuscation techniques has complicated the efforts to automate such extraction. In this paper, we propose an efficient and resilient approach to provenance identification in obfuscated binaries using advanced pre-trained computer-vision models. To achieve this, we transform the program binaries into images and apply a two-layer approach for compiler and optimization prediction. Extensive results from experiments performed on a large-scale dataset show that the proposed method can achieve an accuracy of over 98 % for both obfuscated and deobfuscated binaries.