{"title":"Similarity Regression Of Functions In Different Compiled Forms With Neural Attentions On Dual Control-Flow Graphs","authors":"Yun Zhang, Yuling Liu, Ge Cheng, Jie Wang","doi":"10.1093/comjnl/bxad095","DOIUrl":null,"url":null,"abstract":"Abstract Detecting if two functions in different compiled forms are similar has a wide range of applications in software security. We present a method that leverages both semantic and structural features of functions, learned by a neural-net model on the underlying control-flow graphs (CFGs). In particular, we devise a neural function-similarity regressor (NFSR) with attentions on dual CFGs. We train and evaluate NFSR on a dataset consisting of nearly 4 million functions from over 14 900 binary files. Experiments show that NFSR is superior to the SOTA models of SAFE, Gemini and GMN, especially for binary functions with large CFGs. An ablation study shows that attention on dual CFGs plays a significant role in detecting function similarities.","PeriodicalId":50641,"journal":{"name":"Computer Journal","volume":"5 1","pages":"0"},"PeriodicalIF":1.5000,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/comjnl/bxad095","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract Detecting if two functions in different compiled forms are similar has a wide range of applications in software security. We present a method that leverages both semantic and structural features of functions, learned by a neural-net model on the underlying control-flow graphs (CFGs). In particular, we devise a neural function-similarity regressor (NFSR) with attentions on dual CFGs. We train and evaluate NFSR on a dataset consisting of nearly 4 million functions from over 14 900 binary files. Experiments show that NFSR is superior to the SOTA models of SAFE, Gemini and GMN, especially for binary functions with large CFGs. An ablation study shows that attention on dual CFGs plays a significant role in detecting function similarities.
期刊介绍:
The Computer Journal is one of the longest-established journals serving all branches of the academic computer science community. It is currently published in four sections.