{"title":"Efficient features for function matching between binary executables","authors":"Chariton Karamitas, A. Kehagias","doi":"10.1109/SANER.2018.8330221","DOIUrl":null,"url":null,"abstract":"Binary diffing is the process of reverse engineering two programs, when source code is not available, in order to study their syntactic and semantic differences. For large programs, binary diffing can be performed by function matching which, in turn, is reduced to a graph isomorphism problem between the compared programs' CFGs (Control Flow Graphs) and/or CGs (Call Graphs). In this paper we provide a set of carefully chosen features, extracted from a binary's CG and CFG, which can be used by BinDiff algorithm variants to, first, build a set of initial exact matches with minimal false positives (by scanning for unique perfect matches) and, second, propagate approximate matching information using, for example, a nearest-neighbor scheme. Furthermore, we investigate the benefits of applying Markov lumping techniques to function CFGs (to our knowledge, this technique has not been previously studied). The proposed function features are evaluated in a series of experiments on various versions of the Linux kernel (Intel64), the OpenSSH server (Intel64) and Firefox's xul.dll (IA-32). Our prototype system is also compared to Diaphora, the current state-of-the-art binary diffing software.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"86 9 1","pages":"335-345"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SANER.2018.8330221","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Binary diffing is the process of reverse engineering two programs, when source code is not available, in order to study their syntactic and semantic differences. For large programs, binary diffing can be performed by function matching which, in turn, is reduced to a graph isomorphism problem between the compared programs' CFGs (Control Flow Graphs) and/or CGs (Call Graphs). In this paper we provide a set of carefully chosen features, extracted from a binary's CG and CFG, which can be used by BinDiff algorithm variants to, first, build a set of initial exact matches with minimal false positives (by scanning for unique perfect matches) and, second, propagate approximate matching information using, for example, a nearest-neighbor scheme. Furthermore, we investigate the benefits of applying Markov lumping techniques to function CFGs (to our knowledge, this technique has not been previously studied). The proposed function features are evaluated in a series of experiments on various versions of the Linux kernel (Intel64), the OpenSSH server (Intel64) and Firefox's xul.dll (IA-32). Our prototype system is also compared to Diaphora, the current state-of-the-art binary diffing software.