{"title":"rev.ng:一个统一的二进制分析框架,用于恢复CFGs和功能边界","authors":"A. Federico, Mathias Payer, G. Agosta","doi":"10.1145/3033019.3033028","DOIUrl":null,"url":null,"abstract":"Static binary analysis is a key tool to assess the security of third-party binaries and legacy programs. Most forms of binary analysis rely on the availability of two key pieces of information: the program's control-flow graph and function boundaries. However, current tools struggle to provide accurate and precise results, in particular when dealing with hand-written assembly functions and non-trivial control-flow transfer instructions, such as tail calls. In addition, most of the existing solutions are ad-hoc, rely on hand-coded heuristics, and are tied to a specific architecture. In this paper we highlight the challenges faced by an architecture agnostic static binary analysis framework to provide accurate information about a program's CFG and function boundaries without employing debugging information or symbols. We propose a set of analyses to address predicate instructions, noreturn functions, tail calls, and context-dependent CFG. rev.ng, our binary analysis framework based on QEMU and LLVM, handles all the 17 architectures supported by QEMU and produces a compilable LLVM IR. We implement our described analyses on top of LLVM IR. In an extensive evaluation, we test our tool on binaries compiled for MIPS, ARM, and x86-64 using GCC and clang and compare them to the industry's state of the art tool, IDA Pro, and two well-known academic tools, BAP/ByteWeight and angr. In all cases, the quality of the CFG and function boundaries produced by rev.ng is comparable to or improves over the alternatives.","PeriodicalId":146080,"journal":{"name":"Proceedings of the 26th International Conference on Compiler Construction","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"63","resultStr":"{\"title\":\"rev.ng: a unified binary analysis framework to recover CFGs and function boundaries\",\"authors\":\"A. Federico, Mathias Payer, G. Agosta\",\"doi\":\"10.1145/3033019.3033028\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Static binary analysis is a key tool to assess the security of third-party binaries and legacy programs. Most forms of binary analysis rely on the availability of two key pieces of information: the program's control-flow graph and function boundaries. However, current tools struggle to provide accurate and precise results, in particular when dealing with hand-written assembly functions and non-trivial control-flow transfer instructions, such as tail calls. In addition, most of the existing solutions are ad-hoc, rely on hand-coded heuristics, and are tied to a specific architecture. In this paper we highlight the challenges faced by an architecture agnostic static binary analysis framework to provide accurate information about a program's CFG and function boundaries without employing debugging information or symbols. We propose a set of analyses to address predicate instructions, noreturn functions, tail calls, and context-dependent CFG. rev.ng, our binary analysis framework based on QEMU and LLVM, handles all the 17 architectures supported by QEMU and produces a compilable LLVM IR. We implement our described analyses on top of LLVM IR. In an extensive evaluation, we test our tool on binaries compiled for MIPS, ARM, and x86-64 using GCC and clang and compare them to the industry's state of the art tool, IDA Pro, and two well-known academic tools, BAP/ByteWeight and angr. In all cases, the quality of the CFG and function boundaries produced by rev.ng is comparable to or improves over the alternatives.\",\"PeriodicalId\":146080,\"journal\":{\"name\":\"Proceedings of the 26th International Conference on Compiler Construction\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-02-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"63\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 26th International Conference on Compiler Construction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3033019.3033028\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th International Conference on Compiler Construction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3033019.3033028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
rev.ng: a unified binary analysis framework to recover CFGs and function boundaries
Static binary analysis is a key tool to assess the security of third-party binaries and legacy programs. Most forms of binary analysis rely on the availability of two key pieces of information: the program's control-flow graph and function boundaries. However, current tools struggle to provide accurate and precise results, in particular when dealing with hand-written assembly functions and non-trivial control-flow transfer instructions, such as tail calls. In addition, most of the existing solutions are ad-hoc, rely on hand-coded heuristics, and are tied to a specific architecture. In this paper we highlight the challenges faced by an architecture agnostic static binary analysis framework to provide accurate information about a program's CFG and function boundaries without employing debugging information or symbols. We propose a set of analyses to address predicate instructions, noreturn functions, tail calls, and context-dependent CFG. rev.ng, our binary analysis framework based on QEMU and LLVM, handles all the 17 architectures supported by QEMU and produces a compilable LLVM IR. We implement our described analyses on top of LLVM IR. In an extensive evaluation, we test our tool on binaries compiled for MIPS, ARM, and x86-64 using GCC and clang and compare them to the industry's state of the art tool, IDA Pro, and two well-known academic tools, BAP/ByteWeight and angr. In all cases, the quality of the CFG and function boundaries produced by rev.ng is comparable to or improves over the alternatives.