Extracting instruction semantics via symbolic execution of code generators

Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering Pub Date : 2016-11-01 DOI:10.1145/2950290.2950335

N. Hasabnis, R. Sekar

{"title":"Extracting instruction semantics via symbolic execution of code generators","authors":"N. Hasabnis, R. Sekar","doi":"10.1145/2950290.2950335","DOIUrl":null,"url":null,"abstract":"Binary analysis and instrumentation form the basis of many tools and frameworks for software debugging, security hardening, and monitoring. Accurate modeling of instruction semantics is paramount in this regard, as errors can lead to program crashes, or worse, bypassing of security checks. Semantic modeling is a daunting task for modern processors such as x86 and ARM that support over a thousand instructions, many of them with complex semantics. This paper describes a new approach to automate this semantic modeling task. Our approach leverages instruction semantics knowledge that is already encoded into today's production compilers such as GCC and LLVM. Such an approach can greatly reduce manual effort, and more importantly, avoid errors introduced by manual modeling. Furthermore, it is applicable to any of the numerous architectures already supported by the compiler. In this paper, we develop a new symbolic execution technique to extract instruction semantics from a compiler's source code. Unlike previous applications of symbolic execution that were focused on identifying a single program path that violates a property, our approach addresses the all paths problem, extracting the entire input/output behavior of the code generator. We have applied it successfully to the 120K lines of C-code used in GCC's code generator to extract x86 instruction semantics. To demonstrate architecture-neutrality, we have also applied it to AVR, a processor used in the popular Arduino platform.","PeriodicalId":20532,"journal":{"name":"Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2950290.2950335","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

Binary analysis and instrumentation form the basis of many tools and frameworks for software debugging, security hardening, and monitoring. Accurate modeling of instruction semantics is paramount in this regard, as errors can lead to program crashes, or worse, bypassing of security checks. Semantic modeling is a daunting task for modern processors such as x86 and ARM that support over a thousand instructions, many of them with complex semantics. This paper describes a new approach to automate this semantic modeling task. Our approach leverages instruction semantics knowledge that is already encoded into today's production compilers such as GCC and LLVM. Such an approach can greatly reduce manual effort, and more importantly, avoid errors introduced by manual modeling. Furthermore, it is applicable to any of the numerous architectures already supported by the compiler. In this paper, we develop a new symbolic execution technique to extract instruction semantics from a compiler's source code. Unlike previous applications of symbolic execution that were focused on identifying a single program path that violates a property, our approach addresses the all paths problem, extracting the entire input/output behavior of the code generator. We have applied it successfully to the 120K lines of C-code used in GCC's code generator to extract x86 instruction semantics. To demonstrate architecture-neutrality, we have also applied it to AVR, a processor used in the popular Arduino platform.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过代码生成器的符号执行提取指令语义

二进制分析和检测构成了许多用于软件调试、安全加固和监视的工具和框架的基础。在这方面，指令语义的准确建模是至关重要的，因为错误可能导致程序崩溃，或者更糟的是，绕过安全检查。对于支持上千条指令的现代处理器(如x86和ARM)来说，语义建模是一项艰巨的任务，其中许多指令具有复杂的语义。本文描述了一种自动化语义建模任务的新方法。我们的方法利用了指令语义知识，这些知识已经被编码到今天的生产编译器(如GCC和LLVM)中。这种方法可以大大减少人工工作量，更重要的是，可以避免人工建模带来的错误。此外，它适用于编译器已经支持的众多体系结构中的任何一个。在本文中，我们开发了一种新的符号执行技术来从编译器的源代码中提取指令语义。与以前的符号执行应用程序专注于识别违反属性的单个程序路径不同，我们的方法解决了所有路径问题，提取了代码生成器的整个输入/输出行为。我们已经成功地将它应用到GCC代码生成器中用于提取x86指令语义的120K行c代码中。为了证明架构中立性，我们还将其应用于AVR，这是流行的Arduino平台中使用的处理器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering

自引率

0.00%

发文量

期刊最新文献

Evaluation of fault localization techniques Model, execute, and deploy: answering the hard questions in end-user programming (showcase) Guided code synthesis using deep neural networks Automated change impact analysis between SysML models of requirements and design Sustainable software design