Structured random differential testing of instruction decoders

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER) Pub Date : 2018-03-20 DOI:10.1109/SANER.2018.8330199

Nathan Jay, B. Miller

{"title":"Structured random differential testing of instruction decoders","authors":"Nathan Jay, B. Miller","doi":"10.1109/SANER.2018.8330199","DOIUrl":null,"url":null,"abstract":"Decoding binary executable files is a critical facility for software analysis, including debugging, performance monitoring, malware detection, cyber forensics, and sandboxing, among other techniques. As a foundational capability, binary decoding must be consistently correct for the techniques that rely on it to be viable. Unfortunately, modern instruction sets are huge and the encodings are complex, so as a result, modern binary decoders are buggy. In this paper, we present a testing methodology that automatically infers structural information for an instruction set and uses the inferred structure to efficiently generate structured-random test cases independent of the instruction set being tested. Our testing methodology includes automatic output verification using differential analysis and reassembly to generate error reports. This testing methodology requires little instruction-set-specific knowledge, allowing rapid testing of decoders for new architectures and extensions to existing ones. We have implemented our testing procedure in a tool name Fleece and used it to test multiple binary decoders (Intel XED, libopcodes, LLVM, Dyninst and Capstone) on multiple architectures (x86, ARM and PowerPC). Our testing efficiently covered thousands of instruction format variations for each instruction set and uncovered decoding bugs in every decoder we tested.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"11 9","pages":"84-94"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SANER.2018.8330199","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Decoding binary executable files is a critical facility for software analysis, including debugging, performance monitoring, malware detection, cyber forensics, and sandboxing, among other techniques. As a foundational capability, binary decoding must be consistently correct for the techniques that rely on it to be viable. Unfortunately, modern instruction sets are huge and the encodings are complex, so as a result, modern binary decoders are buggy. In this paper, we present a testing methodology that automatically infers structural information for an instruction set and uses the inferred structure to efficiently generate structured-random test cases independent of the instruction set being tested. Our testing methodology includes automatic output verification using differential analysis and reassembly to generate error reports. This testing methodology requires little instruction-set-specific knowledge, allowing rapid testing of decoders for new architectures and extensions to existing ones. We have implemented our testing procedure in a tool name Fleece and used it to test multiple binary decoders (Intel XED, libopcodes, LLVM, Dyninst and Capstone) on multiple architectures (x86, ARM and PowerPC). Our testing efficiently covered thousands of instruction format variations for each instruction set and uncovered decoding bugs in every decoder we tested.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

指令解码器的结构化随机差分测试

解码二进制可执行文件是软件分析的关键工具，包括调试、性能监控、恶意软件检测、网络取证和沙盒等技术。作为一项基本能力，二进制解码必须始终正确，这样依赖于它的技术才可行。不幸的是，现代的指令集是巨大的，编码是复杂的，因此，现代二进制解码器是错误的。在本文中，我们提出了一种测试方法，可以自动推断指令集的结构信息，并使用推断的结构有效地生成独立于被测试指令集的结构化随机测试用例。我们的测试方法包括使用差异分析和重新组装来生成错误报告的自动输出验证。这种测试方法几乎不需要特定于指令集的知识，允许对新架构和现有架构的扩展的解码器进行快速测试。我们已经在一个名为Fleece的工具中实现了我们的测试过程，并使用它在多个架构(x86, ARM和PowerPC)上测试多个二进制解码器(Intel XED, libopcodes, LLVM, Dyninst和Capstone)。我们的测试有效地覆盖了每个指令集的数千种指令格式变化，并发现了我们测试的每个解码器中的解码错误。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

自引率

0.00%

发文量

期刊最新文献

Exploring the integration of user feedback in automated testing of Android applications The Statechart Workbench: Enabling scalable software event log analysis using process mining Detecting code smells using machine learning techniques: Are we there yet? Classifying stack overflow posts on API issues Re-evaluating method-level bug prediction