Modular Decompilation of Low-Level Code by Partial Evaluation

2008 Eighth IEEE International Working Conference on Source Code Analysis and Manipulation Pub Date : 2008-10-03 DOI:10.1109/SCAM.2008.35

M. Gómez-Zamalloa, E. Albert, G. Puebla

{"title":"Modular Decompilation of Low-Level Code by Partial Evaluation","authors":"M. Gómez-Zamalloa, E. Albert, G. Puebla","doi":"10.1109/SCAM.2008.35","DOIUrl":null,"url":null,"abstract":"Decompiling low-level code to a high-level intermediate representation facilitates the development of analyzers, model checkers, etc. which reason about properties of the low-level code (e.g., bytecode, .NET). Interpretive decompilation consists in partially evaluating an interpreter for the low-level language (written in the high-level language) w.r.t. the code to be decompiled. There have been proofs-of-concept that interpretive decompilation is feasible, butt here remain important open issues when it comes to decompile a real language: does the approach scale up? is the quality of decompiled programs comparable to that obtained by ad-hoc decompilers? do decompiled programs preserve the structure of the original programs? This paper addresses these issues by presenting, to the best of our knowledge, the first modular scheme to enable interpretive decompilation of low-level code to a high-level representation, namely, we decompile bytecode into PROLOG. We introduce two notions of optimality. The first one requires that each method/block is decompiled just once. The second one requires that each program point is traversed at most once during decompilation. We demonstrate the impact of our modular approach and optimality issues on a series of realistic benchmarks. Decompilation times and decompiled program sizes are linear with the size of the input bytecode program. This demostrates empirically the scalability of modular decompilation of low-level code by partial evaluation.","PeriodicalId":433693,"journal":{"name":"2008 Eighth IEEE International Working Conference on Source Code Analysis and Manipulation","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 Eighth IEEE International Working Conference on Source Code Analysis and Manipulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCAM.2008.35","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Decompiling low-level code to a high-level intermediate representation facilitates the development of analyzers, model checkers, etc. which reason about properties of the low-level code (e.g., bytecode, .NET). Interpretive decompilation consists in partially evaluating an interpreter for the low-level language (written in the high-level language) w.r.t. the code to be decompiled. There have been proofs-of-concept that interpretive decompilation is feasible, butt here remain important open issues when it comes to decompile a real language: does the approach scale up? is the quality of decompiled programs comparable to that obtained by ad-hoc decompilers? do decompiled programs preserve the structure of the original programs? This paper addresses these issues by presenting, to the best of our knowledge, the first modular scheme to enable interpretive decompilation of low-level code to a high-level representation, namely, we decompile bytecode into PROLOG. We introduce two notions of optimality. The first one requires that each method/block is decompiled just once. The second one requires that each program point is traversed at most once during decompilation. We demonstrate the impact of our modular approach and optimality issues on a series of realistic benchmarks. Decompilation times and decompiled program sizes are linear with the size of the input bytecode program. This demostrates empirically the scalability of modular decompilation of low-level code by partial evaluation.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

部分求值的低级代码模块化反编译

将低级代码反编译为高级中间表示形式，有助于分析程序、模型检查程序等的开发，这些程序可以推断低级代码的属性(例如，字节码、。net)。解释性反编译包括部分地计算解释器对低级语言(用高级语言编写)而不是要反编译的代码的解释器。已经有概念证明解释式反编译是可行的，但是当涉及到反编译真正的语言时，这里仍然存在重要的开放问题:这种方法是否可以扩展?反编译程序的质量是否可与特设反编译器获得的质量相媲美?反编译程序是否保留原始程序的结构?本文通过提出这些问题，据我们所知，第一个模块化方案能够将低级代码的解释性反编译为高级表示，也就是说，我们将字节码反编译为PROLOG。我们引入两个最优性的概念。第一个要求每个方法/块只反编译一次。第二种方法要求每个程序点在反编译期间最多遍历一次。我们在一系列现实的基准测试中展示了我们的模块化方法和最优性问题的影响。反编译时间和反编译程序大小与输入字节码程序的大小成线性关系。这从经验上证明了通过部分求值对低级代码进行模块化反编译的可伸缩性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2008 Eighth IEEE International Working Conference on Source Code Analysis and Manipulation

自引率

0.00%

发文量

期刊最新文献

CoordInspector: A Tool for Extracting Coordination Data from Legacy Code An Empirical Study of Function Overloading in C++ DTS - A Software Defects Testing System The Evolution and Decay of Statically Detected Source Code Vulnerabilities TBCppA: A Tracer Approach for Automatic Accurate Analysis of C Preprocessor's Behaviors