机器代码的多态类型推断

M. Noonan, Alexey Loginov, D. Cok
{"title":"机器代码的多态类型推断","authors":"M. Noonan, Alexey Loginov, D. Cok","doi":"10.1145/2908080.2908119","DOIUrl":null,"url":null,"abstract":"For many compiled languages, source-level types are erased very early in the compilation process. As a result, further compiler passes may convert type-safe source into type-unsafe machine code. Type-unsafe idioms in the original source and type-unsafe optimizations mean that type information in a stripped binary is essentially nonexistent. The problem of recovering high-level types by performing type inference over stripped machine code is called type reconstruction, and offers a useful capability in support of reverse engineering and decompilation. In this paper, we motivate and develop a novel type system and algorithm for machine-code type inference. The features of this type system were developed by surveying a wide collection of common source- and machine-code idioms, building a catalog of challenging cases for type reconstruction. We found that these idioms place a sophisticated set of requirements on the type system, inducing features such as recursively-constrained polymorphic types. Many of the features we identify are often seen only in expressive and powerful type systems used by high-level functional languages. Using these type-system features as a guideline, we have developed Retypd: a novel static type-inference algorithm for machine code that supports recursive types, polymorphism, and subtyping. Retypd yields more accurate inferred types than existing algorithms, while also enabling new capabilities such as reconstruction of pointer const annotations with 98% recall. Retypd can operate on weaker program representations than the current state of the art, removing the need for high-quality points-to information that may be impractical to compute.","PeriodicalId":178839,"journal":{"name":"Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"Polymorphic type inference for machine code\",\"authors\":\"M. Noonan, Alexey Loginov, D. Cok\",\"doi\":\"10.1145/2908080.2908119\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For many compiled languages, source-level types are erased very early in the compilation process. As a result, further compiler passes may convert type-safe source into type-unsafe machine code. Type-unsafe idioms in the original source and type-unsafe optimizations mean that type information in a stripped binary is essentially nonexistent. The problem of recovering high-level types by performing type inference over stripped machine code is called type reconstruction, and offers a useful capability in support of reverse engineering and decompilation. In this paper, we motivate and develop a novel type system and algorithm for machine-code type inference. The features of this type system were developed by surveying a wide collection of common source- and machine-code idioms, building a catalog of challenging cases for type reconstruction. We found that these idioms place a sophisticated set of requirements on the type system, inducing features such as recursively-constrained polymorphic types. Many of the features we identify are often seen only in expressive and powerful type systems used by high-level functional languages. Using these type-system features as a guideline, we have developed Retypd: a novel static type-inference algorithm for machine code that supports recursive types, polymorphism, and subtyping. Retypd yields more accurate inferred types than existing algorithms, while also enabling new capabilities such as reconstruction of pointer const annotations with 98% recall. Retypd can operate on weaker program representations than the current state of the art, removing the need for high-quality points-to information that may be impractical to compute.\",\"PeriodicalId\":178839,\"journal\":{\"name\":\"Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-03-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2908080.2908119\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2908080.2908119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 29

摘要

对于许多编译语言,源级类型在编译过程的早期就被删除了。因此,进一步的编译器传递可能会将类型安全的源代码转换为类型不安全的机器码。原始源代码中的类型不安全习惯用法和类型不安全优化意味着在剥离的二进制文件中基本上不存在类型信息。通过对剥离的机器码执行类型推断来恢复高级类型的问题称为类型重构,它为支持逆向工程和反编译提供了有用的功能。在本文中,我们激发并开发了一种新的机器代码类型推断系统和算法。该类型系统的特性是通过调查广泛的通用源代码和机器码习惯用法的集合来开发的,为类型重构构建了一个具有挑战性的案例目录。我们发现,这些习惯用法对类型系统提出了一组复杂的要求,产生了递归约束多态类型等特性。我们确定的许多特性通常只出现在高级函数式语言使用的表达性强的类型系统中。以这些类型系统特性为指导,我们开发了Retypd:一种新的用于机器码的静态类型推断算法,它支持递归类型、多态性和子类型。Retypd产生比现有算法更准确的推断类型,同时还支持新功能,例如以98%的召回率重建指针const注释。Retypd可以在比当前技术状态更弱的程序表示上操作,从而消除了对可能无法计算的高质量指向信息的需求。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Polymorphic type inference for machine code
For many compiled languages, source-level types are erased very early in the compilation process. As a result, further compiler passes may convert type-safe source into type-unsafe machine code. Type-unsafe idioms in the original source and type-unsafe optimizations mean that type information in a stripped binary is essentially nonexistent. The problem of recovering high-level types by performing type inference over stripped machine code is called type reconstruction, and offers a useful capability in support of reverse engineering and decompilation. In this paper, we motivate and develop a novel type system and algorithm for machine-code type inference. The features of this type system were developed by surveying a wide collection of common source- and machine-code idioms, building a catalog of challenging cases for type reconstruction. We found that these idioms place a sophisticated set of requirements on the type system, inducing features such as recursively-constrained polymorphic types. Many of the features we identify are often seen only in expressive and powerful type systems used by high-level functional languages. Using these type-system features as a guideline, we have developed Retypd: a novel static type-inference algorithm for machine code that supports recursive types, polymorphism, and subtyping. Retypd yields more accurate inferred types than existing algorithms, while also enabling new capabilities such as reconstruction of pointer const annotations with 98% recall. Retypd can operate on weaker program representations than the current state of the art, removing the need for high-quality points-to information that may be impractical to compute.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Assessing the limits of program-specific garbage collection performance Data-driven precondition inference with learned features SDNRacer: concurrency analysis for software-defined networks Exposing errors related to weak memory in GPU applications Effective padding of multidimensional arrays to avoid cache conflict misses
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1