{"title":"BinSub: The Simple Essence of Polymorphic Type Inference for Machine Code","authors":"Ian Smith","doi":"arxiv-2409.01841","DOIUrl":null,"url":null,"abstract":"Recovering high-level type information in binaries is a key task in reverse\nengineering and binary analysis. Binaries contain very little explicit type\ninformation. The structure of binary code is incredibly flexible allowing for\nad-hoc subtyping and polymorphism. Prior work has shown that precise type\ninference on binary code requires expressive subtyping and polymorphism. Implementations of these type system features in a binary type inference\nalgorithm have thus-far been too inefficient to achieve widespread adoption.\nRecent advances in traditional type inference have achieved simple and\nefficient principal type inference in an ML like language with subtyping and\npolymorphism through the framework of algebraic subtyping. BinSub, a new binary\ntype inference algorithm, recognizes the connection between algebraic subtyping\nand the type system features required to analyze binaries effectively. Using\nthis connection, BinSub achieves simple, precise, and efficient binary type\ninference. We show that BinSub maintains a similar precision to prior work,\nwhile achieving a 63x improvement in average runtime for 1568 functions. We\nalso present a formalization of BinSub and show that BinSub's type system\nmaintains the expressiveness of prior work.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.01841","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recovering high-level type information in binaries is a key task in reverse
engineering and binary analysis. Binaries contain very little explicit type
information. The structure of binary code is incredibly flexible allowing for
ad-hoc subtyping and polymorphism. Prior work has shown that precise type
inference on binary code requires expressive subtyping and polymorphism. Implementations of these type system features in a binary type inference
algorithm have thus-far been too inefficient to achieve widespread adoption.
Recent advances in traditional type inference have achieved simple and
efficient principal type inference in an ML like language with subtyping and
polymorphism through the framework of algebraic subtyping. BinSub, a new binary
type inference algorithm, recognizes the connection between algebraic subtyping
and the type system features required to analyze binaries effectively. Using
this connection, BinSub achieves simple, precise, and efficient binary type
inference. We show that BinSub maintains a similar precision to prior work,
while achieving a 63x improvement in average runtime for 1568 functions. We
also present a formalization of BinSub and show that BinSub's type system
maintains the expressiveness of prior work.