Ryan G. Scott, Brett Boston, Benjamin Davis, Iavor Diatchki, Mike Dodds, Joe Hendrix, Daniel Matichuk, Kevin Quick, Tristan Ravitch, Valentin Robert, Benjamin Selfridge, Andrei Stefănescu, Daniel Wagner, Simon Winwood
{"title":"Macaw: A Machine Code Toolbox for the Busy Binary Analyst","authors":"Ryan G. Scott, Brett Boston, Benjamin Davis, Iavor Diatchki, Mike Dodds, Joe Hendrix, Daniel Matichuk, Kevin Quick, Tristan Ravitch, Valentin Robert, Benjamin Selfridge, Andrei Stefănescu, Daniel Wagner, Simon Winwood","doi":"arxiv-2407.06375","DOIUrl":null,"url":null,"abstract":"When attempting to understand the behavior of an executable, a binary analyst\ncan make use of many different techniques. These include program slicing,\ndynamic instrumentation, binary-level rewriting, symbolic execution, and formal\nverification, all of which can uncover insights into how a piece of machine\ncode behaves. As a result, there is no one-size-fits-all binary analysis tool,\nso a binary analysis researcher will often combine several different tools.\nSometimes, a researcher will even need to design new tools to study problems\nthat existing frameworks are not well equipped to handle. Designing such tools\nfrom complete scratch is rarely time- or cost-effective, however, given the\nscale and complexity of modern instruction set architectures. We present Macaw, a modular framework that makes it possible to rapidly build\nreliable binary analysis tools across a range of use cases. Over a decade of\ndevelopment, we have used Macaw to support an industrial research team in\nbuilding tools for machine code-related tasks. As such, the name \"Macaw\" refers\nnot just to the framework itself, but also a suite of tools that are built on\ntop of the framework. We describe Macaw in depth and describe the different\nstatic and dynamic analyses that it performs, many of which are powered by an\nSMT-based symbolic execution engine. We put a particular focus on\ninteroperability between machine code and higher-level languages, including\nbinary lifting from x86 to LLVM, as well verifying the correctness of mixed C\nand assembly code.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.06375","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
When attempting to understand the behavior of an executable, a binary analyst
can make use of many different techniques. These include program slicing,
dynamic instrumentation, binary-level rewriting, symbolic execution, and formal
verification, all of which can uncover insights into how a piece of machine
code behaves. As a result, there is no one-size-fits-all binary analysis tool,
so a binary analysis researcher will often combine several different tools.
Sometimes, a researcher will even need to design new tools to study problems
that existing frameworks are not well equipped to handle. Designing such tools
from complete scratch is rarely time- or cost-effective, however, given the
scale and complexity of modern instruction set architectures. We present Macaw, a modular framework that makes it possible to rapidly build
reliable binary analysis tools across a range of use cases. Over a decade of
development, we have used Macaw to support an industrial research team in
building tools for machine code-related tasks. As such, the name "Macaw" refers
not just to the framework itself, but also a suite of tools that are built on
top of the framework. We describe Macaw in depth and describe the different
static and dynamic analyses that it performs, many of which are powered by an
SMT-based symbolic execution engine. We put a particular focus on
interoperability between machine code and higher-level languages, including
binary lifting from x86 to LLVM, as well verifying the correctness of mixed C
and assembly code.