Macaw: A Machine Code Toolbox for the Busy Binary Analyst

Ryan G. Scott, Brett Boston, Benjamin Davis, Iavor Diatchki, Mike Dodds, Joe Hendrix, Daniel Matichuk, Kevin Quick, Tristan Ravitch, Valentin Robert, Benjamin Selfridge, Andrei Stefănescu, Daniel Wagner, Simon Winwood
{"title":"Macaw: A Machine Code Toolbox for the Busy Binary Analyst","authors":"Ryan G. Scott, Brett Boston, Benjamin Davis, Iavor Diatchki, Mike Dodds, Joe Hendrix, Daniel Matichuk, Kevin Quick, Tristan Ravitch, Valentin Robert, Benjamin Selfridge, Andrei Stefănescu, Daniel Wagner, Simon Winwood","doi":"arxiv-2407.06375","DOIUrl":null,"url":null,"abstract":"When attempting to understand the behavior of an executable, a binary analyst\ncan make use of many different techniques. These include program slicing,\ndynamic instrumentation, binary-level rewriting, symbolic execution, and formal\nverification, all of which can uncover insights into how a piece of machine\ncode behaves. As a result, there is no one-size-fits-all binary analysis tool,\nso a binary analysis researcher will often combine several different tools.\nSometimes, a researcher will even need to design new tools to study problems\nthat existing frameworks are not well equipped to handle. Designing such tools\nfrom complete scratch is rarely time- or cost-effective, however, given the\nscale and complexity of modern instruction set architectures. We present Macaw, a modular framework that makes it possible to rapidly build\nreliable binary analysis tools across a range of use cases. Over a decade of\ndevelopment, we have used Macaw to support an industrial research team in\nbuilding tools for machine code-related tasks. As such, the name \"Macaw\" refers\nnot just to the framework itself, but also a suite of tools that are built on\ntop of the framework. We describe Macaw in depth and describe the different\nstatic and dynamic analyses that it performs, many of which are powered by an\nSMT-based symbolic execution engine. We put a particular focus on\ninteroperability between machine code and higher-level languages, including\nbinary lifting from x86 to LLVM, as well verifying the correctness of mixed C\nand assembly code.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.06375","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

When attempting to understand the behavior of an executable, a binary analyst can make use of many different techniques. These include program slicing, dynamic instrumentation, binary-level rewriting, symbolic execution, and formal verification, all of which can uncover insights into how a piece of machine code behaves. As a result, there is no one-size-fits-all binary analysis tool, so a binary analysis researcher will often combine several different tools. Sometimes, a researcher will even need to design new tools to study problems that existing frameworks are not well equipped to handle. Designing such tools from complete scratch is rarely time- or cost-effective, however, given the scale and complexity of modern instruction set architectures. We present Macaw, a modular framework that makes it possible to rapidly build reliable binary analysis tools across a range of use cases. Over a decade of development, we have used Macaw to support an industrial research team in building tools for machine code-related tasks. As such, the name "Macaw" refers not just to the framework itself, but also a suite of tools that are built on top of the framework. We describe Macaw in depth and describe the different static and dynamic analyses that it performs, many of which are powered by an SMT-based symbolic execution engine. We put a particular focus on interoperability between machine code and higher-level languages, including binary lifting from x86 to LLVM, as well verifying the correctness of mixed C and assembly code.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Macaw:为繁忙的二进制分析师设计的机器码工具箱
二进制分析师在试图了解可执行文件的行为时,可以使用许多不同的技术。这些技术包括程序切片、动态仪器、二进制重写、符号执行和形式验证,所有这些技术都能揭示机器代码的行为方式。因此,没有放之四海而皆准的二进制分析工具,所以二进制分析研究人员通常会将几种不同的工具结合起来。有时,研究人员甚至需要设计新的工具来研究现有框架无法很好处理的问题。然而,考虑到现代指令集架构的规模和复杂性,从零开始设计此类工具既费时又不划算。我们介绍的 Macaw 是一个模块化框架,可以快速构建可靠的二进制分析工具,适用于各种用例。在十多年的开发过程中,我们利用 Macaw 支持一个工业研究团队为机器代码相关任务构建工具。因此,"Macaw "这个名字不仅指框架本身,也指构建在框架之上的一套工具。我们将对 Macaw 进行深入描述,并介绍它所执行的各种静态和动态分析,其中许多分析由基于 SMT 的符号执行引擎提供支持。我们特别关注机器代码和高级语言之间的互操作性,包括从 x86 到 LLVM 的二进制转换,以及验证混合 Cand 汇编代码的正确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Memory Consistency and Program Transformations No Saved Kaleidosope: an 100% Jitted Neural Network Coding Language with Pythonic Syntax Towards Quantum Multiparty Session Types The Incredible Shrinking Context... in a decompiler near you Scheme Pearl: Quantum Continuations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1