Unleashing the hidden power of compiler optimization on binary code difference: an empirical study

Xiaolei Ren, Michael Ho, Jiang Ming, Yu Lei, Li Li
{"title":"Unleashing the hidden power of compiler optimization on binary code difference: an empirical study","authors":"Xiaolei Ren, Michael Ho, Jiang Ming, Yu Lei, Li Li","doi":"10.1145/3453483.3454035","DOIUrl":null,"url":null,"abstract":"Hunting binary code difference without source code (i.e., binary diffing) has compelling applications in software security. Due to the high variability of binary code, existing solutions have been driven towards measuring semantic similarities from syntactically different code. Since compiler optimization is the most common source contributing to binary code differences in syntax, testing the resilience against the changes caused by different compiler optimization settings has become a standard evaluation step for most binary diffing approaches. For example, 47 top-venue papers in the last 12 years compared different program versions compiled by default optimization levels (e.g., -Ox in GCC and LLVM). Although many of them claim they are immune to compiler transformations, it is yet unclear about their resistance to non-default optimization settings. Especially, we have observed that adversaries explored non-default compiler settings to amplify malware differences. This paper takes the first step to systematically studying the effectiveness of compiler optimization on binary code differences. We tailor search-based iterative compilation for the auto-tuning of binary code differences. We develop BinTuner to search near-optimal optimization sequences that can maximize the amount of binary code differences. We run BinTuner with GCC 10.2 and LLVM 11.0 on SPEC benchmarks (CPU2006 & CPU2017), Coreutils, and OpenSSL. Our experiments show that at the cost of 279 to 1,881 compilation iterations, BinTuner can find custom optimization sequences that are substantially better than the general -Ox settings. BinTuner's outputs seriously undermine prominent binary diffing tools' comparisons. In addition, the detection rate of the IoT malware variants tuned by BinTuner falls by more than 50%. Our findings paint a cautionary tale for security analysts that attackers have a new way to mutate malware code cost-effectively, and the research community needs to step back to reassess optimization-resistance evaluations.","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3453483.3454035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 31

Abstract

Hunting binary code difference without source code (i.e., binary diffing) has compelling applications in software security. Due to the high variability of binary code, existing solutions have been driven towards measuring semantic similarities from syntactically different code. Since compiler optimization is the most common source contributing to binary code differences in syntax, testing the resilience against the changes caused by different compiler optimization settings has become a standard evaluation step for most binary diffing approaches. For example, 47 top-venue papers in the last 12 years compared different program versions compiled by default optimization levels (e.g., -Ox in GCC and LLVM). Although many of them claim they are immune to compiler transformations, it is yet unclear about their resistance to non-default optimization settings. Especially, we have observed that adversaries explored non-default compiler settings to amplify malware differences. This paper takes the first step to systematically studying the effectiveness of compiler optimization on binary code differences. We tailor search-based iterative compilation for the auto-tuning of binary code differences. We develop BinTuner to search near-optimal optimization sequences that can maximize the amount of binary code differences. We run BinTuner with GCC 10.2 and LLVM 11.0 on SPEC benchmarks (CPU2006 & CPU2017), Coreutils, and OpenSSL. Our experiments show that at the cost of 279 to 1,881 compilation iterations, BinTuner can find custom optimization sequences that are substantially better than the general -Ox settings. BinTuner's outputs seriously undermine prominent binary diffing tools' comparisons. In addition, the detection rate of the IoT malware variants tuned by BinTuner falls by more than 50%. Our findings paint a cautionary tale for security analysts that attackers have a new way to mutate malware code cost-effectively, and the research community needs to step back to reassess optimization-resistance evaluations.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
释放编译器优化对二进制代码差异的隐藏力量:一个实证研究
在没有源代码的情况下寻找二进制代码差异(即二进制差异)在软件安全性中具有引人注目的应用。由于二进制码的高度可变性,现有的解决方案已经被驱动到从语法不同的代码中测量语义相似性。由于编译器优化是导致二进制代码语法差异的最常见原因,因此测试针对不同编译器优化设置引起的更改的弹性已成为大多数二进制差异方法的标准评估步骤。例如,过去12年的47篇顶级论文比较了默认优化级别(例如GCC和LLVM中的-Ox)编译的不同程序版本。尽管它们中的许多声称它们不受编译器转换的影响,但它们对非默认优化设置的抵抗力尚不清楚。特别是,我们观察到攻击者利用非默认编译器设置来放大恶意软件的差异。本文首先系统地研究了编译器优化对二进制码差异的有效性。我们定制了基于搜索的迭代编译,用于二进制代码差异的自动调整。我们开发BinTuner来搜索接近最优的优化序列,可以最大化二进制代码差异的数量。我们使用GCC 10.2和LLVM 11.0在SPEC基准(CPU2006和CPU2017), coretils和OpenSSL上运行BinTuner。我们的实验表明,在279到1881次编译迭代的代价下,BinTuner可以找到比一般-Ox设置要好得多的自定义优化序列。BinTuner的输出严重破坏了突出的二进制差分工具的比较。此外,BinTuner调整的物联网恶意软件变体的检测率下降了50%以上。我们的研究结果为安全分析师描绘了一个警示故事,即攻击者有了一种经济有效地改变恶意软件代码的新方法,研究团体需要退后一步,重新评估优化抵抗评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Learning to find naming issues with big code and small supervision Cyclic program synthesis Fluid: a framework for approximate concurrency via controlled dependency relaxation Bliss: auto-tuning complex applications using a pool of diverse lightweight learning models Phased synthesis of divide and conquer programs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1