Unleashing the hidden power of compiler optimization on binary code difference: an empirical study

Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation Pub Date : 2021-03-23 DOI:10.1145/3453483.3454035

Xiaolei Ren, Michael Ho, Jiang Ming, Yu Lei, Li Li

{"title":"Unleashing the hidden power of compiler optimization on binary code difference: an empirical study","authors":"Xiaolei Ren, Michael Ho, Jiang Ming, Yu Lei, Li Li","doi":"10.1145/3453483.3454035","DOIUrl":null,"url":null,"abstract":"Hunting binary code difference without source code (i.e., binary diffing) has compelling applications in software security. Due to the high variability of binary code, existing solutions have been driven towards measuring semantic similarities from syntactically different code. Since compiler optimization is the most common source contributing to binary code differences in syntax, testing the resilience against the changes caused by different compiler optimization settings has become a standard evaluation step for most binary diffing approaches. For example, 47 top-venue papers in the last 12 years compared different program versions compiled by default optimization levels (e.g., -Ox in GCC and LLVM). Although many of them claim they are immune to compiler transformations, it is yet unclear about their resistance to non-default optimization settings. Especially, we have observed that adversaries explored non-default compiler settings to amplify malware differences. This paper takes the first step to systematically studying the effectiveness of compiler optimization on binary code differences. We tailor search-based iterative compilation for the auto-tuning of binary code differences. We develop BinTuner to search near-optimal optimization sequences that can maximize the amount of binary code differences. We run BinTuner with GCC 10.2 and LLVM 11.0 on SPEC benchmarks (CPU2006 & CPU2017), Coreutils, and OpenSSL. Our experiments show that at the cost of 279 to 1,881 compilation iterations, BinTuner can find custom optimization sequences that are substantially better than the general -Ox settings. BinTuner's outputs seriously undermine prominent binary diffing tools' comparisons. In addition, the detection rate of the IoT malware variants tuned by BinTuner falls by more than 50%. Our findings paint a cautionary tale for security analysts that attackers have a new way to mutate malware code cost-effectively, and the research community needs to step back to reassess optimization-resistance evaluations.","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"34 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3453483.3454035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 31

Abstract

Hunting binary code difference without source code (i.e., binary diffing) has compelling applications in software security. Due to the high variability of binary code, existing solutions have been driven towards measuring semantic similarities from syntactically different code. Since compiler optimization is the most common source contributing to binary code differences in syntax, testing the resilience against the changes caused by different compiler optimization settings has become a standard evaluation step for most binary diffing approaches. For example, 47 top-venue papers in the last 12 years compared different program versions compiled by default optimization levels (e.g., -Ox in GCC and LLVM). Although many of them claim they are immune to compiler transformations, it is yet unclear about their resistance to non-default optimization settings. Especially, we have observed that adversaries explored non-default compiler settings to amplify malware differences. This paper takes the first step to systematically studying the effectiveness of compiler optimization on binary code differences. We tailor search-based iterative compilation for the auto-tuning of binary code differences. We develop BinTuner to search near-optimal optimization sequences that can maximize the amount of binary code differences. We run BinTuner with GCC 10.2 and LLVM 11.0 on SPEC benchmarks (CPU2006 & CPU2017), Coreutils, and OpenSSL. Our experiments show that at the cost of 279 to 1,881 compilation iterations, BinTuner can find custom optimization sequences that are substantially better than the general -Ox settings. BinTuner's outputs seriously undermine prominent binary diffing tools' comparisons. In addition, the detection rate of the IoT malware variants tuned by BinTuner falls by more than 50%. Our findings paint a cautionary tale for security analysts that attackers have a new way to mutate malware code cost-effectively, and the research community needs to step back to reassess optimization-resistance evaluations.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

释放编译器优化对二进制代码差异的隐藏力量:一个实证研究

在没有源代码的情况下寻找二进制代码差异(即二进制差异)在软件安全性中具有引人注目的应用。由于二进制码的高度可变性，现有的解决方案已经被驱动到从语法不同的代码中测量语义相似性。由于编译器优化是导致二进制代码语法差异的最常见原因，因此测试针对不同编译器优化设置引起的更改的弹性已成为大多数二进制差异方法的标准评估步骤。例如，过去12年的47篇顶级论文比较了默认优化级别(例如GCC和LLVM中的-Ox)编译的不同程序版本。尽管它们中的许多声称它们不受编译器转换的影响，但它们对非默认优化设置的抵抗力尚不清楚。特别是，我们观察到攻击者利用非默认编译器设置来放大恶意软件的差异。本文首先系统地研究了编译器优化对二进制码差异的有效性。我们定制了基于搜索的迭代编译，用于二进制代码差异的自动调整。我们开发BinTuner来搜索接近最优的优化序列，可以最大化二进制代码差异的数量。我们使用GCC 10.2和LLVM 11.0在SPEC基准(CPU2006和CPU2017)， coretils和OpenSSL上运行BinTuner。我们的实验表明，在279到1881次编译迭代的代价下，BinTuner可以找到比一般-Ox设置要好得多的自定义优化序列。BinTuner的输出严重破坏了突出的二进制差分工具的比较。此外，BinTuner调整的物联网恶意软件变体的检测率下降了50%以上。我们的研究结果为安全分析师描绘了一个警示故事，即攻击者有了一种经济有效地改变恶意软件代码的新方法，研究团体需要退后一步，重新评估优化抵抗评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation

自引率

0.00%

发文量

期刊最新文献

Learning to find naming issues with big code and small supervision Cyclic program synthesis Fluid: a framework for approximate concurrency via controlled dependency relaxation Bliss: auto-tuning complex applications using a pool of diverse lightweight learning models Phased synthesis of divide and conquer programs