A Cocktail Approach to Practical Call Graph Construction

IF 2.2 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Proceedings of the ACM on Programming Languages Pub Date : 2023-10-16 DOI:10.1145/3622833
Yuandao Cai, Charles Zhang
{"title":"A Cocktail Approach to Practical Call Graph Construction","authors":"Yuandao Cai, Charles Zhang","doi":"10.1145/3622833","DOIUrl":null,"url":null,"abstract":"After decades of research, constructing call graphs for modern C-based software remains either imprecise or inefficient when scaling up to the ever-growing complexity. The main culprit is the difficulty of resolving function pointers, as precise pointer analyses are cubic in nature and become exponential when considering calling contexts. This paper takes a practical stance by first conducting a comprehensive empirical study of function pointer manipulations in the wild. By investigating 5355 indirect calls in five popular open-source systems, we conclude that, instead of the past uniform treatments for function pointers, a cocktail approach can be more effective in “squeezing” the number of difficult pointers to a minimum using a potpourri of cheap methods. In particular, we decompose the costs of constructing highly precise call graphs of big code by tailoring several increasingly precise algorithms and synergizing them into a concerted workflow. As a result, many indirect calls can be precisely resolved in an efficient and principled fashion, thereby reducing the final, expensive refinements. This is, in spirit, similar to the well-known cocktail medical therapy. The results are encouraging — our implemented prototype called Coral can achieve similar precision versus the previous field-, flow-, and context-sensitive Andersen-style call graph construction, yet scale up to millions of lines of code for the first time, to the best of our knowledge. Moreover, by evaluating the produced call graphs through the lens of downstream clients (i.e., use-after-free detection, thin slicing, and directed grey-box fuzzing), the results show that Coral can dramatically improve their effectiveness for better vulnerability hunting, understanding, and reproduction. More excitingly, we found twelve confirmed bugs (six impacted by indirect calls) in popular systems (e.g., MariaDB), spreading across multiple historical versions.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"36 4 1","pages":"0"},"PeriodicalIF":2.2000,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3622833","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

After decades of research, constructing call graphs for modern C-based software remains either imprecise or inefficient when scaling up to the ever-growing complexity. The main culprit is the difficulty of resolving function pointers, as precise pointer analyses are cubic in nature and become exponential when considering calling contexts. This paper takes a practical stance by first conducting a comprehensive empirical study of function pointer manipulations in the wild. By investigating 5355 indirect calls in five popular open-source systems, we conclude that, instead of the past uniform treatments for function pointers, a cocktail approach can be more effective in “squeezing” the number of difficult pointers to a minimum using a potpourri of cheap methods. In particular, we decompose the costs of constructing highly precise call graphs of big code by tailoring several increasingly precise algorithms and synergizing them into a concerted workflow. As a result, many indirect calls can be precisely resolved in an efficient and principled fashion, thereby reducing the final, expensive refinements. This is, in spirit, similar to the well-known cocktail medical therapy. The results are encouraging — our implemented prototype called Coral can achieve similar precision versus the previous field-, flow-, and context-sensitive Andersen-style call graph construction, yet scale up to millions of lines of code for the first time, to the best of our knowledge. Moreover, by evaluating the produced call graphs through the lens of downstream clients (i.e., use-after-free detection, thin slicing, and directed grey-box fuzzing), the results show that Coral can dramatically improve their effectiveness for better vulnerability hunting, understanding, and reproduction. More excitingly, we found twelve confirmed bugs (six impacted by indirect calls) in popular systems (e.g., MariaDB), spreading across multiple historical versions.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
实用调用图构建的鸡尾酒方法
经过几十年的研究,为现代基于c语言的软件构建调用图在扩展到不断增长的复杂性时仍然不精确或效率低下。罪魁祸首是解析函数指针的困难,因为精确的指针分析本质上是立方的,而在考虑调用上下文时就会变成指数的。本文从实际出发,首先对函数指针操作进行了全面的实证研究。通过调查五个流行的开源系统中的5355个间接调用,我们得出结论,与过去对函数指针的统一处理不同,混合方法可以更有效地将困难指针的数量“压缩”到最小,使用各种廉价的方法。特别是,我们通过剪裁几个越来越精确的算法并将它们协同到一个协调的工作流中,来分解构建大代码的高精度调用图的成本。因此,许多间接调用可以以有效和有原则的方式精确地解决,从而减少了最终的、昂贵的改进。这在精神上类似于众所周知的鸡尾酒疗法。结果是令人鼓舞的——我们实现的名为Coral的原型与之前的字段、流程和上下文敏感的andersen风格的调用图构造相比,可以达到类似的精度,但据我们所知,这是第一次扩展到数百万行代码。此外,通过下游客户端(即使用后免费检测,薄切片和定向灰盒模糊)的视角评估生成的调用图,结果表明,Coral可以显着提高其有效性,以便更好地寻找,理解和复制漏洞。更令人兴奋的是,我们在流行的系统(例如MariaDB)中发现了12个已确认的漏洞(其中6个受到间接调用的影响),分布在多个历史版本中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages Engineering-Safety, Risk, Reliability and Quality
CiteScore
5.20
自引率
22.20%
发文量
192
期刊最新文献
ReLU Hull Approximation An Axiomatic Basis for Computer Programming on the Relaxed Arm-A Architecture: The AxSL Logic The Essence of Generalized Algebraic Data Types Explicit Effects and Effect Constraints in ReML Indexed Types for a Statically Safe WebAssembly
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1