Rapid: Region-Based Pointer Disambiguation

IF 2.8 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Proceedings of the ACM on Programming Languages Pub Date : 2023-10-16 DOI:10.1145/3622859

Khushboo Chitre, Piyus Kedia, Rahul Purandare

{"title":"Rapid: Region-Based Pointer Disambiguation","authors":"Khushboo Chitre, Piyus Kedia, Rahul Purandare","doi":"10.1145/3622859","DOIUrl":null,"url":null,"abstract":"Interprocedural alias analyses often sacrifice precision for scalability. Thus, modern compilers such as GCC and LLVM implement more scalable but less precise intraprocedural alias analyses. This compromise makes the compilers miss out on potential optimization opportunities, affecting the performance of the application. Modern compilers implement loop-versioning with dynamic checks for pointer disambiguation to enable the missed optimizations. Polyhedral access range analysis and symbolic range analysis enable 𝑂 (1) range checks for non-overlapping of memory accesses inside loops. However, these approaches work only for the loops in which the loop bounds are loop invariants. To address this limitation, researchers proposed a technique that requires 𝑂 (𝑙𝑜𝑔 𝑛) memory accesses for pointer disambiguation. Others improved the performance of dynamic checks to single memory access by constraining the object size and alignment. However, the former approach incurs noticeable overhead due to its dynamic checks, whereas the latter has a noticeable allocator overhead. Thus, scalability remains a challenge. In this work, we present a tool, Rapid, that further reduces the overheads of the allocator and dynamic checks proposed in the existing approaches. The key idea is to identify objects that need disambiguation checks using a profiler and allocate them in different regions, which are disjoint memory areas. The disambiguation checks simply compare the regions corresponding to the objects. The regions are aligned such that the top 32 bits in the addresses of any two objects allocated in different regions are always different. As a consequence, the dynamic checks do not require any memory access to ensure that the objects belong to different regions, making them efficient. Rapid achieved a maximum performance benefit of around 52.94% for Polybench and 1.88% for CPU SPEC 2017 benchmarks. The maximum CPU overhead of our allocator is 0.57% with a geometric mean of -0.2% for CPU SPEC 2017 benchmarks. Due to the low overhead of the allocator and dynamic checks, Rapid could improve the performance of 12 out of 16 CPU SPEC 2017 benchmarks. In contrast, a state-of-the-art approach used in the comparison could improve only five CPU SPEC 2017 benchmarks.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"27 1","pages":"0"},"PeriodicalIF":2.8000,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3622859","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Interprocedural alias analyses often sacrifice precision for scalability. Thus, modern compilers such as GCC and LLVM implement more scalable but less precise intraprocedural alias analyses. This compromise makes the compilers miss out on potential optimization opportunities, affecting the performance of the application. Modern compilers implement loop-versioning with dynamic checks for pointer disambiguation to enable the missed optimizations. Polyhedral access range analysis and symbolic range analysis enable 𝑂 (1) range checks for non-overlapping of memory accesses inside loops. However, these approaches work only for the loops in which the loop bounds are loop invariants. To address this limitation, researchers proposed a technique that requires 𝑂 (𝑙𝑜𝑔 𝑛) memory accesses for pointer disambiguation. Others improved the performance of dynamic checks to single memory access by constraining the object size and alignment. However, the former approach incurs noticeable overhead due to its dynamic checks, whereas the latter has a noticeable allocator overhead. Thus, scalability remains a challenge. In this work, we present a tool, Rapid, that further reduces the overheads of the allocator and dynamic checks proposed in the existing approaches. The key idea is to identify objects that need disambiguation checks using a profiler and allocate them in different regions, which are disjoint memory areas. The disambiguation checks simply compare the regions corresponding to the objects. The regions are aligned such that the top 32 bits in the addresses of any two objects allocated in different regions are always different. As a consequence, the dynamic checks do not require any memory access to ensure that the objects belong to different regions, making them efficient. Rapid achieved a maximum performance benefit of around 52.94% for Polybench and 1.88% for CPU SPEC 2017 benchmarks. The maximum CPU overhead of our allocator is 0.57% with a geometric mean of -0.2% for CPU SPEC 2017 benchmarks. Due to the low overhead of the allocator and dynamic checks, Rapid could improve the performance of 12 out of 16 CPU SPEC 2017 benchmarks. In contrast, a state-of-the-art approach used in the comparison could improve only five CPU SPEC 2017 benchmarks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

快速:基于区域的指针消歧

过程间别名分析常常为了可伸缩性而牺牲精度。因此，像GCC和LLVM这样的现代编译器实现了更大的可伸缩性，但更不精确的过程内别名分析。这种折衷会使编译器错过潜在的优化机会，从而影响应用程序的性能。现代编译器通过动态检查指针消歧来实现循环版本控制，以启用错过的优化。多面体访问范围分析和符号范围分析使𝑂(1)范围检查循环内内存访问的不重叠。然而，这些方法只适用于循环边界为循环不变量的循环。为了解决这一限制，研究人员提出了一种需要𝑂(𝑙𝑜𝑔𝑛)内存访问来消除指针歧义的技术。其他人通过限制对象大小和对齐来提高单内存访问的动态检查的性能。但是，前一种方法由于其动态检查而产生明显的开销，而后一种方法具有明显的分配器开销。因此，可伸缩性仍然是一个挑战。在这项工作中，我们提出了一个工具，Rapid，它进一步降低了分配器的开销和现有方法中提出的动态检查。关键思想是使用分析器识别需要消歧检查的对象，并将它们分配到不同的区域，这些区域是不相交的内存区域。消歧检查只是比较对象对应的区域。这些区域是对齐的，因此在不同区域分配的任意两个对象的地址的前32位总是不同的。因此，动态检查不需要任何内存访问来确保对象属于不同的区域，从而提高了检查的效率。Rapid在Polybench上实现了52.94%的最大性能优势，在CPU SPEC 2017基准测试中实现了1.88%的最大性能优势。我们的分配器的最大CPU开销为0.57%，CPU SPEC 2017基准测试的几何平均值为-0.2%。由于分配器和动态检查的低开销，Rapid可以提高16个CPU SPEC 2017基准中的12个的性能。相比之下，在比较中使用的最先进的方法只能提高五个CPU SPEC 2017基准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊