Modeling the Interplay between Loop Tiling and Fusion in Optimizing Compilers Using Affine Relations

IF 2 4区 计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS ACM Transactions on Computer Systems Pub Date : 2023-12-01 DOI:10.1145/3635305
Jie Zhao, Jinchen Xu, Peng Di, Wang Nie, Jiahui Hu, Yanzhi Yi, Sijia Yang, Zhen Geng, Renwei Zhang, Bojie Li, Zhiliang Gan, Xuefeng Jin
{"title":"Modeling the Interplay between Loop Tiling and Fusion in Optimizing Compilers Using Affine Relations","authors":"Jie Zhao, Jinchen Xu, Peng Di, Wang Nie, Jiahui Hu, Yanzhi Yi, Sijia Yang, Zhen Geng, Renwei Zhang, Bojie Li, Zhiliang Gan, Xuefeng Jin","doi":"10.1145/3635305","DOIUrl":null,"url":null,"abstract":"<p>Loop tiling and fusion are two essential transformations in optimizing compilers to enhance the data locality of programs. Existing heuristics either perform loop tiling and fusion in a particular order, missing some of their profitable compositions, or execute ad-hoc implementations for domain-specific applications, calling for a generalized and systematic solution in optimizing compilers. </p><p>In this paper, we present a so-called <i>basteln</i> (an abbreviation for backward slicing of tiled loop nests) <i>strategy</i> in polyhedral compilation to better model the interplay between loop tiling and fusion. The basteln strategy first groups loop nests by preserving their parallelism/tilability and next performs rectangular/parallelogram tiling to the output groups that produce data consumed outside the considered program fragment. The memory footprints required by each tile are then computed, from which the upwards exposed data are extracted to determine the tile shapes of the remaining fusion groups. Such a tiling mechanism can construct complex tile shapes imposed by the dependences between these groups, which are further merged by a post-tiling fusion algorithm for enhancing data locality without losing the parallelism/tilability of the output groups. The basteln strategy also takes into account the amount of redundant computations and the fusion of independent groups, exhibiting a general applicability. </p><p>We integrate the basteln strategy into two optimizing compilers, with one a general-purpose optimizer and the other a domain-specific compiler for deploying deep learning models. The experiments are conducted on CPU, GPU, and a deep learning accelerator to demonstrate the effectiveness of the approach for a wide class of application domains, including deep learning, image processing, sparse matrix computation, and linear algebra. In particular, the basteln strategy achieves a mean speedup of 1.8 × over cuBLAS/cuDNN and 1.1 × over TVM on GPU when used to optimize deep learning models; it also outperforms PPCG and TVM by 11% and 20%, respectively, when generating code for the deep learning accelerator.</p>","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"5 3","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Computer Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3635305","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Loop tiling and fusion are two essential transformations in optimizing compilers to enhance the data locality of programs. Existing heuristics either perform loop tiling and fusion in a particular order, missing some of their profitable compositions, or execute ad-hoc implementations for domain-specific applications, calling for a generalized and systematic solution in optimizing compilers.

In this paper, we present a so-called basteln (an abbreviation for backward slicing of tiled loop nests) strategy in polyhedral compilation to better model the interplay between loop tiling and fusion. The basteln strategy first groups loop nests by preserving their parallelism/tilability and next performs rectangular/parallelogram tiling to the output groups that produce data consumed outside the considered program fragment. The memory footprints required by each tile are then computed, from which the upwards exposed data are extracted to determine the tile shapes of the remaining fusion groups. Such a tiling mechanism can construct complex tile shapes imposed by the dependences between these groups, which are further merged by a post-tiling fusion algorithm for enhancing data locality without losing the parallelism/tilability of the output groups. The basteln strategy also takes into account the amount of redundant computations and the fusion of independent groups, exhibiting a general applicability.

We integrate the basteln strategy into two optimizing compilers, with one a general-purpose optimizer and the other a domain-specific compiler for deploying deep learning models. The experiments are conducted on CPU, GPU, and a deep learning accelerator to demonstrate the effectiveness of the approach for a wide class of application domains, including deep learning, image processing, sparse matrix computation, and linear algebra. In particular, the basteln strategy achieves a mean speedup of 1.8 × over cuBLAS/cuDNN and 1.1 × over TVM on GPU when used to optimize deep learning models; it also outperforms PPCG and TVM by 11% and 20%, respectively, when generating code for the deep learning accelerator.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用仿射关系对优化编译器中循环平铺和融合的相互作用进行建模
循环平铺和融合是优化编译器以增强程序的数据局部性的两个重要转变。现有的启发式方法要么以特定的顺序执行循环平纹和融合,错过了它们的一些有益的组合,要么为特定领域的应用程序执行特定的实现,在优化编译器时需要一个通用的和系统的解决方案。在本文中,我们提出了一种所谓的basteln(对平铺循环巢的向后切片的缩写)策略,以更好地模拟循环平铺和融合之间的相互作用。basteln策略首先通过保留循环巢的并行性/可伸缩性对它们进行分组,然后对产生在所考虑的程序片段之外使用的数据的输出组执行矩形/平行四边形平铺。然后计算每个瓦片所需的内存占用,从中提取向上暴露的数据,以确定剩余融合组的瓦片形状。这种平铺机制可以构建由这些组之间的依赖关系施加的复杂的平铺形状,并通过平铺后融合算法进一步合并,以增强数据局部性,同时又不会失去输出组的并行性/可平铺性。basteln策略还考虑了冗余计算的数量和独立群体的融合,显示出普遍的适用性。我们将basteln策略集成到两个优化编译器中,其中一个是通用优化器,另一个是用于部署深度学习模型的特定领域编译器。特别是,在优化深度学习模型时,basteln策略比cuBLAS/cuDNN平均加速1.8倍,比GPU上的TVM平均加速1.1倍;
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
ACM Transactions on Computer Systems
ACM Transactions on Computer Systems 工程技术-计算机:理论方法
CiteScore
4.00
自引率
0.00%
发文量
7
审稿时长
1 months
期刊介绍: ACM Transactions on Computer Systems (TOCS) presents research and development results on the design, implementation, analysis, evaluation, and use of computer systems and systems software. The term "computer systems" is interpreted broadly and includes operating systems, systems architecture and hardware, distributed systems, optimizing compilers, and the interaction between systems and computer networks. Articles appearing in TOCS will tend either to present new techniques and concepts, or to report on experiences and experiments with actual systems. Insights useful to system designers, builders, and users will be emphasized. TOCS publishes research and technical papers, both short and long. It includes technical correspondence to permit commentary on technical topics and on previously published papers.
期刊最新文献
PMAlloc: A Holistic Approach to Improving Persistent Memory Allocation Trinity: High-Performance and Reliable Mobile Emulation through Graphics Projection Hardware-software Collaborative Tiered-memory Management Framework for Virtualization Diciclo: Flexible User-level Services for Efficient Multitenant Isolation Modeling the Interplay between Loop Tiling and Fusion in Optimizing Compilers Using Affine Relations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1