涡旋:通过硬件感知的策略空间分层实现高效的无采样动态张量程序优化

Yangjie Zhou, Honglin Zhu, Qian Qiu, Weihao Cui, Zihan Liu, Cong Guo, Siyuan Feng, Jintao Meng, Haidong Lan, Jingwen Leng, Wenxi Zhu, Minwen Deng
{"title":"涡旋:通过硬件感知的策略空间分层实现高效的无采样动态张量程序优化","authors":"Yangjie Zhou, Honglin Zhu, Qian Qiu, Weihao Cui, Zihan Liu, Cong Guo, Siyuan Feng, Jintao Meng, Haidong Lan, Jingwen Leng, Wenxi Zhu, Minwen Deng","doi":"arxiv-2409.01075","DOIUrl":null,"url":null,"abstract":"Dynamic-shape deep neural networks (DNNs) are rapidly evolving, attracting\nattention for their ability to handle variable input sizes in real-time\napplications. However, existing compilation optimization methods for such\nnetworks often rely heavily on predefined samples to guide the compilation\nprocess, which restricts their adaptability and efficiency. These sample-driven\nmethods struggle to efficiently manage the diverse and unpredictable shapes\nencountered in real-world scenarios, often resulting in suboptimal performance. To tackle these issues, we introduce Vortex, a hardware-driven and\nsample-free compiler tailored for dynamic-shape tensor programs. Vortex\ncapitalizes on detailed hardware information and hierarchizes the strategy\nspace to facilitate high-performance code generation without relying on runtime\nshape samples. It features a unique bidirectional compilation workflow,\ncombining top-down abstraction for aligning tensor program execution with\nhardware hierarchies and bottom-up kernel construction to narrow the search\nspace, enabling Vortex to achieve remarkable efficiency. Comprehensive\nevaluations confirm that Vortex reduces compilation time by $176\\times$\ncompared to the existing dynamic-shape compiler. Additionally, it substantially\noutperforms existing vendor-provided libraries and dynamic-shape compilers on\nboth CPU and GPU platforms, delivering speedups of $2.53\\times$ and\n$3.01\\times$, respectively.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization\",\"authors\":\"Yangjie Zhou, Honglin Zhu, Qian Qiu, Weihao Cui, Zihan Liu, Cong Guo, Siyuan Feng, Jintao Meng, Haidong Lan, Jingwen Leng, Wenxi Zhu, Minwen Deng\",\"doi\":\"arxiv-2409.01075\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dynamic-shape deep neural networks (DNNs) are rapidly evolving, attracting\\nattention for their ability to handle variable input sizes in real-time\\napplications. However, existing compilation optimization methods for such\\nnetworks often rely heavily on predefined samples to guide the compilation\\nprocess, which restricts their adaptability and efficiency. These sample-driven\\nmethods struggle to efficiently manage the diverse and unpredictable shapes\\nencountered in real-world scenarios, often resulting in suboptimal performance. To tackle these issues, we introduce Vortex, a hardware-driven and\\nsample-free compiler tailored for dynamic-shape tensor programs. Vortex\\ncapitalizes on detailed hardware information and hierarchizes the strategy\\nspace to facilitate high-performance code generation without relying on runtime\\nshape samples. It features a unique bidirectional compilation workflow,\\ncombining top-down abstraction for aligning tensor program execution with\\nhardware hierarchies and bottom-up kernel construction to narrow the search\\nspace, enabling Vortex to achieve remarkable efficiency. Comprehensive\\nevaluations confirm that Vortex reduces compilation time by $176\\\\times$\\ncompared to the existing dynamic-shape compiler. Additionally, it substantially\\noutperforms existing vendor-provided libraries and dynamic-shape compilers on\\nboth CPU and GPU platforms, delivering speedups of $2.53\\\\times$ and\\n$3.01\\\\times$, respectively.\",\"PeriodicalId\":501422,\"journal\":{\"name\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"volume\":\"12 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.01075\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.01075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

动态形状深度神经网络(Dynamic-shape deep neural networks,DNN)发展迅速,因其在实时应用中处理可变输入大小的能力而备受关注。然而,针对此类网络的现有编译优化方法往往严重依赖于预定义样本来指导编译过程,这限制了它们的适应性和效率。这些样本驱动的方法难以有效管理真实世界场景中遇到的多样化和不可预测的形状,往往导致性能不理想。为了解决这些问题,我们引入了 Vortex,这是一种硬件驱动的无样本编译器,专为动态形状张量程序量身定制。Vortex 利用详细的硬件信息,对策略空间进行分层,以促进高性能代码生成,而无需依赖运行时形状样本。它采用独特的双向编译工作流,将自上而下的抽象与硬件分层相结合,以调整张量程序的执行,并自下而上地构建内核以缩小搜索空间,从而使 Vortex 实现了卓越的效率。综合评估证实,与现有的动态形状编译器相比,Vortex 将编译时间缩短了 176 倍。此外,在CPU和GPU平台上,它的性能大大优于现有供应商提供的库和动态形状编译器,速度分别提高了2.53倍和3.01倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization
Dynamic-shape deep neural networks (DNNs) are rapidly evolving, attracting attention for their ability to handle variable input sizes in real-time applications. However, existing compilation optimization methods for such networks often rely heavily on predefined samples to guide the compilation process, which restricts their adaptability and efficiency. These sample-driven methods struggle to efficiently manage the diverse and unpredictable shapes encountered in real-world scenarios, often resulting in suboptimal performance. To tackle these issues, we introduce Vortex, a hardware-driven and sample-free compiler tailored for dynamic-shape tensor programs. Vortex capitalizes on detailed hardware information and hierarchizes the strategy space to facilitate high-performance code generation without relying on runtime shape samples. It features a unique bidirectional compilation workflow, combining top-down abstraction for aligning tensor program execution with hardware hierarchies and bottom-up kernel construction to narrow the search space, enabling Vortex to achieve remarkable efficiency. Comprehensive evaluations confirm that Vortex reduces compilation time by $176\times$ compared to the existing dynamic-shape compiler. Additionally, it substantially outperforms existing vendor-provided libraries and dynamic-shape compilers on both CPU and GPU platforms, delivering speedups of $2.53\times$ and $3.01\times$, respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Massively parallel CMA-ES with increasing population Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix Computations Energy Efficiency Support for Software Defined Networks: a Serverless Computing Approach CountChain: A Decentralized Oracle Network for Counting Systems Delay Analysis of EIP-4844
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1