{"title":"Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization","authors":"Yangjie Zhou, Honglin Zhu, Qian Qiu, Weihao Cui, Zihan Liu, Cong Guo, Siyuan Feng, Jintao Meng, Haidong Lan, Jingwen Leng, Wenxi Zhu, Minwen Deng","doi":"arxiv-2409.01075","DOIUrl":null,"url":null,"abstract":"Dynamic-shape deep neural networks (DNNs) are rapidly evolving, attracting\nattention for their ability to handle variable input sizes in real-time\napplications. However, existing compilation optimization methods for such\nnetworks often rely heavily on predefined samples to guide the compilation\nprocess, which restricts their adaptability and efficiency. These sample-driven\nmethods struggle to efficiently manage the diverse and unpredictable shapes\nencountered in real-world scenarios, often resulting in suboptimal performance. To tackle these issues, we introduce Vortex, a hardware-driven and\nsample-free compiler tailored for dynamic-shape tensor programs. Vortex\ncapitalizes on detailed hardware information and hierarchizes the strategy\nspace to facilitate high-performance code generation without relying on runtime\nshape samples. It features a unique bidirectional compilation workflow,\ncombining top-down abstraction for aligning tensor program execution with\nhardware hierarchies and bottom-up kernel construction to narrow the search\nspace, enabling Vortex to achieve remarkable efficiency. Comprehensive\nevaluations confirm that Vortex reduces compilation time by $176\\times$\ncompared to the existing dynamic-shape compiler. Additionally, it substantially\noutperforms existing vendor-provided libraries and dynamic-shape compilers on\nboth CPU and GPU platforms, delivering speedups of $2.53\\times$ and\n$3.01\\times$, respectively.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.01075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Dynamic-shape deep neural networks (DNNs) are rapidly evolving, attracting
attention for their ability to handle variable input sizes in real-time
applications. However, existing compilation optimization methods for such
networks often rely heavily on predefined samples to guide the compilation
process, which restricts their adaptability and efficiency. These sample-driven
methods struggle to efficiently manage the diverse and unpredictable shapes
encountered in real-world scenarios, often resulting in suboptimal performance. To tackle these issues, we introduce Vortex, a hardware-driven and
sample-free compiler tailored for dynamic-shape tensor programs. Vortex
capitalizes on detailed hardware information and hierarchizes the strategy
space to facilitate high-performance code generation without relying on runtime
shape samples. It features a unique bidirectional compilation workflow,
combining top-down abstraction for aligning tensor program execution with
hardware hierarchies and bottom-up kernel construction to narrow the search
space, enabling Vortex to achieve remarkable efficiency. Comprehensive
evaluations confirm that Vortex reduces compilation time by $176\times$
compared to the existing dynamic-shape compiler. Additionally, it substantially
outperforms existing vendor-provided libraries and dynamic-shape compilers on
both CPU and GPU platforms, delivering speedups of $2.53\times$ and
$3.01\times$, respectively.