DISC: A Dynamic Shape Compiler for Machine Learning Workloads

Proceedings of the 1st Workshop on Machine Learning and Systems Pub Date : 2021-03-09 DOI:10.1145/3437984.3458838

Kai Zhu, Wenyi Zhao, Zhen Zheng, Tianyou Guo, Pengzhan Zhao, Junjie Bai, Jun Yang, Xiaoyong Liu, Lansong Diao, Wei Lin

引用次数: 14

Abstract

Many recent machine learning models show dynamic shape characteristics. However, existing AI compiler optimization systems suffer a lot from problems brought by dynamic shape models, including compilation overhead, memory usage, optimization pipeline and deployment complexity. This paper provides a compiler system to natively support optimization for dynamic shape workloads, named DISC. DISC enriches a set of IR to form a fully dynamic shape representation. It generates the runtime flow at compile time to support processing dynamic shape based logic, which avoids the interpretation overhead at runtime and enlarges the opportunity of host-device co-optimization. It addresses the kernel fusion problem of dynamic shapes with shape propagation and constraints collecting methods. This is the first work to demonstrate how to build an end-to-end dynamic shape compiler based on MLIR infrastructure. Experiments show that DISC achieves up to 3.3× speedup than TensorFlow/PyTorch, and 1.8× than Nimble.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DISC:用于机器学习工作负载的动态形状编译器

许多最近的机器学习模型显示动态形状特征。然而，现有的AI编译器优化系统受到动态形状模型带来的编译开销、内存使用、优化管道和部署复杂性等问题的困扰。本文提供了一个编译器系统来支持动态形状工作负载的优化。DISC丰富了一组IR，形成了一个完全动态的形状表示。它在编译时生成运行时流，以支持处理基于动态形状的逻辑，从而避免了运行时的解释开销，并扩大了主机-设备协同优化的机会。利用形状传播和约束收集方法解决了动态形状的核融合问题。这是演示如何构建基于MLIR基础架构的端到端动态形状编译器的第一个工作。实验表明，DISC的加速速度比TensorFlow/PyTorch快3.3倍，比Nimble快1.8倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 1st Workshop on Machine Learning and Systems

自引率

0.00%

发文量

期刊最新文献

Towards Mitigating Device Heterogeneity in Federated Learning via Adaptive Model Quantization Queen Jane Approximately: Enabling Efficient Neural Network Inference with Context-Adaptivity Are we there yet? Estimating Training Time for Recommendation Systems Predicting CPU usage for proactive autoscaling Towards Optimal Configuration of Microservices