Pub Date : 2024-09-24DOI: 10.1109/TPDS.2024.3466913
Zhiqi Lin;Youshan Miao;Guanbin Xu;Cheng Li;Olli Saarikivi;Saeed Maleki;Fan Yang
Increasingly complex and diverse deep neural network (DNN) models necessitate distributing the execution across multiple devices for training and inference tasks, and also require carefully planned schedules for performance. However, existing practices often rely on predefined schedules that may not fully exploit the benefits of emerging diverse model-aware operator placement strategies. Handcrafting high-efficiency schedules can be challenging due to the large and varying schedule space. This paper presents Tessel, an automated system that searches for efficient schedules for distributed DNN training and inference for diverse operator placement strategies. To reduce search costs, Tessel leverages the insight that the most efficient schedules often exhibit repetitive pattern ( repetend