{"title":"M-LAB: scheduling space exploration of multitasks on tiled deep learning accelerators","authors":"Bingya Zhang, Sheng Zhang","doi":"10.1117/12.3032039","DOIUrl":null,"url":null,"abstract":"With the increasing commercialization of deep neural networks (DNN), there is a growing need for running multiple neural networks simultaneously on an accelerator. This creates a new space to explore the allocation of computing resources and the order of computation. However, the majority of current research in multi-DNN scheduling relies predominantly on newly developed accelerators or employs heuristic methods aimed primarily at reducing DRAM traffic, increasing throughput and improving Service Level Agreements (SLA) satisfaction. These approaches often lead to poor portability, incompatibility with other optimization methods, and markedly high energy consumption. In this paper, we introduce a novel scheduling framework, M-LAB, that all scheduling of data is at layer level instead of network level, which means our framework is compatible with the research of inter-layer scheduling, with significant improvement in energy consumption and speed. To facilitate layer-level scheduling, M-LAB eliminates the conventional network boundaries, transforming these dependencies into a layer-to-layer format. Subsequently, M-LAB explores the scheduling space by amalgamating inter-layer and intra-layer scheduling, which allows for a more nuanced and efficient scheduling strategy tailored to the specific needs of multiple neural networks. Compared with current works, M-LAB achieves 2.06x-4.85x speed-up and 2.27-4.12x cost reduction.","PeriodicalId":342847,"journal":{"name":"International Conference on Algorithms, Microchips and Network Applications","volume":" 6","pages":"131711E - 131711E-7"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Algorithms, Microchips and Network Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3032039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the increasing commercialization of deep neural networks (DNN), there is a growing need for running multiple neural networks simultaneously on an accelerator. This creates a new space to explore the allocation of computing resources and the order of computation. However, the majority of current research in multi-DNN scheduling relies predominantly on newly developed accelerators or employs heuristic methods aimed primarily at reducing DRAM traffic, increasing throughput and improving Service Level Agreements (SLA) satisfaction. These approaches often lead to poor portability, incompatibility with other optimization methods, and markedly high energy consumption. In this paper, we introduce a novel scheduling framework, M-LAB, that all scheduling of data is at layer level instead of network level, which means our framework is compatible with the research of inter-layer scheduling, with significant improvement in energy consumption and speed. To facilitate layer-level scheduling, M-LAB eliminates the conventional network boundaries, transforming these dependencies into a layer-to-layer format. Subsequently, M-LAB explores the scheduling space by amalgamating inter-layer and intra-layer scheduling, which allows for a more nuanced and efficient scheduling strategy tailored to the specific needs of multiple neural networks. Compared with current works, M-LAB achieves 2.06x-4.85x speed-up and 2.27-4.12x cost reduction.