Exploration of task-based scheduling for convolutional neural networks accelerators under memory constraints

Proceedings of the 16th ACM International Conference on Computing Frontiers Pub Date : 2019-04-30 DOI:10.1145/3310273.3323162

Crefeda Faviola Rodrigues, G. Riley, M. Luján

{"title":"Exploration of task-based scheduling for convolutional neural networks accelerators under memory constraints","authors":"Crefeda Faviola Rodrigues, G. Riley, M. Luján","doi":"10.1145/3310273.3323162","DOIUrl":null,"url":null,"abstract":"Development of application specific accelerators for deep convolutional neural networks (ConvNets) have mainly focussed on accelerating the computationally intensive layers, that is the convolutional layers, to improve performance and energy efficiency. Traditional approaches in this space have relied on handcrafted dataflow implementations to leverage the fine-grained parallelism and data-locality properties within these layers. However, ConvNets layers also have an untapped potential from cross-layer data locality. In our work, we explore a novel approach in the context of deep neural networks accelerators by modelling the computation as a task-dependency directed acyclic graph and proposing a memory-aware heuristic based onHeterogeneous Earliest Finish Time (HEFT) for task-graph scheduling on shared memory systems. Our results show the benefits of task graphs in terms of better memory use (23.4 % less) over conventional layer-by-layer processing in a simulated environment with the first three layers of LeNet-5. Certain task-graphs trade-off makespan (10% increase) for memory use (20 % decrease). Finally, our exploration of graphs with different slicing configurations for the pooling layer while using memory-aware HEFT versus the original HEFT reveals that regular shaped tiles across layers offers better makespan and memory use than tiles with large dimensions along one axis.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th ACM International Conference on Computing Frontiers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3310273.3323162","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Development of application specific accelerators for deep convolutional neural networks (ConvNets) have mainly focussed on accelerating the computationally intensive layers, that is the convolutional layers, to improve performance and energy efficiency. Traditional approaches in this space have relied on handcrafted dataflow implementations to leverage the fine-grained parallelism and data-locality properties within these layers. However, ConvNets layers also have an untapped potential from cross-layer data locality. In our work, we explore a novel approach in the context of deep neural networks accelerators by modelling the computation as a task-dependency directed acyclic graph and proposing a memory-aware heuristic based onHeterogeneous Earliest Finish Time (HEFT) for task-graph scheduling on shared memory systems. Our results show the benefits of task graphs in terms of better memory use (23.4 % less) over conventional layer-by-layer processing in a simulated environment with the first three layers of LeNet-5. Certain task-graphs trade-off makespan (10% increase) for memory use (20 % decrease). Finally, our exploration of graphs with different slicing configurations for the pooling layer while using memory-aware HEFT versus the original HEFT reveals that regular shaped tiles across layers offers better makespan and memory use than tiles with large dimensions along one axis.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

内存约束下卷积神经网络加速器的任务调度研究

深度卷积神经网络(ConvNets)专用加速器的开发主要集中在加速计算密集型层，即卷积层，以提高性能和能源效率。这个领域的传统方法依赖于手工制作的数据流实现来利用这些层中的细粒度并行性和数据局部性属性。然而，卷积神经网络层在跨层数据局部性方面也有未开发的潜力。在我们的工作中，我们探索了一种在深度神经网络加速器背景下的新方法，通过将计算建模为任务依赖的有向无环图，并提出了一种基于异构最早完成时间(HEFT)的内存感知启发式方法，用于共享内存系统上的任务图调度。我们的结果显示，在使用LeNet-5的前三层模拟环境中，与传统的逐层处理相比，任务图的好处在于更好的内存使用(减少23.4%)。某些任务图权衡内存使用的最大扩展时间(增加10%)(减少20%)。最后，我们在使用内存感知HEFT和原始HEFT时对池化层具有不同切片配置的图进行了探索，结果表明，跨层的规则形状瓦片比沿一个轴的大尺寸瓦片提供了更好的makespan和内存使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 16th ACM International Conference on Computing Frontiers

自引率

0.00%

发文量