- Book学术

Q1 Computer Science ACM Sigplan Notices Pub Date : 2018-11-30 DOI:10.1145/3296957.3173176

Hyoukjun Kwon, A. Samajdar, T. Krishna

{"title":"MAERI","authors":"Hyoukjun Kwon, A. Samajdar, T. Krishna","doi":"10.1145/3296957.3173176","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNN) have demonstrated highly promising results across computer vision and speech recognition, and are becoming foundational for ubiquitous AI. The computational complexity of these algorithms and a need for high energy-efficiency has led to a surge in research on hardware accelerators. % for this paradigm. To reduce the latency and energy costs of accessing DRAM, most DNN accelerators are spatial in nature, with hundreds of processing elements (PE) operating in parallel and communicating with each other directly. DNNs are evolving at a rapid rate, and it is common to have convolution, recurrent, pooling, and fully-connected layers with varying input and filter sizes in the most recent topologies.They may be dense or sparse. They can also be partitioned in myriad ways (within and across layers) to exploit data reuse (weights and intermediate outputs). All of the above can lead to different dataflow patterns within the accelerator substrate. Unfortunately, most DNN accelerators support only fixed dataflow patterns internally as they perform a careful co-design of the PEs and the network-on-chip (NoC). In fact, the majority of them are only optimized for traffic within a convolutional layer. This makes it challenging to map arbitrary dataflows on the fabric efficiently, and can lead to underutilization of the available compute resources. DNN accelerators need to be programmable to enable mass deployment. For them to be programmable, they need to be configurable internally to support the various dataflow patterns that could be mapped over them. To address this need, we present MAERI, which is a DNN accelerator built with a set of modular and configurable building blocks that can easily support myriad DNN partitions and mappings by appropriately configuring tiny switches. MAERI provides 8-459% better utilization across multiple dataflow mappings over baselines with rigid NoC fabrics.","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"83 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2018-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"49","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Sigplan Notices","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3296957.3173176","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 49

摘要

深度神经网络(DNN)在计算机视觉和语音识别方面表现出了非常有前途的成果，并正在成为无处不在的人工智能的基础。这些算法的计算复杂性和对高能效的需求导致了硬件加速器研究的激增。%用于此范例。为了减少访问DRAM的延迟和能源成本，大多数DNN加速器本质上是空间的，具有数百个处理元素(PE)并行运行并直接相互通信。dnn正在快速发展，在最新的拓扑结构中，具有不同输入和过滤器大小的卷积、循环、池化和完全连接层是很常见的。它们可能密集，也可能稀疏。它们还可以以无数种方式(在层内和跨层)进行分区，以利用数据重用(权重和中间输出)。上述所有因素都可能导致加速器衬底内的不同数据流模式。不幸的是，大多数DNN加速器在内部只支持固定的数据流模式，因为它们执行pe和片上网络(NoC)的仔细协同设计。事实上，它们中的大多数只针对卷积层内的流量进行了优化。这使得在结构上有效地映射任意数据流变得具有挑战性，并且可能导致可用计算资源的利用不足。深度神经网络加速器需要可编程以实现大规模部署。为了使它们可编程，它们需要在内部进行配置，以支持可以映射到它们上的各种数据流模式。为了满足这一需求，我们提出了MAERI，这是一个DNN加速器，由一组模块化和可配置的构建块构建，可以通过适当配置微小开关轻松支持无数DNN分区和映射。MAERI在刚性NoC结构的基线上跨多个数据流映射提供了8-459%的更好利用率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MAERI

Deep neural networks (DNN) have demonstrated highly promising results across computer vision and speech recognition, and are becoming foundational for ubiquitous AI. The computational complexity of these algorithms and a need for high energy-efficiency has led to a surge in research on hardware accelerators. % for this paradigm. To reduce the latency and energy costs of accessing DRAM, most DNN accelerators are spatial in nature, with hundreds of processing elements (PE) operating in parallel and communicating with each other directly. DNNs are evolving at a rapid rate, and it is common to have convolution, recurrent, pooling, and fully-connected layers with varying input and filter sizes in the most recent topologies.They may be dense or sparse. They can also be partitioned in myriad ways (within and across layers) to exploit data reuse (weights and intermediate outputs). All of the above can lead to different dataflow patterns within the accelerator substrate. Unfortunately, most DNN accelerators support only fixed dataflow patterns internally as they perform a careful co-design of the PEs and the network-on-chip (NoC). In fact, the majority of them are only optimized for traffic within a convolutional layer. This makes it challenging to map arbitrary dataflows on the fabric efficiently, and can lead to underutilization of the available compute resources. DNN accelerators need to be programmable to enable mass deployment. For them to be programmable, they need to be configurable internally to support the various dataflow patterns that could be mapped over them. To address this need, we present MAERI, which is a DNN accelerator built with a set of modular and configurable building blocks that can easily support myriad DNN partitions and mappings by appropriately configuring tiny switches. MAERI provides 8-459% better utilization across multiple dataflow mappings over baselines with rigid NoC fabrics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Sigplan Notices 工程技术-计算机：软件工程

CiteScore

4.90

自引率

0.00%

发文量

审稿时长

2-4 weeks

期刊介绍： The ACM Special Interest Group on Programming Languages explores programming language concepts and tools, focusing on design, implementation, practice, and theory. Its members are programming language developers, educators, implementers, researchers, theoreticians, and users. SIGPLAN sponsors several major annual conferences, including the Symposium on Principles of Programming Languages (POPL), the Symposium on Principles and Practice of Parallel Programming (PPoPP), the Conference on Programming Language Design and Implementation (PLDI), the International Conference on Functional Programming (ICFP), the International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), as well as more than a dozen other events of either smaller size or in-cooperation with other SIGs. The monthly "ACM SIGPLAN Notices" publishes proceedings of selected sponsored events and an annual report on SIGPLAN activities. Members receive discounts on conference registrations and free access to ACM SIGPLAN publications in the ACM Digital Library. SIGPLAN recognizes significant research and service contributions of individuals with a variety of awards, supports current members through the Professional Activities Committee, and encourages future programming language enthusiasts with frequent Programming Languages Mentoring Workshops (PLMW).