Exploiting Model-Level Parallelism in Recurrent Neural Network Accelerators

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI:10.1109/MCSoC.2019.00042

Lu Peng, Wentao Shi, Jian Zhang, Samuel Irving

{"title":"Exploiting Model-Level Parallelism in Recurrent Neural Network Accelerators","authors":"Lu Peng, Wentao Shi, Jian Zhang, Samuel Irving","doi":"10.1109/MCSoC.2019.00042","DOIUrl":null,"url":null,"abstract":"Recurrent Neural Networks (RNNs) have continued to facilitate rapid progress in a variety of academic and industrial fields, though their complexity continues to make efficient deployment difficult; when the RNN model size is not properly matched to hardware resources, performance can suffer from hardware under-utilization. In this work, we propose to explore model-level parallelism for LSTM-RNN accelerators in different levels of the model using a multicore design. The multi-core design proposed in this work operates in three computing modes: multi-programming mode in which independent models are executed; multithreading mode in which parallelism among layers of an LSTM model is explored and properly scheduled; and helper-core mode in which cores collaborate on a single LSTM layer in a lower model level comparing with multithread mode. Our design can achieve up to 1.98x speedup in \"multi-programming\" mode, a 1.91x speedup in \"multithreading\" mode and a 1.88x speedup in \"helper-core\" mode over the single-core design.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MCSoC.2019.00042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Recurrent Neural Networks (RNNs) have continued to facilitate rapid progress in a variety of academic and industrial fields, though their complexity continues to make efficient deployment difficult; when the RNN model size is not properly matched to hardware resources, performance can suffer from hardware under-utilization. In this work, we propose to explore model-level parallelism for LSTM-RNN accelerators in different levels of the model using a multicore design. The multi-core design proposed in this work operates in three computing modes: multi-programming mode in which independent models are executed; multithreading mode in which parallelism among layers of an LSTM model is explored and properly scheduled; and helper-core mode in which cores collaborate on a single LSTM layer in a lower model level comparing with multithread mode. Our design can achieve up to 1.98x speedup in "multi-programming" mode, a 1.91x speedup in "multithreading" mode and a 1.88x speedup in "helper-core" mode over the single-core design.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

循环神经网络加速器中模型级并行性的开发

递归神经网络(rnn)继续促进各种学术和工业领域的快速发展，尽管其复杂性继续使有效部署变得困难;当RNN模型大小与硬件资源不匹配时，性能可能会受到硬件利用率不足的影响。在这项工作中，我们建议使用多核设计在模型的不同级别探索LSTM-RNN加速器的模型级并行性。本文提出的多核设计在三种计算模式下运行:执行独立模型的多编程模式;多线程模式，探索LSTM模型各层之间的并行性并合理调度;在helper-core模式中，与多线程模式相比，内核在较低的模型级别上在单个LSTM层上进行协作。我们的设计在“多编程”模式下可以实现高达1.98倍的加速，在“多线程”模式下可以实现1.91倍的加速，在“辅助核心”模式下可以实现1.88倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

自引率

0.00%

发文量

期刊最新文献

Algorithm to Determine Extended Edit Distance between Program Codes Smart Ontology-Based Event Identification Automatic Generation of Fill-in-the-Blank Programming Problems Prototype of FPGA Dynamic Reconfiguration Based-on Context-Oriented Programming An Efficient Implementation of a TAGE Branch Predictor for Soft Processors on FPGA