Chixiao Chen, Huwan Peng, Xindi Liu, Hongwei Ding, C. R. Shi
{"title":"Exploring the Programmability for Deep Learning Processors: from Architecture to Tensorization","authors":"Chixiao Chen, Huwan Peng, Xindi Liu, Hongwei Ding, C. R. Shi","doi":"10.1145/3195970.3196049","DOIUrl":null,"url":null,"abstract":"This paper presents an instruction and Fabric Programmable Neuron Array (iFPNA) architecture, its 28nm CMOS chip prototype, and a compiler for the acceleration of a variety of deep learning neural networks (DNNs) including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and fully connected (FC) networks on chip. The iFPNA architecture combines instruction-level programmability as in an Instruction Set Architecture (ISA) with logic-level reconfigurability as in a Field-Programmable Gate Array (FPGA) in a sliced structure for scalability. Four data flow models, namely weight stationary, input stationary, row stationary and tunnel stationary, are described as the abstraction of various DNN data and computational dependence. The iFPNA compiler partitions a large-size DNN to smaller networks, each being mapped to, optimized and code generated for, the underlying iFPNA processor using one or a mixture of the four data-flow models. Experimental results have shown that state-of-art large-size CNNs, RNNs, and FC networks can be mapped to the iFPNA processor achieving the near ASIC performance.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"62 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3195970.3196049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
This paper presents an instruction and Fabric Programmable Neuron Array (iFPNA) architecture, its 28nm CMOS chip prototype, and a compiler for the acceleration of a variety of deep learning neural networks (DNNs) including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and fully connected (FC) networks on chip. The iFPNA architecture combines instruction-level programmability as in an Instruction Set Architecture (ISA) with logic-level reconfigurability as in a Field-Programmable Gate Array (FPGA) in a sliced structure for scalability. Four data flow models, namely weight stationary, input stationary, row stationary and tunnel stationary, are described as the abstraction of various DNN data and computational dependence. The iFPNA compiler partitions a large-size DNN to smaller networks, each being mapped to, optimized and code generated for, the underlying iFPNA processor using one or a mixture of the four data-flow models. Experimental results have shown that state-of-art large-size CNNs, RNNs, and FC networks can be mapped to the iFPNA processor achieving the near ASIC performance.