{"title":"LTNN: An energy-efficient machine learning accelerator on 3D CMOS-RRAM for layer-wise tensorized neural network","authors":"Hantao Huang, Leibin Ni, Hao Yu","doi":"10.1109/SOCC.2017.8226058","DOIUrl":null,"url":null,"abstract":"An energy efficient machine learning requires an effective construction of neural network during training. This paper introduces a tensorized formulation of neural network during training such that weight matrix can be significantly compressed. The tensorized neural network can be further naturally mapped to a 3D CMOS-RRAM based accelerator with significant bandwidth boosting from vertical I/O connections. As such, high throughput and low power can be achieved simultaneously. Simulation results using the benchmark MNIST show that the proposed accelerator has 1.294x speed-up, 2.393x energy-efficiency and 7.59 x area saving compared to 3D CMOS-ASIC implementation. Moreover, our proposed accelerator can achieve 370.64 GOPS throughput and 1055.95 GOPS/W energy efficiency, which is equivalent to 7.661 TOPS/W for uncompressed neural network. In addition, 142x model compression can be achieved by tensorization with acceptable accuracy loss.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 30th IEEE International System-on-Chip Conference (SOCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SOCC.2017.8226058","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
An energy efficient machine learning requires an effective construction of neural network during training. This paper introduces a tensorized formulation of neural network during training such that weight matrix can be significantly compressed. The tensorized neural network can be further naturally mapped to a 3D CMOS-RRAM based accelerator with significant bandwidth boosting from vertical I/O connections. As such, high throughput and low power can be achieved simultaneously. Simulation results using the benchmark MNIST show that the proposed accelerator has 1.294x speed-up, 2.393x energy-efficiency and 7.59 x area saving compared to 3D CMOS-ASIC implementation. Moreover, our proposed accelerator can achieve 370.64 GOPS throughput and 1055.95 GOPS/W energy efficiency, which is equivalent to 7.661 TOPS/W for uncompressed neural network. In addition, 142x model compression can be achieved by tensorization with acceptable accuracy loss.