Seongwook Park, Junyoung Park, Kyeongryeol Bong, Dongjoo Shin, Jinmook Lee, Sungpill Choi, H. Yoo
{"title":"An Energy-Efficient and Scalable Deep Learning/Inference Processor With Tetra-Parallel MIMD Architecture for Big Data Applications","authors":"Seongwook Park, Junyoung Park, Kyeongryeol Bong, Dongjoo Shin, Jinmook Lee, Sungpill Choi, H. Yoo","doi":"10.1109/TBCAS.2015.2504563","DOIUrl":null,"url":null,"abstract":"Deep Learning algorithm is widely used for various pattern recognition applications such as text recognition, object recognition and action recognition because of its best-in-class recognition accuracy compared to hand-crafted algorithm and shallow learning based algorithms. Long learning time caused by its complex structure, however, limits its usage only in high-cost servers or many-core GPU platforms so far. On the other hand, the demand on customized pattern recognition within personal devices will grow gradually as more deep learning applications will be developed. This paper presents a SoC implementation to enable deep learning applications to run with low cost platforms such as mobile or portable devices. Different from conventional works which have adopted massively-parallel architecture, this work adopts task-flexible architecture and exploits multiple parallelism to cover complex functions of convolutional deep belief network which is one of popular deep learning/inference algorithms. In this paper, we implement the most energy-efficient deep learning and inference processor for wearable system. The implemented 2.5 mm ×4.0 mm deep learning/inference processor is fabricated using 65 nm 8-metal CMOS technology for a battery-powered platform with real-time deep inference and deep learning operation. It consumes 185 mW average power, and 213.1 mW peak power at 200 MHz operating frequency and 1.2 V supply voltage. It achieves 411.3 GOPS peak performance and 1.93 TOPS/W energy efficiency, which is 2.07× higher than the state-of-the-art.","PeriodicalId":13151,"journal":{"name":"IEEE Transactions on Biomedical Circuits and Systems","volume":"9 1","pages":"838-848"},"PeriodicalIF":3.8000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TBCAS.2015.2504563","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Biomedical Circuits and Systems","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/TBCAS.2015.2504563","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 37
Abstract
Deep Learning algorithm is widely used for various pattern recognition applications such as text recognition, object recognition and action recognition because of its best-in-class recognition accuracy compared to hand-crafted algorithm and shallow learning based algorithms. Long learning time caused by its complex structure, however, limits its usage only in high-cost servers or many-core GPU platforms so far. On the other hand, the demand on customized pattern recognition within personal devices will grow gradually as more deep learning applications will be developed. This paper presents a SoC implementation to enable deep learning applications to run with low cost platforms such as mobile or portable devices. Different from conventional works which have adopted massively-parallel architecture, this work adopts task-flexible architecture and exploits multiple parallelism to cover complex functions of convolutional deep belief network which is one of popular deep learning/inference algorithms. In this paper, we implement the most energy-efficient deep learning and inference processor for wearable system. The implemented 2.5 mm ×4.0 mm deep learning/inference processor is fabricated using 65 nm 8-metal CMOS technology for a battery-powered platform with real-time deep inference and deep learning operation. It consumes 185 mW average power, and 213.1 mW peak power at 200 MHz operating frequency and 1.2 V supply voltage. It achieves 411.3 GOPS peak performance and 1.93 TOPS/W energy efficiency, which is 2.07× higher than the state-of-the-art.
期刊介绍:
The IEEE Transactions on Biomedical Circuits and Systems addresses areas at the crossroads of Circuits and Systems and Life Sciences. The main emphasis is on microelectronic issues in a wide range of applications found in life sciences, physical sciences and engineering. The primary goal of the journal is to bridge the unique scientific and technical activities of the Circuits and Systems Society to a wide variety of related areas such as: • Bioelectronics • Implantable and wearable electronics like cochlear and retinal prosthesis, motor control, etc. • Biotechnology sensor circuits, integrated systems, and networks • Micropower imaging technology • BioMEMS • Lab-on-chip Bio-nanotechnology • Organic Semiconductors • Biomedical Engineering • Genomics and Proteomics • Neuromorphic Engineering • Smart sensors • Low power micro- and nanoelectronics • Mixed-mode system-on-chip • Wireless technology • Gene circuits and molecular circuits • System biology • Brain science and engineering: such as neuro-informatics, neural prosthesis, cognitive engineering, brain computer interface • Healthcare: information technology for biomedical, epidemiology, and other related life science applications. General, theoretical, and application-oriented papers in the abovementioned technical areas with a Circuits and Systems perspective are encouraged to publish in TBioCAS. Of special interest are biomedical-oriented papers with a Circuits and Systems angle.