Neural network accelerator design with resistive crossbars: Opportunities and challenges

IF 1.3 4区计算机科学 Q1 Computer Science IBM Journal of Research and Development Pub Date : 2019-10-14 DOI:10.1147/JRD.2019.2947011

S. Jain;A. Ankit;I. Chakraborty;T. Gokmen;M. Rasch;W. Haensch;K. Roy;A. Raghunathan

{"title":"Neural network accelerator design with resistive crossbars: Opportunities and challenges","authors":"S. Jain;A. Ankit;I. Chakraborty;T. Gokmen;M. Rasch;W. Haensch;K. Roy;A. Raghunathan","doi":"10.1147/JRD.2019.2947011","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) achieve best-known accuracies in many machine learning tasks involved in image, voice, and natural language processing and are being used in an ever-increasing range of applications. However, their algorithmic benefits are accompanied by extremely high computation and storage costs, sparking intense efforts in optimizing the design of computing platforms for DNNs. Today, graphics processing units (GPUs) and specialized digital CMOS accelerators represent the state-of-the-art in DNN hardware, with near-term efforts focusing on approximate computing through reduced precision. However, the ever-increasing complexities of DNNs and the data they process have fueled an active interest in alternative hardware fabrics that can deliver the next leap in efficiency. Resistive crossbars designed using emerging nonvolatile memory technologies have emerged as a promising candidate building block for future DNN hardware fabrics since they can natively execute massively parallel vector-matrix multiplications (the dominant compute kernel in DNNs) in the analog domain within the memory arrays. Leveraging in-memory computing and dense storage, resistive-crossbar-based systems cater to both the high computation and storage demands of complex DNNs and promise energy efficiency beyond current DNN accelerators by mitigating data transfer and memory bottlenecks. However, several design challenges need to be addressed to enable their adoption. For example, the overheads of peripheral circuits (analog-to-digital converters and digital-to-analog converters) and other components (scratchpad memories and on-chip interconnect) may significantly diminish the efficiency benefits at the system level. Additionally, the analog crossbar computations are intrinsically subject to noise due to a range of device- and circuit-level nonidealities, potentially leading to lower accuracy at the application level. In this article, we highlight the prospects for designing hardware accelerators for neural networks using resistive crossbars. We also underscore the key open challenges and some possible approaches to address them.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"63 6","pages":"10:1-10:13"},"PeriodicalIF":1.3000,"publicationDate":"2019-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2947011","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IBM Journal of Research and Development","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/8865106/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 13

Abstract

Deep neural networks (DNNs) achieve best-known accuracies in many machine learning tasks involved in image, voice, and natural language processing and are being used in an ever-increasing range of applications. However, their algorithmic benefits are accompanied by extremely high computation and storage costs, sparking intense efforts in optimizing the design of computing platforms for DNNs. Today, graphics processing units (GPUs) and specialized digital CMOS accelerators represent the state-of-the-art in DNN hardware, with near-term efforts focusing on approximate computing through reduced precision. However, the ever-increasing complexities of DNNs and the data they process have fueled an active interest in alternative hardware fabrics that can deliver the next leap in efficiency. Resistive crossbars designed using emerging nonvolatile memory technologies have emerged as a promising candidate building block for future DNN hardware fabrics since they can natively execute massively parallel vector-matrix multiplications (the dominant compute kernel in DNNs) in the analog domain within the memory arrays. Leveraging in-memory computing and dense storage, resistive-crossbar-based systems cater to both the high computation and storage demands of complex DNNs and promise energy efficiency beyond current DNN accelerators by mitigating data transfer and memory bottlenecks. However, several design challenges need to be addressed to enable their adoption. For example, the overheads of peripheral circuits (analog-to-digital converters and digital-to-analog converters) and other components (scratchpad memories and on-chip interconnect) may significantly diminish the efficiency benefits at the system level. Additionally, the analog crossbar computations are intrinsically subject to noise due to a range of device- and circuit-level nonidealities, potentially leading to lower accuracy at the application level. In this article, we highlight the prospects for designing hardware accelerators for neural networks using resistive crossbars. We also underscore the key open challenges and some possible approaches to address them.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

具有电阻交叉杆的神经网络加速器设计:机遇与挑战

深度神经网络(dnn)在涉及图像、语音和自然语言处理的许多机器学习任务中实现了最知名的准确性，并被用于越来越多的应用领域。然而，它们的算法优势伴随着极高的计算和存储成本，引发了对深度神经网络计算平台优化设计的强烈努力。今天，图形处理单元(gpu)和专门的数字CMOS加速器代表了DNN硬件的最新技术，近期的工作重点是通过降低精度进行近似计算。然而，随着深度神经网络及其处理数据的复杂性不断增加，人们对能够实现下一次效率飞跃的替代硬件结构产生了积极的兴趣。使用新兴的非易失性存储器技术设计的电阻交叉棒已经成为未来DNN硬件结构的一个有前途的候选构建块，因为它们可以在存储器阵列的模拟域中本地执行大规模并行向量矩阵乘法(DNN的主要计算内核)。利用内存计算和密集存储，基于电阻交叉棒的系统满足复杂深度神经网络的高计算和存储需求，并通过减轻数据传输和内存瓶颈，承诺超越当前深度神经网络加速器的能源效率。然而，需要解决几个设计挑战才能使它们能够被采用。例如，外围电路(模数转换器和数模转换器)和其他组件(刮板存储器和片上互连)的开销可能会显著降低系统级的效率效益。此外，由于一系列器件和电路级别的非理想性，模拟交叉条计算本质上受到噪声的影响，可能导致应用级别的精度降低。在本文中，我们强调了使用电阻交叉棒设计神经网络硬件加速器的前景。我们还强调了关键的公开挑战以及解决这些挑战的一些可能方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IBM Journal of Research and Development 工程技术-计算机：硬件

自引率

0.00%

发文量

审稿时长

6-12 weeks

期刊介绍： The IBM Journal of Research and Development is a peer-reviewed technical journal, published bimonthly, which features the work of authors in the science, technology and engineering of information systems. Papers are written for the worldwide scientific research and development community and knowledgeable professionals. Submitted papers are welcome from the IBM technical community and from non-IBM authors on topics relevant to the scientific and technical content of the Journal.

期刊最新文献

Use of a smartwatch for home blood pressure measurement Numerical modeling of the behavior of a lithium battery after a collision Disaster Resilient Cities in Nepal: Disaster Management Efforts of Biratnagar Metropolitan City Status of Invasive Alien Plant species in Dhankuta Municipality Perceived Learning Environment: A Case of BBA Program at Dhankuta Multiple Campus