A Hybrid Optical-Electrical Analog Deep Learning Accelerator Using Incoherent Optical Signals

IF 2.1 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE ACM Journal on Emerging Technologies in Computing Systems Pub Date : 2023-02-17 DOI:10.1145/3584183

Mingdai Yang, Qiuwen Lou, R. Rajaei, M. Jokar, Junyi Qiu, Yuming Liu, Aditi Udupa, F. Chong, J. Dallesasse, Milton Feng, L. Goddard, X. S. Hu, Yanjing Li

{"title":"A Hybrid Optical-Electrical Analog Deep Learning Accelerator Using Incoherent Optical Signals","authors":"Mingdai Yang, Qiuwen Lou, R. Rajaei, M. Jokar, Junyi Qiu, Yuming Liu, Aditi Udupa, F. Chong, J. Dallesasse, Milton Feng, L. Goddard, X. S. Hu, Yanjing Li","doi":"10.1145/3584183","DOIUrl":null,"url":null,"abstract":"Optical deep learning (DL) accelerators have attracted significant interests due to their latency and power advantages. In this article, we focus on incoherent optical designs. A significant challenge is that there is no known solution to perform single-wavelength accumulation (a key operation required for DL workloads) using incoherent optical signals efficiently. Therefore, we devise a hybrid approach, where accumulation is done in the electrical domain, and multiplication is performed in the optical domain. The key technology enabler of our design is the transistor laser, which performs electrical-to-optical and optical-to-electrical conversions efficiently. Through detailed design and evaluation of our design, along with a comprehensive benchmarking study against state-of-the-art RRAM-based designs, we derive the following key results: (1) For a four-layer multilayer perceptron network, our design achieves 115× and 17.11× improvements in latency and energy, respectively, compared to the RRAM-based design. We can take full advantage of the speed and energy benefits of the optical technology because the inference task can be entirely mapped onto our design. (2) For a complex workload (Resnet50), weight reprogramming is needed, and intermediate results need to be stored/re-fetched to/from memories. In this case, for the same area, our design still outperforms the RRAM-based design by 15.92× in inference latency, and 8.99× in energy.","PeriodicalId":50924,"journal":{"name":"ACM Journal on Emerging Technologies in Computing Systems","volume":" ","pages":"1 - 24"},"PeriodicalIF":2.1000,"publicationDate":"2023-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal on Emerging Technologies in Computing Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3584183","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Optical deep learning (DL) accelerators have attracted significant interests due to their latency and power advantages. In this article, we focus on incoherent optical designs. A significant challenge is that there is no known solution to perform single-wavelength accumulation (a key operation required for DL workloads) using incoherent optical signals efficiently. Therefore, we devise a hybrid approach, where accumulation is done in the electrical domain, and multiplication is performed in the optical domain. The key technology enabler of our design is the transistor laser, which performs electrical-to-optical and optical-to-electrical conversions efficiently. Through detailed design and evaluation of our design, along with a comprehensive benchmarking study against state-of-the-art RRAM-based designs, we derive the following key results: (1) For a four-layer multilayer perceptron network, our design achieves 115× and 17.11× improvements in latency and energy, respectively, compared to the RRAM-based design. We can take full advantage of the speed and energy benefits of the optical technology because the inference task can be entirely mapped onto our design. (2) For a complex workload (Resnet50), weight reprogramming is needed, and intermediate results need to be stored/re-fetched to/from memories. In this case, for the same area, our design still outperforms the RRAM-based design by 15.92× in inference latency, and 8.99× in energy.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

光学深度学习(DL)加速器由于其延迟和功率优势而引起了人们的极大兴趣。本文主要讨论非相干光学设计。一个重要的挑战是，没有已知的解决方案可以有效地使用非相干光信号进行单波长积累(DL工作负载所需的关键操作)。因此，我们设计了一种混合方法，在电域中进行积累，在光域中进行乘法。我们设计的关键技术是晶体管激光器，它可以有效地进行光电和光光电转换。通过对我们的设计进行详细的设计和评估，以及对最先进的基于rram的设计进行全面的基准测试研究，我们得出了以下关键结果:(1)对于四层多层感知器网络，与基于rram的设计相比，我们的设计在延迟和能量方面分别提高了115倍和17.11倍。我们可以充分利用光学技术的速度和能源优势，因为推理任务可以完全映射到我们的设计中。(2)对于复杂的工作负载(Resnet50)，需要权重重编程，并且需要将中间结果存储/重新提取到内存中。在这种情况下，对于相同的区域，我们的设计仍然比基于ram的设计在推理延迟上高出15.92倍，在能量上高出8.99倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Journal on Emerging Technologies in Computing Systems 工程技术-工程：电子与电气

CiteScore

4.80

自引率

4.50%

发文量

审稿时长

3 months

期刊介绍： The Journal of Emerging Technologies in Computing Systems invites submissions of original technical papers describing research and development in emerging technologies in computing systems. Major economic and technical challenges are expected to impede the continued scaling of semiconductor devices. This has resulted in the search for alternate mechanical, biological/biochemical, nanoscale electronic, asynchronous and quantum computing and sensor technologies. As the underlying nanotechnologies continue to evolve in the labs of chemists, physicists, and biologists, it has become imperative for computer scientists and engineers to translate the potential of the basic building blocks (analogous to the transistor) emerging from these labs into information systems. Their design will face multiple challenges ranging from the inherent (un)reliability due to the self-assembly nature of the fabrication processes for nanotechnologies, from the complexity due to the sheer volume of nanodevices that will have to be integrated for complex functionality, and from the need to integrate these new nanotechnologies with silicon devices in the same system. The journal provides comprehensive coverage of innovative work in the specification, design analysis, simulation, verification, testing, and evaluation of computing systems constructed out of emerging technologies and advanced semiconductors