{"title":"Efficient O-type mapping and routing of large-scale neural networks to torus-based ONoCs","authors":"Qiuyan Yao;Daqing Meng;Hui Yang;Nan Feng;Jie Zhang","doi":"10.1364/JOCN.525666","DOIUrl":null,"url":null,"abstract":"The rapid development of artificial intelligence has accelerated the arrival of the era of large models. Artificial-neural-network-based large models typically have millions to billions of parameters, and their training and reasoning processes put strict requirements on hardware, especially at the chip level, in terms of interconnection bandwidth, processing speed, latency, etc. The optical network-on-chip (ONoC) is a new interconnection technology that connects IP cores through a network of optical waveguides. Due to its incomparable advantages such as low loss, high throughput, and low delay, this communication mode has gradually become the key technology to improve the efficiency of large models. At present, the ONoC has been used to reduce the interconnection complexity of neural network accelerators, where neural network models are reshaped to map into the process elements of the ONoC and communicate at high speed on chip. In this paper, we first propose a torus-based O-type mapping strategy to realize efficient mapping of neuron groups to the chip. Additionally, an array congestion information-based low-congestion arbitrator is designed and then a multi-path low-congestion routing algorithm named TMLA is presented to alleviate array congestion and disperse the routing pressure of each path. Results demonstrate that the proposed mapping and routing scheme can reduce the average network delay without additional loss when the injection rate is relatively large, which provides a valuable reference for the research of neural network acceleration.","PeriodicalId":50103,"journal":{"name":"Journal of Optical Communications and Networking","volume":"16 9","pages":"918-928"},"PeriodicalIF":4.0000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Optical Communications and Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10646889/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
The rapid development of artificial intelligence has accelerated the arrival of the era of large models. Artificial-neural-network-based large models typically have millions to billions of parameters, and their training and reasoning processes put strict requirements on hardware, especially at the chip level, in terms of interconnection bandwidth, processing speed, latency, etc. The optical network-on-chip (ONoC) is a new interconnection technology that connects IP cores through a network of optical waveguides. Due to its incomparable advantages such as low loss, high throughput, and low delay, this communication mode has gradually become the key technology to improve the efficiency of large models. At present, the ONoC has been used to reduce the interconnection complexity of neural network accelerators, where neural network models are reshaped to map into the process elements of the ONoC and communicate at high speed on chip. In this paper, we first propose a torus-based O-type mapping strategy to realize efficient mapping of neuron groups to the chip. Additionally, an array congestion information-based low-congestion arbitrator is designed and then a multi-path low-congestion routing algorithm named TMLA is presented to alleviate array congestion and disperse the routing pressure of each path. Results demonstrate that the proposed mapping and routing scheme can reduce the average network delay without additional loss when the injection rate is relatively large, which provides a valuable reference for the research of neural network acceleration.
人工智能的快速发展加速了大型模型时代的到来。基于人工神经网络的大型模型通常拥有数百万到数十亿个参数,其训练和推理过程对硬件,尤其是芯片级硬件的互联带宽、处理速度、延迟等提出了严格的要求。片上光网络(ONoC)是一种通过光波导网络连接 IP 核的新型互连技术。由于其具有低损耗、高吞吐量、低延迟等无可比拟的优势,这种通信模式已逐渐成为提高大型模型效率的关键技术。目前,ONoC 已被用于降低神经网络加速器的互连复杂度,将神经网络模型重塑后映射到 ONoC 的工艺元件中,并在芯片上进行高速通信。在本文中,我们首先提出了一种基于环的 O 型映射策略,以实现神经元群到芯片的高效映射。此外,我们还设计了一种基于阵列拥塞信息的低拥塞仲裁器,然后提出了一种名为 TMLA 的多路径低拥塞路由算法,以缓解阵列拥塞并分散各路径的路由压力。结果表明,当注入率相对较大时,所提出的映射和路由方案可以在不增加额外损耗的情况下降低平均网络延迟,为神经网络加速研究提供了有价值的参考。
期刊介绍:
The scope of the Journal includes advances in the state-of-the-art of optical networking science, technology, and engineering. Both theoretical contributions (including new techniques, concepts, analyses, and economic studies) and practical contributions (including optical networking experiments, prototypes, and new applications) are encouraged. Subareas of interest include the architecture and design of optical networks, optical network survivability and security, software-defined optical networking, elastic optical networks, data and control plane advances, network management related innovation, and optical access networks. Enabling technologies and their applications are suitable topics only if the results are shown to directly impact optical networking beyond simple point-to-point networks.