Yukuan Yang;Qihang Fan;Tianyi Yan;Jing Pei;Guoqi Li
{"title":"Network Group Partition and Core Placement Optimization for Neuromorphic Multi-Core and Multi-Chip Systems","authors":"Yukuan Yang;Qihang Fan;Tianyi Yan;Jing Pei;Guoqi Li","doi":"10.1109/TETCI.2024.3379165","DOIUrl":null,"url":null,"abstract":"Neuromorphic chips with multi-core architecture are considered to be of great potential for the next generation of artificial intelligence (AI) chips because of the avoidance of the memory wall effect. Deploying deep neural networks (DNNs) to these chips requires two stages, namely, network partition and core placement. For the network partition, existing schemes are mostly manual or only focus on single-layer, small-scale network partitions. For the core placement, to the best of our knowledge, there is still no work that has completely solved the communication deadlock problem at the clock-level which commonly exists in the applications of neuromorphic multi-core and multi-chip (NMCMC) systems. To address these issues that affect the operating and deployment efficiency of NMCMC systems, we formulate the network group partition problem as an optimization problem for the first time and propose a search-based network group partition scheme to solve the problem. A clock-level multi-chip simulator is established to completely avoid the deadlock problem during the core placement optimization process. What's more, a region constrained simulated annealing (RCSA) algorithm is proposed to improve the efficiency of the core placement optimization. Finally, an automated toolchain for the efficient deployment of DNNs in the NMCMC systems is developed by integrating the proposed network group partition and core placement schemes together. Experiments show the proposed group partition scheme can achieve 22.25%, 17.77%, 14.80% less in core number, 9.44%, 7.96%, 5.16% improvements in memory utilization, and more balanced communication and computation loads compared with existing manual schemes in ResNet-18, ResNet-34, and ResNet-50, respectively. In addition, the proposed core placement optimization based on the RCSA algorithm shows higher efficiency with much fewer optimization steps and can realize 9.52%, 11.91%, and 27.52% higher in throughput compared with sequential core placement without deadlock in the ResNet-18, ResNet-34, and ResNet-50 networks. This work paves the way for applying NMCMC systems to real-world scenarios to reach more powerful machine intelligence.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"8 6","pages":"3966-3981"},"PeriodicalIF":5.3000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10487993/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Neuromorphic chips with multi-core architecture are considered to be of great potential for the next generation of artificial intelligence (AI) chips because of the avoidance of the memory wall effect. Deploying deep neural networks (DNNs) to these chips requires two stages, namely, network partition and core placement. For the network partition, existing schemes are mostly manual or only focus on single-layer, small-scale network partitions. For the core placement, to the best of our knowledge, there is still no work that has completely solved the communication deadlock problem at the clock-level which commonly exists in the applications of neuromorphic multi-core and multi-chip (NMCMC) systems. To address these issues that affect the operating and deployment efficiency of NMCMC systems, we formulate the network group partition problem as an optimization problem for the first time and propose a search-based network group partition scheme to solve the problem. A clock-level multi-chip simulator is established to completely avoid the deadlock problem during the core placement optimization process. What's more, a region constrained simulated annealing (RCSA) algorithm is proposed to improve the efficiency of the core placement optimization. Finally, an automated toolchain for the efficient deployment of DNNs in the NMCMC systems is developed by integrating the proposed network group partition and core placement schemes together. Experiments show the proposed group partition scheme can achieve 22.25%, 17.77%, 14.80% less in core number, 9.44%, 7.96%, 5.16% improvements in memory utilization, and more balanced communication and computation loads compared with existing manual schemes in ResNet-18, ResNet-34, and ResNet-50, respectively. In addition, the proposed core placement optimization based on the RCSA algorithm shows higher efficiency with much fewer optimization steps and can realize 9.52%, 11.91%, and 27.52% higher in throughput compared with sequential core placement without deadlock in the ResNet-18, ResNet-34, and ResNet-50 networks. This work paves the way for applying NMCMC systems to real-world scenarios to reach more powerful machine intelligence.
期刊介绍:
The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys.
TETCI is an electronics only publication. TETCI publishes six issues per year.
Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.