Fanxi Yang;Yuhan He;Jinqiao Yang;Anqin Xiao;Lufei Fan;Ning Ma;Li-Rong Zheng;Zhuo Zou
{"title":"CorTile: A Scalable Neuromorphic Processing Core for Cortical Simulation With Hybrid-Mode Router and TCAM","authors":"Fanxi Yang;Yuhan He;Jinqiao Yang;Anqin Xiao;Lufei Fan;Ning Ma;Li-Rong Zheng;Zhuo Zou","doi":"10.1109/TCSI.2024.3431036","DOIUrl":null,"url":null,"abstract":"In neuromorphic processors, simulating large-scale Spiking Neural Networks (SNNs) for cortical models necessitates a significant increase in communication traffic and memory capacity, due to the lack of exploiting the sparsity of connections. Therefore, this paper proposes CorTile, a scalable neuromorphic processing core designed for cortical simulation. We propose a hybrid-mode router that supports Remote Unicast and Local Broadcast (RULB) routing method, leveraging the high local connectivity and low distal connectivity observed in cortical models. This approach achieves reductions of 36.7% in average router load, 40.7% in peak load, 51.2% in average link traffic, 41.7% in peak traffic, respectively, compared to conventional routing methods. Additionally, the proposed Ternary Content Addressable Memory (TCAM)-based Sparse Connection Memory (TSCM) architecture leads to 87.1% reduction in area and a 62.7% reduction in power consumption. These approaches effectively decrease communication traffic and mitigate the quadratic increase in memory requirements, achieving linear growth instead, thus achieving scalability. The proposed CorTile is simulated using UMC 40-nm CMOS process, occupying an area of 5.15 mm2, supporting a maximum of 8k neurons and 64M synapses. Evaluated using a typical macaque cortex model, it consumes 8.25 mW, with the router operating at 200 MHz and the other modules at 100 MHz. This design achieves an average router load of 12.33 Mpackets/s and peak link traffic of 21.16 MB/s. Thanks to the scalability of the proposed processing core that can be tiled into many-core processors, it paves the way for chiplets and multiple chip integration towards a brain-scale neuromorphic computing system.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"71 12","pages":"5432-5444"},"PeriodicalIF":5.2000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems I: Regular Papers","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10621024/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
In neuromorphic processors, simulating large-scale Spiking Neural Networks (SNNs) for cortical models necessitates a significant increase in communication traffic and memory capacity, due to the lack of exploiting the sparsity of connections. Therefore, this paper proposes CorTile, a scalable neuromorphic processing core designed for cortical simulation. We propose a hybrid-mode router that supports Remote Unicast and Local Broadcast (RULB) routing method, leveraging the high local connectivity and low distal connectivity observed in cortical models. This approach achieves reductions of 36.7% in average router load, 40.7% in peak load, 51.2% in average link traffic, 41.7% in peak traffic, respectively, compared to conventional routing methods. Additionally, the proposed Ternary Content Addressable Memory (TCAM)-based Sparse Connection Memory (TSCM) architecture leads to 87.1% reduction in area and a 62.7% reduction in power consumption. These approaches effectively decrease communication traffic and mitigate the quadratic increase in memory requirements, achieving linear growth instead, thus achieving scalability. The proposed CorTile is simulated using UMC 40-nm CMOS process, occupying an area of 5.15 mm2, supporting a maximum of 8k neurons and 64M synapses. Evaluated using a typical macaque cortex model, it consumes 8.25 mW, with the router operating at 200 MHz and the other modules at 100 MHz. This design achieves an average router load of 12.33 Mpackets/s and peak link traffic of 21.16 MB/s. Thanks to the scalability of the proposed processing core that can be tiled into many-core processors, it paves the way for chiplets and multiple chip integration towards a brain-scale neuromorphic computing system.
期刊介绍:
TCAS I publishes regular papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes: - Circuits: Analog, Digital and Mixed Signal Circuits and Systems - Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic - Circuits and Systems, Power Electronics and Systems - Software for Analog-and-Logic Circuits and Systems - Control aspects of Circuits and Systems.