Yu-Pang Wang;Wei-Chen Wang;Yuan-Hao Chang;Chieh-Lin Tsai;Tei-Wei Kuo;Chun-Feng Wu;Chien-Chung Ho;Han-Wen Hu
{"title":"TCAM-GNN: A TCAM-Based Data Processing Strategy for GNN Over Sparse Graphs","authors":"Yu-Pang Wang;Wei-Chen Wang;Yuan-Hao Chang;Chieh-Lin Tsai;Tei-Wei Kuo;Chun-Feng Wu;Chien-Chung Ho;Han-Wen Hu","doi":"10.1109/TETC.2023.3328008","DOIUrl":null,"url":null,"abstract":"The graph neural network (GNN) has recently become an emerging research topic for processing non-euclidean data structures since the data used in various popular application domains are usually modeled as a graph, such as social networks, recommendation systems, and computer vision. Previous GNN accelerators commonly utilize the hybrid architecture to resolve the issue of “hybrid computing pattern” in GNN training. Nevertheless, the hybrid architecture suffers from poor utilization of hardware resources mainly due to the dynamic workloads between different phases in GNN. To address these issues, existing GNN accelerators adopt a unified structure with numerous processing elements and high bandwidth memory. However, the large amount of data movement between the processor and memory could heavily downgrade the performance of such accelerators in real-world graphs. As a result, the processing-in-memory architecture, such as the ReRAM-based crossbar, becomes a promising solution to reduce the memory overhead of GNN training. In this work, we present the TCAM-GNN, a novel TCAM-based data processing strategy, to enable high-throughput and energy-efficient GNN training over ReRAM-based crossbar architecture. Several hardware co-designed data structures and placement methods are proposed to fully exploit the parallelism in GNN during training. In addition, we propose a dynamic fixed-point formatting approach to resolve the precision issue. An adaptive data reusing policy is also proposed to enhance the data locality of graph features by the bootstrapping batch sampling approach. Overall, TCAM-GNN could enhance computing performance by 4.25× and energy efficiency by 9.11× on average compared to the neural network accelerators.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 3","pages":"891-904"},"PeriodicalIF":5.1000,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10305502/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The graph neural network (GNN) has recently become an emerging research topic for processing non-euclidean data structures since the data used in various popular application domains are usually modeled as a graph, such as social networks, recommendation systems, and computer vision. Previous GNN accelerators commonly utilize the hybrid architecture to resolve the issue of “hybrid computing pattern” in GNN training. Nevertheless, the hybrid architecture suffers from poor utilization of hardware resources mainly due to the dynamic workloads between different phases in GNN. To address these issues, existing GNN accelerators adopt a unified structure with numerous processing elements and high bandwidth memory. However, the large amount of data movement between the processor and memory could heavily downgrade the performance of such accelerators in real-world graphs. As a result, the processing-in-memory architecture, such as the ReRAM-based crossbar, becomes a promising solution to reduce the memory overhead of GNN training. In this work, we present the TCAM-GNN, a novel TCAM-based data processing strategy, to enable high-throughput and energy-efficient GNN training over ReRAM-based crossbar architecture. Several hardware co-designed data structures and placement methods are proposed to fully exploit the parallelism in GNN during training. In addition, we propose a dynamic fixed-point formatting approach to resolve the precision issue. An adaptive data reusing policy is also proposed to enhance the data locality of graph features by the bootstrapping batch sampling approach. Overall, TCAM-GNN could enhance computing performance by 4.25× and energy efficiency by 9.11× on average compared to the neural network accelerators.
期刊介绍:
IEEE Transactions on Emerging Topics in Computing publishes papers on emerging aspects of computer science, computing technology, and computing applications not currently covered by other IEEE Computer Society Transactions. Some examples of emerging topics in computing include: IT for Green, Synthetic and organic computing structures and systems, Advanced analytics, Social/occupational computing, Location-based/client computer systems, Morphic computer design, Electronic game systems, & Health-care IT.