Communication is a critical bottleneck for GPUs, manifesting as energy and performance overheads due to network-on-chip (NoC) delay and congestion. While many algorithms exhibit locality among thread blocks and accessed data, modern GPUs lack the interface to exploit this locality: GPU thread blocks are mapped to cores obliviously. In this work, we explore a simple extension to the conventional GPU programming interface to enable control over the spatial placement of data and threads, yielding new opportunities for aggressive locality optimizations within a GPU kernel. Across 7 workloads that can take advantage of these optimizations, for a 32 (or 128) SM GPU: we achieve a 1.28× (1.54×) speedup and 35% (44%) reduction in NoC traffic, compared to baseline non-spatial GPUs.
{"title":"SPGPU: Spatially Programmed GPU","authors":"Shizhuo Zhu;Illia Shkirko;Jacob Levinson;Zhengrong Wang;Tony Nowatzki","doi":"10.1109/LCA.2024.3499339","DOIUrl":"https://doi.org/10.1109/LCA.2024.3499339","url":null,"abstract":"Communication is a critical bottleneck for GPUs, manifesting as energy and performance overheads due to network-on-chip (NoC) delay and congestion. While many algorithms exhibit locality among thread blocks and accessed data, modern GPUs lack the interface to exploit this locality: GPU thread blocks are mapped to cores obliviously. In this work, we explore a simple extension to the conventional GPU programming interface to enable control over the spatial placement of data and threads, yielding new opportunities for aggressive locality optimizations within a GPU kernel. Across 7 workloads that can take advantage of these optimizations, for a 32 (or 128) SM GPU: we achieve a 1.28× (1.54×) speedup and 35% (44%) reduction in NoC traffic, compared to baseline non-spatial GPUs.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"23 2","pages":"223-226"},"PeriodicalIF":1.4,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142736403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-04DOI: 10.1109/LCA.2024.3483840
Navnil Choudhury;Chao Lu;Kanad Basu
Noisy Intermediate-Scale Quantum (NISQ) computers are impeded by constraints such as limited qubit count and susceptibility to noise, hindering the progression towards fault-tolerant quantum computing for intricate and practical applications. To augment the computational capabilities of quantum computers, research is gravitating towards qudits featuring more than two energy levels. This paper presents the inaugural examination of the repercussions of errors in qudit circuits. Subsequently, we introduce an innovative qudit-based assertion framework aimed at automatically detecting and reporting errors and warnings during the quantum circuit design and compilation process. Our proposed framework, when subjected to evaluation on existing quantum computing platforms, can detect both new and existing bugs with up to 100% coverage of the bugs mentioned in this paper.
{"title":"Quantum Assertion Scheme for Assuring Qudit Robustness","authors":"Navnil Choudhury;Chao Lu;Kanad Basu","doi":"10.1109/LCA.2024.3483840","DOIUrl":"https://doi.org/10.1109/LCA.2024.3483840","url":null,"abstract":"Noisy Intermediate-Scale Quantum (NISQ) computers are impeded by constraints such as limited qubit count and susceptibility to noise, hindering the progression towards fault-tolerant quantum computing for intricate and practical applications. To augment the computational capabilities of quantum computers, research is gravitating towards qudits featuring more than two energy levels. This paper presents the inaugural examination of the repercussions of errors in qudit circuits. Subsequently, we introduce an innovative qudit-based assertion framework aimed at automatically detecting and reporting errors and warnings during the quantum circuit design and compilation process. Our proposed framework, when subjected to evaluation on existing quantum computing platforms, can detect both new and existing bugs with up to 100% coverage of the bugs mentioned in this paper.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"23 2","pages":"247-250"},"PeriodicalIF":1.4,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142825859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-22DOI: 10.1109/LCA.2024.3484648
Hyungkyu Ham;Wonhyuk Yang;Yunseon Shin;Okkyun Woo;Guseul Heo;Sangyeop Lee;Jongse Park;Gwangsun Kim
As DNNs (Deep Neural Networks) demand increasingly higher compute and memory requirements, designing efficient and performant NPUs (Neural Processing Units) has become more important. However, existing architectural NPU simulators lack support for high-speed simulation, multi-core modeling, multi-tenant scenarios, detailed DRAM/NoC modeling, and/or different deep learning frameworks. To address these limitations, this work proposes ONNXim