WER:通过 GPU Warp EqualizeR 最大化不规则图应用的并行性

En-Ming Huang, Bo Wun Cheng, Meng-Hsien Lin, Chun-Yi Lee, Tsung-Tai Yeh
{"title":"WER:通过 GPU Warp EqualizeR 最大化不规则图应用的并行性","authors":"En-Ming Huang, Bo Wun Cheng, Meng-Hsien Lin, Chun-Yi Lee, Tsung-Tai Yeh","doi":"10.1109/ASP-DAC58780.2024.10473955","DOIUrl":null,"url":null,"abstract":"Irregular graphs are becoming increasingly prevalent across a broad spectrum of data analysis applications. Despite their versatility, the inherent complexity and irregularity of these graphs often result in the underutilization of Single Instruction, Multiple Data (SIMD) resources when processed on Graphics Processing Units (GPUs). This underutilization originates from two primary issues: the occurrence of inactive threads and intra-warp load imbalances. These issues can produce idle threads, lead to inefficient usage of SIMD resources, consequently hamper throughput, and increase program execution time. To address these challenges, we introduce Warp EqualizeR (WER), a framework designed to optimize the utilization of SIMD resources on a GPU for processing irregular graphs. WER employs both software API and a specifically-tailored hardware microarchitecture. Such a synergistic approach enables workload redistribution in irregular graphs, which allows WER to enhance SIMD lane utilization and further harness the SIMD resources within a GPU. Our experimental results over seven different graph applications indicate that WER yields a geometric mean speedup of $2.52 \\times$ and $1.47 \\times$ over the baseline GPU and existing state-of-the-art methodologies, respectively.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"3 6","pages":"201-206"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"WER: Maximizing Parallelism of Irregular Graph Applications Through GPU Warp EqualizeR\",\"authors\":\"En-Ming Huang, Bo Wun Cheng, Meng-Hsien Lin, Chun-Yi Lee, Tsung-Tai Yeh\",\"doi\":\"10.1109/ASP-DAC58780.2024.10473955\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Irregular graphs are becoming increasingly prevalent across a broad spectrum of data analysis applications. Despite their versatility, the inherent complexity and irregularity of these graphs often result in the underutilization of Single Instruction, Multiple Data (SIMD) resources when processed on Graphics Processing Units (GPUs). This underutilization originates from two primary issues: the occurrence of inactive threads and intra-warp load imbalances. These issues can produce idle threads, lead to inefficient usage of SIMD resources, consequently hamper throughput, and increase program execution time. To address these challenges, we introduce Warp EqualizeR (WER), a framework designed to optimize the utilization of SIMD resources on a GPU for processing irregular graphs. WER employs both software API and a specifically-tailored hardware microarchitecture. Such a synergistic approach enables workload redistribution in irregular graphs, which allows WER to enhance SIMD lane utilization and further harness the SIMD resources within a GPU. Our experimental results over seven different graph applications indicate that WER yields a geometric mean speedup of $2.52 \\\\times$ and $1.47 \\\\times$ over the baseline GPU and existing state-of-the-art methodologies, respectively.\",\"PeriodicalId\":518586,\"journal\":{\"name\":\"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)\",\"volume\":\"3 6\",\"pages\":\"201-206\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASP-DAC58780.2024.10473955\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASP-DAC58780.2024.10473955","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

不规则图形在各种数据分析应用中越来越普遍。尽管这些图形用途广泛,但其固有的复杂性和不规则性往往导致在图形处理器(GPU)上处理时,单指令多数据(SIMD)资源利用率不足。这种利用率不足主要源于两个问题:出现闲置线程和线程内负载不平衡。这些问题会产生闲置线程,导致 SIMD 资源使用效率低下,从而阻碍吞吐量并增加程序执行时间。为了应对这些挑战,我们引入了Warp EqualizeR(WER),这是一个旨在优化GPU上SIMD资源利用率的框架,用于处理不规则图形。WER 采用了软件 API 和专门定制的硬件微架构。这种协同方法能够在不规则图形中重新分配工作负载,从而使 WER 能够提高 SIMD 通道的利用率,并进一步利用 GPU 中的 SIMD 资源。我们对七种不同图形应用的实验结果表明,与基准 GPU 和现有的最先进方法相比,WER 的几何平均速度分别提高了 2.52 美元和 1.47 美元。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
WER: Maximizing Parallelism of Irregular Graph Applications Through GPU Warp EqualizeR
Irregular graphs are becoming increasingly prevalent across a broad spectrum of data analysis applications. Despite their versatility, the inherent complexity and irregularity of these graphs often result in the underutilization of Single Instruction, Multiple Data (SIMD) resources when processed on Graphics Processing Units (GPUs). This underutilization originates from two primary issues: the occurrence of inactive threads and intra-warp load imbalances. These issues can produce idle threads, lead to inefficient usage of SIMD resources, consequently hamper throughput, and increase program execution time. To address these challenges, we introduce Warp EqualizeR (WER), a framework designed to optimize the utilization of SIMD resources on a GPU for processing irregular graphs. WER employs both software API and a specifically-tailored hardware microarchitecture. Such a synergistic approach enables workload redistribution in irregular graphs, which allows WER to enhance SIMD lane utilization and further harness the SIMD resources within a GPU. Our experimental results over seven different graph applications indicate that WER yields a geometric mean speedup of $2.52 \times$ and $1.47 \times$ over the baseline GPU and existing state-of-the-art methodologies, respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
SPIRAL: Signal-Power Integrity Co-Analysis for High-Speed Inter-Chiplet Serial Links Validation A Resource-efficient Task Scheduling System using Reinforcement Learning : Invited Paper Toward End-to-End Analog Design Automation with ML and Data-Driven Approaches (Invited Paper) A Cross-layer Framework for Design Space and Variation Analysis of Non-Volatile Ferroelectric Capacitor-Based Compute-in-Memory Accelerators A High Performance Detailed Router Based on Integer Programming with Adaptive Route Guides
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1