gpgpu上均衡片上网络的流量感知频率缩放

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS) Pub Date : 2014-12-01 DOI:10.1109/PADSW.2014.7097795

Chiao-Yun Tu, Yuan-Ying Chang, C. King, Chien-Ting Chen, Tai-Yuan Wang

{"title":"gpgpu上均衡片上网络的流量感知频率缩放","authors":"Chiao-Yun Tu, Yuan-Ying Chang, C. King, Chien-Ting Chen, Tai-Yuan Wang","doi":"10.1109/PADSW.2014.7097795","DOIUrl":null,"url":null,"abstract":"General-purpose computing on graphics processing units (GPGPU) can provide orders of magnitude more computing power than general purpose processors (CPU) for highly parallel applications. For such parallel applications, the memory traffic pattern of GPGPUs behaves considerably different from that of CPUs. This gives rise to opportunities for optimizing the on-chip interconnection network (NoC) of GPGPUs. In this work, we first investigate the characteristics of GPGPU memory traffic of typical benchmarks and categorize the memory traffic patterns. Different traffic patterns require different throughput in the request and reply paths of the NoC to match the network load. To meet this requirement, we examine the feasibility of scaling the network frequency dynamically to balance the throughput of the request and reply networks. The decision is guided by monitoring some shader cores to identify the memory traffic pattern. Performance evaluation shows that this dynamic frequency tuning design can achieve up to 27% improvement in terms of execution speedup compared to a baseline setting and 7.4% improvement on average.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"206 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Traffic-aware frequency scaling for balanced on-chip networks on GPGPUs\",\"authors\":\"Chiao-Yun Tu, Yuan-Ying Chang, C. King, Chien-Ting Chen, Tai-Yuan Wang\",\"doi\":\"10.1109/PADSW.2014.7097795\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"General-purpose computing on graphics processing units (GPGPU) can provide orders of magnitude more computing power than general purpose processors (CPU) for highly parallel applications. For such parallel applications, the memory traffic pattern of GPGPUs behaves considerably different from that of CPUs. This gives rise to opportunities for optimizing the on-chip interconnection network (NoC) of GPGPUs. In this work, we first investigate the characteristics of GPGPU memory traffic of typical benchmarks and categorize the memory traffic patterns. Different traffic patterns require different throughput in the request and reply paths of the NoC to match the network load. To meet this requirement, we examine the feasibility of scaling the network frequency dynamically to balance the throughput of the request and reply networks. The decision is guided by monitoring some shader cores to identify the memory traffic pattern. Performance evaluation shows that this dynamic frequency tuning design can achieve up to 27% improvement in terms of execution speedup compared to a baseline setting and 7.4% improvement on average.\",\"PeriodicalId\":421740,\"journal\":{\"name\":\"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)\",\"volume\":\"206 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PADSW.2014.7097795\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PADSW.2014.7097795","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

图形处理单元(GPGPU)上的通用计算可以为高度并行应用程序提供比通用处理器(CPU)多几个数量级的计算能力。对于这种并行应用程序，gpgpu的内存流量模式与cpu的内存流量模式有很大的不同。这为优化gpgpu的片上互连网络(NoC)提供了机会。在这项工作中，我们首先研究了典型基准测试的GPGPU内存流量特征，并对内存流量模式进行了分类。不同的流量模式要求NoC的请求和应答路径的吞吐量不同，以匹配网络负载。为了满足这一需求，我们研究了动态扩展网络频率以平衡请求和应答网络吞吐量的可行性。这个决定是通过监控一些着色器内核来识别内存流量模式来指导的。性能评估表明，与基线设置相比，这种动态频率调优设计在执行加速方面可以提高27%，平均提高7.4%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Traffic-aware frequency scaling for balanced on-chip networks on GPGPUs

General-purpose computing on graphics processing units (GPGPU) can provide orders of magnitude more computing power than general purpose processors (CPU) for highly parallel applications. For such parallel applications, the memory traffic pattern of GPGPUs behaves considerably different from that of CPUs. This gives rise to opportunities for optimizing the on-chip interconnection network (NoC) of GPGPUs. In this work, we first investigate the characteristics of GPGPU memory traffic of typical benchmarks and categorize the memory traffic patterns. Different traffic patterns require different throughput in the request and reply paths of the NoC to match the network load. To meet this requirement, we examine the feasibility of scaling the network frequency dynamically to balance the throughput of the request and reply networks. The decision is guided by monitoring some shader cores to identify the memory traffic pattern. Performance evaluation shows that this dynamic frequency tuning design can achieve up to 27% improvement in terms of execution speedup compared to a baseline setting and 7.4% improvement on average.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

自引率

0.00%

发文量

期刊最新文献

Optimal bandwidth allocation with dynamic multi-path routing for non-critical traffic in AFDX networks Sensor-free corner shape detection by wireless networks Accelerated variance reduction methods on GPU Fault-Tolerant bi-directional communications in web-based applications Performance analysis of HPC applications with irregular tree data structures