使用流架构模拟器为流加速器设计空间探索

M. Shafiq, M. Pericàs, N. Navarro, E. Ayguadé
{"title":"使用流架构模拟器为流加速器设计空间探索","authors":"M. Shafiq, M. Pericàs, N. Navarro, E. Ayguadé","doi":"10.1109/IBCAST.2013.6512151","DOIUrl":null,"url":null,"abstract":"In the recent years streaming accelerators like GPUs have been pop-up as an effective step towards parallel computing. The wish-list for these devices span from having a support for thousands of small cores to a nature very close to the general purpose computing. This makes the design space very vast for the future accelerators containing thousands of parallel streaming cores. This complicates to exercise a right choice of the architectural configuration for the next generation devices. However, accurate design space exploration tools developed for the massively parallel architectures can ease this task. The main objectives of this work are twofold. (i) We present a complete environment of a trace driven simulator named SArcs (Streaming Architectural Simulator) for the streaming accelerators. (ii) We use our simulation tool-chain for the design space explorations of the GPU like streaming architectures. Our design space explorations for different architectural aspects of a GPU like device a e with reference to a base line established for NVIDIA's Fermi architecture (GPU Tesla C2050). The explored aspects include the performation effects by the variations in the configurations of Streaming Multiprocessors Global Memory Bandwidth, Channles between SMs down to Memory Hierarchy and Cache Hierarchy. The explorations are performed using application kernels from Vector Reduction, 2D-Convolution. Matrix-Matrix Multiplication and 3D-Stencil. Results show that the configurations of the computational resources for the current Fermi GPU device can deliver higher performance with further improvement in the global memory bandwidth for the same device.","PeriodicalId":276834,"journal":{"name":"Proceedings of 2013 10th International Bhurban Conference on Applied Sciences & Technology (IBCAST)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Design space explorations for streaming accelerators using Streaming Architectural Simulator\",\"authors\":\"M. Shafiq, M. Pericàs, N. Navarro, E. Ayguadé\",\"doi\":\"10.1109/IBCAST.2013.6512151\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the recent years streaming accelerators like GPUs have been pop-up as an effective step towards parallel computing. The wish-list for these devices span from having a support for thousands of small cores to a nature very close to the general purpose computing. This makes the design space very vast for the future accelerators containing thousands of parallel streaming cores. This complicates to exercise a right choice of the architectural configuration for the next generation devices. However, accurate design space exploration tools developed for the massively parallel architectures can ease this task. The main objectives of this work are twofold. (i) We present a complete environment of a trace driven simulator named SArcs (Streaming Architectural Simulator) for the streaming accelerators. (ii) We use our simulation tool-chain for the design space explorations of the GPU like streaming architectures. Our design space explorations for different architectural aspects of a GPU like device a e with reference to a base line established for NVIDIA's Fermi architecture (GPU Tesla C2050). The explored aspects include the performation effects by the variations in the configurations of Streaming Multiprocessors Global Memory Bandwidth, Channles between SMs down to Memory Hierarchy and Cache Hierarchy. The explorations are performed using application kernels from Vector Reduction, 2D-Convolution. Matrix-Matrix Multiplication and 3D-Stencil. Results show that the configurations of the computational resources for the current Fermi GPU device can deliver higher performance with further improvement in the global memory bandwidth for the same device.\",\"PeriodicalId\":276834,\"journal\":{\"name\":\"Proceedings of 2013 10th International Bhurban Conference on Applied Sciences & Technology (IBCAST)\",\"volume\":\"81 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-05-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of 2013 10th International Bhurban Conference on Applied Sciences & Technology (IBCAST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IBCAST.2013.6512151\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 2013 10th International Bhurban Conference on Applied Sciences & Technology (IBCAST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IBCAST.2013.6512151","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近年来,像gpu这样的流加速器已经成为并行计算的有效手段。这些设备的愿望清单涵盖了从支持数千个小内核到非常接近通用计算的性质。这使得包含数千个并行流核的未来加速器的设计空间非常大。这使得为下一代设备正确选择体系结构配置变得复杂。然而,为大规模并行架构开发的精确的设计空间探索工具可以简化这一任务。这项工作的主要目标是双重的。(i)我们为流加速器提供了一个名为SArcs(流架构模拟器)的跟踪驱动模拟器的完整环境。(ii)我们将模拟工具链用于GPU的设计空间探索,如流架构。我们对GPU类设备的不同架构方面的设计空间探索是参考NVIDIA的费米架构(GPU Tesla C2050)建立的基线。研究的方面包括流多处理器全局内存带宽、SMs之间的通道到内存层次结构和缓存层次结构的配置变化对性能的影响。探索是使用矢量还原,二维卷积的应用程序内核进行的。矩阵-矩阵乘法和3d模板。结果表明,当前费米GPU设备的计算资源配置可以提供更高的性能,并进一步提高相同设备的全局内存带宽。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Design space explorations for streaming accelerators using Streaming Architectural Simulator
In the recent years streaming accelerators like GPUs have been pop-up as an effective step towards parallel computing. The wish-list for these devices span from having a support for thousands of small cores to a nature very close to the general purpose computing. This makes the design space very vast for the future accelerators containing thousands of parallel streaming cores. This complicates to exercise a right choice of the architectural configuration for the next generation devices. However, accurate design space exploration tools developed for the massively parallel architectures can ease this task. The main objectives of this work are twofold. (i) We present a complete environment of a trace driven simulator named SArcs (Streaming Architectural Simulator) for the streaming accelerators. (ii) We use our simulation tool-chain for the design space explorations of the GPU like streaming architectures. Our design space explorations for different architectural aspects of a GPU like device a e with reference to a base line established for NVIDIA's Fermi architecture (GPU Tesla C2050). The explored aspects include the performation effects by the variations in the configurations of Streaming Multiprocessors Global Memory Bandwidth, Channles between SMs down to Memory Hierarchy and Cache Hierarchy. The explorations are performed using application kernels from Vector Reduction, 2D-Convolution. Matrix-Matrix Multiplication and 3D-Stencil. Results show that the configurations of the computational resources for the current Fermi GPU device can deliver higher performance with further improvement in the global memory bandwidth for the same device.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Linear independent increment process with linear standard deviation function for degradation analysis Sensor fault reconstruction for one-sided Lipschitz nonlinear systems Colloidal preparation of copper selenide and indium selenide nanoparticles by single source precursors approach Novel iris segmentation and recognition system for human identification A narrowband low noise amplifier for passive imaging systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1