Energy Efficient HPC on Embedded SoCs: Optimization Techniques for Mali GPU

Ivan Grasso, Petar Radojkovic, Nikola Rajovic, Isaac Gelado, Alex Ramírez
{"title":"Energy Efficient HPC on Embedded SoCs: Optimization Techniques for Mali GPU","authors":"Ivan Grasso, Petar Radojkovic, Nikola Rajovic, Isaac Gelado, Alex Ramírez","doi":"10.1109/IPDPS.2014.24","DOIUrl":null,"url":null,"abstract":"A lot of effort from academia and industry has been invested in exploring the suitability of low-power embedded technologies for HPC. Although state-of-the-art embedded systems-on-chip (SoCs) inherently contain GPUs that could be used for HPC, their performance and energy capabilities have never been evaluated. Two reasons contribute to the above. Primarily, embedded GPUs until now, have not supported 64-bit floating point arithmetic - a requirement for HPC. Secondly, embedded GPUs did not provide support for parallel programming languages such as OpenCL and CUDA. However, the situation is changing, and the latest GPUs integrated in embedded SoCs do support 64-bit floating point precision and parallel programming models. In this paper, we analyze performance and energy advantages of embedded GPUs for HPC. In particular, we analyze ARM Mali-T604 GPU - the first embedded GPUs with OpenCL Full Profile support. We identify, implement and evaluate software optimization techniques for efficient utilization of the ARM Mali GPU Compute Architecture. Our results show that, HPC benchmarks running on the ARM Mali-T604 GPU integrated into Exynos 5250 SoC, on average, achieve speed-up of 8.7X over a single Cortex-A15 core, while consuming only 32% of the energy. Overall results show that embedded GPUs have performance and energy qualities that make them candidates for future HPC systems.","PeriodicalId":309291,"journal":{"name":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"57","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2014.24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 57

Abstract

A lot of effort from academia and industry has been invested in exploring the suitability of low-power embedded technologies for HPC. Although state-of-the-art embedded systems-on-chip (SoCs) inherently contain GPUs that could be used for HPC, their performance and energy capabilities have never been evaluated. Two reasons contribute to the above. Primarily, embedded GPUs until now, have not supported 64-bit floating point arithmetic - a requirement for HPC. Secondly, embedded GPUs did not provide support for parallel programming languages such as OpenCL and CUDA. However, the situation is changing, and the latest GPUs integrated in embedded SoCs do support 64-bit floating point precision and parallel programming models. In this paper, we analyze performance and energy advantages of embedded GPUs for HPC. In particular, we analyze ARM Mali-T604 GPU - the first embedded GPUs with OpenCL Full Profile support. We identify, implement and evaluate software optimization techniques for efficient utilization of the ARM Mali GPU Compute Architecture. Our results show that, HPC benchmarks running on the ARM Mali-T604 GPU integrated into Exynos 5250 SoC, on average, achieve speed-up of 8.7X over a single Cortex-A15 core, while consuming only 32% of the energy. Overall results show that embedded GPUs have performance and energy qualities that make them candidates for future HPC systems.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于嵌入式soc的高效高性能计算:Mali GPU的优化技术
学术界和工业界已经投入了大量的精力来探索低功耗嵌入式技术对高性能计算的适用性。虽然最先进的嵌入式片上系统(soc)固有地包含可用于HPC的gpu,但它们的性能和能源能力从未被评估过。有两个原因导致了上述情况。到目前为止,嵌入式gpu主要不支持64位浮点运算——这是高性能计算的一个要求。其次,嵌入式gpu不支持并行编程语言,如OpenCL和CUDA。然而,情况正在发生变化,集成在嵌入式soc中的最新gpu确实支持64位浮点精度和并行编程模型。本文分析了用于高性能计算的嵌入式gpu在性能和能耗方面的优势。我们特别分析了ARM Mali-T604 GPU——第一个支持OpenCL Full Profile的嵌入式GPU。我们确定,实施和评估有效利用ARM Mali GPU计算架构的软件优化技术。我们的研究结果表明,在集成到Exynos 5250 SoC中的ARM Mali-T604 GPU上运行的HPC基准测试,平均而言,在单个Cortex-A15内核上实现了8.7倍的加速,而消耗的能量仅为32%。总体结果表明,嵌入式gpu具有性能和能源质量,使其成为未来高性能计算系统的候选者。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Improving the Performance of CA-GMRES on Multicores with Multiple GPUs Multi-resource Real-Time Reader/Writer Locks for Multiprocessors Energy-Efficient Time-Division Multiplexed Hybrid-Switched NoC for Heterogeneous Multicore Systems Scaling Irregular Applications through Data Aggregation and Software Multithreading Heterogeneity-Aware Workload Placement and Migration in Distributed Sustainable Datacenters
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1