Increasing multicore system efficiency through intelligent bandwidth shifting

Víctor Jiménez, A. Buyuktosunoglu, P. Bose, F. O'Connell, F. Cazorla, M. Valero
{"title":"Increasing multicore system efficiency through intelligent bandwidth shifting","authors":"Víctor Jiménez, A. Buyuktosunoglu, P. Bose, F. O'Connell, F. Cazorla, M. Valero","doi":"10.1109/HPCA.2015.7056020","DOIUrl":null,"url":null,"abstract":"Memory bandwidth is a crucial resource in computing systems. Current CMP/SMT processors have a significant number of cores and they can run many threads concurrently. This large thread count adds high pressure to the memory bus, which demands high bandwidth to service memory requests from the cores. Hardware data prefetching is a well-known technique for hiding memory latency. Due to its speculative nature, however, in some situations prefetching does not effectively work, wasting memory bandwidth and polluting the caches. Data prefetching efficiency depends on the prefetching algorithm. It also depends on the characteristics of the applications running on the system. In this paper we propose an online bandwidth shifting mechanism that dynamically assigns bandwidth to applications according to their prefetch efficiency. This mechanism maximizes the utilization of memory bandwidth, thereby improving system performance and/or reducing memory power consumption. To the best of our knowledge, this solution is the first to not require hardware support. We evaluate the benefits of using our bandwidth shifting mechanism on a real system - the IBM POWER7. We obtain speedups in the order of 10-20% (in one instance, speedup exceeds 1.6X). Our mechanism does not generate a significant degree of unfairness among the applications. In many cases individual thread performance increases by 10-35%, while virtually no thread experiences a slowdown larger than 5%.","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"55 1","pages":"39-50"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2015.7056020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21

Abstract

Memory bandwidth is a crucial resource in computing systems. Current CMP/SMT processors have a significant number of cores and they can run many threads concurrently. This large thread count adds high pressure to the memory bus, which demands high bandwidth to service memory requests from the cores. Hardware data prefetching is a well-known technique for hiding memory latency. Due to its speculative nature, however, in some situations prefetching does not effectively work, wasting memory bandwidth and polluting the caches. Data prefetching efficiency depends on the prefetching algorithm. It also depends on the characteristics of the applications running on the system. In this paper we propose an online bandwidth shifting mechanism that dynamically assigns bandwidth to applications according to their prefetch efficiency. This mechanism maximizes the utilization of memory bandwidth, thereby improving system performance and/or reducing memory power consumption. To the best of our knowledge, this solution is the first to not require hardware support. We evaluate the benefits of using our bandwidth shifting mechanism on a real system - the IBM POWER7. We obtain speedups in the order of 10-20% (in one instance, speedup exceeds 1.6X). Our mechanism does not generate a significant degree of unfairness among the applications. In many cases individual thread performance increases by 10-35%, while virtually no thread experiences a slowdown larger than 5%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过智能带宽转移提高多核系统效率
内存带宽是计算系统中至关重要的资源。当前的CMP/SMT处理器有大量的内核,它们可以并发地运行多个线程。这么大的线程数给内存总线增加了很大的压力,它需要高带宽来服务来自内核的内存请求。硬件数据预取是一种众所周知的隐藏内存延迟的技术。然而,由于其推测性,在某些情况下,预取不能有效地工作,从而浪费内存带宽并污染缓存。数据预取的效率取决于预取算法。它还取决于系统上运行的应用程序的特征。本文提出了一种在线带宽转移机制,根据应用程序的预取效率动态分配带宽。这种机制可以最大限度地利用内存带宽,从而提高系统性能和/或降低内存功耗。据我们所知,这个解决方案是第一个不需要硬件支持的解决方案。我们评估了在实际系统(IBM POWER7)上使用我们的带宽转移机制的好处。我们获得了10-20%的加速(在一个实例中,加速超过1.6倍)。我们的机制不会在应用程序之间产生很大程度的不公平。在许多情况下,单个线程的性能提高了10-35%,而实际上没有线程的减速超过5%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Parameter Identification Inverse Problems of Partial Differential Equations Based on the Improved Gene Expression Programming High-Efficiency Realization of SRT Division on Ternary Optical Computers A Fast Training Method for Transductive Support Vector Machine in Semi-supervised Learning Performance Optimization of a DEM Simulation Framework on GPU Using a Stencil Model A Platform for Routine Development of Ternary Optical Computers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1