减少和容忍延迟技术的比较评价

Anoop Gupta, J. Hennessy, K. Gharachorloo, T. Mowry, W. Weber
{"title":"减少和容忍延迟技术的比较评价","authors":"Anoop Gupta, J. Hennessy, K. Gharachorloo, T. Mowry, W. Weber","doi":"10.1145/115953.115978","DOIUrl":null,"url":null,"abstract":"Techniques that can cope with the large latency of memory accesses are essential for achieving high processor utilization in large-scale shared-memory multiprocessors. In this paper, we consider four architectural techniques that address the latency problem: (i) hardware coherent caches, (ii) relaxed memory consistency, (iii) softwareconuolled prefetching, and (iv) multiple-context suppon. We some studies of benefits of the individual techniques have been done, no Study evaluates all of the techniques within a consistent framework. This paper attempts to remedy this by providing a comprehensive evaluation of the benefits of the four techniques, both individually and in combinations, using a consistent set of architectural assumptions. The results in this paper have been obtained using detailed simulations of a large-scale shared-memory multiprocessor. Our results show that caches and relaxed consistency UNformly improve performance. The improvements due to prefetching and multiple contexts are sizeable, but are much more applicationdependent. Combinations of the various techniques generally amin better performance than each one on its own. Overall, we show that using suitahle combinations of the techniques, performance can be improved by 4 to 7 dmes","PeriodicalId":187095,"journal":{"name":"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture","volume":"148 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1991-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"229","resultStr":"{\"title\":\"Comparative evaluation of latency reducing and tolerating techniques\",\"authors\":\"Anoop Gupta, J. Hennessy, K. Gharachorloo, T. Mowry, W. Weber\",\"doi\":\"10.1145/115953.115978\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Techniques that can cope with the large latency of memory accesses are essential for achieving high processor utilization in large-scale shared-memory multiprocessors. In this paper, we consider four architectural techniques that address the latency problem: (i) hardware coherent caches, (ii) relaxed memory consistency, (iii) softwareconuolled prefetching, and (iv) multiple-context suppon. We some studies of benefits of the individual techniques have been done, no Study evaluates all of the techniques within a consistent framework. This paper attempts to remedy this by providing a comprehensive evaluation of the benefits of the four techniques, both individually and in combinations, using a consistent set of architectural assumptions. The results in this paper have been obtained using detailed simulations of a large-scale shared-memory multiprocessor. Our results show that caches and relaxed consistency UNformly improve performance. The improvements due to prefetching and multiple contexts are sizeable, but are much more applicationdependent. Combinations of the various techniques generally amin better performance than each one on its own. Overall, we show that using suitahle combinations of the techniques, performance can be improved by 4 to 7 dmes\",\"PeriodicalId\":187095,\"journal\":{\"name\":\"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture\",\"volume\":\"148 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1991-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"229\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/115953.115978\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/115953.115978","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 229

摘要

能够处理内存访问的大延迟的技术对于在大规模共享内存多处理器中实现高处理器利用率至关重要。在本文中,我们考虑了解决延迟问题的四种架构技术:(i)硬件连贯缓存,(ii)放松内存一致性,(iii)软件控制预取,以及(iv)多上下文支持。我们对个别技术的益处进行了一些研究,没有研究在一致的框架内评估所有技术。本文试图通过使用一组一致的体系结构假设,对这四种技术的好处进行全面的评估,包括单独的和组合的。本文的结果是通过对一个大型共享内存多处理器的详细仿真得到的。我们的结果表明,缓存和放松一致性可以均匀地提高性能。预取和多上下文带来的改进是相当大的,但更依赖于应用程序。各种技术的组合通常比单独使用一种技术的性能更好。总的来说,我们表明,使用适当的技术组合,性能可以提高4到7个百分点
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Comparative evaluation of latency reducing and tolerating techniques
Techniques that can cope with the large latency of memory accesses are essential for achieving high processor utilization in large-scale shared-memory multiprocessors. In this paper, we consider four architectural techniques that address the latency problem: (i) hardware coherent caches, (ii) relaxed memory consistency, (iii) softwareconuolled prefetching, and (iv) multiple-context suppon. We some studies of benefits of the individual techniques have been done, no Study evaluates all of the techniques within a consistent framework. This paper attempts to remedy this by providing a comprehensive evaluation of the benefits of the four techniques, both individually and in combinations, using a consistent set of architectural assumptions. The results in this paper have been obtained using detailed simulations of a large-scale shared-memory multiprocessor. Our results show that caches and relaxed consistency UNformly improve performance. The improvements due to prefetching and multiple contexts are sizeable, but are much more applicationdependent. Combinations of the various techniques generally amin better performance than each one on its own. Overall, we show that using suitahle combinations of the techniques, performance can be improved by 4 to 7 dmes
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The effect on RISC performance of register set size and structure versus code generation strategy GT-EP: a novel high-performance real-time architecture Performance prediction and tuning on a multiprocessor High performance interprocessor communication through optical wavelength division multiple access channels An empirical study of the CRAY Y-MP processor using the PERFECT club benchmarks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1