大规模多处理器中缓存一致性的编译器和硬件支持:设计考虑和性能研究

23rd Annual International Symposium on Computer Architecture (ISCA'96) Pub Date : 1996-05-15 DOI:10.1145/232973.233002

L. Choi, P. Yew

{"title":"大规模多处理器中缓存一致性的编译器和硬件支持:设计考虑和性能研究","authors":"L. Choi, P. Yew","doi":"10.1145/232973.233002","DOIUrl":null,"url":null,"abstract":"In this paper, we study a hardware-supported, compiler directed (HSCD) cache coherence scheme, which can be implemented on a large-scale multiprocessor using off-the-shelf microprocessors, such as the Cray T3D. It can be adapted to various cache organizations, including multi-word cache lines and byte-addressable architectures. Several system related issues, including critical sections, inter-thread communication, and task migration have also been addressed. The cost of the required hardware support is small and proportional to the cache size. The necessary compiler algorithms, including intra- and interprocedural array data-flow analysis, have been implemented on the Polaris compiler [17].From our simulation study using the Perfect Club benchmarks, we found that, in spite of the conservative analysis made by the compiler, the performance of the proposed HSCD scheme can be comparable to that of a full-map hardware directory scheme. With its comparable performance and reduced hardware cost, the scheme can be a viable alternative for large-scale multiprocessors, such as the Cray T3D, that rely on users to maintain data coherence.","PeriodicalId":415354,"journal":{"name":"23rd Annual International Symposium on Computer Architecture (ISCA'96)","volume":"2017 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1996-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Compiler and Hardware Support for Cache Coherence in Large-Scale Multiprocessors: Design Considerations and Performance Study\",\"authors\":\"L. Choi, P. Yew\",\"doi\":\"10.1145/232973.233002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we study a hardware-supported, compiler directed (HSCD) cache coherence scheme, which can be implemented on a large-scale multiprocessor using off-the-shelf microprocessors, such as the Cray T3D. It can be adapted to various cache organizations, including multi-word cache lines and byte-addressable architectures. Several system related issues, including critical sections, inter-thread communication, and task migration have also been addressed. The cost of the required hardware support is small and proportional to the cache size. The necessary compiler algorithms, including intra- and interprocedural array data-flow analysis, have been implemented on the Polaris compiler [17].From our simulation study using the Perfect Club benchmarks, we found that, in spite of the conservative analysis made by the compiler, the performance of the proposed HSCD scheme can be comparable to that of a full-map hardware directory scheme. With its comparable performance and reduced hardware cost, the scheme can be a viable alternative for large-scale multiprocessors, such as the Cray T3D, that rely on users to maintain data coherence.\",\"PeriodicalId\":415354,\"journal\":{\"name\":\"23rd Annual International Symposium on Computer Architecture (ISCA'96)\",\"volume\":\"2017 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1996-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"23rd Annual International Symposium on Computer Architecture (ISCA'96)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/232973.233002\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"23rd Annual International Symposium on Computer Architecture (ISCA'96)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/232973.233002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

摘要

在本文中，我们研究了一种硬件支持的编译器导向(HSCD)缓存一致性方案，该方案可以使用现成的微处理器(如Cray T3D)在大规模多处理器上实现。它可以适应各种缓存组织，包括多字缓存线和字节可寻址架构。还讨论了几个系统相关问题，包括临界区、线程间通信和任务迁移。所需硬件支持的成本很小，并且与缓存大小成正比。必要的编译算法，包括程序内和程序间的数组数据流分析，已经在Polaris编译器上实现[17]。从我们使用Perfect Club基准测试的模拟研究中，我们发现，尽管编译器进行了保守的分析，但所提出的HSCD方案的性能可以与全映射硬件目录方案相媲美。由于具有相当的性能和更低的硬件成本，该方案可以成为大型多处理器(如Cray T3D)的可行替代方案，这些处理器依赖于用户来保持数据一致性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Compiler and Hardware Support for Cache Coherence in Large-Scale Multiprocessors: Design Considerations and Performance Study

In this paper, we study a hardware-supported, compiler directed (HSCD) cache coherence scheme, which can be implemented on a large-scale multiprocessor using off-the-shelf microprocessors, such as the Cray T3D. It can be adapted to various cache organizations, including multi-word cache lines and byte-addressable architectures. Several system related issues, including critical sections, inter-thread communication, and task migration have also been addressed. The cost of the required hardware support is small and proportional to the cache size. The necessary compiler algorithms, including intra- and interprocedural array data-flow analysis, have been implemented on the Polaris compiler [17].From our simulation study using the Perfect Club benchmarks, we found that, in spite of the conservative analysis made by the compiler, the performance of the proposed HSCD scheme can be comparable to that of a full-map hardware directory scheme. With its comparable performance and reduced hardware cost, the scheme can be a viable alternative for large-scale multiprocessors, such as the Cray T3D, that rely on users to maintain data coherence.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

23rd Annual International Symposium on Computer Architecture (ISCA'96)

自引率

0.00%

发文量

期刊最新文献

Memory Bandwidth Limitations of Future Microprocessors Missing the Memory Wall: The Case for Processor/Memory Integration Instruction Prefetching of Systems Codes with Layout Optimized for Reduced Cache Misses STiNG: A CC-NUMA Computer System for the Commercial Marketplace High-Bandwidth Address Translation for Multiple-Issue Processors