共享对并行程序缓存和总线性能的影响

ASPLOS III Pub Date : 1989-04-01 DOI:10.1145/70082.68206

S. Eggers, R. Katz

{"title":"共享对并行程序缓存和总线性能的影响","authors":"S. Eggers, R. Katz","doi":"10.1145/70082.68206","DOIUrl":null,"url":null,"abstract":"Bus bandwidth ultimately limits the performance, and therefore the scale, of bus-based, shared memory multiprocessors. Previous studies have extrapolated from uniprocessor measurements and simulations to estimate the performance of these machines. In this study, we use traces of parallel programs to evaluate the cache and bus performance of shared memory multiprocessors, in which coherency is maintained by a write-invalidate protocol. In particular, we analyze the effect of sharing overhead on cache miss ratio and bus utilization.\nOur studies show that parallel programs incur substantially higher miss ratios and bus utilization than comparable uniprocessor programs. The sharing component of these metrics proportionally increases with both cache and block size, and for some cache configurations determines both their magnitude and trend. The amount of overhead depends on the memory reference pattern to the shared data. Programs that exhibit good per-processor-locality perform better than those with fine-grain-sharing. This suggests that parallel software writers and better compiler technology can improve program performance through better memory organization of shared data.","PeriodicalId":359206,"journal":{"name":"ASPLOS III","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1989-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"155","resultStr":"{\"title\":\"The effect of sharing on the cache and bus performance of parallel programs\",\"authors\":\"S. Eggers, R. Katz\",\"doi\":\"10.1145/70082.68206\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Bus bandwidth ultimately limits the performance, and therefore the scale, of bus-based, shared memory multiprocessors. Previous studies have extrapolated from uniprocessor measurements and simulations to estimate the performance of these machines. In this study, we use traces of parallel programs to evaluate the cache and bus performance of shared memory multiprocessors, in which coherency is maintained by a write-invalidate protocol. In particular, we analyze the effect of sharing overhead on cache miss ratio and bus utilization.\\nOur studies show that parallel programs incur substantially higher miss ratios and bus utilization than comparable uniprocessor programs. The sharing component of these metrics proportionally increases with both cache and block size, and for some cache configurations determines both their magnitude and trend. The amount of overhead depends on the memory reference pattern to the shared data. Programs that exhibit good per-processor-locality perform better than those with fine-grain-sharing. This suggests that parallel software writers and better compiler technology can improve program performance through better memory organization of shared data.\",\"PeriodicalId\":359206,\"journal\":{\"name\":\"ASPLOS III\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1989-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"155\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ASPLOS III\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/70082.68206\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ASPLOS III","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/70082.68206","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 155

摘要

总线带宽最终限制了基于总线的共享内存多处理器的性能，从而限制了其规模。以前的研究是从单处理器测量和模拟中推断出这些机器的性能。在本研究中，我们使用并行程序的跟踪来评估共享内存多处理器的缓存和总线性能，其中一致性由写无效协议保持。特别地，我们分析了共享开销对缓存丢失率和总线利用率的影响。我们的研究表明，并行程序比类似的单处理器程序产生更高的丢失率和总线利用率。这些指标的共享部分随着缓存和块大小成比例地增加，并且对于某些缓存配置决定了它们的大小和趋势。开销的大小取决于共享数据的内存引用模式。表现出良好的每处理器局部性的程序比那些具有细粒度共享的程序表现得更好。这表明并行软件编写者和更好的编译器技术可以通过更好地组织共享数据的内存来提高程序性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

The effect of sharing on the cache and bus performance of parallel programs

Bus bandwidth ultimately limits the performance, and therefore the scale, of bus-based, shared memory multiprocessors. Previous studies have extrapolated from uniprocessor measurements and simulations to estimate the performance of these machines. In this study, we use traces of parallel programs to evaluate the cache and bus performance of shared memory multiprocessors, in which coherency is maintained by a write-invalidate protocol. In particular, we analyze the effect of sharing overhead on cache miss ratio and bus utilization. Our studies show that parallel programs incur substantially higher miss ratios and bus utilization than comparable uniprocessor programs. The sharing component of these metrics proportionally increases with both cache and block size, and for some cache configurations determines both their magnitude and trend. The amount of overhead depends on the memory reference pattern to the shared data. Programs that exhibit good per-processor-locality perform better than those with fine-grain-sharing. This suggests that parallel software writers and better compiler technology can improve program performance through better memory organization of shared data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ASPLOS III

自引率

0.00%

发文量

期刊最新文献

Program optimization for instruction caches A message driven OR-parallel machine Reference history, page size, and migration daemons in local/remote architectures An analysis of 8086 instruction set usage in MS DOS programs Available instruction-level parallelism for superscalar and superpipelined machines