MGS: A Multigrain Shared Memory System

D. Yeung, J. Kubiatowicz, A. Agarwal
{"title":"MGS: A Multigrain Shared Memory System","authors":"D. Yeung, J. Kubiatowicz, A. Agarwal","doi":"10.1145/232973.232980","DOIUrl":null,"url":null,"abstract":"Parallel workstations, each comprising 10-100 processors, promise cost-effective general-purpose multiprocessing. This paper explores the coupling of such small- to medium-scale shared memory multiprocessors through software over a local area network to synthesize larger shared memory systems. We call these systems Distributed Scalable Shared-memory Multiprocessors (DSSMPs).This paper introduces the design of a shared memory system that uses multiple granularities of sharing, and presents an implementation on the Alewife multiprocessor, called MGS. Multigrain shared memory enables the collaboration of hardware and software shared memory, and is effective at exploiting a form of locality called multigrain locality. The system provides efficient support for fine-grain cache-line sharing, and resorts to coarse-grain page-level sharing only when locality is violated. A framework for characterizing application performance on DSSMPs is also introduced.Using MGS, an in-depth study of several shared memory applications is conducted to understand the behavior of DSSMPs. We find that unmodified shared memory applications can exploit multigrain sharing. Keeping the number of processors fixed, applications execute up to 85% faster when each DSSMP node is a multiprocessor as opposed to a uniprocessor. We also show that tightly-coupled multiprocessors hold a significant performance advantage over DSSMPs on unmodified applications. However, a best-effort implementation of a kernel from one of the applications allows a DSSMP to almost match the performance of a tightly-coupled multiprocessor.","PeriodicalId":415354,"journal":{"name":"23rd Annual International Symposium on Computer Architecture (ISCA'96)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1996-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"77","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"23rd Annual International Symposium on Computer Architecture (ISCA'96)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/232973.232980","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 77

Abstract

Parallel workstations, each comprising 10-100 processors, promise cost-effective general-purpose multiprocessing. This paper explores the coupling of such small- to medium-scale shared memory multiprocessors through software over a local area network to synthesize larger shared memory systems. We call these systems Distributed Scalable Shared-memory Multiprocessors (DSSMPs).This paper introduces the design of a shared memory system that uses multiple granularities of sharing, and presents an implementation on the Alewife multiprocessor, called MGS. Multigrain shared memory enables the collaboration of hardware and software shared memory, and is effective at exploiting a form of locality called multigrain locality. The system provides efficient support for fine-grain cache-line sharing, and resorts to coarse-grain page-level sharing only when locality is violated. A framework for characterizing application performance on DSSMPs is also introduced.Using MGS, an in-depth study of several shared memory applications is conducted to understand the behavior of DSSMPs. We find that unmodified shared memory applications can exploit multigrain sharing. Keeping the number of processors fixed, applications execute up to 85% faster when each DSSMP node is a multiprocessor as opposed to a uniprocessor. We also show that tightly-coupled multiprocessors hold a significant performance advantage over DSSMPs on unmodified applications. However, a best-effort implementation of a kernel from one of the applications allows a DSSMP to almost match the performance of a tightly-coupled multiprocessor.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MGS:一个多粒共享内存系统
并行工作站,每个包含10-100个处理器,承诺经济高效的通用多处理。本文探讨了这种中小规模的共享内存多处理器通过软件在局域网上的耦合来合成更大的共享内存系统。我们称这些系统为分布式可扩展共享内存多处理器(dssmp)。本文介绍了一种采用多粒度共享的共享内存系统的设计,并给出了在Alewife多处理器(MGS)上的实现。多粒共享内存支持硬件和软件共享内存的协作,并且有效地利用了一种称为多粒局部性的局部性形式。该系统为细粒度缓存行共享提供了有效的支持,并且仅在违反局部性时才采用粗粒度页面级共享。本文还介绍了dssmp上应用性能表征的框架。利用MGS,对几种共享内存应用程序进行了深入研究,以了解dssmp的行为。我们发现未经修改的共享内存应用程序可以利用多粒共享。保持固定的处理器数量,当每个DSSMP节点都是多处理器时,应用程序的执行速度比单处理器快85%。我们还表明,在未经修改的应用程序中,紧密耦合的多处理器比dssmp具有显著的性能优势。但是,一个应用程序的内核的最佳实现允许DSSMP几乎与紧密耦合的多处理器的性能相匹配。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Memory Bandwidth Limitations of Future Microprocessors Missing the Memory Wall: The Case for Processor/Memory Integration Instruction Prefetching of Systems Codes with Layout Optimized for Reduced Cache Misses STiNG: A CC-NUMA Computer System for the Commercial Marketplace High-Bandwidth Address Translation for Multiple-Issue Processors
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1