Termination detection for fine-grained message-passing architectures

Matthew Naylor, S. Moore, A. Mokhov, David B. Thomas, J. Beaumont, Shane T. Fleming, A. T. Markettos, Thomas Bytheway, Andrew D. Brown
{"title":"Termination detection for fine-grained message-passing architectures","authors":"Matthew Naylor, S. Moore, A. Mokhov, David B. Thomas, J. Beaumont, Shane T. Fleming, A. T. Markettos, Thomas Bytheway, Andrew D. Brown","doi":"10.1109/ASAP49362.2020.00012","DOIUrl":null,"url":null,"abstract":"Barrier primitives provided by standard parallel programming APIs are the primary means by which applications implement global synchronisation. Typically these primitives are fully-committed to synchronisation in the sense that, once a barrier is entered, synchronisation is the only way out. For message-passing applications, this raises the question of what happens when a message arrives at a thread that already resides in a barrier. Without a satisfactory answer, barriers do not interact with message-passing in any useful way.In this paper, we propose a new refutable barrier primitive that combines with message-passing to form a simple, expressive, efficient, well-defined API. It has a clear semantics based on termination detection, and supports the development of both globally-synchronous and asynchronous parallel applications.To evaluate the new primitive, we implement it in a prototype large-scale message-passing machine with 49,152 RISC-V threads distributed over 48 FPGAs. We show that hardware support for the primitive leads to a highly-efficient implementation, capable of synchronisation rates that are an order-of-magnitude higher than what is achievable in software. Using the primitive, we implement synchronous and asynchronous versions of a range of applications, observing that each version can have significant advantages over the other, depending on the application. Therefore, a barrier primitive supporting both styles can greatly assist the development of parallel programs.","PeriodicalId":375691,"journal":{"name":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAP49362.2020.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Barrier primitives provided by standard parallel programming APIs are the primary means by which applications implement global synchronisation. Typically these primitives are fully-committed to synchronisation in the sense that, once a barrier is entered, synchronisation is the only way out. For message-passing applications, this raises the question of what happens when a message arrives at a thread that already resides in a barrier. Without a satisfactory answer, barriers do not interact with message-passing in any useful way.In this paper, we propose a new refutable barrier primitive that combines with message-passing to form a simple, expressive, efficient, well-defined API. It has a clear semantics based on termination detection, and supports the development of both globally-synchronous and asynchronous parallel applications.To evaluate the new primitive, we implement it in a prototype large-scale message-passing machine with 49,152 RISC-V threads distributed over 48 FPGAs. We show that hardware support for the primitive leads to a highly-efficient implementation, capable of synchronisation rates that are an order-of-magnitude higher than what is achievable in software. Using the primitive, we implement synchronous and asynchronous versions of a range of applications, observing that each version can have significant advantages over the other, depending on the application. Therefore, a barrier primitive supporting both styles can greatly assist the development of parallel programs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
细粒度消息传递体系结构的终止检测
标准并行编程api提供的屏障原语是应用程序实现全局同步的主要手段。通常,这些原语完全致力于同步,因为一旦进入障碍,同步是唯一的出路。对于消息传递应用程序,这就提出了一个问题:当消息到达已经驻留在屏障中的线程时,会发生什么情况。如果没有满意的答案,屏障就不会以任何有用的方式与消息传递进行交互。在本文中,我们提出了一种新的可反驳的屏障原语,它与消息传递相结合,形成了一个简单、富有表现力、高效、定义良好的API。它具有基于终止检测的清晰语义,并支持全局同步和异步并行应用程序的开发。为了评估新的原语,我们在一个大型消息传递机器的原型中实现了它,该机器有49,152个RISC-V线程,分布在48个fpga上。我们展示了对原语的硬件支持导致了高效的实现,能够实现比软件可实现的同步率高一个数量级的同步率。使用原语,我们实现了一系列应用程序的同步和异步版本,并观察到每个版本都比其他版本具有显著的优势,这取决于应用程序。因此,支持两种风格的屏障原语可以极大地帮助并行程序的开发。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
ASAP 2020 Committees Persistent Fault Analysis of Neural Networks on FPGA-based Acceleration System Anytime Floating-Point Addition and Multiplication-Concepts and Implementations FPGAs in the Datacenters: the Case of Parallel Hybrid Super Scalar String Sample Sort An Efficient Convolution Engine based on the À-trous Spatial Pyramid Pooling
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1