An On-the-Fly Method to Exchange Vector Clocks in Distributed-Memory Programs

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI:10.1109/IPDPSW55747.2022.00093

Simon Schwitanski, Felix Tomski, Joachim Protze, C. Terboven, Matthias S. Müller

{"title":"An On-the-Fly Method to Exchange Vector Clocks in Distributed-Memory Programs","authors":"Simon Schwitanski, Felix Tomski, Joachim Protze, C. Terboven, Matthias S. Müller","doi":"10.1109/IPDPSW55747.2022.00093","DOIUrl":null,"url":null,"abstract":"Vector clocks are logical timestamps used in correctness tools to analyze the happened-before relation between events in parallel program executions. In particular, race detectors use them to find concurrent conflicting memory accesses, and replay tools use them to reproduce or find alternative execution paths. To record the happened-before relation with vector clocks, tool developers have to consider the different synchronization concepts of a programming model, e.g., barriers, locks, or message exchanges. Especially in distributed-memory programs, various concepts result in explicit and implicit synchronization between processes. Previously implemented vector clock exchanges are often specific to a single programming model, and a translation to other programming models is not trivial. Consequently, analyses relying on the vector clock exchange remain model-specific. This paper proposes an abstraction layer for on-the-fly vector clock exchanges for distributed-memory programs. Based on the programming models MPI, OpenSHMEM, and GASPI, we define common synchronization primitives and explain how model-specific procedures map to our model-agnostic abstraction layer. The exchange model is general enough also to support synchronization concepts of other parallel programming models. We present our implementation of the vector clock abstraction layer based on the Generic Tool Infrastructure with translators for MPI and OpenSHMEM. In an overhead study using the SPEC MPI 2007 benchmarks, the slowdown of the implemented vector clock exchange ranges from 1.1x to 12.6x for runs with up to 768 processes.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW55747.2022.00093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Vector clocks are logical timestamps used in correctness tools to analyze the happened-before relation between events in parallel program executions. In particular, race detectors use them to find concurrent conflicting memory accesses, and replay tools use them to reproduce or find alternative execution paths. To record the happened-before relation with vector clocks, tool developers have to consider the different synchronization concepts of a programming model, e.g., barriers, locks, or message exchanges. Especially in distributed-memory programs, various concepts result in explicit and implicit synchronization between processes. Previously implemented vector clock exchanges are often specific to a single programming model, and a translation to other programming models is not trivial. Consequently, analyses relying on the vector clock exchange remain model-specific. This paper proposes an abstraction layer for on-the-fly vector clock exchanges for distributed-memory programs. Based on the programming models MPI, OpenSHMEM, and GASPI, we define common synchronization primitives and explain how model-specific procedures map to our model-agnostic abstraction layer. The exchange model is general enough also to support synchronization concepts of other parallel programming models. We present our implementation of the vector clock abstraction layer based on the Generic Tool Infrastructure with translators for MPI and OpenSHMEM. In an overhead study using the SPEC MPI 2007 benchmarks, the slowdown of the implemented vector clock exchange ranges from 1.1x to 12.6x for runs with up to 768 processes.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

分布式内存程序中动态交换矢量时钟的方法

矢量时钟是正确性工具中使用的逻辑时间戳，用于分析并行程序执行中事件之间发生之前的关系。特别是，竞争检测器使用它们来查找并发的冲突内存访问，重播工具使用它们来重现或查找替代执行路径。为了用矢量时钟记录之前发生的关系，工具开发人员必须考虑编程模型的不同同步概念，例如，屏障、锁或消息交换。特别是在分布式内存程序中，各种概念导致进程之间显式和隐式同步。以前实现的矢量时钟交换通常特定于单个编程模型，并且转换到其他编程模型也不是微不足道的。因此，依赖于矢量时钟交换的分析仍然是特定于模型的。本文提出了一种用于分布式内存程序的动态矢量时钟交换的抽象层。基于编程模型MPI、OpenSHMEM和GASPI，我们定义了常见的同步原语，并解释了特定于模型的过程如何映射到与模型无关的抽象层。交换模型也足够通用，可以支持其他并行编程模型的同步概念。我们提出了基于通用工具基础设施的矢量时钟抽象层的实现，并为MPI和OpenSHMEM提供了翻译。在使用SPEC MPI 2007基准测试的开销研究中，对于运行多达768个进程的运行，实现的矢量时钟交换的减速范围从1.1倍到12.6倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

自引率

0.00%

发文量

期刊最新文献

(CGRA4HPC) 2022 Invited Speaker: Pushing the Boundaries of HPC with the Integration of AI Moving from Composable to Programmable Energy-aware neural architecture selection and hyperparameter optimization Smoothing on Dynamic Concurrency Throttling An Analysis of Mapping Polybench Kernels to HPC CGRAs