{"title":"Distributed Explicit State Space Exploration with State Reconstruction for RDMA Networks","authors":"Sami Evangelista, L. Petrucci, L. Kristensen","doi":"10.1109/ICECCS54210.2022.00021","DOIUrl":null,"url":null,"abstract":"The inherent computational complexity of validating and verifying concurrent systems implies a need to be able to exploit parallel and distributed computing architectures. We present a new distributed algorithm for state space exploration of concurrent systems on computing clusters. Our algorithm relies on Remote Direct Memory Access (RDMA) for low-latency transfer of states between computing elements, and on state reconstruction trees for compact representation of states on the computing elements themselves. For the distribution of states between computing elements, we propose a concept of state stealing. We have implemented our proposed algorithm using the OpenSHMEM API for RDMA and experimentally evaluated it on the Grid'500 testbed with a set of benchmark models. The experimental results show that our algorithm scales well with the number of available computing elements, and that our state stealing mechanism generally provides a balanced workload distribution.","PeriodicalId":344493,"journal":{"name":"2022 26th International Conference on Engineering of Complex Computer Systems (ICECCS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 26th International Conference on Engineering of Complex Computer Systems (ICECCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECCS54210.2022.00021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The inherent computational complexity of validating and verifying concurrent systems implies a need to be able to exploit parallel and distributed computing architectures. We present a new distributed algorithm for state space exploration of concurrent systems on computing clusters. Our algorithm relies on Remote Direct Memory Access (RDMA) for low-latency transfer of states between computing elements, and on state reconstruction trees for compact representation of states on the computing elements themselves. For the distribution of states between computing elements, we propose a concept of state stealing. We have implemented our proposed algorithm using the OpenSHMEM API for RDMA and experimentally evaluated it on the Grid'500 testbed with a set of benchmark models. The experimental results show that our algorithm scales well with the number of available computing elements, and that our state stealing mechanism generally provides a balanced workload distribution.