The fast and the capacious: memory-efficient multi-GPU accelerated explicit state space exploration with GPUexplore 3.0

Anton Wijs, Muhammad Osama
{"title":"The fast and the capacious: memory-efficient multi-GPU accelerated explicit state space exploration with GPUexplore 3.0","authors":"Anton Wijs, Muhammad Osama","doi":"10.3389/fhpcp.2024.1285349","DOIUrl":null,"url":null,"abstract":"The GPU acceleration of explicit state space exploration, for explicit-state model checking, has been the subject of previous research, but to date, the tools have been limited in their applicability and in their practical use. Considering this research, to our knowledge, we are the first to use a novel tree database for GPUs. This novel tree database allows high-performant, memory-efficient storage of states in the form of binary trees. Besides the tree compression this enables, we also propose two new hashing schemes, compact-cuckoo and compact multiple-functions. These schemes enable the use of Cleary compression to compactly store tree roots. Besides an in-depth discussion of the tree database algorithms, the input language and workflow of our tool, called GPUexplore 3.0, are presented. Finally, we explain how the algorithms can be extended to exploit multiple GPUs that reside on the same machine. Experiments show single-GPU processing speeds of up to 144 million states per second compared to 20 million states achieved by 32-core LTSmin. In the multi-GPU setting, workload and storage distributions are optimal, and, frequently, performance is even positively impacted when the number of GPUs is increased. Overall, a logarithmic acceleration up to 1.9× was achieved with four GPUs, compared to what was achieved with one and two GPUs. We believe that a linear speedup can be easily accomplished with faster P2P communications between the GPUs.","PeriodicalId":399190,"journal":{"name":"Frontiers in High Performance Computing","volume":"137 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fhpcp.2024.1285349","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The GPU acceleration of explicit state space exploration, for explicit-state model checking, has been the subject of previous research, but to date, the tools have been limited in their applicability and in their practical use. Considering this research, to our knowledge, we are the first to use a novel tree database for GPUs. This novel tree database allows high-performant, memory-efficient storage of states in the form of binary trees. Besides the tree compression this enables, we also propose two new hashing schemes, compact-cuckoo and compact multiple-functions. These schemes enable the use of Cleary compression to compactly store tree roots. Besides an in-depth discussion of the tree database algorithms, the input language and workflow of our tool, called GPUexplore 3.0, are presented. Finally, we explain how the algorithms can be extended to exploit multiple GPUs that reside on the same machine. Experiments show single-GPU processing speeds of up to 144 million states per second compared to 20 million states achieved by 32-core LTSmin. In the multi-GPU setting, workload and storage distributions are optimal, and, frequently, performance is even positively impacted when the number of GPUs is increased. Overall, a logarithmic acceleration up to 1.9× was achieved with four GPUs, compared to what was achieved with one and two GPUs. We believe that a linear speedup can be easily accomplished with faster P2P communications between the GPUs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
速度与容量:使用 GPUexplore 3.0 进行内存效率高的多 GPU 加速显式状态空间探索
GPU 加速显式状态空间探索(用于显式状态模型检查)一直是以前研究的主题,但迄今为止,这些工具在适用性和实际使用方面都很有限。考虑到这一研究,据我们所知,我们是第一个在 GPU 上使用新型树数据库的人。这种新颖的树数据库允许以二叉树的形式高效存储状态。除了能实现树形压缩外,我们还提出了两种新的散列方案:紧凑型布谷鸟和紧凑型多重函数。这些方案可以使用克里压缩来紧凑地存储树根。除了对树数据库算法的深入讨论,我们还介绍了名为 GPUexplore 3.0 的工具的输入语言和工作流程。最后,我们解释了如何将算法扩展到利用同一台机器上的多个 GPU。实验显示,单 GPU 处理速度高达每秒 1.44 亿个状态,而 32 核 LTSmin 的处理速度仅为 2000 万个状态。在多 GPU 设置中,工作负载和存储分布达到最佳状态,而且当 GPU 数量增加时,性能甚至经常会受到积极影响。总体而言,与使用一个和两个 GPU 时相比,使用四个 GPU 可实现高达 1.9 倍的对数加速。我们相信,通过 GPU 之间更快的 P2P 通信,可以轻松实现线性加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Runtime support for CPU-GPU high-performance computing on distributed memory platforms Using open-science workflow tools to produce SCEC CyberShake physics-based probabilistic seismic hazard models The fast and the capacious: memory-efficient multi-GPU accelerated explicit state space exploration with GPUexplore 3.0 Asgard: Are NoSQL databases suitable for ephemeral data in serverless workloads? SNDVI: a new scalable serverless framework to compute NDVI
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1