The fast and the capacious: memory-efficient multi-GPU accelerated explicit state space exploration with GPUexplore 3.0

Frontiers in High Performance Computing Pub Date : 2024-03-13 DOI:10.3389/fhpcp.2024.1285349

Anton Wijs, Muhammad Osama

{"title":"The fast and the capacious: memory-efficient multi-GPU accelerated explicit state space exploration with GPUexplore 3.0","authors":"Anton Wijs, Muhammad Osama","doi":"10.3389/fhpcp.2024.1285349","DOIUrl":null,"url":null,"abstract":"The GPU acceleration of explicit state space exploration, for explicit-state model checking, has been the subject of previous research, but to date, the tools have been limited in their applicability and in their practical use. Considering this research, to our knowledge, we are the first to use a novel tree database for GPUs. This novel tree database allows high-performant, memory-efficient storage of states in the form of binary trees. Besides the tree compression this enables, we also propose two new hashing schemes, compact-cuckoo and compact multiple-functions. These schemes enable the use of Cleary compression to compactly store tree roots. Besides an in-depth discussion of the tree database algorithms, the input language and workflow of our tool, called GPUexplore 3.0, are presented. Finally, we explain how the algorithms can be extended to exploit multiple GPUs that reside on the same machine. Experiments show single-GPU processing speeds of up to 144 million states per second compared to 20 million states achieved by 32-core LTSmin. In the multi-GPU setting, workload and storage distributions are optimal, and, frequently, performance is even positively impacted when the number of GPUs is increased. Overall, a logarithmic acceleration up to 1.9× was achieved with four GPUs, compared to what was achieved with one and two GPUs. We believe that a linear speedup can be easily accomplished with faster P2P communications between the GPUs.","PeriodicalId":399190,"journal":{"name":"Frontiers in High Performance Computing","volume":"137 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fhpcp.2024.1285349","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The GPU acceleration of explicit state space exploration, for explicit-state model checking, has been the subject of previous research, but to date, the tools have been limited in their applicability and in their practical use. Considering this research, to our knowledge, we are the first to use a novel tree database for GPUs. This novel tree database allows high-performant, memory-efficient storage of states in the form of binary trees. Besides the tree compression this enables, we also propose two new hashing schemes, compact-cuckoo and compact multiple-functions. These schemes enable the use of Cleary compression to compactly store tree roots. Besides an in-depth discussion of the tree database algorithms, the input language and workflow of our tool, called GPUexplore 3.0, are presented. Finally, we explain how the algorithms can be extended to exploit multiple GPUs that reside on the same machine. Experiments show single-GPU processing speeds of up to 144 million states per second compared to 20 million states achieved by 32-core LTSmin. In the multi-GPU setting, workload and storage distributions are optimal, and, frequently, performance is even positively impacted when the number of GPUs is increased. Overall, a logarithmic acceleration up to 1.9× was achieved with four GPUs, compared to what was achieved with one and two GPUs. We believe that a linear speedup can be easily accomplished with faster P2P communications between the GPUs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

速度与容量：使用 GPUexplore 3.0 进行内存效率高的多 GPU 加速显式状态空间探索

GPU 加速显式状态空间探索（用于显式状态模型检查）一直是以前研究的主题，但迄今为止，这些工具在适用性和实际使用方面都很有限。考虑到这一研究，据我们所知，我们是第一个在 GPU 上使用新型树数据库的人。这种新颖的树数据库允许以二叉树的形式高效存储状态。除了能实现树形压缩外，我们还提出了两种新的散列方案：紧凑型布谷鸟和紧凑型多重函数。这些方案可以使用克里压缩来紧凑地存储树根。除了对树数据库算法的深入讨论，我们还介绍了名为 GPUexplore 3.0 的工具的输入语言和工作流程。最后，我们解释了如何将算法扩展到利用同一台机器上的多个 GPU。实验显示，单 GPU 处理速度高达每秒 1.44 亿个状态，而 32 核 LTSmin 的处理速度仅为 2000 万个状态。在多 GPU 设置中，工作负载和存储分布达到最佳状态，而且当 GPU 数量增加时，性能甚至经常会受到积极影响。总体而言，与使用一个和两个 GPU 时相比，使用四个 GPU 可实现高达 1.9 倍的对数加速。我们相信，通过 GPU 之间更快的 P2P 通信，可以轻松实现线性加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Frontiers in High Performance Computing

自引率

0.00%

发文量