PPB-MCTS：新型分布式内存并行部分后向传播蒙特卡洛树搜索算法

IF 3.4 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Journal of Parallel and Distributed Computing Pub Date : 2024-06-26 DOI:10.1016/j.jpdc.2024.104944

Yashar Naderzadeh , Daniel Grosu , Ratna Babu Chinnam

{"title":"PPB-MCTS：新型分布式内存并行部分后向传播蒙特卡洛树搜索算法","authors":"Yashar Naderzadeh , Daniel Grosu , Ratna Babu Chinnam","doi":"10.1016/j.jpdc.2024.104944","DOIUrl":null,"url":null,"abstract":"<div><p>Monte-Carlo Tree Search (MCTS) is an adaptive and heuristic tree-search algorithm designed to uncover sub-optimal actions at each decision-making point. This method progressively constructs a search tree by gathering samples throughout its execution. Predominantly applied within the realm of gaming, MCTS has exhibited exceptional achievements. Additionally, it has displayed promising outcomes when employed to solve NP-hard combinatorial optimization problems. MCTS has been adapted for distributed-memory parallel platforms. The primary challenges associated with distributed-memory parallel MCTS are the substantial communication overhead and the necessity to balance the computational load among various processes. In this work, we introduce a novel distributed-memory parallel MCTS algorithm with partial backpropagations, referred to as <em>Parallel Partial-Backpropagation MCTS</em> (<span>PPB-MCTS</span>). Our design approach aims to significantly reduce the communication overhead while maintaining, or even slightly improving, the performance in the context of combinatorial optimization problems. To address the communication overhead challenge, we propose a strategy involving transmitting an additional backpropagation message. This strategy avoids attaching an information table to the communication messages exchanged by the processes, thus reducing the communication overhead. Furthermore, this approach contributes to enhancing the decision-making accuracy during the selection phase. The load balancing issue is also effectively addressed by implementing a shared transposition table among the parallel processes. Furthermore, we introduce two primary methods for managing duplicate states within distributed-memory parallel MCTS, drawing upon techniques utilized in addressing duplicate states within sequential MCTS. Duplicate states can transform the conventional search tree into a Directed Acyclic Graph (DAG). To evaluate the performance of our proposed parallel algorithm, we conduct an extensive series of experiments on solving instances of the Job-Shop Scheduling Problem (JSSP) and the Weighted Set-Cover Problem (WSCP). These problems are recognized for their complexity and classified as NP-hard combinatorial optimization problems with considerable relevance within industrial applications. The experiments are performed on a cluster of computers with many cores. The empirical results highlight the enhanced scalability of our algorithm compared to that of the existing distributed-memory parallel MCTS algorithms. As the number of processes increases, our algorithm demonstrates increased rollout efficiency while maintaining an improved load balance across processes.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"193 ","pages":"Article 104944"},"PeriodicalIF":3.4000,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PPB-MCTS: A novel distributed-memory parallel partial-backpropagation Monte Carlo tree search algorithm\",\"authors\":\"Yashar Naderzadeh , Daniel Grosu , Ratna Babu Chinnam\",\"doi\":\"10.1016/j.jpdc.2024.104944\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Monte-Carlo Tree Search (MCTS) is an adaptive and heuristic tree-search algorithm designed to uncover sub-optimal actions at each decision-making point. This method progressively constructs a search tree by gathering samples throughout its execution. Predominantly applied within the realm of gaming, MCTS has exhibited exceptional achievements. Additionally, it has displayed promising outcomes when employed to solve NP-hard combinatorial optimization problems. MCTS has been adapted for distributed-memory parallel platforms. The primary challenges associated with distributed-memory parallel MCTS are the substantial communication overhead and the necessity to balance the computational load among various processes. In this work, we introduce a novel distributed-memory parallel MCTS algorithm with partial backpropagations, referred to as <em>Parallel Partial-Backpropagation MCTS</em> (<span>PPB-MCTS</span>). Our design approach aims to significantly reduce the communication overhead while maintaining, or even slightly improving, the performance in the context of combinatorial optimization problems. To address the communication overhead challenge, we propose a strategy involving transmitting an additional backpropagation message. This strategy avoids attaching an information table to the communication messages exchanged by the processes, thus reducing the communication overhead. Furthermore, this approach contributes to enhancing the decision-making accuracy during the selection phase. The load balancing issue is also effectively addressed by implementing a shared transposition table among the parallel processes. Furthermore, we introduce two primary methods for managing duplicate states within distributed-memory parallel MCTS, drawing upon techniques utilized in addressing duplicate states within sequential MCTS. Duplicate states can transform the conventional search tree into a Directed Acyclic Graph (DAG). To evaluate the performance of our proposed parallel algorithm, we conduct an extensive series of experiments on solving instances of the Job-Shop Scheduling Problem (JSSP) and the Weighted Set-Cover Problem (WSCP). These problems are recognized for their complexity and classified as NP-hard combinatorial optimization problems with considerable relevance within industrial applications. The experiments are performed on a cluster of computers with many cores. The empirical results highlight the enhanced scalability of our algorithm compared to that of the existing distributed-memory parallel MCTS algorithms. As the number of processes increases, our algorithm demonstrates increased rollout efficiency while maintaining an improved load balance across processes.</p></div>\",\"PeriodicalId\":54775,\"journal\":{\"name\":\"Journal of Parallel and Distributed Computing\",\"volume\":\"193 \",\"pages\":\"Article 104944\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Parallel and Distributed Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0743731524001084\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Parallel and Distributed Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0743731524001084","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

蒙特卡洛树搜索（Monte-Carlo Tree Search，MCTS）是一种自适应的启发式树搜索算法，旨在发现每个决策点的次优行动。这种方法在执行过程中通过收集样本逐步构建搜索树。MCTS 主要应用于游戏领域，取得了卓越的成就。此外，在解决 NP 难度的组合优化问题时，它也取得了可喜的成果。MCTS 适用于分布式内存并行平台。与分布式内存并行 MCTS 相关的主要挑战是巨大的通信开销和在不同进程间平衡计算负载的必要性。在这项工作中，我们介绍了一种新型分布式内存并行 MCTS 算法，该算法采用部分反向传播，被称为并行部分反向传播 MCTS（PPB-MCTS）。我们的设计方法旨在大幅降低通信开销，同时保持甚至略微提高组合优化问题的性能。为解决通信开销难题，我们提出了一种涉及传输额外反向传播信息的策略。这种策略避免了在进程交换的通信信息中附加信息表，从而减少了通信开销。此外，这种方法还有助于提高选择阶段的决策准确性。通过在并行进程间实施共享转置表，负载平衡问题也得到了有效解决。此外，我们还借鉴了处理顺序 MCTS 中重复状态的技术，介绍了在分布式内存并行 MCTS 中管理重复状态的两种主要方法。重复状态会将传统的搜索树转化为有向无环图（DAG）。为了评估我们提出的并行算法的性能，我们在求解工作车间调度问题（JSSP）和加权集合覆盖问题（WSCP）的实例时进行了一系列广泛的实验。这些问题的复杂性是公认的，被归类为 NP-硬组合优化问题，在工业应用中具有相当大的相关性。实验在多核计算机集群上进行。实证结果表明，与现有的分布式内存并行 MCTS 算法相比，我们的算法具有更强的可扩展性。随着进程数量的增加，我们的算法显示出更高的扩展效率，同时保持了各进程之间更好的负载平衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

PPB-MCTS: A novel distributed-memory parallel partial-backpropagation Monte Carlo tree search algorithm

Monte-Carlo Tree Search (MCTS) is an adaptive and heuristic tree-search algorithm designed to uncover sub-optimal actions at each decision-making point. This method progressively constructs a search tree by gathering samples throughout its execution. Predominantly applied within the realm of gaming, MCTS has exhibited exceptional achievements. Additionally, it has displayed promising outcomes when employed to solve NP-hard combinatorial optimization problems. MCTS has been adapted for distributed-memory parallel platforms. The primary challenges associated with distributed-memory parallel MCTS are the substantial communication overhead and the necessity to balance the computational load among various processes. In this work, we introduce a novel distributed-memory parallel MCTS algorithm with partial backpropagations, referred to as Parallel Partial-Backpropagation MCTS (PPB-MCTS). Our design approach aims to significantly reduce the communication overhead while maintaining, or even slightly improving, the performance in the context of combinatorial optimization problems. To address the communication overhead challenge, we propose a strategy involving transmitting an additional backpropagation message. This strategy avoids attaching an information table to the communication messages exchanged by the processes, thus reducing the communication overhead. Furthermore, this approach contributes to enhancing the decision-making accuracy during the selection phase. The load balancing issue is also effectively addressed by implementing a shared transposition table among the parallel processes. Furthermore, we introduce two primary methods for managing duplicate states within distributed-memory parallel MCTS, drawing upon techniques utilized in addressing duplicate states within sequential MCTS. Duplicate states can transform the conventional search tree into a Directed Acyclic Graph (DAG). To evaluate the performance of our proposed parallel algorithm, we conduct an extensive series of experiments on solving instances of the Job-Shop Scheduling Problem (JSSP) and the Weighted Set-Cover Problem (WSCP). These problems are recognized for their complexity and classified as NP-hard combinatorial optimization problems with considerable relevance within industrial applications. The experiments are performed on a cluster of computers with many cores. The empirical results highlight the enhanced scalability of our algorithm compared to that of the existing distributed-memory parallel MCTS algorithms. As the number of processes increases, our algorithm demonstrates increased rollout efficiency while maintaining an improved load balance across processes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Parallel and Distributed Computing 工程技术-计算机：理论方法

CiteScore

10.30

自引率

2.60%

发文量

172

审稿时长

12 months

期刊介绍： This international journal is directed to researchers, engineers, educators, managers, programmers, and users of computers who have particular interests in parallel processing and/or distributed computing. The Journal of Parallel and Distributed Computing publishes original research papers and timely review articles on the theory, design, evaluation, and use of parallel and/or distributed computing systems. The journal also features special issues on these topics; again covering the full range from the design to the use of our targeted systems.