{"title":"蒙特卡洛树搜索动态最短路径拦截","authors":"Alexey A. Bochkarev, J. Cole Smith","doi":"10.1002/net.22243","DOIUrl":null,"url":null,"abstract":"We present a reinforcement learning‐based heuristic for a two‐player interdiction game called the dynamic shortest path interdiction problem (DSPI). The DSPI involves an evader and an interdictor who take turns in the problem, with the interdictor selecting a set of arcs to attack and the evader choosing an arc to traverse at each step of the game. Our model employs the Monte Carlo tree search framework to learn a policy for the players using randomized roll‐outs. This policy is stored as an asymmetric game tree and can be further refined as the game unfolds. We leverage alpha–beta pruning and existing bounding schemes in the literature to prune suboptimal branches. Our numerical experiments demonstrate that the prescribed approach yields near‐optimal solutions in many cases and allows for flexibility in balancing solution quality and computational effort.","PeriodicalId":54734,"journal":{"name":"Networks","volume":"1 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Monte Carlo tree search for dynamic shortest‐path interdiction\",\"authors\":\"Alexey A. Bochkarev, J. Cole Smith\",\"doi\":\"10.1002/net.22243\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a reinforcement learning‐based heuristic for a two‐player interdiction game called the dynamic shortest path interdiction problem (DSPI). The DSPI involves an evader and an interdictor who take turns in the problem, with the interdictor selecting a set of arcs to attack and the evader choosing an arc to traverse at each step of the game. Our model employs the Monte Carlo tree search framework to learn a policy for the players using randomized roll‐outs. This policy is stored as an asymmetric game tree and can be further refined as the game unfolds. We leverage alpha–beta pruning and existing bounding schemes in the literature to prune suboptimal branches. Our numerical experiments demonstrate that the prescribed approach yields near‐optimal solutions in many cases and allows for flexibility in balancing solution quality and computational effort.\",\"PeriodicalId\":54734,\"journal\":{\"name\":\"Networks\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2024-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1002/net.22243\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Networks","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1002/net.22243","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Monte Carlo tree search for dynamic shortest‐path interdiction
We present a reinforcement learning‐based heuristic for a two‐player interdiction game called the dynamic shortest path interdiction problem (DSPI). The DSPI involves an evader and an interdictor who take turns in the problem, with the interdictor selecting a set of arcs to attack and the evader choosing an arc to traverse at each step of the game. Our model employs the Monte Carlo tree search framework to learn a policy for the players using randomized roll‐outs. This policy is stored as an asymmetric game tree and can be further refined as the game unfolds. We leverage alpha–beta pruning and existing bounding schemes in the literature to prune suboptimal branches. Our numerical experiments demonstrate that the prescribed approach yields near‐optimal solutions in many cases and allows for flexibility in balancing solution quality and computational effort.
期刊介绍:
Network problems are pervasive in our modern technological society, as witnessed by our reliance on physical networks that provide power, communication, and transportation. As well, a number of processes can be modeled using logical networks, as in the scheduling of interdependent tasks, the dating of archaeological artifacts, or the compilation of subroutines comprising a large computer program. Networks provide a common framework for posing and studying problems that often have wider applicability than their originating context.
The goal of this journal is to provide a central forum for the distribution of timely information about network problems, their design and mathematical analysis, as well as efficient algorithms for carrying out optimization on networks. The nonstandard modeling of diverse processes using networks and network concepts is also of interest. Consequently, the disciplines that are useful in studying networks are varied, including applied mathematics, operations research, computer science, discrete mathematics, and economics.
Networks publishes material on the analytic modeling of problems using networks, the mathematical analysis of network problems, the design of computationally efficient network algorithms, and innovative case studies of successful network applications. We do not typically publish works that fall in the realm of pure graph theory (without significant algorithmic and modeling contributions) or papers that deal with engineering aspects of network design. Since the audience for this journal is then necessarily broad, articles that impact multiple application areas or that creatively use new or existing methodologies are especially appropriate. We seek to publish original, well-written research papers that make a substantive contribution to the knowledge base. In addition, tutorial and survey articles are welcomed. All manuscripts are carefully refereed.