蒙特卡洛树搜索动态最短路径拦截

IF 1.6 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Networks Pub Date : 2024-07-10 DOI:10.1002/net.22243
Alexey A. Bochkarev, J. Cole Smith
{"title":"蒙特卡洛树搜索动态最短路径拦截","authors":"Alexey A. Bochkarev, J. Cole Smith","doi":"10.1002/net.22243","DOIUrl":null,"url":null,"abstract":"We present a reinforcement learning‐based heuristic for a two‐player interdiction game called the dynamic shortest path interdiction problem (DSPI). The DSPI involves an evader and an interdictor who take turns in the problem, with the interdictor selecting a set of arcs to attack and the evader choosing an arc to traverse at each step of the game. Our model employs the Monte Carlo tree search framework to learn a policy for the players using randomized roll‐outs. This policy is stored as an asymmetric game tree and can be further refined as the game unfolds. We leverage alpha–beta pruning and existing bounding schemes in the literature to prune suboptimal branches. Our numerical experiments demonstrate that the prescribed approach yields near‐optimal solutions in many cases and allows for flexibility in balancing solution quality and computational effort.","PeriodicalId":54734,"journal":{"name":"Networks","volume":null,"pages":null},"PeriodicalIF":1.6000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Monte Carlo tree search for dynamic shortest‐path interdiction\",\"authors\":\"Alexey A. Bochkarev, J. Cole Smith\",\"doi\":\"10.1002/net.22243\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a reinforcement learning‐based heuristic for a two‐player interdiction game called the dynamic shortest path interdiction problem (DSPI). The DSPI involves an evader and an interdictor who take turns in the problem, with the interdictor selecting a set of arcs to attack and the evader choosing an arc to traverse at each step of the game. Our model employs the Monte Carlo tree search framework to learn a policy for the players using randomized roll‐outs. This policy is stored as an asymmetric game tree and can be further refined as the game unfolds. We leverage alpha–beta pruning and existing bounding schemes in the literature to prune suboptimal branches. Our numerical experiments demonstrate that the prescribed approach yields near‐optimal solutions in many cases and allows for flexibility in balancing solution quality and computational effort.\",\"PeriodicalId\":54734,\"journal\":{\"name\":\"Networks\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2024-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1002/net.22243\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Networks","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1002/net.22243","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

我们针对双人拦截游戏--动态最短路径拦截问题(DSPI)--提出了一种基于强化学习的启发式。DSPI 涉及一个逃避者和一个拦截者,他们轮流参与游戏,拦截者选择一组弧线进行攻击,逃避者则在游戏的每一步选择一条弧线进行穿越。我们的模型采用蒙特卡洛树搜索框架,利用随机滚动为玩家学习策略。该策略以非对称博弈树的形式存储,并可在博弈过程中进一步完善。我们利用阿尔法-贝塔修剪和文献中现有的约束方案来修剪次优分支。我们的数值实验证明,规定的方法在很多情况下都能产生接近最优的解决方案,并能灵活地平衡解决方案的质量和计算量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Monte Carlo tree search for dynamic shortest‐path interdiction
We present a reinforcement learning‐based heuristic for a two‐player interdiction game called the dynamic shortest path interdiction problem (DSPI). The DSPI involves an evader and an interdictor who take turns in the problem, with the interdictor selecting a set of arcs to attack and the evader choosing an arc to traverse at each step of the game. Our model employs the Monte Carlo tree search framework to learn a policy for the players using randomized roll‐outs. This policy is stored as an asymmetric game tree and can be further refined as the game unfolds. We leverage alpha–beta pruning and existing bounding schemes in the literature to prune suboptimal branches. Our numerical experiments demonstrate that the prescribed approach yields near‐optimal solutions in many cases and allows for flexibility in balancing solution quality and computational effort.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Networks
Networks 工程技术-计算机:硬件
CiteScore
4.40
自引率
9.50%
发文量
46
审稿时长
12 months
期刊介绍: Network problems are pervasive in our modern technological society, as witnessed by our reliance on physical networks that provide power, communication, and transportation. As well, a number of processes can be modeled using logical networks, as in the scheduling of interdependent tasks, the dating of archaeological artifacts, or the compilation of subroutines comprising a large computer program. Networks provide a common framework for posing and studying problems that often have wider applicability than their originating context. The goal of this journal is to provide a central forum for the distribution of timely information about network problems, their design and mathematical analysis, as well as efficient algorithms for carrying out optimization on networks. The nonstandard modeling of diverse processes using networks and network concepts is also of interest. Consequently, the disciplines that are useful in studying networks are varied, including applied mathematics, operations research, computer science, discrete mathematics, and economics. Networks publishes material on the analytic modeling of problems using networks, the mathematical analysis of network problems, the design of computationally efficient network algorithms, and innovative case studies of successful network applications. We do not typically publish works that fall in the realm of pure graph theory (without significant algorithmic and modeling contributions) or papers that deal with engineering aspects of network design. Since the audience for this journal is then necessarily broad, articles that impact multiple application areas or that creatively use new or existing methodologies are especially appropriate. We seek to publish original, well-written research papers that make a substantive contribution to the knowledge base. In addition, tutorial and survey articles are welcomed. All manuscripts are carefully refereed.
期刊最新文献
Finding conserved low‐diameter subgraphs in social and biological networks A survey on optimization studies of group centrality metrics A dynamic programming algorithm for order picking in robotic mobile fulfillment systems A heuristic with a performance guarantee for the commodity constrained split delivery vehicle routing problem Selecting fast algorithms for the capacitated vehicle routing problem with machine learning techniques
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1