MAPF-GPT:多代理规模寻路的模仿学习

Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, Alexey Skrynnik
{"title":"MAPF-GPT:多代理规模寻路的模仿学习","authors":"Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, Alexey Skrynnik","doi":"arxiv-2409.00134","DOIUrl":null,"url":null,"abstract":"Multi-agent pathfinding (MAPF) is a challenging computational problem that\ntypically requires to find collision-free paths for multiple agents in a shared\nenvironment. Solving MAPF optimally is NP-hard, yet efficient solutions are\ncritical for numerous applications, including automated warehouses and\ntransportation systems. Recently, learning-based approaches to MAPF have gained\nattention, particularly those leveraging deep reinforcement learning. Following\ncurrent trends in machine learning, we have created a foundation model for the\nMAPF problems called MAPF-GPT. Using imitation learning, we have trained a\npolicy on a set of pre-collected sub-optimal expert trajectories that can\ngenerate actions in conditions of partial observability without additional\nheuristics, reward functions, or communication with other agents. The resulting\nMAPF-GPT model demonstrates zero-shot learning abilities when solving the MAPF\nproblem instances that were not present in the training dataset. We show that\nMAPF-GPT notably outperforms the current best-performing learnable-MAPF solvers\non a diverse range of problem instances and is efficient in terms of\ncomputation (in the inference mode).","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale\",\"authors\":\"Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, Alexey Skrynnik\",\"doi\":\"arxiv-2409.00134\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-agent pathfinding (MAPF) is a challenging computational problem that\\ntypically requires to find collision-free paths for multiple agents in a shared\\nenvironment. Solving MAPF optimally is NP-hard, yet efficient solutions are\\ncritical for numerous applications, including automated warehouses and\\ntransportation systems. Recently, learning-based approaches to MAPF have gained\\nattention, particularly those leveraging deep reinforcement learning. Following\\ncurrent trends in machine learning, we have created a foundation model for the\\nMAPF problems called MAPF-GPT. Using imitation learning, we have trained a\\npolicy on a set of pre-collected sub-optimal expert trajectories that can\\ngenerate actions in conditions of partial observability without additional\\nheuristics, reward functions, or communication with other agents. The resulting\\nMAPF-GPT model demonstrates zero-shot learning abilities when solving the MAPF\\nproblem instances that were not present in the training dataset. We show that\\nMAPF-GPT notably outperforms the current best-performing learnable-MAPF solvers\\non a diverse range of problem instances and is efficient in terms of\\ncomputation (in the inference mode).\",\"PeriodicalId\":501315,\"journal\":{\"name\":\"arXiv - CS - Multiagent Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multiagent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.00134\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.00134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

多代理寻路(MAPF)是一个具有挑战性的计算问题,通常需要为共享环境中的多个代理寻找无碰撞路径。以最佳方式求解 MAPF 是 NP 难题,但高效的解决方案对自动化仓库和运输系统等众多应用至关重要。最近,基于学习的 MAPF 方法备受关注,尤其是那些利用深度强化学习的方法。顺应当前机器学习的发展趋势,我们为 MAPF 问题创建了一个名为 MAPF-GPT 的基础模型。利用模仿学习,我们在一组预先收集的次优专家轨迹上训练了政策,这些轨迹可以在部分可观测条件下生成行动,而无需额外的启发式方法、奖励函数或与其他代理的通信。由此产生的 MAPF-GPT 模型在解决训练数据集中不存在的 MAPFproblem 实例时,表现出了 "零 "学习能力。我们的研究表明,MAPF-GPT 在各种问题实例中的表现明显优于目前表现最好的可学习 MAPF 求解器,而且在计算方面(推理模式下)也很高效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale
Multi-agent pathfinding (MAPF) is a challenging computational problem that typically requires to find collision-free paths for multiple agents in a shared environment. Solving MAPF optimally is NP-hard, yet efficient solutions are critical for numerous applications, including automated warehouses and transportation systems. Recently, learning-based approaches to MAPF have gained attention, particularly those leveraging deep reinforcement learning. Following current trends in machine learning, we have created a foundation model for the MAPF problems called MAPF-GPT. Using imitation learning, we have trained a policy on a set of pre-collected sub-optimal expert trajectories that can generate actions in conditions of partial observability without additional heuristics, reward functions, or communication with other agents. The resulting MAPF-GPT model demonstrates zero-shot learning abilities when solving the MAPF problem instances that were not present in the training dataset. We show that MAPF-GPT notably outperforms the current best-performing learnable-MAPF solvers on a diverse range of problem instances and is efficient in terms of computation (in the inference mode).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning HARP: Human-Assisted Regrouping with Permutation Invariant Critic for Multi-Agent Reinforcement Learning On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark Multi-agent Path Finding in Continuous Environment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1