KnowPC: Knowledge-Driven Programmatic Reinforcement Learning for Zero-shot Coordination

Yin Gu, Qi Liu, Zhi Li, Kai Zhang
{"title":"KnowPC: Knowledge-Driven Programmatic Reinforcement Learning for Zero-shot Coordination","authors":"Yin Gu, Qi Liu, Zhi Li, Kai Zhang","doi":"arxiv-2408.04336","DOIUrl":null,"url":null,"abstract":"Zero-shot coordination (ZSC) remains a major challenge in the cooperative AI\nfield, which aims to learn an agent to cooperate with an unseen partner in\ntraining environments or even novel environments. In recent years, a popular\nZSC solution paradigm has been deep reinforcement learning (DRL) combined with\nadvanced self-play or population-based methods to enhance the neural policy's\nability to handle unseen partners. Despite some success, these approaches\nusually rely on black-box neural networks as the policy function. However,\nneural networks typically lack interpretability and logic, making the learned\npolicies difficult for partners (e.g., humans) to understand and limiting their\ngeneralization ability. These shortcomings hinder the application of\nreinforcement learning methods in diverse cooperative scenarios.We suggest to\nrepresent the agent's policy with an interpretable program. Unlike neural\nnetworks, programs contain stable logic, but they are non-differentiable and\ndifficult to optimize.To automatically learn such programs, we introduce\nKnowledge-driven Programmatic reinforcement learning for zero-shot Coordination\n(KnowPC). We first define a foundational Domain-Specific Language (DSL),\nincluding program structures, conditional primitives, and action primitives. A\nsignificant challenge is the vast program search space, making it difficult to\nfind high-performing programs efficiently. To address this, KnowPC integrates\nan extractor and an reasoner. The extractor discovers environmental transition\nknowledge from multi-agent interaction trajectories, while the reasoner deduces\nthe preconditions of each action primitive based on the transition knowledge.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"74 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04336","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Zero-shot coordination (ZSC) remains a major challenge in the cooperative AI field, which aims to learn an agent to cooperate with an unseen partner in training environments or even novel environments. In recent years, a popular ZSC solution paradigm has been deep reinforcement learning (DRL) combined with advanced self-play or population-based methods to enhance the neural policy's ability to handle unseen partners. Despite some success, these approaches usually rely on black-box neural networks as the policy function. However, neural networks typically lack interpretability and logic, making the learned policies difficult for partners (e.g., humans) to understand and limiting their generalization ability. These shortcomings hinder the application of reinforcement learning methods in diverse cooperative scenarios.We suggest to represent the agent's policy with an interpretable program. Unlike neural networks, programs contain stable logic, but they are non-differentiable and difficult to optimize.To automatically learn such programs, we introduce Knowledge-driven Programmatic reinforcement learning for zero-shot Coordination (KnowPC). We first define a foundational Domain-Specific Language (DSL), including program structures, conditional primitives, and action primitives. A significant challenge is the vast program search space, making it difficult to find high-performing programs efficiently. To address this, KnowPC integrates an extractor and an reasoner. The extractor discovers environmental transition knowledge from multi-agent interaction trajectories, while the reasoner deduces the preconditions of each action primitive based on the transition knowledge.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
KnowPC:知识驱动的程序化强化学习,实现零距离协调
零次协调(Zero-shot coordination,ZSC)仍然是合作人工智能领域的一大挑战,其目的是学习代理如何在训练环境甚至新环境中与未曾谋面的伙伴合作。近年来,一种流行的 ZSC 解决范例是将深度强化学习(DRL)与高级自我游戏或基于种群的方法相结合,以增强神经策略处理未知伙伴的能力。尽管取得了一些成功,但这些方法通常依赖黑盒神经网络作为策略函数。然而,神经网络通常缺乏可解释性和逻辑性,使得伙伴(如人类)难以理解所学习的策略,并限制了其泛化能力。我们建议用可解释的程序来表示代理的策略。与神经网络不同,程序包含稳定的逻辑,但它们是无差别的,难以优化。为了自动学习这样的程序,我们引入了知识驱动的零次协调程序强化学习(Knowledge-driven Programmatic reinforcement learning for zero-shot Coordination,KnowPC)。我们首先定义了一种基础性的特定领域语言(DSL),包括程序结构、条件基元和动作基元。巨大的程序搜索空间是一项重大挑战,它使得高效地找到高性能程序变得十分困难。为了解决这个问题,KnowPC 集成了提取器和推理器。提取器从多代理交互轨迹中发现环境转换知识,推理器则根据转换知识推导出每个动作基元的前提条件。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Abductive explanations of classifiers under constraints: Complexity and properties Explaining Non-monotonic Normative Reasoning using Argumentation Theory with Deontic Logic Towards Explainable Goal Recognition Using Weight of Evidence (WoE): A Human-Centered Approach A Metric Hybrid Planning Approach to Solving Pandemic Planning Problems with Simple SIR Models Neural Networks for Vehicle Routing Problem
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1