Improving Multi-Agent Reinforcement Learning for Beer Game by Reward Design Based on Payment Mechanism

Masaaki Hori, Toshihiro Matsui
{"title":"Improving Multi-Agent Reinforcement Learning for Beer Game by Reward Design Based on Payment Mechanism","authors":"Masaaki Hori, Toshihiro Matsui","doi":"10.52731/ijscai.v7.i2.789","DOIUrl":null,"url":null,"abstract":"Supply chain management aims to maximize profits among supply chain partners by managing the flow of information and products. Multiagent reinforcement learning in artificial intelligence research fields has been applied to supply chain management. The beer game is an example problem in supply chain management and has also been studied as a cooperation problem in multiagent systems. In the previous study, a solution method SRDQN that is based on deep reinforcement learning and reward shaping has been applied to the beer game. By introducing a single reinforcement learning agent with SRDQN as a participant in the beer game, the cost of beer inventory was reduced. However, the previous study has not addressed the case of multiagent reinforcement learning due to the difficulties in cooperation among agents. To address the multiagent cases, we apply a reward shaping technique RDPM based on mechanism design to SRDQN and improve cooperative policies in multiagent reinforcement learning. Furthermore, we propose two reward design methods with modifications to the state value function designs in RDPM to address various consumer demands for beers in the supply chain. And then we empirically evaluate the effectiveness of the proposed approaches.","PeriodicalId":495454,"journal":{"name":"International journal of smart computing and artificial intelligence","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of smart computing and artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.52731/ijscai.v7.i2.789","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Supply chain management aims to maximize profits among supply chain partners by managing the flow of information and products. Multiagent reinforcement learning in artificial intelligence research fields has been applied to supply chain management. The beer game is an example problem in supply chain management and has also been studied as a cooperation problem in multiagent systems. In the previous study, a solution method SRDQN that is based on deep reinforcement learning and reward shaping has been applied to the beer game. By introducing a single reinforcement learning agent with SRDQN as a participant in the beer game, the cost of beer inventory was reduced. However, the previous study has not addressed the case of multiagent reinforcement learning due to the difficulties in cooperation among agents. To address the multiagent cases, we apply a reward shaping technique RDPM based on mechanism design to SRDQN and improve cooperative policies in multiagent reinforcement learning. Furthermore, we propose two reward design methods with modifications to the state value function designs in RDPM to address various consumer demands for beers in the supply chain. And then we empirically evaluate the effectiveness of the proposed approaches.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于支付机制的奖励设计改进啤酒博弈多智能体强化学习
供应链管理的目的是通过管理信息和产品的流动,使供应链合作伙伴之间的利润最大化。人工智能研究领域中的多智能体强化学习已经应用到供应链管理中。啤酒博弈是供应链管理中的一个典型问题,也是多智能体系统中的一个合作问题。在之前的研究中,我们将一种基于深度强化学习和奖励塑造的SRDQN求解方法应用于啤酒博弈。通过引入SRDQN作为啤酒博弈参与者的单个强化学习智能体,降低了啤酒库存成本。然而,由于智能体之间的合作困难,以往的研究并没有解决多智能体强化学习的情况。为了解决多智能体案例,我们将基于机制设计的奖励塑造技术RDPM应用于SRDQN,并改进了多智能体强化学习中的合作策略。此外,我们提出了两种奖励设计方法,修改了RDPM中的状态价值函数设计,以解决供应链中消费者对啤酒的各种需求。然后对所提出方法的有效性进行了实证评价。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Improving Abstractive Summarization by Transfer Learning with Adaptive Document Selection An Extension of Particle Swarm Optimization to Identify Multiple Peaks using Re-diversification in Static and Dynamic Environments Improving Multi-Agent Reinforcement Learning for Beer Game by Reward Design Based on Payment Mechanism
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1