Improving Multi-Agent Reinforcement Learning for Beer Game by Reward Design Based on Payment Mechanism

International journal of smart computing and artificial intelligence Pub Date : 2023-01-01 DOI:10.52731/ijscai.v7.i2.789

Masaaki Hori, Toshihiro Matsui

{"title":"Improving Multi-Agent Reinforcement Learning for Beer Game by Reward Design Based on Payment Mechanism","authors":"Masaaki Hori, Toshihiro Matsui","doi":"10.52731/ijscai.v7.i2.789","DOIUrl":null,"url":null,"abstract":"Supply chain management aims to maximize profits among supply chain partners by managing the flow of information and products. Multiagent reinforcement learning in artificial intelligence research fields has been applied to supply chain management. The beer game is an example problem in supply chain management and has also been studied as a cooperation problem in multiagent systems. In the previous study, a solution method SRDQN that is based on deep reinforcement learning and reward shaping has been applied to the beer game. By introducing a single reinforcement learning agent with SRDQN as a participant in the beer game, the cost of beer inventory was reduced. However, the previous study has not addressed the case of multiagent reinforcement learning due to the difficulties in cooperation among agents. To address the multiagent cases, we apply a reward shaping technique RDPM based on mechanism design to SRDQN and improve cooperative policies in multiagent reinforcement learning. Furthermore, we propose two reward design methods with modifications to the state value function designs in RDPM to address various consumer demands for beers in the supply chain. And then we empirically evaluate the effectiveness of the proposed approaches.","PeriodicalId":495454,"journal":{"name":"International journal of smart computing and artificial intelligence","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of smart computing and artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.52731/ijscai.v7.i2.789","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Supply chain management aims to maximize profits among supply chain partners by managing the flow of information and products. Multiagent reinforcement learning in artificial intelligence research fields has been applied to supply chain management. The beer game is an example problem in supply chain management and has also been studied as a cooperation problem in multiagent systems. In the previous study, a solution method SRDQN that is based on deep reinforcement learning and reward shaping has been applied to the beer game. By introducing a single reinforcement learning agent with SRDQN as a participant in the beer game, the cost of beer inventory was reduced. However, the previous study has not addressed the case of multiagent reinforcement learning due to the difficulties in cooperation among agents. To address the multiagent cases, we apply a reward shaping technique RDPM based on mechanism design to SRDQN and improve cooperative policies in multiagent reinforcement learning. Furthermore, we propose two reward design methods with modifications to the state value function designs in RDPM to address various consumer demands for beers in the supply chain. And then we empirically evaluate the effectiveness of the proposed approaches.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于支付机制的奖励设计改进啤酒博弈多智能体强化学习

供应链管理的目的是通过管理信息和产品的流动，使供应链合作伙伴之间的利润最大化。人工智能研究领域中的多智能体强化学习已经应用到供应链管理中。啤酒博弈是供应链管理中的一个典型问题，也是多智能体系统中的一个合作问题。在之前的研究中，我们将一种基于深度强化学习和奖励塑造的SRDQN求解方法应用于啤酒博弈。通过引入SRDQN作为啤酒博弈参与者的单个强化学习智能体，降低了啤酒库存成本。然而，由于智能体之间的合作困难，以往的研究并没有解决多智能体强化学习的情况。为了解决多智能体案例，我们将基于机制设计的奖励塑造技术RDPM应用于SRDQN，并改进了多智能体强化学习中的合作策略。此外，我们提出了两种奖励设计方法，修改了RDPM中的状态价值函数设计，以解决供应链中消费者对啤酒的各种需求。然后对所提出方法的有效性进行了实证评价。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International journal of smart computing and artificial intelligence

自引率

0.00%

发文量