{"title":"Improving Multi-Agent Reinforcement Learning for Beer Game by Reward Design Based on Payment Mechanism","authors":"Masaaki Hori, Toshihiro Matsui","doi":"10.52731/ijscai.v7.i2.789","DOIUrl":null,"url":null,"abstract":"Supply chain management aims to maximize profits among supply chain partners by managing the flow of information and products. Multiagent reinforcement learning in artificial intelligence research fields has been applied to supply chain management. The beer game is an example problem in supply chain management and has also been studied as a cooperation problem in multiagent systems. In the previous study, a solution method SRDQN that is based on deep reinforcement learning and reward shaping has been applied to the beer game. By introducing a single reinforcement learning agent with SRDQN as a participant in the beer game, the cost of beer inventory was reduced. However, the previous study has not addressed the case of multiagent reinforcement learning due to the difficulties in cooperation among agents. To address the multiagent cases, we apply a reward shaping technique RDPM based on mechanism design to SRDQN and improve cooperative policies in multiagent reinforcement learning. Furthermore, we propose two reward design methods with modifications to the state value function designs in RDPM to address various consumer demands for beers in the supply chain. And then we empirically evaluate the effectiveness of the proposed approaches.","PeriodicalId":495454,"journal":{"name":"International journal of smart computing and artificial intelligence","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of smart computing and artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.52731/ijscai.v7.i2.789","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Supply chain management aims to maximize profits among supply chain partners by managing the flow of information and products. Multiagent reinforcement learning in artificial intelligence research fields has been applied to supply chain management. The beer game is an example problem in supply chain management and has also been studied as a cooperation problem in multiagent systems. In the previous study, a solution method SRDQN that is based on deep reinforcement learning and reward shaping has been applied to the beer game. By introducing a single reinforcement learning agent with SRDQN as a participant in the beer game, the cost of beer inventory was reduced. However, the previous study has not addressed the case of multiagent reinforcement learning due to the difficulties in cooperation among agents. To address the multiagent cases, we apply a reward shaping technique RDPM based on mechanism design to SRDQN and improve cooperative policies in multiagent reinforcement learning. Furthermore, we propose two reward design methods with modifications to the state value function designs in RDPM to address various consumer demands for beers in the supply chain. And then we empirically evaluate the effectiveness of the proposed approaches.