Risk-averse supply chain management via robust reinforcement learning

IF 3.9 2区工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computers & Chemical Engineering Pub Date : 2024-11-06 DOI:10.1016/j.compchemeng.2024.108912

Jing Wang , Christopher L.E. Swartz , Kai Huang

{"title":"Risk-averse supply chain management via robust reinforcement learning","authors":"Jing Wang , Christopher L.E. Swartz , Kai Huang","doi":"10.1016/j.compchemeng.2024.108912","DOIUrl":null,"url":null,"abstract":"<div><div>Classical reinforcement learning (RL) may suffer performance degradation when the environment deviates from training conditions, limiting its application in risk-averse supply chain management. This work explores using robust RL in supply chain operations to hedge against environment inconsistencies and changes. Two robust RL algorithms, <span><math><mover><mrow><mi>Q</mi></mrow><mrow><mo>ˆ</mo></mrow></mover></math></span>-learning and <span><math><mi>β</mi></math></span>-pessimistic <span><math><mi>Q</mi></math></span>-learning, are examined against conventional <span><math><mi>Q</mi></math></span>-learning and a baseline order-up-to inventory policy. Furthermore, this work extends RL applications from forward to closed-loop supply chains. Two case studies are conducted using a supply chain simulator developed with agent-based modeling. The results show that <span><math><mi>Q</mi></math></span>-learning can outperform the baseline policy under normal conditions, but notably degrades under environment deviations. By comparison, the robust RL models tend to make more conservative inventory decisions to avoid large shortage penalties. Specifically, fine-tuned <span><math><mi>β</mi></math></span>-pessimistic <span><math><mi>Q</mi></math></span>-learning can achieve good performance under normal conditions and maintain robustness against moderate environment inconsistencies, making it suitable for risk-averse decision-making.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"192 ","pages":"Article 108912"},"PeriodicalIF":3.9000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135424003302","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Classical reinforcement learning (RL) may suffer performance degradation when the environment deviates from training conditions, limiting its application in risk-averse supply chain management. This work explores using robust RL in supply chain operations to hedge against environment inconsistencies and changes. Two robust RL algorithms,

\hat{Q}

-learning and

β

-pessimistic

Q

-learning, are examined against conventional

Q

-learning and a baseline order-up-to inventory policy. Furthermore, this work extends RL applications from forward to closed-loop supply chains. Two case studies are conducted using a supply chain simulator developed with agent-based modeling. The results show that

Q

-learning can outperform the baseline policy under normal conditions, but notably degrades under environment deviations. By comparison, the robust RL models tend to make more conservative inventory decisions to avoid large shortage penalties. Specifically, fine-tuned

β

-pessimistic

Q

-learning can achieve good performance under normal conditions and maintain robustness against moderate environment inconsistencies, making it suitable for risk-averse decision-making.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过稳健强化学习规避风险的供应链管理

当环境偏离训练条件时，经典强化学习（RL）的性能可能会下降，从而限制了其在规避风险的供应链管理中的应用。这项研究探索了在供应链运作中使用鲁棒强化学习来对冲环境的不一致性和变化。针对传统的 Q-learning 和基线订单到库存策略，研究了两种鲁棒 RL 算法，即 Q-learning 和 β-pessimistic Q-learning。此外，这项工作还将 RL 应用从前瞻性供应链扩展到闭环供应链。使用基于代理建模开发的供应链模拟器进行了两个案例研究。研究结果表明，Q-learning 在正常条件下的表现优于基准策略，但在环境偏差的情况下，Q-learning 的性能会明显下降。相比之下，稳健的 RL 模型倾向于做出更保守的库存决策，以避免出现较大的短缺惩罚。具体来说，经过微调的 β-悲观 Q-learning 在正常条件下可以获得良好的性能，并能在中等程度的环境不一致情况下保持稳健性，因此适用于规避风险的决策。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computers & Chemical Engineering 工程技术-工程：化工

CiteScore

8.70

自引率

14.00%

发文量

374

审稿时长

70 days

期刊介绍： Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.