通过稳健强化学习规避风险的供应链管理

IF 3.9 2区 工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computers & Chemical Engineering Pub Date : 2024-11-06 DOI:10.1016/j.compchemeng.2024.108912
Jing Wang , Christopher L.E. Swartz , Kai Huang
{"title":"通过稳健强化学习规避风险的供应链管理","authors":"Jing Wang ,&nbsp;Christopher L.E. Swartz ,&nbsp;Kai Huang","doi":"10.1016/j.compchemeng.2024.108912","DOIUrl":null,"url":null,"abstract":"<div><div>Classical reinforcement learning (RL) may suffer performance degradation when the environment deviates from training conditions, limiting its application in risk-averse supply chain management. This work explores using robust RL in supply chain operations to hedge against environment inconsistencies and changes. Two robust RL algorithms, <span><math><mover><mrow><mi>Q</mi></mrow><mrow><mo>ˆ</mo></mrow></mover></math></span>-learning and <span><math><mi>β</mi></math></span>-pessimistic <span><math><mi>Q</mi></math></span>-learning, are examined against conventional <span><math><mi>Q</mi></math></span>-learning and a baseline order-up-to inventory policy. Furthermore, this work extends RL applications from forward to closed-loop supply chains. Two case studies are conducted using a supply chain simulator developed with agent-based modeling. The results show that <span><math><mi>Q</mi></math></span>-learning can outperform the baseline policy under normal conditions, but notably degrades under environment deviations. By comparison, the robust RL models tend to make more conservative inventory decisions to avoid large shortage penalties. Specifically, fine-tuned <span><math><mi>β</mi></math></span>-pessimistic <span><math><mi>Q</mi></math></span>-learning can achieve good performance under normal conditions and maintain robustness against moderate environment inconsistencies, making it suitable for risk-averse decision-making.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"192 ","pages":"Article 108912"},"PeriodicalIF":3.9000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Risk-averse supply chain management via robust reinforcement learning\",\"authors\":\"Jing Wang ,&nbsp;Christopher L.E. Swartz ,&nbsp;Kai Huang\",\"doi\":\"10.1016/j.compchemeng.2024.108912\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Classical reinforcement learning (RL) may suffer performance degradation when the environment deviates from training conditions, limiting its application in risk-averse supply chain management. This work explores using robust RL in supply chain operations to hedge against environment inconsistencies and changes. Two robust RL algorithms, <span><math><mover><mrow><mi>Q</mi></mrow><mrow><mo>ˆ</mo></mrow></mover></math></span>-learning and <span><math><mi>β</mi></math></span>-pessimistic <span><math><mi>Q</mi></math></span>-learning, are examined against conventional <span><math><mi>Q</mi></math></span>-learning and a baseline order-up-to inventory policy. Furthermore, this work extends RL applications from forward to closed-loop supply chains. Two case studies are conducted using a supply chain simulator developed with agent-based modeling. The results show that <span><math><mi>Q</mi></math></span>-learning can outperform the baseline policy under normal conditions, but notably degrades under environment deviations. By comparison, the robust RL models tend to make more conservative inventory decisions to avoid large shortage penalties. Specifically, fine-tuned <span><math><mi>β</mi></math></span>-pessimistic <span><math><mi>Q</mi></math></span>-learning can achieve good performance under normal conditions and maintain robustness against moderate environment inconsistencies, making it suitable for risk-averse decision-making.</div></div>\",\"PeriodicalId\":286,\"journal\":{\"name\":\"Computers & Chemical Engineering\",\"volume\":\"192 \",\"pages\":\"Article 108912\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2024-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Chemical Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0098135424003302\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135424003302","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

当环境偏离训练条件时,经典强化学习(RL)的性能可能会下降,从而限制了其在规避风险的供应链管理中的应用。这项研究探索了在供应链运作中使用鲁棒强化学习来对冲环境的不一致性和变化。针对传统的 Q-learning 和基线订单到库存策略,研究了两种鲁棒 RL 算法,即 Q-learning 和 β-pessimistic Q-learning。此外,这项工作还将 RL 应用从前瞻性供应链扩展到闭环供应链。使用基于代理建模开发的供应链模拟器进行了两个案例研究。研究结果表明,Q-learning 在正常条件下的表现优于基准策略,但在环境偏差的情况下,Q-learning 的性能会明显下降。相比之下,稳健的 RL 模型倾向于做出更保守的库存决策,以避免出现较大的短缺惩罚。具体来说,经过微调的 β-悲观 Q-learning 在正常条件下可以获得良好的性能,并能在中等程度的环境不一致情况下保持稳健性,因此适用于规避风险的决策。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Risk-averse supply chain management via robust reinforcement learning
Classical reinforcement learning (RL) may suffer performance degradation when the environment deviates from training conditions, limiting its application in risk-averse supply chain management. This work explores using robust RL in supply chain operations to hedge against environment inconsistencies and changes. Two robust RL algorithms, Qˆ-learning and β-pessimistic Q-learning, are examined against conventional Q-learning and a baseline order-up-to inventory policy. Furthermore, this work extends RL applications from forward to closed-loop supply chains. Two case studies are conducted using a supply chain simulator developed with agent-based modeling. The results show that Q-learning can outperform the baseline policy under normal conditions, but notably degrades under environment deviations. By comparison, the robust RL models tend to make more conservative inventory decisions to avoid large shortage penalties. Specifically, fine-tuned β-pessimistic Q-learning can achieve good performance under normal conditions and maintain robustness against moderate environment inconsistencies, making it suitable for risk-averse decision-making.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computers & Chemical Engineering
Computers & Chemical Engineering 工程技术-工程:化工
CiteScore
8.70
自引率
14.00%
发文量
374
审稿时长
70 days
期刊介绍: Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.
期刊最新文献
The bullwhip effect, market competition and standard deviation ratio in two parallel supply chains CADET-Julia: Efficient and versatile, open-source simulator for batch chromatography in Julia Computer aided formulation design based on molecular dynamics simulation: Detergents with fragrance Model-based real-time optimization in continuous pharmaceutical manufacturing Risk-averse supply chain management via robust reinforcement learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1