Deep Reinforcement Learning for a Multi-Objective Online Order Batching Problem

M. Beeks, Reza Refaei Afshar, Yingqian Zhang, R. Dijkman, Claudy van Dorst, S. D. Looijer
{"title":"Deep Reinforcement Learning for a Multi-Objective Online Order Batching Problem","authors":"M. Beeks, Reza Refaei Afshar, Yingqian Zhang, R. Dijkman, Claudy van Dorst, S. D. Looijer","doi":"10.1609/icaps.v32i1.19829","DOIUrl":null,"url":null,"abstract":"On-time delivery and low service costs are two important performance metrics in warehousing operations. This paper proposes a Deep Reinforcement Learning (DRL) based approach to solve the online Order Batching and Sequence Problem (OBSP) to optimize these two objectives. \nTo learn how to balance the trade-off between two objectives, we introduce a Bayesian optimization framework to shape the reward function of the DRL agent, such that the influences of learning to these objectives are adjusted to different environments. We compare our approach with several heuristics using problem instances of real-world size where thousands of orders arrive dynamically per hour. \nWe show the Proximal Policy Optimization (PPO) algorithm with Bayesian optimization outperforms the heuristics in all tested scenarios on both objectives. In addition, it finds different weights for the components in the reward function in different scenarios, indicating its capability of learning how to set the importance of two objectives under different environments. We also provide policy analysis on the learned DRL agent, where a decision tree is used to infer decision rules to enable the interpretability of the DRL approach.","PeriodicalId":239898,"journal":{"name":"International Conference on Automated Planning and Scheduling","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Automated Planning and Scheduling","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/icaps.v32i1.19829","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

On-time delivery and low service costs are two important performance metrics in warehousing operations. This paper proposes a Deep Reinforcement Learning (DRL) based approach to solve the online Order Batching and Sequence Problem (OBSP) to optimize these two objectives. To learn how to balance the trade-off between two objectives, we introduce a Bayesian optimization framework to shape the reward function of the DRL agent, such that the influences of learning to these objectives are adjusted to different environments. We compare our approach with several heuristics using problem instances of real-world size where thousands of orders arrive dynamically per hour. We show the Proximal Policy Optimization (PPO) algorithm with Bayesian optimization outperforms the heuristics in all tested scenarios on both objectives. In addition, it finds different weights for the components in the reward function in different scenarios, indicating its capability of learning how to set the importance of two objectives under different environments. We also provide policy analysis on the learned DRL agent, where a decision tree is used to infer decision rules to enable the interpretability of the DRL approach.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多目标在线订单批处理问题的深度强化学习
准时交货和低服务成本是仓储作业中两个重要的绩效指标。本文提出了一种基于深度强化学习(DRL)的方法来解决在线订单批处理和序列问题(OBSP),以优化这两个目标。为了学习如何平衡两个目标之间的权衡,我们引入了一个贝叶斯优化框架来塑造DRL代理的奖励函数,这样学习对这些目标的影响就会根据不同的环境进行调整。我们将我们的方法与几个启发式方法进行比较,这些启发式方法使用的是实际规模的问题实例,其中每小时有数千个订单动态到达。我们展示了具有贝叶斯优化的近端策略优化(PPO)算法在两个目标的所有测试场景中都优于启发式算法。此外,在不同的场景下,它找到了奖励函数中不同分量的权重,表明它有能力学习如何在不同的环境下设置两个目标的重要性。我们还提供了对学习到的DRL代理的策略分析,其中使用决策树来推断决策规则,以使DRL方法具有可解释性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Fast and Robust Resource-Constrained Scheduling with Graph Neural Networks Solving the Multi-Choice Two Dimensional Shelf Strip Packing Problem with Time Windows Generalizing Action Justification and Causal Links to Policies Exact Anytime Multi-Agent Path Finding Using Branch-and-Cut-and-Price and Large Neighborhood Search A Constraint Programming Solution to the Guillotine Rectangular Cutting Problem
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1