On the Hardness of Learning from Censored and Nonstationary Demand

Gábor Lugosi, Mihalis G. Markakis, Gergely Neu
{"title":"On the Hardness of Learning from Censored and Nonstationary Demand","authors":"Gábor Lugosi, Mihalis G. Markakis, Gergely Neu","doi":"10.1287/ijoo.2022.0017","DOIUrl":null,"url":null,"abstract":"We consider a repeated newsvendor problem in which the inventory manager has no prior information about the demand and can access only censored/sales data. In analogy to multiarmed bandit problems, the manager needs to simultaneously “explore” and “exploit” with inventory decisions in order to minimize the cumulative cost. Our goal is to understand the hardness of the problem disentangled from any probabilistic assumptions on the demand sequence—importantly, independence or time stationarity—and, correspondingly, to develop policies that perform well with respect to the regret criterion. We design a cost estimator that is tailored to the special structure of the censoring problem, and we show that, if coupled with the classic exponentially weighted forecaster, it achieves optimal scaling of the expected regret (up to logarithmic factors) with respect to both the number of time periods and available actions. This result also leads to two important insights: the benefit from “information stalking” as well as the cost of censoring are both negligible, at least in terms of the regret. We demonstrate the flexibility of our technique by combining it with the fixed share forecaster to provide strong guarantees in terms of tracking regret, a powerful notion of regret that uses a large class of time-varying action sequences as benchmark. Numerical experiments suggest that the resulting policy outperforms existing policies (that are tailored to or facilitated by time stationarity) on nonstationary demand models with time-varying noise, trend, and seasonality components. Finally, we consider the “combinatorial” version of the repeated newsvendor problem, that is, single-warehouse, multiretailer inventory management of a perishable product. We extend the proposed approach so that, again, it achieves near-optimal performance in terms of the regret. Funding: G. Lugosi was supported by the Spanish Ministry of Economy, Industry and Competitiveness [Grant MTM2015-67304-P (AEI/FEDER, UE)]. M. G. Markakis was supported by the Spanish Ministry of Economy and Competitiveness [Grant ECO2016-75905-R (AEI/FEDER, UE)] and a Juan de la Cierva fellowship as well as the Spanish Ministry of Science and Innovation through a Ramón y Cajal fellowship. G. Neu was supported by the UPFellows Fellowship (Marie Curie COFUND program) [Grant 600387]. Supplemental Material: The e-companion is available at https://doi.org/10.1287/ijoo.2022.0017 .","PeriodicalId":73382,"journal":{"name":"INFORMS journal on optimization","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"INFORMS journal on optimization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1287/ijoo.2022.0017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

We consider a repeated newsvendor problem in which the inventory manager has no prior information about the demand and can access only censored/sales data. In analogy to multiarmed bandit problems, the manager needs to simultaneously “explore” and “exploit” with inventory decisions in order to minimize the cumulative cost. Our goal is to understand the hardness of the problem disentangled from any probabilistic assumptions on the demand sequence—importantly, independence or time stationarity—and, correspondingly, to develop policies that perform well with respect to the regret criterion. We design a cost estimator that is tailored to the special structure of the censoring problem, and we show that, if coupled with the classic exponentially weighted forecaster, it achieves optimal scaling of the expected regret (up to logarithmic factors) with respect to both the number of time periods and available actions. This result also leads to two important insights: the benefit from “information stalking” as well as the cost of censoring are both negligible, at least in terms of the regret. We demonstrate the flexibility of our technique by combining it with the fixed share forecaster to provide strong guarantees in terms of tracking regret, a powerful notion of regret that uses a large class of time-varying action sequences as benchmark. Numerical experiments suggest that the resulting policy outperforms existing policies (that are tailored to or facilitated by time stationarity) on nonstationary demand models with time-varying noise, trend, and seasonality components. Finally, we consider the “combinatorial” version of the repeated newsvendor problem, that is, single-warehouse, multiretailer inventory management of a perishable product. We extend the proposed approach so that, again, it achieves near-optimal performance in terms of the regret. Funding: G. Lugosi was supported by the Spanish Ministry of Economy, Industry and Competitiveness [Grant MTM2015-67304-P (AEI/FEDER, UE)]. M. G. Markakis was supported by the Spanish Ministry of Economy and Competitiveness [Grant ECO2016-75905-R (AEI/FEDER, UE)] and a Juan de la Cierva fellowship as well as the Spanish Ministry of Science and Innovation through a Ramón y Cajal fellowship. G. Neu was supported by the UPFellows Fellowship (Marie Curie COFUND program) [Grant 600387]. Supplemental Material: The e-companion is available at https://doi.org/10.1287/ijoo.2022.0017 .
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从删减和非平稳需求中学习的困难
我们考虑一个重复报贩问题,其中库存管理人员没有关于需求的先验信息,只能访问审查/销售数据。与多武装盗匪问题类似,管理者需要同时“探索”和“利用”库存决策,以最小化累积成本。我们的目标是理解从需求序列的任何概率假设(重要的是独立性或时间平稳性)中解脱出来的问题的难度,并相应地制定相对于后悔标准表现良好的策略。我们设计了一个针对审查问题的特殊结构量身定制的成本估计器,并且我们表明,如果与经典的指数加权预测器相结合,它可以根据时间周期和可用操作的数量实现预期后悔(高达对数因子)的最佳缩放。这个结果还带来了两个重要的启示:“信息跟踪”的好处和审查的成本都可以忽略不计,至少在后悔方面是这样。我们通过将其与固定份额预测器相结合来展示我们技术的灵活性,从而在跟踪后悔方面提供强有力的保证,这是一种使用大量时变动作序列作为基准的强大的后悔概念。数值实验表明,在具有时变噪声、趋势和季节性成分的非平稳需求模型上,所得政策优于现有政策(根据时间平稳性量身定制或促进)。最后,我们考虑重复报贩问题的“组合”版本,即易腐产品的单仓库、多零售商库存管理。我们扩展了所提出的方法,因此,再一次,它在遗憾方面达到了接近最佳的性能。资助:G. Lugosi由西班牙经济、工业和竞争力部资助[Grant MTM2015-67304-P (AEI/FEDER, UE)]。m.g. Markakis得到了西班牙经济与竞争力部[Grant ECO2016-75905-R (AEI/FEDER, UE)]、Juan de la Cierva奖学金以及西班牙科学与创新部Ramón y Cajal奖学金的支持。G. Neu得到了UPFellows Fellowship (Marie Curie COFUND program)的资助[Grant 600387]。补充材料:电子伴侣可在https://doi.org/10.1287/ijoo.2022.0017上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Stochastic Inexact Sequential Quadratic Optimization Algorithm for Nonlinear Equality-Constrained Optimization Scenario-Based Robust Optimization for Two-Stage Decision Making Under Binary Uncertainty On the Hardness of Learning from Censored and Nonstationary Demand Temporal Bin Packing with Half-Capacity Jobs Editorial Board
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1