Boxiao Chen, D. Simchi-Levi, Yining Wang, Yuanshuo Zhou
{"title":"固定订货成本和不完全需求信息下的动态定价与库存控制","authors":"Boxiao Chen, D. Simchi-Levi, Yining Wang, Yuanshuo Zhou","doi":"10.2139/ssrn.3632475","DOIUrl":null,"url":null,"abstract":"We consider the periodic review dynamic pricing and inventory control problem with fixed ordering cost. Demand is random and price dependent, and unsatisfied demand is backlogged. With complete demand information, the celebrated [Formula: see text] policy is proved to be optimal, where s and S are the reorder point and order-up-to level for ordering strategy, and [Formula: see text], a function of on-hand inventory level, characterizes the pricing strategy. In this paper, we consider incomplete demand information and develop online learning algorithms whose average profit approaches that of the optimal [Formula: see text] with a tight [Formula: see text] regret rate. A number of salient features differentiate our work from the existing online learning researches in the operations management (OM) literature. First, computing the optimal [Formula: see text] policy requires solving a dynamic programming (DP) over multiple periods involving unknown quantities, which is different from the majority of learning problems in OM that only require solving single-period optimization questions. It is hence challenging to establish stability results through DP recursions, which we accomplish by proving uniform convergence of the profit-to-go function. The necessity of analyzing action-dependent state transition over multiple periods resembles the reinforcement learning question, considerably more difficult than existing bandit learning algorithms. Second, the pricing function [Formula: see text] is of infinite dimension, and approaching it is much more challenging than approaching a finite number of parameters as seen in existing researches. The demand-price relationship is estimated based on upper confidence bound, but the confidence interval cannot be explicitly calculated due to the complexity of the DP recursion. Finally, because of the multiperiod nature of [Formula: see text] policies the actual distribution of the randomness in demand plays an important role in determining the optimal pricing strategy [Formula: see text], which is unknown to the learner a priori. In this paper, the demand randomness is approximated by an empirical distribution constructed using dependent samples, and a novel Wasserstein metric-based argument is employed to prove convergence of the empirical distribution. This paper was accepted by J. George Shanthikumar, big data analytics.","PeriodicalId":150569,"journal":{"name":"IO: Theory eJournal","volume":"322 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Dynamic Pricing and Inventory Control with Fixed Ordering Cost and Incomplete Demand Information\",\"authors\":\"Boxiao Chen, D. Simchi-Levi, Yining Wang, Yuanshuo Zhou\",\"doi\":\"10.2139/ssrn.3632475\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider the periodic review dynamic pricing and inventory control problem with fixed ordering cost. Demand is random and price dependent, and unsatisfied demand is backlogged. With complete demand information, the celebrated [Formula: see text] policy is proved to be optimal, where s and S are the reorder point and order-up-to level for ordering strategy, and [Formula: see text], a function of on-hand inventory level, characterizes the pricing strategy. In this paper, we consider incomplete demand information and develop online learning algorithms whose average profit approaches that of the optimal [Formula: see text] with a tight [Formula: see text] regret rate. A number of salient features differentiate our work from the existing online learning researches in the operations management (OM) literature. First, computing the optimal [Formula: see text] policy requires solving a dynamic programming (DP) over multiple periods involving unknown quantities, which is different from the majority of learning problems in OM that only require solving single-period optimization questions. It is hence challenging to establish stability results through DP recursions, which we accomplish by proving uniform convergence of the profit-to-go function. The necessity of analyzing action-dependent state transition over multiple periods resembles the reinforcement learning question, considerably more difficult than existing bandit learning algorithms. Second, the pricing function [Formula: see text] is of infinite dimension, and approaching it is much more challenging than approaching a finite number of parameters as seen in existing researches. The demand-price relationship is estimated based on upper confidence bound, but the confidence interval cannot be explicitly calculated due to the complexity of the DP recursion. Finally, because of the multiperiod nature of [Formula: see text] policies the actual distribution of the randomness in demand plays an important role in determining the optimal pricing strategy [Formula: see text], which is unknown to the learner a priori. In this paper, the demand randomness is approximated by an empirical distribution constructed using dependent samples, and a novel Wasserstein metric-based argument is employed to prove convergence of the empirical distribution. This paper was accepted by J. George Shanthikumar, big data analytics.\",\"PeriodicalId\":150569,\"journal\":{\"name\":\"IO: Theory eJournal\",\"volume\":\"322 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IO: Theory eJournal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3632475\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IO: Theory eJournal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3632475","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
摘要
研究了具有固定订货成本的定期评审、动态定价和库存控制问题。需求是随机的,依赖于价格,未满足的需求积压。在需求信息完全的情况下,证明了著名的[公式:见文]策略是最优的,其中s和s分别是订货策略的再订货点和订货水平,[公式:见文]是现货库存水平的函数,表征了定价策略。在本文中,我们考虑了不完全的需求信息,并开发了在线学习算法,该算法的平均利润接近于具有紧[公式:见文]遗憾率的最优[公式:见文]。我们的工作与运营管理(OM)文献中现有的在线学习研究有许多显著的特点。首先,计算最优策略需要解决涉及未知数量的多个周期的动态规划(DP)问题,这与OM中大多数只需要解决单周期优化问题的学习问题不同。因此,通过DP递归建立稳定性结果是具有挑战性的,我们通过证明利润-走函数的一致收敛来完成。分析多个时期的动作依赖状态转移的必要性类似于强化学习问题,比现有的强盗学习算法困难得多。其次,定价函数[公式:见文]是无限维的,接近它比接近现有研究中有限数量的参数要困难得多。需求-价格关系基于上置信区间估计,但由于DP递推的复杂性,无法明确计算置信区间。最后,由于[公式:见文]政策的多周期性质,需求随机性的实际分布在确定最优定价策略[公式:见文]方面起着重要作用,这对于学习者先验来说是未知的。本文用依赖样本构造的经验分布来近似需求随机性,并采用一种新颖的基于度量的Wasserstein论证来证明经验分布的收敛性。本文被大数据分析J. George Shanthikumar接受。
Dynamic Pricing and Inventory Control with Fixed Ordering Cost and Incomplete Demand Information
We consider the periodic review dynamic pricing and inventory control problem with fixed ordering cost. Demand is random and price dependent, and unsatisfied demand is backlogged. With complete demand information, the celebrated [Formula: see text] policy is proved to be optimal, where s and S are the reorder point and order-up-to level for ordering strategy, and [Formula: see text], a function of on-hand inventory level, characterizes the pricing strategy. In this paper, we consider incomplete demand information and develop online learning algorithms whose average profit approaches that of the optimal [Formula: see text] with a tight [Formula: see text] regret rate. A number of salient features differentiate our work from the existing online learning researches in the operations management (OM) literature. First, computing the optimal [Formula: see text] policy requires solving a dynamic programming (DP) over multiple periods involving unknown quantities, which is different from the majority of learning problems in OM that only require solving single-period optimization questions. It is hence challenging to establish stability results through DP recursions, which we accomplish by proving uniform convergence of the profit-to-go function. The necessity of analyzing action-dependent state transition over multiple periods resembles the reinforcement learning question, considerably more difficult than existing bandit learning algorithms. Second, the pricing function [Formula: see text] is of infinite dimension, and approaching it is much more challenging than approaching a finite number of parameters as seen in existing researches. The demand-price relationship is estimated based on upper confidence bound, but the confidence interval cannot be explicitly calculated due to the complexity of the DP recursion. Finally, because of the multiperiod nature of [Formula: see text] policies the actual distribution of the randomness in demand plays an important role in determining the optimal pricing strategy [Formula: see text], which is unknown to the learner a priori. In this paper, the demand randomness is approximated by an empirical distribution constructed using dependent samples, and a novel Wasserstein metric-based argument is employed to prove convergence of the empirical distribution. This paper was accepted by J. George Shanthikumar, big data analytics.