{"title":"多式联运动态定价","authors":"Yining Wang, Boxiao Chen, D. Simchi-Levi","doi":"10.2139/ssrn.3489355","DOIUrl":null,"url":null,"abstract":"We consider a stylistic question of dynamic pricing of a single product with demand learning. The candidate prices belong to a wide range of price interval, and the modeling of the demand functions is nonparametric in nature, imposing only smoothness regularity conditions. One important aspect of our modeling is the possibility of the expected reward function to be non-convex and indeed multi-modal, which leads to many conceptual and technical challenges. Our proposed algorithm is inspired by both the Upper-Confidence-Bound (UCB) algorithm for multi-armed bandit and the Optimism-in-Face-of-Uncertainty (OFU) principle arising from linear contextual bandits. Through rigorous regret analysis, we demonstrate that our proposed algorithm achieves optimal worst-case regret over a wide range of smooth function classes. More specifically, for k-times smooth functions and T selling periods, the regret of our propose algorithm is O(T^{(k+1)/(2k+1)}), which is shown to be optimal via information theoretical lower bounds. We also show that in special cases such as strongly concave or infinitely smooth reward functions, our algorithm achieves an O(sqrt{T}) regret matching optimal regret established in previous works. Finally, we present numerical results which verify the effectiveness of our method in numerical simulations.","PeriodicalId":102139,"journal":{"name":"Other Topics Engineering Research eJournal","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Multi-Modal Dynamic Pricing\",\"authors\":\"Yining Wang, Boxiao Chen, D. Simchi-Levi\",\"doi\":\"10.2139/ssrn.3489355\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider a stylistic question of dynamic pricing of a single product with demand learning. The candidate prices belong to a wide range of price interval, and the modeling of the demand functions is nonparametric in nature, imposing only smoothness regularity conditions. One important aspect of our modeling is the possibility of the expected reward function to be non-convex and indeed multi-modal, which leads to many conceptual and technical challenges. Our proposed algorithm is inspired by both the Upper-Confidence-Bound (UCB) algorithm for multi-armed bandit and the Optimism-in-Face-of-Uncertainty (OFU) principle arising from linear contextual bandits. Through rigorous regret analysis, we demonstrate that our proposed algorithm achieves optimal worst-case regret over a wide range of smooth function classes. More specifically, for k-times smooth functions and T selling periods, the regret of our propose algorithm is O(T^{(k+1)/(2k+1)}), which is shown to be optimal via information theoretical lower bounds. We also show that in special cases such as strongly concave or infinitely smooth reward functions, our algorithm achieves an O(sqrt{T}) regret matching optimal regret established in previous works. Finally, we present numerical results which verify the effectiveness of our method in numerical simulations.\",\"PeriodicalId\":102139,\"journal\":{\"name\":\"Other Topics Engineering Research eJournal\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Other Topics Engineering Research eJournal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3489355\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Other Topics Engineering Research eJournal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3489355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
We consider a stylistic question of dynamic pricing of a single product with demand learning. The candidate prices belong to a wide range of price interval, and the modeling of the demand functions is nonparametric in nature, imposing only smoothness regularity conditions. One important aspect of our modeling is the possibility of the expected reward function to be non-convex and indeed multi-modal, which leads to many conceptual and technical challenges. Our proposed algorithm is inspired by both the Upper-Confidence-Bound (UCB) algorithm for multi-armed bandit and the Optimism-in-Face-of-Uncertainty (OFU) principle arising from linear contextual bandits. Through rigorous regret analysis, we demonstrate that our proposed algorithm achieves optimal worst-case regret over a wide range of smooth function classes. More specifically, for k-times smooth functions and T selling periods, the regret of our propose algorithm is O(T^{(k+1)/(2k+1)}), which is shown to be optimal via information theoretical lower bounds. We also show that in special cases such as strongly concave or infinitely smooth reward functions, our algorithm achieves an O(sqrt{T}) regret matching optimal regret established in previous works. Finally, we present numerical results which verify the effectiveness of our method in numerical simulations.