Policy Optimization Using Semiparametric Models for Dynamic Pricing

Jianqing Fan, Yongyi Guo, Mengxin Yu
{"title":"Policy Optimization Using Semiparametric Models for Dynamic Pricing","authors":"Jianqing Fan, Yongyi Guo, Mengxin Yu","doi":"10.2139/ssrn.3922825","DOIUrl":null,"url":null,"abstract":"In this paper, we study the contextual dynamic pricing problem where the market value of a product is linear in their observed features plus some market noise. Products are sold one at a time, and only a binary response indicating the success or failure of a sale is observed. Our model setting is similar to \\cite{JN19} except that we expand the demand curve to a semiparametric model and need to learn dynamically both parametric and nonparametric components. We propose a dynamic statistical learning and decision-making policy that combines semiparametric estimation from a generalized linear model with an unknown link and online decision making to minimize regret (maximize revenue). Under mild conditions, we show that for a market noise c.d.f. $F(\\cdot)$ with $m$-th order derivative, our policy achieves a regret upper bound of $\\tilde{\\cO}_{d}(T^{\\frac{2m+1}{4m-1}})$ for $m\\geq 2$, where $T$ is time horizon and $\\tilde{\\cO}_{d}$ is the order that hides logarithmic terms and the dimensionality of feature $d$. The upper bound is further reduced to $\\tilde{\\cO}_{d}(\\sqrt{T})$ if $F$ is super smooth whose Fourier transform decays exponentially. In terms of dependence on the horizon $T$, these upper bounds are close to $\\Omega(\\sqrt{T})$, the lower bound where the market noise distribution belongs to a parametric class. We further generalize these results to the case when the product features are dynamically dependent, satisfying some strong mixing conditions.","PeriodicalId":406435,"journal":{"name":"CompSciRN: Other Machine Learning (Topic)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CompSciRN: Other Machine Learning (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3922825","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

In this paper, we study the contextual dynamic pricing problem where the market value of a product is linear in their observed features plus some market noise. Products are sold one at a time, and only a binary response indicating the success or failure of a sale is observed. Our model setting is similar to \cite{JN19} except that we expand the demand curve to a semiparametric model and need to learn dynamically both parametric and nonparametric components. We propose a dynamic statistical learning and decision-making policy that combines semiparametric estimation from a generalized linear model with an unknown link and online decision making to minimize regret (maximize revenue). Under mild conditions, we show that for a market noise c.d.f. $F(\cdot)$ with $m$-th order derivative, our policy achieves a regret upper bound of $\tilde{\cO}_{d}(T^{\frac{2m+1}{4m-1}})$ for $m\geq 2$, where $T$ is time horizon and $\tilde{\cO}_{d}$ is the order that hides logarithmic terms and the dimensionality of feature $d$. The upper bound is further reduced to $\tilde{\cO}_{d}(\sqrt{T})$ if $F$ is super smooth whose Fourier transform decays exponentially. In terms of dependence on the horizon $T$, these upper bounds are close to $\Omega(\sqrt{T})$, the lower bound where the market noise distribution belongs to a parametric class. We further generalize these results to the case when the product features are dynamically dependent, satisfying some strong mixing conditions.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于半参数模型的动态定价策略优化
本文研究了产品的市场价值在观察特征上是线性的,加上一些市场噪声的情景动态定价问题。每次只销售一种产品,并且只观察到表示销售成功或失败的二元响应。我们的模型设置类似于\cite{JN19},除了我们将需求曲线扩展为半参数模型,并且需要动态学习参数和非参数组件。我们提出了一种动态统计学习和决策策略,该策略结合了具有未知链接的广义线性模型的半参数估计和在线决策以最小化遗憾(最大化收益)。在温和的条件下,我们表明,对于具有$m$ -阶导数的市场噪声c.d.f. $F(\cdot)$,我们的策略实现了$m\geq 2$的遗憾上界$\tilde{\cO}_{d}(T^{\frac{2m+1}{4m-1}})$,其中$T$是时间范围,$\tilde{\cO}_{d}$是隐藏对数项和特征$d$维数的阶数。如果$F$是超光滑且傅里叶变换呈指数衰减,则上界进一步简化为$\tilde{\cO}_{d}(\sqrt{T})$。就视界$T$的依赖性而言,这些上界接近$\Omega(\sqrt{T})$,市场噪声分布属于参数类的下界。我们进一步将这些结果推广到产品特征是动态相关的情况下,满足一些强混合条件。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Visualizing The Implicit Model Selection Tradeoff Troubleshooting: a Dynamic Solution for Achieving Reliable Fault Detection by Combining Augmented Reality and Machine Learning Policy Optimization Using Semiparametric Models for Dynamic Pricing Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games Deep Learning under Model Uncertainty
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1