Constrained Dual-Level Bandit for Personalized Impression Regulation in Online Ranking Systems

Zhao Li, Junshuai Song, Zehong Hu, Zhen Wang, Jun Gao
{"title":"Constrained Dual-Level Bandit for Personalized Impression Regulation in Online Ranking Systems","authors":"Zhao Li, Junshuai Song, Zehong Hu, Zhen Wang, Jun Gao","doi":"10.1145/3461340","DOIUrl":null,"url":null,"abstract":"Impression regulation plays an important role in various online ranking systems, e.g., e-commerce ranking systems always need to achieve local commercial demands on some pre-labeled target items like fresh item cultivation and fraudulent item counteracting while maximizing its global revenue. However, local impression regulation may cause “butterfly effects” on the global scale, e.g., in e-commerce, the price preference fluctuation in initial conditions (overpriced or underpriced items) may create a significantly different outcome, thus affecting shopping experience and bringing economic losses to platforms. To prevent “butterfly effects”, some researchers define their regulation objectives with global constraints, by using contextual bandit at the page-level that requires all items on one page sharing the same regulation action, which fails to conduct impression regulation on individual items. To address this problem, in this article, we propose a personalized impression regulation method that can directly makes regulation decisions for each user-item pair. Specifically, we model the regulation problem as a Constrained Dual-level Bandit (CDB) problem, where the local regulation action and reward signals are at the item-level while the global effect constraint on the platform impression can be calculated at the page-level only. To handle the asynchronous signals, we first expand the page-level constraint to the item-level and then derive the policy updating as a second-order cone optimization problem. Our CDB approaches the optimal policy by iteratively solving the optimization problem. Experiments are performed on both offline and online datasets, and the results, theoretically and empirically, demonstrate CDB outperforms state-of-the-art algorithms.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"54 27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Knowledge Discovery from Data (TKDD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3461340","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Impression regulation plays an important role in various online ranking systems, e.g., e-commerce ranking systems always need to achieve local commercial demands on some pre-labeled target items like fresh item cultivation and fraudulent item counteracting while maximizing its global revenue. However, local impression regulation may cause “butterfly effects” on the global scale, e.g., in e-commerce, the price preference fluctuation in initial conditions (overpriced or underpriced items) may create a significantly different outcome, thus affecting shopping experience and bringing economic losses to platforms. To prevent “butterfly effects”, some researchers define their regulation objectives with global constraints, by using contextual bandit at the page-level that requires all items on one page sharing the same regulation action, which fails to conduct impression regulation on individual items. To address this problem, in this article, we propose a personalized impression regulation method that can directly makes regulation decisions for each user-item pair. Specifically, we model the regulation problem as a Constrained Dual-level Bandit (CDB) problem, where the local regulation action and reward signals are at the item-level while the global effect constraint on the platform impression can be calculated at the page-level only. To handle the asynchronous signals, we first expand the page-level constraint to the item-level and then derive the policy updating as a second-order cone optimization problem. Our CDB approaches the optimal policy by iteratively solving the optimization problem. Experiments are performed on both offline and online datasets, and the results, theoretically and empirically, demonstrate CDB outperforms state-of-the-art algorithms.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在线排名系统中个性化印象调节的约束双级强盗
印象调节在各种在线排名系统中发挥着重要的作用,例如电子商务排名系统总是需要在实现其全球收益最大化的同时,对一些预先标记的目标物品(如生鲜物品培育、欺诈物品抵消)实现当地的商业需求。然而,局部印象调控可能会在全球范围内产生“蝴蝶效应”,例如在电子商务中,初始条件下(商品价格过高或过低)的价格偏好波动可能会产生明显不同的结果,从而影响购物体验,给平台带来经济损失。为了防止“蝴蝶效应”,一些研究者用全局约束来定义他们的调节目标,通过在页面级别使用上下文强盗,要求一个页面上的所有项目共享相同的调节动作,这不能对单个项目进行印象调节。为了解决这一问题,本文提出了一种个性化印象调节方法,可以直接对每个用户-物品对进行调节决策。具体来说,我们将监管问题建模为约束双级强盗(CDB)问题,其中局部监管行为和奖励信号在项目层面,而平台印象的全局效应约束只能在页面层面计算。为了处理异步信号,我们首先将页面级约束扩展到项目级,然后将策略更新导出为二阶锥优化问题。我们的CDB通过迭代求解优化问题来逼近最优策略。实验在离线和在线数据集上进行,结果在理论上和经验上都证明CDB优于最先进的算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
相关文献
Risk factors of ipsilateral breast tumor recurrence in triple-negative or HER2-positive breast cancer patients who achieved pathological complete response after neoadjuvant chemotherapy.
IF 45.3 1区 医学Journal of Clinical OncologyPub Date : 2020-05-25 DOI: 10.1200/jco.2020.38.15_suppl.e12599
Mizuho Tazo, Y. Kojima, A. Yoshida, Sayuka Nakayama, R. Tokui, T. Ogawa, T. Kuwayama, T. Nakayama, H. Yamauchi, K. Tsugawa, Seigo Nakamura, N. Hayashi, M. Ishitobi
Abstract P2-16-27: Risk factors of ipsilateral breast tumor recurrence in primary breast cancer patients who achieved pathological complete response after neoadjuvant chemotherapy
IF 11.2 1区 医学Cancer researchPub Date : 2020-02-15 DOI: 10.1158/1538-7445.sabcs19-p2-16-27
N. Matsuda, N. Hayashi, R. Tokui, T. Nakayama, H. Yamauchi, M. Ishitobi
Risk Factors Predictive of Recurrence and Progression for Patients Who Suffered Initial Recurrence After Transurethral Resection of Stage pT1 Bladder Tumor in Chinese Population: A Retrospective Study
IF 1.6 4区 医学MedicinePub Date : 2016-02-01 DOI: 10.1097/MD.0000000000002625
Zhonghua Shen, Linguo Xie, Tao Chen, Dawei Tian, Xiaoteng Liu, Hao Xu, Yu Zhang, Zhouliang Wu, N. Sha, Chen Xing, Na Ding, Hailong Hu, Chang-li Wu
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Machine Learning-based Short-term Rainfall Prediction from Sky Data Incremental Feature Spaces Learning with Label Scarcity Multi-objective Learning to Overcome Catastrophic Forgetting in Time-series Applications Combining Filtering and Cross-Correlation Efficiently for Streaming Time Series Segment-Wise Time-Varying Dynamic Bayesian Network with Graph Regularization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1