Deriving User- and Content-specific Rewards for Contextual Bandits

The World Wide Web Conference Pub Date : 2019-05-13 DOI:10.1145/3308558.3313592

Paolo Dragone, Rishabh Mehrotra, M. Lalmas

{"title":"Deriving User- and Content-specific Rewards for Contextual Bandits","authors":"Paolo Dragone, Rishabh Mehrotra, M. Lalmas","doi":"10.1145/3308558.3313592","DOIUrl":null,"url":null,"abstract":"Bandit algorithms have gained increased attention in recommender systems, as they provide effective and scalable recommendations. These algorithms use reward functions, usually based on a numeric variable such as click-through rates, as the basis for optimization. On a popular music streaming service, a contextual bandit algorithm is used to decide which content to recommend to users, where the reward function is a binarization of a numeric variable that defines success based on a static threshold of user streaming time: 1 if the user streamed for at least 30 seconds and 0 otherwise. We explore alternative methods to provide a more informed reward function, based on the assumptions that streaming time distribution heavily depends on the type of user and the type of content being streamed. To automatically extract user and content groups from streaming data, we employ ”co-clustering”, an unsupervised learning technique to simultaneously extract clusters of rows and columns from a co-occurrence matrix. The streaming distributions within the co-clusters are then used to define rewards specific to each co-cluster. Our proposed co-clustered based reward functions lead to improvement of over 25% in expected stream rate, compared to the standard binarized rewards.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"42 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The World Wide Web Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3308558.3313592","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

Bandit algorithms have gained increased attention in recommender systems, as they provide effective and scalable recommendations. These algorithms use reward functions, usually based on a numeric variable such as click-through rates, as the basis for optimization. On a popular music streaming service, a contextual bandit algorithm is used to decide which content to recommend to users, where the reward function is a binarization of a numeric variable that defines success based on a static threshold of user streaming time: 1 if the user streamed for at least 30 seconds and 0 otherwise. We explore alternative methods to provide a more informed reward function, based on the assumptions that streaming time distribution heavily depends on the type of user and the type of content being streamed. To automatically extract user and content groups from streaming data, we employ ”co-clustering”, an unsupervised learning technique to simultaneously extract clusters of rows and columns from a co-occurrence matrix. The streaming distributions within the co-clusters are then used to define rewards specific to each co-cluster. Our proposed co-clustered based reward functions lead to improvement of over 25% in expected stream rate, compared to the standard binarized rewards.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

为上下文强盗获取用户和内容特定奖励

Bandit算法在推荐系统中获得了越来越多的关注，因为它们提供了有效和可扩展的推荐。这些算法使用奖励函数(通常基于数值变量，如点击率)作为优化的基础。在流行的音乐流媒体服务上，使用上下文盗贼算法来决定向用户推荐哪些内容，其中奖励函数是基于用户流媒体时间的静态阈值定义成功的数字变量的二值化:如果用户流媒体至少30秒，则为1，否则为0。基于流媒体时间分布严重依赖于用户类型和流媒体内容类型的假设，我们探索了提供更明智的奖励功能的替代方法。为了从流数据中自动提取用户和内容组，我们采用了“共聚类”，这是一种无监督学习技术，可以同时从共现矩阵中提取行和列的簇。然后使用协同集群内的流分布来定义特定于每个协同集群的奖励。与标准二值化奖励相比，我们提出的基于共聚类的奖励函数导致预期流率提高25%以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

The World Wide Web Conference

自引率

0.00%

发文量

期刊最新文献

Decoupled Smoothing on Graphs Think Outside the Dataset: Finding Fraudulent Reviews using Cross-Dataset Analysis Augmenting Knowledge Tracing by Considering Forgetting Behavior Enhancing Fashion Recommendation with Visual Compatibility Relationship Judging a Book by Its Cover: The Effect of Facial Perception on Centrality in Social Networks