Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation

arXiv - CS - Information Retrieval Pub Date : 2024-09-11 DOI:arxiv-2409.07416

Luo Ji, Gao Liu, Mingyang Yin, Hongxia Yang, Jingren Zhou

{"title":"Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation","authors":"Luo Ji, Gao Liu, Mingyang Yin, Hongxia Yang, Jingren Zhou","doi":"arxiv-2409.07416","DOIUrl":null,"url":null,"abstract":"Modern listwise recommendation systems need to consider both long-term user\nperceptions and short-term interest shifts. Reinforcement learning can be\napplied on recommendation to study such a problem but is also subject to large\nsearch space, sparse user feedback and long interactive latency. Motivated by\nrecent progress in hierarchical reinforcement learning, we propose a novel\nframework called mccHRL to provide different levels of temporal abstraction on\nlistwise recommendation. Within the hierarchical framework, the high-level\nagent studies the evolution of user perception, while the low-level agent\nproduces the item selection policy by modeling the process as a sequential\ndecision-making problem. We argue that such framework has a well-defined\ndecomposition of the outra-session context and the intra-session context, which\nare encoded by the high-level and low-level agents, respectively. To verify\nthis argument, we implement both a simulator-based environment and an\nindustrial dataset-based experiment. Results observe significant performance\nimprovement by our method, compared with several well-known baselines. Data and\ncodes have been made public.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"33 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07416","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Modern listwise recommendation systems need to consider both long-term user perceptions and short-term interest shifts. Reinforcement learning can be applied on recommendation to study such a problem but is also subject to large search space, sparse user feedback and long interactive latency. Motivated by recent progress in hierarchical reinforcement learning, we propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation. Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy by modeling the process as a sequential decision-making problem. We argue that such framework has a well-defined decomposition of the outra-session context and the intra-session context, which are encoded by the high-level and low-level agents, respectively. To verify this argument, we implement both a simulator-based environment and an industrial dataset-based experiment. Results observe significant performance improvement by our method, compared with several well-known baselines. Data and codes have been made public.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于时空抽象的列表式推荐的分层强化学习

现代列表式推荐系统需要同时考虑用户的长期看法和短期兴趣转移。强化学习可以应用于推荐来研究这样的问题，但它也受到搜索空间大、用户反馈稀少和交互延迟长的限制。受分层强化学习最新进展的启发，我们提出了一个名为 mccHRL 的新框架，为列表式推荐提供不同层次的时间抽象。在分层框架内，高层代理研究用户感知的演化，而低层代理则通过将用户感知的演化过程建模为一个顺序决策问题来生成项目选择策略。我们认为，这种框架对会话外上下文和会话内上下文有明确的分解，分别由高层代理和低层代理编码。为了验证这一论点，我们实施了基于模拟器的环境和基于工业数据集的实验。结果表明，与几种著名的基线方法相比，我们的方法在性能上有明显提高。数据和代码已公开。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Information Retrieval

自引率

0.00%

发文量