RLPS: A Reinforcement Learning–Based Framework for Personalized Search

ACM Transactions on Information Systems (TOIS) Pub Date : 2021-05-06 DOI:10.1145/3446617

Jing Yao, Zhicheng Dou, Jun Xu, Jirong Wen

{"title":"RLPS: A Reinforcement Learning–Based Framework for Personalized Search","authors":"Jing Yao, Zhicheng Dou, Jun Xu, Jirong Wen","doi":"10.1145/3446617","DOIUrl":null,"url":null,"abstract":"Personalized search is a promising way to improve search qualities by taking user interests into consideration. Recently, machine learning and deep learning techniques have been successfully applied to search result personalization. Most existing models simply regard the personal search history as a static set of user behaviors and learn fixed ranking strategies based on all the recorded data. Though improvements have been achieved, the essence that the search process is a sequence of interactions between the search engine and user is ignored. The user’s interests may dynamically change during the search process, therefore, it would be more helpful if a personalized search model could track the whole interaction process and adjust its ranking strategy continuously. In this article, we adapt reinforcement learning to personalized search and propose a framework, referred to as RLPS. It utilizes a Markov Decision Process (MDP) to track sequential interactions between the user and search engine, and continuously update the underlying personalized ranking model with the user’s real-time feedback to learn the user’s dynamic interests. Within this framework, we implement two models: the listwise RLPS-L and the hierarchical RLPS-H. RLPS-L interacts with users and trains the ranking model with document lists, while RLPS-H improves model training by designing a layered structure and introducing document pairs. In addition, we also design a feedback-aware personalized ranking component to capture the user’s feedback, which impacts the user interest profile for the next query. Significant improvements over existing personalized search models are observed in the experiments on the public AOL search log and a commercial log.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"24 1","pages":"1 - 29"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Information Systems (TOIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3446617","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Personalized search is a promising way to improve search qualities by taking user interests into consideration. Recently, machine learning and deep learning techniques have been successfully applied to search result personalization. Most existing models simply regard the personal search history as a static set of user behaviors and learn fixed ranking strategies based on all the recorded data. Though improvements have been achieved, the essence that the search process is a sequence of interactions between the search engine and user is ignored. The user’s interests may dynamically change during the search process, therefore, it would be more helpful if a personalized search model could track the whole interaction process and adjust its ranking strategy continuously. In this article, we adapt reinforcement learning to personalized search and propose a framework, referred to as RLPS. It utilizes a Markov Decision Process (MDP) to track sequential interactions between the user and search engine, and continuously update the underlying personalized ranking model with the user’s real-time feedback to learn the user’s dynamic interests. Within this framework, we implement two models: the listwise RLPS-L and the hierarchical RLPS-H. RLPS-L interacts with users and trains the ranking model with document lists, while RLPS-H improves model training by designing a layered structure and introducing document pairs. In addition, we also design a feedback-aware personalized ranking component to capture the user’s feedback, which impacts the user interest profile for the next query. Significant improvements over existing personalized search models are observed in the experiments on the public AOL search log and a commercial log.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

RLPS:基于强化学习的个性化搜索框架

个性化搜索是一种很有前途的方法，可以通过考虑用户的兴趣来提高搜索质量。近年来，机器学习和深度学习技术已成功应用于搜索结果个性化。现有的大多数模型只是简单地将个人搜索历史视为一组静态的用户行为，并根据所有记录的数据学习固定的排名策略。虽然已经取得了改进，但搜索过程的本质是搜索引擎和用户之间的一系列交互，这一点被忽视了。在搜索过程中，用户的兴趣可能会发生动态变化，因此，个性化搜索模型如果能够跟踪整个交互过程并不断调整其排名策略，将会更有帮助。在本文中，我们将强化学习应用于个性化搜索，并提出了一个框架，称为RLPS。它利用马尔可夫决策过程(Markov Decision Process, MDP)跟踪用户与搜索引擎之间的顺序交互，并利用用户的实时反馈不断更新底层个性化排名模型，以了解用户的动态兴趣。在这个框架内，我们实现了两个模型:列表式RLPS-L和分层式RLPS-H。RLPS-L与用户交互，使用文档列表训练排名模型，而RLPS-H通过设计分层结构和引入文档对来改进模型训练。此外，我们还设计了一个反馈感知的个性化排名组件来捕获用户的反馈，这些反馈会影响下一个查询的用户兴趣配置文件。在公共AOL搜索日志和商业日志上的实验中，可以观察到对现有个性化搜索模型的显著改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Information Systems (TOIS)

自引率

0.00%

发文量