Hands-on Reinforcement Learning for Recommender Systems - From Bandits to SlateQ to Offline RL with Ray RLlib

Proceedings of the 16th ACM Conference on Recommender Systems Pub Date : 2022-09-18 DOI:10.1145/3523227.3547370

Christy D. Bergman, Kourosh Hakhamaneshi

引用次数: 0

Abstract

Reinforcement learning (RL) is gaining traction as a complementary approach to supervised learning for RecSys due to its ability to solve sequential decision-making processes for delayed rewards. Recent advances in offline reinforcement learning, off-policy evaluation, and more scalable, performant system design with the ability to run code in parallel, have made RL more tractable for the RecSys real time use cases. This tutorial introduces RLlib [9], a comprehensive open-source Python RL framework built for production workloads. RLlib is built on top of open-source Ray [8], an easy-to-use, distributed computing framework for Python that can handle complex, heterogeneous applications. Ray and RLlib run on compute clusters on any cloud without vendor lock. Using Colab notebooks, you will leave this tutorial with a complete, working example of parallelized Python RL code using RLlib for RecSys on a github repo.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

推荐系统的动手强化学习-从Bandits到SlateQ到离线RL与Ray RLlib

由于强化学习(RL)能够解决延迟奖励的顺序决策过程，因此它作为监督学习的补充方法在RecSys中获得了越来越多的关注。最近在离线强化学习、离线策略评估、更可扩展、性能更好的系统设计以及并行运行代码的能力方面取得的进展，使得强化学习在RecSys实时用例中更容易处理。本教程介绍了RLlib[9]，一个为生产工作负载构建的全面的开源Python RL框架。RLlib建立在开源的Ray[8]之上，Ray是一个易于使用的Python分布式计算框架，可以处理复杂的异构应用程序。Ray和RLlib可以在没有供应商锁定的任何云上的计算集群上运行。使用Colab笔记本，您将在本教程中留下一个完整的，使用RLlib for RecSys在github repo上并行化Python RL代码的工作示例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 16th ACM Conference on Recommender Systems

自引率

0.00%

发文量

期刊最新文献

Heterogeneous Graph Representation Learning for multi-target Cross-Domain Recommendation Imbalanced Data Sparsity as a Source of Unfair Bias in Collaborative Filtering Position Awareness Modeling with Knowledge Distillation for CTR Prediction Multi-Modal Dialog State Tracking for Interactive Fashion Recommendation Denoising Self-Attentive Sequential Recommendation