基于反事实数据增强的内在动机强化学习推荐

World Wide Web Pub Date : 2023-07-15 DOI:10.1007/s11280-023-01187-7

Xiaocong Chen, Siyu Wang, Lianyong Qi, Yong Li, Lina Yao

{"title":"基于反事实数据增强的内在动机强化学习推荐","authors":"Xiaocong Chen, Siyu Wang, Lianyong Qi, Yong Li, Lina Yao","doi":"10.1007/s11280-023-01187-7","DOIUrl":null,"url":null,"abstract":"<p>Deep reinforcement learning (DRL) has shown promising results in modeling dynamic user preferences in RS in recent literature. However, training a DRL agent in the sparse RS environment poses a significant challenge. This is because the agent must balance between exploring informative user-item interaction trajectories and using existing trajectories for policy learning, a known exploration and exploitation trade-off. This trade-off greatly affects the recommendation performance when the environment is sparse. In DRL-based RS, balancing exploration and exploitation is even more challenging as the agent needs to deeply explore informative trajectories and efficiently exploit them in the context of RS. To address this issue, we propose a novel intrinsically motivated reinforcement learning (IMRL) method that enhances the agent’s capability to explore informative interaction trajectories in the sparse environment. We further enrich these trajectories via an adaptive counterfactual augmentation strategy with a customised threshold to improve their efficiency in exploitation. Our approach is evaluated on six offline datasets and three online simulation platforms, demonstrating its superiority over existing state-of-the-art methods. The extensive experiments show that our IMRL method outperforms other methods in terms of recommendation performance in the sparse RS environment.</p>","PeriodicalId":501180,"journal":{"name":"World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Intrinsically motivated reinforcement learning based recommendation with counterfactual data augmentation\",\"authors\":\"Xiaocong Chen, Siyu Wang, Lianyong Qi, Yong Li, Lina Yao\",\"doi\":\"10.1007/s11280-023-01187-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Deep reinforcement learning (DRL) has shown promising results in modeling dynamic user preferences in RS in recent literature. However, training a DRL agent in the sparse RS environment poses a significant challenge. This is because the agent must balance between exploring informative user-item interaction trajectories and using existing trajectories for policy learning, a known exploration and exploitation trade-off. This trade-off greatly affects the recommendation performance when the environment is sparse. In DRL-based RS, balancing exploration and exploitation is even more challenging as the agent needs to deeply explore informative trajectories and efficiently exploit them in the context of RS. To address this issue, we propose a novel intrinsically motivated reinforcement learning (IMRL) method that enhances the agent’s capability to explore informative interaction trajectories in the sparse environment. We further enrich these trajectories via an adaptive counterfactual augmentation strategy with a customised threshold to improve their efficiency in exploitation. Our approach is evaluated on six offline datasets and three online simulation platforms, demonstrating its superiority over existing state-of-the-art methods. The extensive experiments show that our IMRL method outperforms other methods in terms of recommendation performance in the sparse RS environment.</p>\",\"PeriodicalId\":501180,\"journal\":{\"name\":\"World Wide Web\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"World Wide Web\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s11280-023-01187-7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Wide Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11280-023-01187-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在最近的文献中，深度强化学习(DRL)在RS中动态用户偏好建模方面显示出有希望的结果。然而，在稀疏的RS环境中训练DRL代理是一个重大的挑战。这是因为智能体必须在探索信息丰富的用户-物品交互轨迹和使用现有轨迹进行策略学习之间取得平衡，这是一种已知的探索和利用权衡。当环境稀疏时，这种权衡极大地影响了推荐性能。在基于drl的RS中，由于智能体需要深入探索信息轨迹并在RS环境中有效地利用它们，因此平衡探索和利用更加具有挑战性。为了解决这个问题，我们提出了一种新的内在动机强化学习(IMRL)方法，该方法增强了智能体在稀疏环境中探索信息交互轨迹的能力。我们通过定制阈值的自适应反事实增强策略进一步丰富这些轨迹，以提高其开发效率。我们的方法在六个离线数据集和三个在线仿真平台上进行了评估，证明了其优于现有最先进的方法。大量的实验表明，我们的IMRL方法在稀疏RS环境下的推荐性能优于其他方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Intrinsically motivated reinforcement learning based recommendation with counterfactual data augmentation

Deep reinforcement learning (DRL) has shown promising results in modeling dynamic user preferences in RS in recent literature. However, training a DRL agent in the sparse RS environment poses a significant challenge. This is because the agent must balance between exploring informative user-item interaction trajectories and using existing trajectories for policy learning, a known exploration and exploitation trade-off. This trade-off greatly affects the recommendation performance when the environment is sparse. In DRL-based RS, balancing exploration and exploitation is even more challenging as the agent needs to deeply explore informative trajectories and efficiently exploit them in the context of RS. To address this issue, we propose a novel intrinsically motivated reinforcement learning (IMRL) method that enhances the agent’s capability to explore informative interaction trajectories in the sparse environment. We further enrich these trajectories via an adaptive counterfactual augmentation strategy with a customised threshold to improve their efficiency in exploitation. Our approach is evaluated on six offline datasets and three online simulation platforms, demonstrating its superiority over existing state-of-the-art methods. The extensive experiments show that our IMRL method outperforms other methods in terms of recommendation performance in the sparse RS environment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

World Wide Web

自引率

0.00%

发文量