Improving Bandit Learning Via Heterogeneous Information Networks: Algorithms and Applications

ACM Transactions on Knowledge Discovery from Data (TKDD) Pub Date : 2022-03-15 DOI:10.1145/3522590

Xiaoying Zhang, Hong Xie, John C.S. Lui

{"title":"Improving Bandit Learning Via Heterogeneous Information Networks: Algorithms and Applications","authors":"Xiaoying Zhang, Hong Xie, John C.S. Lui","doi":"10.1145/3522590","DOIUrl":null,"url":null,"abstract":"Contextual bandit serves as an invaluable tool to balance the exploration vs. exploitation tradeoff in various applications such as online recommendation. In many applications, heterogeneous information networks (HINs) provide rich side information for contextual bandits, such as different types of attributes and relationships among users and items. In this article, we propose the first HIN-assisted contextual bandit framework, which utilizes a given HIN to assist contextual bandit learning. The proposed framework uses meta-paths in HIN to extract rich relations among users and items for the contextual bandit. The main challenge is how to leverage these relations, since users’ preference over items, the target of our online learning, are closely related to users’ preference over meta-paths. However, it is unknown which meta-path a user prefers more. Thus, both preferences are needed to be learned in an online fashion with exploration vs. exploitation tradeoff balanced. We propose the HIN-assisted upper confidence bound (HUCB) algorithm to address such a challenge. For each meta-path, the HUCB algorithm employs an independent base bandit algorithm to handle online item recommendations by leveraging the relationship captured in this meta-path. A bandit master is then employed to learn users’ preference over meta-paths to dynamically combine base bandit algorithms with a balance of exploration vs. exploitation tradeoff. We theoretically prove that the HUCB algorithm can achieve similar performance compared with the optimal algorithm where each user is served according to his true preference over meta-paths (assuming the optimal algorithm knows the preference). Moreover, we prove that the HUCB algorithm benefits from leveraging HIN in achieving a smaller regret upper bound than the baseline algorithm without leveraging HIN. Experimental results on a synthetic dataset, as well as real datasets from LastFM and Yelp demonstrate the fast learning speed of the HUCB algorithm.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"31 11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Knowledge Discovery from Data (TKDD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3522590","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Contextual bandit serves as an invaluable tool to balance the exploration vs. exploitation tradeoff in various applications such as online recommendation. In many applications, heterogeneous information networks (HINs) provide rich side information for contextual bandits, such as different types of attributes and relationships among users and items. In this article, we propose the first HIN-assisted contextual bandit framework, which utilizes a given HIN to assist contextual bandit learning. The proposed framework uses meta-paths in HIN to extract rich relations among users and items for the contextual bandit. The main challenge is how to leverage these relations, since users’ preference over items, the target of our online learning, are closely related to users’ preference over meta-paths. However, it is unknown which meta-path a user prefers more. Thus, both preferences are needed to be learned in an online fashion with exploration vs. exploitation tradeoff balanced. We propose the HIN-assisted upper confidence bound (HUCB) algorithm to address such a challenge. For each meta-path, the HUCB algorithm employs an independent base bandit algorithm to handle online item recommendations by leveraging the relationship captured in this meta-path. A bandit master is then employed to learn users’ preference over meta-paths to dynamically combine base bandit algorithms with a balance of exploration vs. exploitation tradeoff. We theoretically prove that the HUCB algorithm can achieve similar performance compared with the optimal algorithm where each user is served according to his true preference over meta-paths (assuming the optimal algorithm knows the preference). Moreover, we prove that the HUCB algorithm benefits from leveraging HIN in achieving a smaller regret upper bound than the baseline algorithm without leveraging HIN. Experimental results on a synthetic dataset, as well as real datasets from LastFM and Yelp demonstrate the fast learning speed of the HUCB algorithm.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过异构信息网络改进强盗学习:算法和应用

在各种应用程序(如在线推荐)中，上下文强盗是平衡探索与利用权衡的宝贵工具。在许多应用程序中，异构信息网络(HINs)为上下文强盗提供了丰富的侧信息，例如不同类型的属性以及用户和项之间的关系。在本文中，我们提出了第一个HIN辅助上下文强盗框架，它利用给定的HIN来辅助上下文强盗学习。提出的框架使用HIN中的元路径为上下文强盗提取用户和项目之间的丰富关系。主要的挑战是如何利用这些关系，因为用户对项目的偏好(我们在线学习的目标)与用户对元路径的偏好密切相关。但是，不知道用户更喜欢哪一种元路径。因此，这两种偏好都需要以在线方式学习，并平衡探索与利用之间的权衡。我们提出了hin辅助上置信度界(HUCB)算法来解决这一挑战。对于每个元路径，HUCB算法使用一个独立的基本算法，通过利用元路径中捕获的关系来处理在线项目推荐。然后使用强盗大师来学习用户对元路径的偏好，以动态地将基本强盗算法与探索与利用权衡的平衡结合起来。我们从理论上证明，与最优算法相比，HUCB算法可以实现类似的性能，其中每个用户根据他在元路径上的真实偏好(假设最优算法知道偏好)得到服务。此外，我们证明了利用HIN的HUCB算法比不利用HIN的基线算法获得更小的遗憾上界。在合成数据集以及LastFM和Yelp的真实数据集上的实验结果表明，HUCB算法具有较快的学习速度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助