Improving Bandit Learning Via Heterogeneous Information Networks: Algorithms and Applications

Xiaoying Zhang, Hong Xie, John C.S. Lui
{"title":"Improving Bandit Learning Via Heterogeneous Information Networks: Algorithms and Applications","authors":"Xiaoying Zhang, Hong Xie, John C.S. Lui","doi":"10.1145/3522590","DOIUrl":null,"url":null,"abstract":"Contextual bandit serves as an invaluable tool to balance the exploration vs. exploitation tradeoff in various applications such as online recommendation. In many applications, heterogeneous information networks (HINs) provide rich side information for contextual bandits, such as different types of attributes and relationships among users and items. In this article, we propose the first HIN-assisted contextual bandit framework, which utilizes a given HIN to assist contextual bandit learning. The proposed framework uses meta-paths in HIN to extract rich relations among users and items for the contextual bandit. The main challenge is how to leverage these relations, since users’ preference over items, the target of our online learning, are closely related to users’ preference over meta-paths. However, it is unknown which meta-path a user prefers more. Thus, both preferences are needed to be learned in an online fashion with exploration vs. exploitation tradeoff balanced. We propose the HIN-assisted upper confidence bound (HUCB) algorithm to address such a challenge. For each meta-path, the HUCB algorithm employs an independent base bandit algorithm to handle online item recommendations by leveraging the relationship captured in this meta-path. A bandit master is then employed to learn users’ preference over meta-paths to dynamically combine base bandit algorithms with a balance of exploration vs. exploitation tradeoff. We theoretically prove that the HUCB algorithm can achieve similar performance compared with the optimal algorithm where each user is served according to his true preference over meta-paths (assuming the optimal algorithm knows the preference). Moreover, we prove that the HUCB algorithm benefits from leveraging HIN in achieving a smaller regret upper bound than the baseline algorithm without leveraging HIN. Experimental results on a synthetic dataset, as well as real datasets from LastFM and Yelp demonstrate the fast learning speed of the HUCB algorithm.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"31 11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Knowledge Discovery from Data (TKDD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3522590","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Contextual bandit serves as an invaluable tool to balance the exploration vs. exploitation tradeoff in various applications such as online recommendation. In many applications, heterogeneous information networks (HINs) provide rich side information for contextual bandits, such as different types of attributes and relationships among users and items. In this article, we propose the first HIN-assisted contextual bandit framework, which utilizes a given HIN to assist contextual bandit learning. The proposed framework uses meta-paths in HIN to extract rich relations among users and items for the contextual bandit. The main challenge is how to leverage these relations, since users’ preference over items, the target of our online learning, are closely related to users’ preference over meta-paths. However, it is unknown which meta-path a user prefers more. Thus, both preferences are needed to be learned in an online fashion with exploration vs. exploitation tradeoff balanced. We propose the HIN-assisted upper confidence bound (HUCB) algorithm to address such a challenge. For each meta-path, the HUCB algorithm employs an independent base bandit algorithm to handle online item recommendations by leveraging the relationship captured in this meta-path. A bandit master is then employed to learn users’ preference over meta-paths to dynamically combine base bandit algorithms with a balance of exploration vs. exploitation tradeoff. We theoretically prove that the HUCB algorithm can achieve similar performance compared with the optimal algorithm where each user is served according to his true preference over meta-paths (assuming the optimal algorithm knows the preference). Moreover, we prove that the HUCB algorithm benefits from leveraging HIN in achieving a smaller regret upper bound than the baseline algorithm without leveraging HIN. Experimental results on a synthetic dataset, as well as real datasets from LastFM and Yelp demonstrate the fast learning speed of the HUCB algorithm.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过异构信息网络改进强盗学习:算法和应用
在各种应用程序(如在线推荐)中,上下文强盗是平衡探索与利用权衡的宝贵工具。在许多应用程序中,异构信息网络(HINs)为上下文强盗提供了丰富的侧信息,例如不同类型的属性以及用户和项之间的关系。在本文中,我们提出了第一个HIN辅助上下文强盗框架,它利用给定的HIN来辅助上下文强盗学习。提出的框架使用HIN中的元路径为上下文强盗提取用户和项目之间的丰富关系。主要的挑战是如何利用这些关系,因为用户对项目的偏好(我们在线学习的目标)与用户对元路径的偏好密切相关。但是,不知道用户更喜欢哪一种元路径。因此,这两种偏好都需要以在线方式学习,并平衡探索与利用之间的权衡。我们提出了hin辅助上置信度界(HUCB)算法来解决这一挑战。对于每个元路径,HUCB算法使用一个独立的基本算法,通过利用元路径中捕获的关系来处理在线项目推荐。然后使用强盗大师来学习用户对元路径的偏好,以动态地将基本强盗算法与探索与利用权衡的平衡结合起来。我们从理论上证明,与最优算法相比,HUCB算法可以实现类似的性能,其中每个用户根据他在元路径上的真实偏好(假设最优算法知道偏好)得到服务。此外,我们证明了利用HIN的HUCB算法比不利用HIN的基线算法获得更小的遗憾上界。在合成数据集以及LastFM和Yelp的真实数据集上的实验结果表明,HUCB算法具有较快的学习速度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
相关文献
A comparative study of the sealing ability of two root canal obturation techniques
IF 4.2 2区 医学Journal of endodonticsPub Date : 1995-09-01 DOI: 10.1016/S0099-2399(06)81526-8
Antonio Pallarés DMD , Vicente Faus DMD
Effect of radiotherapy on the coronal-sealing ability of two different root canal sealing materials.
IF 0.9 4区 医学Nigerian Journal of Clinical PracticePub Date : 2018-08-01 DOI: 10.4103/njcp.njcp_377_17
E Hazar Bodrumlu, E Bodrumlu
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Machine Learning-based Short-term Rainfall Prediction from Sky Data Incremental Feature Spaces Learning with Label Scarcity Multi-objective Learning to Overcome Catastrophic Forgetting in Time-series Applications Combining Filtering and Cross-Correlation Efficiently for Streaming Time Series Segment-Wise Time-Varying Dynamic Bayesian Network with Graph Regularization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1