Desheng Cai, Shengsheng Qian, Quan Fang, Jun Hu, Changsheng Xu
{"title":"Adaptive Anti-Bottleneck Multi-Modal Graph Learning Network for Personalized Micro-video Recommendation","authors":"Desheng Cai, Shengsheng Qian, Quan Fang, Jun Hu, Changsheng Xu","doi":"10.1145/3503161.3548420","DOIUrl":null,"url":null,"abstract":"Micro-video recommendation has attracted extensive research attention with the increasing popularity of micro-video sharing platforms. There exists a substantial amount of excellent efforts made to the micro-video recommendation task. Recently, homogeneous (or heterogeneous) GNN-based approaches utilize graph convolutional operators (or meta-path based similarity measures) to learn meaningful representations for users and micro-videos and show promising performance for the micro-video recommendation task. However, these methods may suffer from the following problems: (1) fail to aggregate information from distant or long-range nodes; (2) ignore the varying intensity of users' preferences for different items in micro-video recommendations; (3) neglect the similarities of multi-modal contents of micro-videos for recommendation tasks. In this paper, we propose a novel Adaptive Anti-Bottleneck Multi-Modal Graph Learning Network for personalized micro-video recommendation. Specifically, we design a collaborative representation learning module and a semantic representation learning module to fully exploit user-video interaction information and the similarities of micro-videos, respectively. Furthermore, we utilize an anti-bottleneck module to automatically learn the importance weights of short-range and long-range neighboring nodes to obtain more expressive representations of users and micro-videos. Finally, to consider the varying intensity of users' preferences for different micro-videos, we design and optimize an adaptive recommendation loss to train our model in an end-to-end manner. We evaluate our method on three real-world datasets and the results demonstrate that the proposed model outperforms the baselines.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"11 11","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM International Conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3503161.3548420","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Micro-video recommendation has attracted extensive research attention with the increasing popularity of micro-video sharing platforms. There exists a substantial amount of excellent efforts made to the micro-video recommendation task. Recently, homogeneous (or heterogeneous) GNN-based approaches utilize graph convolutional operators (or meta-path based similarity measures) to learn meaningful representations for users and micro-videos and show promising performance for the micro-video recommendation task. However, these methods may suffer from the following problems: (1) fail to aggregate information from distant or long-range nodes; (2) ignore the varying intensity of users' preferences for different items in micro-video recommendations; (3) neglect the similarities of multi-modal contents of micro-videos for recommendation tasks. In this paper, we propose a novel Adaptive Anti-Bottleneck Multi-Modal Graph Learning Network for personalized micro-video recommendation. Specifically, we design a collaborative representation learning module and a semantic representation learning module to fully exploit user-video interaction information and the similarities of micro-videos, respectively. Furthermore, we utilize an anti-bottleneck module to automatically learn the importance weights of short-range and long-range neighboring nodes to obtain more expressive representations of users and micro-videos. Finally, to consider the varying intensity of users' preferences for different micro-videos, we design and optimize an adaptive recommendation loss to train our model in an end-to-end manner. We evaluate our method on three real-world datasets and the results demonstrate that the proposed model outperforms the baselines.