基于多模态注意网络的个性化视觉解释时尚推荐:面向视觉可解释推荐

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval Pub Date : 2019-07-18 DOI:10.1145/3331184.3331254

Xu Chen, H. Chen, Hongteng Xu, Yongfeng Zhang, Yixin Cao, Zheng Qin, H. Zha

{"title":"基于多模态注意网络的个性化视觉解释时尚推荐:面向视觉可解释推荐","authors":"Xu Chen, H. Chen, Hongteng Xu, Yongfeng Zhang, Yixin Cao, Zheng Qin, H. Zha","doi":"10.1145/3331184.3331254","DOIUrl":null,"url":null,"abstract":"Fashion recommendation has attracted increasing attention from both industry and academic communities. This paper proposes a novel neural architecture for fashion recommendation based on both image region-level features and user review information. Our basic intuition is that: for a fashion image, not all the regions are equally important for the users, i.e., people usually care about a few parts of the fashion image. To model such human sense, we learn an attention model over many pre-segmented image regions, based on which we can understand where a user is really interested in on the image, and correspondingly, represent the image in a more accurate manner. In addition, by discovering such fine-grained visual preference, we can visually explain a recommendation by highlighting some regions of its image. For better learning the attention model, we also introduce user review information as a weak supervision signal to collect more comprehensive user preference. In our final framework, the visual and textual features are seamlessly coupled by a multimodal attention network. Based on this architecture, we can not only provide accurate recommendation, but also can accompany each recommended item with novel visual explanations. We conduct extensive experiments to demonstrate the superiority of our proposed model in terms of Top-N recommendation, and also we build a collectively labeled dataset for evaluating our provided visual explanations in a quantitative manner.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"9 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"176","resultStr":"{\"title\":\"Personalized Fashion Recommendation with Visual Explanations based on Multimodal Attention Network: Towards Visually Explainable Recommendation\",\"authors\":\"Xu Chen, H. Chen, Hongteng Xu, Yongfeng Zhang, Yixin Cao, Zheng Qin, H. Zha\",\"doi\":\"10.1145/3331184.3331254\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Fashion recommendation has attracted increasing attention from both industry and academic communities. This paper proposes a novel neural architecture for fashion recommendation based on both image region-level features and user review information. Our basic intuition is that: for a fashion image, not all the regions are equally important for the users, i.e., people usually care about a few parts of the fashion image. To model such human sense, we learn an attention model over many pre-segmented image regions, based on which we can understand where a user is really interested in on the image, and correspondingly, represent the image in a more accurate manner. In addition, by discovering such fine-grained visual preference, we can visually explain a recommendation by highlighting some regions of its image. For better learning the attention model, we also introduce user review information as a weak supervision signal to collect more comprehensive user preference. In our final framework, the visual and textual features are seamlessly coupled by a multimodal attention network. Based on this architecture, we can not only provide accurate recommendation, but also can accompany each recommended item with novel visual explanations. We conduct extensive experiments to demonstrate the superiority of our proposed model in terms of Top-N recommendation, and also we build a collectively labeled dataset for evaluating our provided visual explanations in a quantitative manner.\",\"PeriodicalId\":20700,\"journal\":{\"name\":\"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval\",\"volume\":\"9 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"176\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3331184.3331254\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3331184.3331254","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 176

摘要

时尚推荐越来越受到业界和学术界的关注。本文提出了一种基于图像区域特征和用户评论信息的时尚推荐神经网络结构。我们的基本直觉是:对于一个时尚形象来说，并不是所有的区域对用户来说都是同等重要的，也就是说，人们通常只关心时尚形象的一部分。为了模拟人类的这种感觉，我们学习了一个在许多预先分割的图像区域上的注意力模型，基于这个模型，我们可以理解用户在图像上真正感兴趣的地方，并相应地以更准确的方式表示图像。此外，通过发现这种细粒度的视觉偏好，我们可以通过突出显示其图像的某些区域来直观地解释推荐。为了更好地学习注意模型，我们还引入了用户评论信息作为弱监督信号，以收集更全面的用户偏好。在我们的最终框架中，视觉和文本特征通过多模态注意力网络无缝耦合。基于这种架构，我们不仅可以提供准确的推荐，而且还可以为每个推荐的项目提供新颖的视觉解释。我们进行了大量的实验来证明我们提出的模型在Top-N推荐方面的优越性，并且我们还建立了一个集体标记的数据集，以定量的方式评估我们提供的视觉解释。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Personalized Fashion Recommendation with Visual Explanations based on Multimodal Attention Network: Towards Visually Explainable Recommendation

Fashion recommendation has attracted increasing attention from both industry and academic communities. This paper proposes a novel neural architecture for fashion recommendation based on both image region-level features and user review information. Our basic intuition is that: for a fashion image, not all the regions are equally important for the users, i.e., people usually care about a few parts of the fashion image. To model such human sense, we learn an attention model over many pre-segmented image regions, based on which we can understand where a user is really interested in on the image, and correspondingly, represent the image in a more accurate manner. In addition, by discovering such fine-grained visual preference, we can visually explain a recommendation by highlighting some regions of its image. For better learning the attention model, we also introduce user review information as a weak supervision signal to collect more comprehensive user preference. In our final framework, the visual and textual features are seamlessly coupled by a multimodal attention network. Based on this architecture, we can not only provide accurate recommendation, but also can accompany each recommended item with novel visual explanations. We conduct extensive experiments to demonstrate the superiority of our proposed model in terms of Top-N recommendation, and also we build a collectively labeled dataset for evaluating our provided visual explanations in a quantitative manner.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助