交互式时尚推荐的多模态对话框状态跟踪

Yaxiong Wu, C. Macdonald, I. Ounis
{"title":"交互式时尚推荐的多模态对话框状态跟踪","authors":"Yaxiong Wu, C. Macdonald, I. Ounis","doi":"10.1145/3523227.3546774","DOIUrl":null,"url":null,"abstract":"Multi-modal interactive recommendation is a type of task that allows users to receive visual recommendations and express natural-language feedback about the recommended items across multiple iterations of interactions. However, such multi-modal dialog sequences (i.e. turns consisting of the system’s visual recommendations and the user’s natural-language feedback) make it challenging to correctly incorporate the users’ preferences across multiple turns. Indeed, the existing formulations of interactive recommender systems suffer from their inability to capture the multi-modal sequential dependencies of textual feedback and visual recommendations because of their use of recurrent neural network-based (i.e., RNN-based) or transformer-based models. To alleviate the multi-modal sequential dependency issue, we propose a novel multi-modal recurrent attention network (MMRAN) model to effectively incorporate the users’ preferences over the long visual dialog sequences of the users’ natural-language feedback and the system’s visual recommendations. Specifically, we leverage a gated recurrent network (GRN) with a feedback gate to separately process the textual and visual representations of natural-language feedback and visual recommendations into hidden states (i.e. representations of the past interactions) for multi-modal sequence combination. In addition, we apply a multi-head attention network (MAN) to refine the hidden states generated by the GRN and to further enhance the model’s ability in dynamic state tracking. Following previous work, we conduct extensive experiments on the Fashion IQ Dresses, Shirts, and Tops & Tees datasets to assess the effectiveness of our proposed model by using a vision-language transformer-based user simulator as a surrogate for real human users. Our results show that our proposed MMRAN model can significantly outperform several existing state-of-the-art baseline models.","PeriodicalId":443279,"journal":{"name":"Proceedings of the 16th ACM Conference on Recommender Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Multi-Modal Dialog State Tracking for Interactive Fashion Recommendation\",\"authors\":\"Yaxiong Wu, C. Macdonald, I. Ounis\",\"doi\":\"10.1145/3523227.3546774\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-modal interactive recommendation is a type of task that allows users to receive visual recommendations and express natural-language feedback about the recommended items across multiple iterations of interactions. However, such multi-modal dialog sequences (i.e. turns consisting of the system’s visual recommendations and the user’s natural-language feedback) make it challenging to correctly incorporate the users’ preferences across multiple turns. Indeed, the existing formulations of interactive recommender systems suffer from their inability to capture the multi-modal sequential dependencies of textual feedback and visual recommendations because of their use of recurrent neural network-based (i.e., RNN-based) or transformer-based models. To alleviate the multi-modal sequential dependency issue, we propose a novel multi-modal recurrent attention network (MMRAN) model to effectively incorporate the users’ preferences over the long visual dialog sequences of the users’ natural-language feedback and the system’s visual recommendations. Specifically, we leverage a gated recurrent network (GRN) with a feedback gate to separately process the textual and visual representations of natural-language feedback and visual recommendations into hidden states (i.e. representations of the past interactions) for multi-modal sequence combination. In addition, we apply a multi-head attention network (MAN) to refine the hidden states generated by the GRN and to further enhance the model’s ability in dynamic state tracking. Following previous work, we conduct extensive experiments on the Fashion IQ Dresses, Shirts, and Tops & Tees datasets to assess the effectiveness of our proposed model by using a vision-language transformer-based user simulator as a surrogate for real human users. Our results show that our proposed MMRAN model can significantly outperform several existing state-of-the-art baseline models.\",\"PeriodicalId\":443279,\"journal\":{\"name\":\"Proceedings of the 16th ACM Conference on Recommender Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 16th ACM Conference on Recommender Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3523227.3546774\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th ACM Conference on Recommender Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3523227.3546774","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

多模态交互推荐是一种允许用户接收视觉推荐并跨多个交互迭代表达关于推荐项目的自然语言反馈的任务。然而,这种多模式对话序列(即由系统的视觉建议和用户的自然语言反馈组成的回合)使得在多个回合中正确整合用户的偏好变得具有挑战性。事实上,现有的交互式推荐系统由于使用基于循环神经网络(即基于rnn)或基于变压器的模型而无法捕获文本反馈和视觉推荐的多模态顺序依赖关系。为了缓解多模态顺序依赖问题,我们提出了一种新的多模态循环注意网络(MMRAN)模型,以有效地将用户的偏好与用户自然语言反馈的长视觉对话序列和系统的视觉推荐相结合。具体来说,我们利用带有反馈门的门控循环网络(GRN)将自然语言反馈和视觉推荐的文本和视觉表示分别处理为多模态序列组合的隐藏状态(即过去相互作用的表示)。此外,我们采用多头注意网络(MAN)对GRN产生的隐藏状态进行细化,进一步增强了模型的动态跟踪能力。在之前的工作之后,我们对Fashion IQ Dresses, Shirts和Tops & Tees数据集进行了广泛的实验,通过使用基于视觉语言转换器的用户模拟器作为真实人类用户的代理来评估我们提出的模型的有效性。我们的研究结果表明,我们提出的MMRAN模型可以显著优于几个现有的最先进的基线模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Multi-Modal Dialog State Tracking for Interactive Fashion Recommendation
Multi-modal interactive recommendation is a type of task that allows users to receive visual recommendations and express natural-language feedback about the recommended items across multiple iterations of interactions. However, such multi-modal dialog sequences (i.e. turns consisting of the system’s visual recommendations and the user’s natural-language feedback) make it challenging to correctly incorporate the users’ preferences across multiple turns. Indeed, the existing formulations of interactive recommender systems suffer from their inability to capture the multi-modal sequential dependencies of textual feedback and visual recommendations because of their use of recurrent neural network-based (i.e., RNN-based) or transformer-based models. To alleviate the multi-modal sequential dependency issue, we propose a novel multi-modal recurrent attention network (MMRAN) model to effectively incorporate the users’ preferences over the long visual dialog sequences of the users’ natural-language feedback and the system’s visual recommendations. Specifically, we leverage a gated recurrent network (GRN) with a feedback gate to separately process the textual and visual representations of natural-language feedback and visual recommendations into hidden states (i.e. representations of the past interactions) for multi-modal sequence combination. In addition, we apply a multi-head attention network (MAN) to refine the hidden states generated by the GRN and to further enhance the model’s ability in dynamic state tracking. Following previous work, we conduct extensive experiments on the Fashion IQ Dresses, Shirts, and Tops & Tees datasets to assess the effectiveness of our proposed model by using a vision-language transformer-based user simulator as a surrogate for real human users. Our results show that our proposed MMRAN model can significantly outperform several existing state-of-the-art baseline models.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Heterogeneous Graph Representation Learning for multi-target Cross-Domain Recommendation Imbalanced Data Sparsity as a Source of Unfair Bias in Collaborative Filtering Position Awareness Modeling with Knowledge Distillation for CTR Prediction Multi-Modal Dialog State Tracking for Interactive Fashion Recommendation Denoising Self-Attentive Sequential Recommendation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1