在联合嵌入空间中使用探索-开发范式浏览大型视觉目录的多模态对话

Indrani Bhattacharya, Arkabandhu Chowdhury, V. Raykar
{"title":"在联合嵌入空间中使用探索-开发范式浏览大型视觉目录的多模态对话","authors":"Indrani Bhattacharya, Arkabandhu Chowdhury, V. Raykar","doi":"10.1145/3323873.3325036","DOIUrl":null,"url":null,"abstract":"We present a multimodal dialog (MMD) system to assist online customers in visually browsing through large catalogs. Visual browsing allows customers to explore products beyond exact search results. We focus on a slightly asymmetric version of a complete MMD system, in that our agent can understand both text and image queries, but responds only in images. We formulate our problem of \"showing the k best images to a user'', based on the dialog context so far, as sampling from a Gaussian Mixture Model (GMM) in a high dimensional joint multimodal embedding space. The joint embedding space is learned by Common Representation Learning and embeds both the text and the image queries. Our system remembers the context of the dialog, and uses an exploration-exploitation paradigm to assist in visual browsing. We train and evaluate the system on an MMD dataset that we synthesize from large catalog data. Our experiments and preliminary human evaluation show that the system is capable of learning and displaying relevant products with an average cosine similarity of 0.85 to the ground truth results, and is capable of engaging human users.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Multimodal Dialog for Browsing Large Visual Catalogs using Exploration-Exploitation Paradigm in a Joint Embedding Space\",\"authors\":\"Indrani Bhattacharya, Arkabandhu Chowdhury, V. Raykar\",\"doi\":\"10.1145/3323873.3325036\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a multimodal dialog (MMD) system to assist online customers in visually browsing through large catalogs. Visual browsing allows customers to explore products beyond exact search results. We focus on a slightly asymmetric version of a complete MMD system, in that our agent can understand both text and image queries, but responds only in images. We formulate our problem of \\\"showing the k best images to a user'', based on the dialog context so far, as sampling from a Gaussian Mixture Model (GMM) in a high dimensional joint multimodal embedding space. The joint embedding space is learned by Common Representation Learning and embeds both the text and the image queries. Our system remembers the context of the dialog, and uses an exploration-exploitation paradigm to assist in visual browsing. We train and evaluate the system on an MMD dataset that we synthesize from large catalog data. Our experiments and preliminary human evaluation show that the system is capable of learning and displaying relevant products with an average cosine similarity of 0.85 to the ground truth results, and is capable of engaging human users.\",\"PeriodicalId\":149041,\"journal\":{\"name\":\"Proceedings of the 2019 on International Conference on Multimedia Retrieval\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-01-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2019 on International Conference on Multimedia Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3323873.3325036\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3323873.3325036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

我们提出了一个多模式对话(MMD)系统,以帮助在线客户在视觉上浏览大型目录。视觉浏览允许客户在精确的搜索结果之外探索产品。我们专注于一个完整的MMD系统的稍微不对称的版本,因为我们的代理可以理解文本和图像查询,但只响应图像。基于到目前为止的对话上下文,我们将“向用户展示k张最佳图像”的问题表述为从高维联合多模态嵌入空间中的高斯混合模型(GMM)中采样。联合嵌入空间通过公共表示学习来学习,并嵌入文本和图像查询。我们的系统会记住对话框的上下文,并使用探索-开发模式来辅助视觉浏览。我们在从大型目录数据合成的MMD数据集上训练和评估系统。我们的实验和初步的人类评估表明,该系统能够学习和显示相关产品,与地面真实结果的平均余弦相似度为0.85,并且能够吸引人类用户。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Multimodal Dialog for Browsing Large Visual Catalogs using Exploration-Exploitation Paradigm in a Joint Embedding Space
We present a multimodal dialog (MMD) system to assist online customers in visually browsing through large catalogs. Visual browsing allows customers to explore products beyond exact search results. We focus on a slightly asymmetric version of a complete MMD system, in that our agent can understand both text and image queries, but responds only in images. We formulate our problem of "showing the k best images to a user'', based on the dialog context so far, as sampling from a Gaussian Mixture Model (GMM) in a high dimensional joint multimodal embedding space. The joint embedding space is learned by Common Representation Learning and embeds both the text and the image queries. Our system remembers the context of the dialog, and uses an exploration-exploitation paradigm to assist in visual browsing. We train and evaluate the system on an MMD dataset that we synthesize from large catalog data. Our experiments and preliminary human evaluation show that the system is capable of learning and displaying relevant products with an average cosine similarity of 0.85 to the ground truth results, and is capable of engaging human users.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
EAGER Multimodal Multimedia Retrieval with vitrivr RobustiQ: A Robust ANN Search Method for Billion-scale Similarity Search on GPUs Improving What Cross-Modal Retrieval Models Learn through Object-Oriented Inter- and Intra-Modal Attention Networks DeepMarks: A Secure Fingerprinting Framework for Digital Rights Management of Deep Learning Models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1