在联合嵌入空间中使用探索-开发范式浏览大型视觉目录的多模态对话

Proceedings of the 2019 on International Conference on Multimedia Retrieval Pub Date : 2019-01-28 DOI:10.1145/3323873.3325036

Indrani Bhattacharya, Arkabandhu Chowdhury, V. Raykar

{"title":"在联合嵌入空间中使用探索-开发范式浏览大型视觉目录的多模态对话","authors":"Indrani Bhattacharya, Arkabandhu Chowdhury, V. Raykar","doi":"10.1145/3323873.3325036","DOIUrl":null,"url":null,"abstract":"We present a multimodal dialog (MMD) system to assist online customers in visually browsing through large catalogs. Visual browsing allows customers to explore products beyond exact search results. We focus on a slightly asymmetric version of a complete MMD system, in that our agent can understand both text and image queries, but responds only in images. We formulate our problem of \"showing the k best images to a user'', based on the dialog context so far, as sampling from a Gaussian Mixture Model (GMM) in a high dimensional joint multimodal embedding space. The joint embedding space is learned by Common Representation Learning and embeds both the text and the image queries. Our system remembers the context of the dialog, and uses an exploration-exploitation paradigm to assist in visual browsing. We train and evaluate the system on an MMD dataset that we synthesize from large catalog data. Our experiments and preliminary human evaluation show that the system is capable of learning and displaying relevant products with an average cosine similarity of 0.85 to the ground truth results, and is capable of engaging human users.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Multimodal Dialog for Browsing Large Visual Catalogs using Exploration-Exploitation Paradigm in a Joint Embedding Space\",\"authors\":\"Indrani Bhattacharya, Arkabandhu Chowdhury, V. Raykar\",\"doi\":\"10.1145/3323873.3325036\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a multimodal dialog (MMD) system to assist online customers in visually browsing through large catalogs. Visual browsing allows customers to explore products beyond exact search results. We focus on a slightly asymmetric version of a complete MMD system, in that our agent can understand both text and image queries, but responds only in images. We formulate our problem of \\\"showing the k best images to a user'', based on the dialog context so far, as sampling from a Gaussian Mixture Model (GMM) in a high dimensional joint multimodal embedding space. The joint embedding space is learned by Common Representation Learning and embeds both the text and the image queries. Our system remembers the context of the dialog, and uses an exploration-exploitation paradigm to assist in visual browsing. We train and evaluate the system on an MMD dataset that we synthesize from large catalog data. Our experiments and preliminary human evaluation show that the system is capable of learning and displaying relevant products with an average cosine similarity of 0.85 to the ground truth results, and is capable of engaging human users.\",\"PeriodicalId\":149041,\"journal\":{\"name\":\"Proceedings of the 2019 on International Conference on Multimedia Retrieval\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-01-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2019 on International Conference on Multimedia Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3323873.3325036\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3323873.3325036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

我们提出了一个多模式对话(MMD)系统，以帮助在线客户在视觉上浏览大型目录。视觉浏览允许客户在精确的搜索结果之外探索产品。我们专注于一个完整的MMD系统的稍微不对称的版本，因为我们的代理可以理解文本和图像查询，但只响应图像。基于到目前为止的对话上下文，我们将“向用户展示k张最佳图像”的问题表述为从高维联合多模态嵌入空间中的高斯混合模型(GMM)中采样。联合嵌入空间通过公共表示学习来学习，并嵌入文本和图像查询。我们的系统会记住对话框的上下文，并使用探索-开发模式来辅助视觉浏览。我们在从大型目录数据合成的MMD数据集上训练和评估系统。我们的实验和初步的人类评估表明，该系统能够学习和显示相关产品，与地面真实结果的平均余弦相似度为0.85，并且能够吸引人类用户。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Multimodal Dialog for Browsing Large Visual Catalogs using Exploration-Exploitation Paradigm in a Joint Embedding Space

We present a multimodal dialog (MMD) system to assist online customers in visually browsing through large catalogs. Visual browsing allows customers to explore products beyond exact search results. We focus on a slightly asymmetric version of a complete MMD system, in that our agent can understand both text and image queries, but responds only in images. We formulate our problem of "showing the k best images to a user'', based on the dialog context so far, as sampling from a Gaussian Mixture Model (GMM) in a high dimensional joint multimodal embedding space. The joint embedding space is learned by Common Representation Learning and embeds both the text and the image queries. Our system remembers the context of the dialog, and uses an exploration-exploitation paradigm to assist in visual browsing. We train and evaluate the system on an MMD dataset that we synthesize from large catalog data. Our experiments and preliminary human evaluation show that the system is capable of learning and displaying relevant products with an average cosine similarity of 0.85 to the ground truth results, and is capable of engaging human users.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2019 on International Conference on Multimedia Retrieval

自引率

0.00%

发文量

期刊最新文献

EAGER Multimodal Multimedia Retrieval with vitrivr RobustiQ: A Robust ANN Search Method for Billion-scale Similarity Search on GPUs Improving What Cross-Modal Retrieval Models Learn through Object-Oriented Inter- and Intra-Modal Attention Networks DeepMarks: A Secure Fingerprinting Framework for Digital Rights Management of Deep Learning Models