Multi-modal Generative Models in Recommendation System

Arnau Ramisa, Rene Vidal, Yashar Deldjoo, Zhankui He, Julian McAuley, Anton Korikov, Scott Sanner, Mahesh Sathiamoorthy, Atoosa Kasrizadeh, Silvia Milano, Francesco Ricci
{"title":"Multi-modal Generative Models in Recommendation System","authors":"Arnau Ramisa, Rene Vidal, Yashar Deldjoo, Zhankui He, Julian McAuley, Anton Korikov, Scott Sanner, Mahesh Sathiamoorthy, Atoosa Kasrizadeh, Silvia Milano, Francesco Ricci","doi":"arxiv-2409.10993","DOIUrl":null,"url":null,"abstract":"Many recommendation systems limit user inputs to text strings or behavior\nsignals such as clicks and purchases, and system outputs to a list of products\nsorted by relevance. With the advent of generative AI, users have come to\nexpect richer levels of interactions. In visual search, for example, a user may\nprovide a picture of their desired product along with a natural language\nmodification of the content of the picture (e.g., a dress like the one shown in\nthe picture but in red color). Moreover, users may want to better understand\nthe recommendations they receive by visualizing how the product fits their use\ncase, e.g., with a representation of how a garment might look on them, or how a\nfurniture item might look in their room. Such advanced levels of interaction\nrequire recommendation systems that are able to discover both shared and\ncomplementary information about the product across modalities, and visualize\nthe product in a realistic and informative way. However, existing systems often\ntreat multiple modalities independently: text search is usually done by\ncomparing the user query to product titles and descriptions, while visual\nsearch is typically done by comparing an image provided by the customer to\nproduct images. We argue that future recommendation systems will benefit from a\nmulti-modal understanding of the products that leverages the rich information\nretailers have about both customers and products to come up with the best\nrecommendations. In this chapter we review recommendation systems that use\nmultiple data modalities simultaneously.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"11 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10993","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Many recommendation systems limit user inputs to text strings or behavior signals such as clicks and purchases, and system outputs to a list of products sorted by relevance. With the advent of generative AI, users have come to expect richer levels of interactions. In visual search, for example, a user may provide a picture of their desired product along with a natural language modification of the content of the picture (e.g., a dress like the one shown in the picture but in red color). Moreover, users may want to better understand the recommendations they receive by visualizing how the product fits their use case, e.g., with a representation of how a garment might look on them, or how a furniture item might look in their room. Such advanced levels of interaction require recommendation systems that are able to discover both shared and complementary information about the product across modalities, and visualize the product in a realistic and informative way. However, existing systems often treat multiple modalities independently: text search is usually done by comparing the user query to product titles and descriptions, while visual search is typically done by comparing an image provided by the customer to product images. We argue that future recommendation systems will benefit from a multi-modal understanding of the products that leverages the rich information retailers have about both customers and products to come up with the best recommendations. In this chapter we review recommendation systems that use multiple data modalities simultaneously.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
推荐系统中的多模式生成模型
许多推荐系统将用户输入限制为文本字符串或行为信号(如点击和购买),系统输出为按相关性排序的产品列表。随着生成式人工智能的出现,用户开始期待更丰富的交互。例如,在视觉搜索中,用户可能会提供一张所需的产品图片,并用自然语言对图片内容进行修改(例如,提供一件与图片中相似的红色连衣裙)。此外,用户可能希望通过可视化方式更好地理解所收到的推荐,例如,展示服装穿在身上的效果,或家具摆放在房间里的效果。这种高级别的交互要求推荐系统能够发现跨模式的产品共享信息和互补信息,并以逼真和信息丰富的方式将产品可视化。然而,现有的系统往往将多种模式分开处理:文本搜索通常是通过将用户查询与产品标题和描述进行比较来完成的,而视觉搜索通常是通过将客户提供的图片与产品图片进行比较来完成的。我们认为,未来的推荐系统将受益于对产品的多模式理解,这种理解可以利用零售商所拥有的关于顾客和产品的丰富信息,从而提出最佳推荐。在本章中,我们将回顾同时使用多种数据模式的推荐系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference Retrieve, Annotate, Evaluate, Repeat: Leveraging Multimodal LLMs for Large-Scale Product Retrieval Evaluation Active Reconfigurable Intelligent Surface Empowered Synthetic Aperture Radar Imaging FLARE: Fusing Language Models and Collaborative Architectures for Recommender Enhancement Basket-Enhanced Heterogenous Hypergraph for Price-Sensitive Next Basket Recommendation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1