FUMMER: A fine-grained self-supervised momentum distillation framework for multimodal recommendation

IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Information Processing & Management Pub Date : 2024-05-22 DOI:10.1016/j.ipm.2024.103776
Yibiao Wei , Yang Xu , Lei Zhu , Jingwei Ma , Jiangping Huang
{"title":"FUMMER: A fine-grained self-supervised momentum distillation framework for multimodal recommendation","authors":"Yibiao Wei ,&nbsp;Yang Xu ,&nbsp;Lei Zhu ,&nbsp;Jingwei Ma ,&nbsp;Jiangping Huang","doi":"10.1016/j.ipm.2024.103776","DOIUrl":null,"url":null,"abstract":"<div><p>The considerable semantic information contained in multimodal data is increasingly appreciated by industry and academia. To effectively leverage multimodal information, existing multimodal recommendation methods mainly build multimodal auxiliary graphs to improve the representation of users and items. However, the weak value density of multimodal data inevitably leads to serious noise issues, making it difficult to effectively exploit valuable information from the multimodal contents. To address this issue, we propose a novel <u>F</u>ine-grained Self-s<u>u</u>pervised <u>M</u>o<u>m</u> <u>e</u>ntum Distillation F<u>r</u>amework (FUMMER) for multimodal recommendations. Specifically, we propose a Transformer-based Fine-grained Feature Extractor (TFFE) and a Momentum Distillation (MoD) structure that incorporates intra- and inter-modal contrastive learning to fully pre-train TFFE for fine-grained feature extraction. Additionally, we design a structure-aware fine-grained contrastive learning module to fully exploit the self-supervised signals from fine-grained structural features. Extensive experiments on three real-world datasets show that our method outperforms state-of-the-art multimodal recommendation methods. Further experiments verify that the fine-grained feature extraction method we propose can serve as a pre-trained model, enhancing the performance of recommendation methods effectively by learning the fine-grained feature representations of items. The code is publicly available at <span>https://github.com/BIAOBIAO12138/FUMMER</span><svg><path></path></svg>.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4000,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324001365","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

The considerable semantic information contained in multimodal data is increasingly appreciated by industry and academia. To effectively leverage multimodal information, existing multimodal recommendation methods mainly build multimodal auxiliary graphs to improve the representation of users and items. However, the weak value density of multimodal data inevitably leads to serious noise issues, making it difficult to effectively exploit valuable information from the multimodal contents. To address this issue, we propose a novel Fine-grained Self-supervised Mom entum Distillation Framework (FUMMER) for multimodal recommendations. Specifically, we propose a Transformer-based Fine-grained Feature Extractor (TFFE) and a Momentum Distillation (MoD) structure that incorporates intra- and inter-modal contrastive learning to fully pre-train TFFE for fine-grained feature extraction. Additionally, we design a structure-aware fine-grained contrastive learning module to fully exploit the self-supervised signals from fine-grained structural features. Extensive experiments on three real-world datasets show that our method outperforms state-of-the-art multimodal recommendation methods. Further experiments verify that the fine-grained feature extraction method we propose can serve as a pre-trained model, enhancing the performance of recommendation methods effectively by learning the fine-grained feature representations of items. The code is publicly available at https://github.com/BIAOBIAO12138/FUMMER.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
FUMMER:用于多模态推荐的细粒度自监督动量蒸馏框架
多模态数据所包含的大量语义信息越来越受到业界和学术界的重视。为了有效利用多模态信息,现有的多模态推荐方法主要是建立多模态辅助图来改进用户和项目的表示。然而,由于多模态数据的值密度较弱,不可避免地会产生严重的噪声问题,从而难以有效利用多模态内容中的有价值信息。为解决这一问题,我们提出了一种用于多模态推荐的新型细粒度自监督矩阵蒸馏框架(FUMMER)。具体来说,我们提出了一种基于变换器的细粒度特征提取器(TFFE)和一种动量蒸馏(MoD)结构,该结构结合了模式内和模式间的对比学习,可对 TFFE 进行充分预训练,以进行细粒度特征提取。此外,我们还设计了一个结构感知细粒度对比学习模块,以充分利用来自细粒度结构特征的自监督信号。在三个真实世界数据集上进行的广泛实验表明,我们的方法优于最先进的多模态推荐方法。进一步的实验验证了我们提出的细粒度特征提取方法可以作为预训练模型,通过学习项目的细粒度特征表征来有效提高推荐方法的性能。代码可在 https://github.com/BIAOBIAO12138/FUMMER 公开获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Information Processing & Management
Information Processing & Management 工程技术-计算机:信息系统
CiteScore
17.00
自引率
11.60%
发文量
276
审稿时长
39 days
期刊介绍: Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.
期刊最新文献
Fusing temporal and semantic dependencies for session-based recommendation A Universal Adaptive Algorithm for Graph Anomaly Detection A context-aware attention and graph neural network-based multimodal framework for misogyny detection Multi-granularity contrastive zero-shot learning model based on attribute decomposition Asymmetric augmented paradigm-based graph neural architecture search
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1