FUMMER: A fine-grained self-supervised momentum distillation framework for multimodal recommendation

IF 6.9 1区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Information Processing & Management Pub Date : 2024-09-01 Epub Date: 2024-05-22 DOI:10.1016/j.ipm.2024.103776

Yibiao Wei , Yang Xu , Lei Zhu , Jingwei Ma , Jiangping Huang

{"title":"FUMMER: A fine-grained self-supervised momentum distillation framework for multimodal recommendation","authors":"Yibiao Wei , Yang Xu , Lei Zhu , Jingwei Ma , Jiangping Huang","doi":"10.1016/j.ipm.2024.103776","DOIUrl":null,"url":null,"abstract":"<div><p>The considerable semantic information contained in multimodal data is increasingly appreciated by industry and academia. To effectively leverage multimodal information, existing multimodal recommendation methods mainly build multimodal auxiliary graphs to improve the representation of users and items. However, the weak value density of multimodal data inevitably leads to serious noise issues, making it difficult to effectively exploit valuable information from the multimodal contents. To address this issue, we propose a novel <u>F</u>ine-grained Self-s<u>u</u>pervised <u>M</u>o<u>m</u> <u>e</u>ntum Distillation F<u>r</u>amework (FUMMER) for multimodal recommendations. Specifically, we propose a Transformer-based Fine-grained Feature Extractor (TFFE) and a Momentum Distillation (MoD) structure that incorporates intra- and inter-modal contrastive learning to fully pre-train TFFE for fine-grained feature extraction. Additionally, we design a structure-aware fine-grained contrastive learning module to fully exploit the self-supervised signals from fine-grained structural features. Extensive experiments on three real-world datasets show that our method outperforms state-of-the-art multimodal recommendation methods. Further experiments verify that the fine-grained feature extraction method we propose can serve as a pre-trained model, enhancing the performance of recommendation methods effectively by learning the fine-grained feature representations of items. The code is publicly available at <span>https://github.com/BIAOBIAO12138/FUMMER</span><svg><path></path></svg>.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"61 5","pages":"Article 103776"},"PeriodicalIF":6.9000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324001365","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/5/22 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The considerable semantic information contained in multimodal data is increasingly appreciated by industry and academia. To effectively leverage multimodal information, existing multimodal recommendation methods mainly build multimodal auxiliary graphs to improve the representation of users and items. However, the weak value density of multimodal data inevitably leads to serious noise issues, making it difficult to effectively exploit valuable information from the multimodal contents. To address this issue, we propose a novel Fine-grained Self-supervised Mom entum Distillation Framework (FUMMER) for multimodal recommendations. Specifically, we propose a Transformer-based Fine-grained Feature Extractor (TFFE) and a Momentum Distillation (MoD) structure that incorporates intra- and inter-modal contrastive learning to fully pre-train TFFE for fine-grained feature extraction. Additionally, we design a structure-aware fine-grained contrastive learning module to fully exploit the self-supervised signals from fine-grained structural features. Extensive experiments on three real-world datasets show that our method outperforms state-of-the-art multimodal recommendation methods. Further experiments verify that the fine-grained feature extraction method we propose can serve as a pre-trained model, enhancing the performance of recommendation methods effectively by learning the fine-grained feature representations of items. The code is publicly available at https://github.com/BIAOBIAO12138/FUMMER.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

FUMMER：用于多模态推荐的细粒度自监督动量蒸馏框架

多模态数据所包含的大量语义信息越来越受到业界和学术界的重视。为了有效利用多模态信息，现有的多模态推荐方法主要是建立多模态辅助图来改进用户和项目的表示。然而，由于多模态数据的值密度较弱，不可避免地会产生严重的噪声问题，从而难以有效利用多模态内容中的有价值信息。为解决这一问题，我们提出了一种用于多模态推荐的新型细粒度自监督矩阵蒸馏框架（FUMMER）。具体来说，我们提出了一种基于变换器的细粒度特征提取器（TFFE）和一种动量蒸馏（MoD）结构，该结构结合了模式内和模式间的对比学习，可对 TFFE 进行充分预训练，以进行细粒度特征提取。此外，我们还设计了一个结构感知细粒度对比学习模块，以充分利用来自细粒度结构特征的自监督信号。在三个真实世界数据集上进行的广泛实验表明，我们的方法优于最先进的多模态推荐方法。进一步的实验验证了我们提出的细粒度特征提取方法可以作为预训练模型，通过学习项目的细粒度特征表征来有效提高推荐方法的性能。代码可在 https://github.com/BIAOBIAO12138/FUMMER 公开获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Information Processing & Management 工程技术-计算机：信息系统

CiteScore

17.00

自引率

11.60%

发文量

276

审稿时长

39 days

期刊介绍： Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.