Yibiao Wei , Yang Xu , Lei Zhu , Jingwei Ma , Jiangping Huang
{"title":"FUMMER:用于多模态推荐的细粒度自监督动量蒸馏框架","authors":"Yibiao Wei , Yang Xu , Lei Zhu , Jingwei Ma , Jiangping Huang","doi":"10.1016/j.ipm.2024.103776","DOIUrl":null,"url":null,"abstract":"<div><p>The considerable semantic information contained in multimodal data is increasingly appreciated by industry and academia. To effectively leverage multimodal information, existing multimodal recommendation methods mainly build multimodal auxiliary graphs to improve the representation of users and items. However, the weak value density of multimodal data inevitably leads to serious noise issues, making it difficult to effectively exploit valuable information from the multimodal contents. To address this issue, we propose a novel <u>F</u>ine-grained Self-s<u>u</u>pervised <u>M</u>o<u>m</u> <u>e</u>ntum Distillation F<u>r</u>amework (FUMMER) for multimodal recommendations. Specifically, we propose a Transformer-based Fine-grained Feature Extractor (TFFE) and a Momentum Distillation (MoD) structure that incorporates intra- and inter-modal contrastive learning to fully pre-train TFFE for fine-grained feature extraction. Additionally, we design a structure-aware fine-grained contrastive learning module to fully exploit the self-supervised signals from fine-grained structural features. Extensive experiments on three real-world datasets show that our method outperforms state-of-the-art multimodal recommendation methods. Further experiments verify that the fine-grained feature extraction method we propose can serve as a pre-trained model, enhancing the performance of recommendation methods effectively by learning the fine-grained feature representations of items. The code is publicly available at <span>https://github.com/BIAOBIAO12138/FUMMER</span><svg><path></path></svg>.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4000,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FUMMER: A fine-grained self-supervised momentum distillation framework for multimodal recommendation\",\"authors\":\"Yibiao Wei , Yang Xu , Lei Zhu , Jingwei Ma , Jiangping Huang\",\"doi\":\"10.1016/j.ipm.2024.103776\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The considerable semantic information contained in multimodal data is increasingly appreciated by industry and academia. To effectively leverage multimodal information, existing multimodal recommendation methods mainly build multimodal auxiliary graphs to improve the representation of users and items. However, the weak value density of multimodal data inevitably leads to serious noise issues, making it difficult to effectively exploit valuable information from the multimodal contents. To address this issue, we propose a novel <u>F</u>ine-grained Self-s<u>u</u>pervised <u>M</u>o<u>m</u> <u>e</u>ntum Distillation F<u>r</u>amework (FUMMER) for multimodal recommendations. Specifically, we propose a Transformer-based Fine-grained Feature Extractor (TFFE) and a Momentum Distillation (MoD) structure that incorporates intra- and inter-modal contrastive learning to fully pre-train TFFE for fine-grained feature extraction. Additionally, we design a structure-aware fine-grained contrastive learning module to fully exploit the self-supervised signals from fine-grained structural features. Extensive experiments on three real-world datasets show that our method outperforms state-of-the-art multimodal recommendation methods. Further experiments verify that the fine-grained feature extraction method we propose can serve as a pre-trained model, enhancing the performance of recommendation methods effectively by learning the fine-grained feature representations of items. The code is publicly available at <span>https://github.com/BIAOBIAO12138/FUMMER</span><svg><path></path></svg>.</p></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.4000,\"publicationDate\":\"2024-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457324001365\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324001365","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
FUMMER: A fine-grained self-supervised momentum distillation framework for multimodal recommendation
The considerable semantic information contained in multimodal data is increasingly appreciated by industry and academia. To effectively leverage multimodal information, existing multimodal recommendation methods mainly build multimodal auxiliary graphs to improve the representation of users and items. However, the weak value density of multimodal data inevitably leads to serious noise issues, making it difficult to effectively exploit valuable information from the multimodal contents. To address this issue, we propose a novel Fine-grained Self-supervised Momentum Distillation Framework (FUMMER) for multimodal recommendations. Specifically, we propose a Transformer-based Fine-grained Feature Extractor (TFFE) and a Momentum Distillation (MoD) structure that incorporates intra- and inter-modal contrastive learning to fully pre-train TFFE for fine-grained feature extraction. Additionally, we design a structure-aware fine-grained contrastive learning module to fully exploit the self-supervised signals from fine-grained structural features. Extensive experiments on three real-world datasets show that our method outperforms state-of-the-art multimodal recommendation methods. Further experiments verify that the fine-grained feature extraction method we propose can serve as a pre-trained model, enhancing the performance of recommendation methods effectively by learning the fine-grained feature representations of items. The code is publicly available at https://github.com/BIAOBIAO12138/FUMMER.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.