基于模态对齐和模型融合的多媒体推荐中的不变表示学习。

IF 2 3区 物理与天体物理 Q2 PHYSICS, MULTIDISCIPLINARY Entropy Pub Date : 2025-01-10 DOI:10.3390/e27010056
Xinghang Hu, Haiteng Zhang
{"title":"基于模态对齐和模型融合的多媒体推荐中的不变表示学习。","authors":"Xinghang Hu, Haiteng Zhang","doi":"10.3390/e27010056","DOIUrl":null,"url":null,"abstract":"<p><p>Multimedia recommendation systems aim to accurately predict user preferences from multimodal data. However, existing methods may learn a recommendation model from spurious features, i.e., appearing to be related to an outcome but actually having no causal relationship with the outcome, leading to poor generalization ability. While previous approaches have adopted invariant learning to address this issue, they simply concatenate multimodal data without proper alignment, resulting in information loss or redundancy. To overcome these challenges, we propose a framework called M<sup>3</sup>-InvRL, designed to enhance recommendation system performance through common and modality-specific representation learning, invariant learning, and model merging. Specifically, our approach begins by learning modality-specific representations along with a common representation for each modality. To achieve this, we introduce a novel contrastive loss that aligns representations and imposes mutual information constraints to extract modality-specific features, thereby preventing generalization issues within the same representation space. Next, we generate invariant masks based on the identification of heterogeneous environments to learn invariant representations. Finally, we integrate both invariant-specific and shared invariant representations for each modality to train models and fuse them in the output space, reducing uncertainty and enhancing generalization performance. Experiments on real-world datasets demonstrate the effectiveness of our approach.</p>","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"27 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11764824/pdf/","citationCount":"0","resultStr":"{\"title\":\"Invariant Representation Learning in Multimedia Recommendation with Modality Alignment and Model Fusion.\",\"authors\":\"Xinghang Hu, Haiteng Zhang\",\"doi\":\"10.3390/e27010056\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Multimedia recommendation systems aim to accurately predict user preferences from multimodal data. However, existing methods may learn a recommendation model from spurious features, i.e., appearing to be related to an outcome but actually having no causal relationship with the outcome, leading to poor generalization ability. While previous approaches have adopted invariant learning to address this issue, they simply concatenate multimodal data without proper alignment, resulting in information loss or redundancy. To overcome these challenges, we propose a framework called M<sup>3</sup>-InvRL, designed to enhance recommendation system performance through common and modality-specific representation learning, invariant learning, and model merging. Specifically, our approach begins by learning modality-specific representations along with a common representation for each modality. To achieve this, we introduce a novel contrastive loss that aligns representations and imposes mutual information constraints to extract modality-specific features, thereby preventing generalization issues within the same representation space. Next, we generate invariant masks based on the identification of heterogeneous environments to learn invariant representations. Finally, we integrate both invariant-specific and shared invariant representations for each modality to train models and fuse them in the output space, reducing uncertainty and enhancing generalization performance. Experiments on real-world datasets demonstrate the effectiveness of our approach.</p>\",\"PeriodicalId\":11694,\"journal\":{\"name\":\"Entropy\",\"volume\":\"27 1\",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-01-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11764824/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Entropy\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.3390/e27010056\",\"RegionNum\":3,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PHYSICS, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e27010056","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

多媒体推荐系统旨在从多模态数据中准确预测用户偏好。然而,现有的方法可能会从虚假的特征中学习推荐模型,即看起来与结果相关,但实际上与结果没有因果关系,导致泛化能力差。虽然以前的方法采用不变学习来解决这个问题,但它们只是简单地连接多模态数据,而没有进行适当的对齐,从而导致信息丢失或冗余。为了克服这些挑战,我们提出了一个名为M3-InvRL的框架,旨在通过通用和特定于模态的表示学习、不变学习和模型合并来提高推荐系统的性能。具体来说,我们的方法首先学习特定于模态的表示以及每种模态的通用表示。为了实现这一点,我们引入了一种新的对比损失,它可以对齐表示并施加相互信息约束来提取特定于模态的特征,从而防止在同一表示空间内出现泛化问题。接下来,我们基于异构环境的识别生成不变掩码,以学习不变表示。最后,我们结合每个模态的特定不变表示和共享不变表示来训练模型并将它们融合到输出空间中,从而减少不确定性并提高泛化性能。在真实数据集上的实验证明了我们方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Invariant Representation Learning in Multimedia Recommendation with Modality Alignment and Model Fusion.

Multimedia recommendation systems aim to accurately predict user preferences from multimodal data. However, existing methods may learn a recommendation model from spurious features, i.e., appearing to be related to an outcome but actually having no causal relationship with the outcome, leading to poor generalization ability. While previous approaches have adopted invariant learning to address this issue, they simply concatenate multimodal data without proper alignment, resulting in information loss or redundancy. To overcome these challenges, we propose a framework called M3-InvRL, designed to enhance recommendation system performance through common and modality-specific representation learning, invariant learning, and model merging. Specifically, our approach begins by learning modality-specific representations along with a common representation for each modality. To achieve this, we introduce a novel contrastive loss that aligns representations and imposes mutual information constraints to extract modality-specific features, thereby preventing generalization issues within the same representation space. Next, we generate invariant masks based on the identification of heterogeneous environments to learn invariant representations. Finally, we integrate both invariant-specific and shared invariant representations for each modality to train models and fuse them in the output space, reducing uncertainty and enhancing generalization performance. Experiments on real-world datasets demonstrate the effectiveness of our approach.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Entropy
Entropy PHYSICS, MULTIDISCIPLINARY-
CiteScore
4.90
自引率
11.10%
发文量
1580
审稿时长
21.05 days
期刊介绍: Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.
期刊最新文献
Failure Lifetime Evaluation Based on Accelerated Generalized Wiener Degradation Process Models with Random Diffusion Coefficients. Information-Driven Rule Reduction in Belief Rule Bases for Complex System Modeling. Image Encryption Algorithm Based on a Novel Hyperchaotic Map and 3D Histogram Model. A Stylometric Analog of the Fermi-Pasta-Ulam-Tsingou Problem: Combination of Human Bias and Long-Range Correlation Creates a Sort of Soliton. Causal Structure Learning Assumptions Shape Counterfactual Safety: Expert-Guided Constraints vs. Data-Driven DAGs with Probabilistic Logic Twin Networks.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1