基于模态对齐和模型融合的多媒体推荐中的不变表示学习。

IF 2 3区物理与天体物理 Q2 PHYSICS, MULTIDISCIPLINARY Entropy Pub Date : 2025-01-10 DOI:10.3390/e27010056

Xinghang Hu, Haiteng Zhang

{"title":"基于模态对齐和模型融合的多媒体推荐中的不变表示学习。","authors":"Xinghang Hu, Haiteng Zhang","doi":"10.3390/e27010056","DOIUrl":null,"url":null,"abstract":"Multimedia recommendation systems aim to accurately predict user preferences from multimodal data. However, existing methods may learn a recommendation model from spurious features, i.e., appearing to be related to an outcome but actually having no causal relationship with the outcome, leading to poor generalization ability. While previous approaches have adopted invariant learning to address this issue, they simply concatenate multimodal data without proper alignment, resulting in information loss or redundancy. To overcome these challenges, we propose a framework called M3-InvRL, designed to enhance recommendation system performance through common and modality-specific representation learning, invariant learning, and model merging. Specifically, our approach begins by learning modality-specific representations along with a common representation for each modality. To achieve this, we introduce a novel contrastive loss that aligns representations and imposes mutual information constraints to extract modality-specific features, thereby preventing generalization issues within the same representation space. Next, we generate invariant masks based on the identification of heterogeneous environments to learn invariant representations. Finally, we integrate both invariant-specific and shared invariant representations for each modality to train models and fuse them in the output space, reducing uncertainty and enhancing generalization performance. Experiments on real-world datasets demonstrate the effectiveness of our approach.","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"27 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11764824/pdf/","citationCount":"0","resultStr":"{\"title\":\"Invariant Representation Learning in Multimedia Recommendation with Modality Alignment and Model Fusion.\",\"authors\":\"Xinghang Hu, Haiteng Zhang\",\"doi\":\"10.3390/e27010056\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multimedia recommendation systems aim to accurately predict user preferences from multimodal data. However, existing methods may learn a recommendation model from spurious features, i.e., appearing to be related to an outcome but actually having no causal relationship with the outcome, leading to poor generalization ability. While previous approaches have adopted invariant learning to address this issue, they simply concatenate multimodal data without proper alignment, resulting in information loss or redundancy. To overcome these challenges, we propose a framework called M3-InvRL, designed to enhance recommendation system performance through common and modality-specific representation learning, invariant learning, and model merging. Specifically, our approach begins by learning modality-specific representations along with a common representation for each modality. To achieve this, we introduce a novel contrastive loss that aligns representations and imposes mutual information constraints to extract modality-specific features, thereby preventing generalization issues within the same representation space. Next, we generate invariant masks based on the identification of heterogeneous environments to learn invariant representations. Finally, we integrate both invariant-specific and shared invariant representations for each modality to train models and fuse them in the output space, reducing uncertainty and enhancing generalization performance. Experiments on real-world datasets demonstrate the effectiveness of our approach.\",\"PeriodicalId\":11694,\"journal\":{\"name\":\"Entropy\",\"volume\":\"27 1\",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-01-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11764824/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Entropy\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.3390/e27010056\",\"RegionNum\":3,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PHYSICS, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e27010056","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

多媒体推荐系统旨在从多模态数据中准确预测用户偏好。然而，现有的方法可能会从虚假的特征中学习推荐模型，即看起来与结果相关，但实际上与结果没有因果关系，导致泛化能力差。虽然以前的方法采用不变学习来解决这个问题，但它们只是简单地连接多模态数据，而没有进行适当的对齐，从而导致信息丢失或冗余。为了克服这些挑战，我们提出了一个名为M3-InvRL的框架，旨在通过通用和特定于模态的表示学习、不变学习和模型合并来提高推荐系统的性能。具体来说，我们的方法首先学习特定于模态的表示以及每种模态的通用表示。为了实现这一点，我们引入了一种新的对比损失，它可以对齐表示并施加相互信息约束来提取特定于模态的特征，从而防止在同一表示空间内出现泛化问题。接下来，我们基于异构环境的识别生成不变掩码，以学习不变表示。最后，我们结合每个模态的特定不变表示和共享不变表示来训练模型并将它们融合到输出空间中，从而减少不确定性并提高泛化性能。在真实数据集上的实验证明了我们方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Invariant Representation Learning in Multimedia Recommendation with Modality Alignment and Model Fusion.

Multimedia recommendation systems aim to accurately predict user preferences from multimodal data. However, existing methods may learn a recommendation model from spurious features, i.e., appearing to be related to an outcome but actually having no causal relationship with the outcome, leading to poor generalization ability. While previous approaches have adopted invariant learning to address this issue, they simply concatenate multimodal data without proper alignment, resulting in information loss or redundancy. To overcome these challenges, we propose a framework called M³-InvRL, designed to enhance recommendation system performance through common and modality-specific representation learning, invariant learning, and model merging. Specifically, our approach begins by learning modality-specific representations along with a common representation for each modality. To achieve this, we introduce a novel contrastive loss that aligns representations and imposes mutual information constraints to extract modality-specific features, thereby preventing generalization issues within the same representation space. Next, we generate invariant masks based on the identification of heterogeneous environments to learn invariant representations. Finally, we integrate both invariant-specific and shared invariant representations for each modality to train models and fuse them in the output space, reducing uncertainty and enhancing generalization performance. Experiments on real-world datasets demonstrate the effectiveness of our approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Entropy PHYSICS, MULTIDISCIPLINARY-

CiteScore

4.90

自引率

11.10%

发文量

1580

审稿时长

21.05 days

期刊介绍： Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.