Learning to Rebalance Multi-Modal Optimization by Adaptively Masking Subnetworks

Yang Yang;Hongpeng Pan;Qing-Yuan Jiang;Yi Xu;Jinhui Tang
{"title":"Learning to Rebalance Multi-Modal Optimization by Adaptively Masking Subnetworks","authors":"Yang Yang;Hongpeng Pan;Qing-Yuan Jiang;Yi Xu;Jinhui Tang","doi":"10.1109/TPAMI.2025.3547417","DOIUrl":null,"url":null,"abstract":"Multi-modal learning aims to enhance performance by unifying models from various modalities but often faces the “modality imbalance” problem in real data, leading to a bias towards dominant modalities and neglecting others, thereby limiting its overall effectiveness. To address this challenge, the core idea is to balance the optimization of each modality to achieve a joint optimum. Existing approaches often employ a modal-level control mechanism for adjusting the update of each modal parameter. However, such a global-wise updating mechanism ignores the different importance of each parameter. Inspired by subnetwork optimization, we explore a uniform sampling-based optimization strategy and find it more effective than global-wise updating. According to the findings, we further propose a novel importance sampling-based, element-wise joint optimization method, called <underline>A</u>daptively <underline>M</u>ask <underline>S</u>ubnetworks Considering Modal <underline>S</u>ignificance (AMSS). Specifically, we incorporate mutual information rates to determine the modal significance and employ non-uniform adaptive sampling to select foreground subnetworks from each modality for parameter updates, thereby rebalancing multi-modal learning. Additionally, we demonstrate the reliability of the AMSS strategy through convergence analysis. Building upon theoretical insights, we further enhance the multi-modal mask subnetwork strategy using unbiased estimation, referred to as AMSS+. Extensive experiments reveal the superiority of our approach over comparison methods.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 6","pages":"4553-4566"},"PeriodicalIF":18.6000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10915567/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Multi-modal learning aims to enhance performance by unifying models from various modalities but often faces the “modality imbalance” problem in real data, leading to a bias towards dominant modalities and neglecting others, thereby limiting its overall effectiveness. To address this challenge, the core idea is to balance the optimization of each modality to achieve a joint optimum. Existing approaches often employ a modal-level control mechanism for adjusting the update of each modal parameter. However, such a global-wise updating mechanism ignores the different importance of each parameter. Inspired by subnetwork optimization, we explore a uniform sampling-based optimization strategy and find it more effective than global-wise updating. According to the findings, we further propose a novel importance sampling-based, element-wise joint optimization method, called Adaptively Mask Subnetworks Considering Modal Significance (AMSS). Specifically, we incorporate mutual information rates to determine the modal significance and employ non-uniform adaptive sampling to select foreground subnetworks from each modality for parameter updates, thereby rebalancing multi-modal learning. Additionally, we demonstrate the reliability of the AMSS strategy through convergence analysis. Building upon theoretical insights, we further enhance the multi-modal mask subnetwork strategy using unbiased estimation, referred to as AMSS+. Extensive experiments reveal the superiority of our approach over comparison methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过自适应屏蔽子网学会重新平衡多模式优化
多模态学习旨在通过统一各种模态的模型来提高性能,但在实际数据中往往面临“模态失衡”问题,导致偏向主导模态而忽视其他模态,从而限制了其整体有效性。为了应对这一挑战,核心思想是平衡每种模态的优化,以实现联合最优。现有的方法通常采用模型级控制机制来调整每个模态参数的更新。然而,这种全局更新机制忽略了每个参数的不同重要性。受子网优化的启发,我们探索了一种基于均匀抽样的优化策略,发现它比全局更新更有效。根据研究结果,我们进一步提出了一种新的基于重要性采样的元素智能联合优化方法,称为考虑模态显著性的自适应掩码子网(AMSS)。具体来说,我们结合互信息率来确定模态显著性,并采用非均匀自适应采样从每个模态中选择前景子网络进行参数更新,从而重新平衡多模态学习。此外,我们通过收敛分析证明了AMSS策略的可靠性。在理论见解的基础上,我们使用无偏估计进一步增强了多模态掩码子网策略,称为AMSS+。大量的实验表明,我们的方法优于比较方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Spike Camera Optical Flow Estimation Based on Continuous Spike Streams. Bi-C2R: Bidirectional Continual Compatible Representation for Re-Indexing Free Lifelong Person Re-Identification. Modality Equilibrium Matters: Minor-Modality-Aware Adaptive Alternating for Cross-Modal Memory Enhancement. Principled Multimodal Representation Learning. Class-Distribution-Aware Pseudo-Labeling for Semi-Supervised Multi-Label Learning.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1