Learning to Rebalance Multi-Modal Optimization by Adaptively Masking Subnetworks

IF 18.6 IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-03-06 DOI:10.1109/TPAMI.2025.3547417

Yang Yang;Hongpeng Pan;Qing-Yuan Jiang;Yi Xu;Jinhui Tang

{"title":"Learning to Rebalance Multi-Modal Optimization by Adaptively Masking Subnetworks","authors":"Yang Yang;Hongpeng Pan;Qing-Yuan Jiang;Yi Xu;Jinhui Tang","doi":"10.1109/TPAMI.2025.3547417","DOIUrl":null,"url":null,"abstract":"Multi-modal learning aims to enhance performance by unifying models from various modalities but often faces the “modality imbalance” problem in real data, leading to a bias towards dominant modalities and neglecting others, thereby limiting its overall effectiveness. To address this challenge, the core idea is to balance the optimization of each modality to achieve a joint optimum. Existing approaches often employ a modal-level control mechanism for adjusting the update of each modal parameter. However, such a global-wise updating mechanism ignores the different importance of each parameter. Inspired by subnetwork optimization, we explore a uniform sampling-based optimization strategy and find it more effective than global-wise updating. According to the findings, we further propose a novel importance sampling-based, element-wise joint optimization method, called <underline>A</u>daptively <underline>M</u>ask <underline>S</u>ubnetworks Considering Modal <underline>S</u>ignificance (AMSS). Specifically, we incorporate mutual information rates to determine the modal significance and employ non-uniform adaptive sampling to select foreground subnetworks from each modality for parameter updates, thereby rebalancing multi-modal learning. Additionally, we demonstrate the reliability of the AMSS strategy through convergence analysis. Building upon theoretical insights, we further enhance the multi-modal mask subnetwork strategy using unbiased estimation, referred to as AMSS+. Extensive experiments reveal the superiority of our approach over comparison methods.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 6","pages":"4553-4566"},"PeriodicalIF":18.6000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10915567/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Multi-modal learning aims to enhance performance by unifying models from various modalities but often faces the “modality imbalance” problem in real data, leading to a bias towards dominant modalities and neglecting others, thereby limiting its overall effectiveness. To address this challenge, the core idea is to balance the optimization of each modality to achieve a joint optimum. Existing approaches often employ a modal-level control mechanism for adjusting the update of each modal parameter. However, such a global-wise updating mechanism ignores the different importance of each parameter. Inspired by subnetwork optimization, we explore a uniform sampling-based optimization strategy and find it more effective than global-wise updating. According to the findings, we further propose a novel importance sampling-based, element-wise joint optimization method, called Adaptively Mask Subnetworks Considering Modal Significance (AMSS). Specifically, we incorporate mutual information rates to determine the modal significance and employ non-uniform adaptive sampling to select foreground subnetworks from each modality for parameter updates, thereby rebalancing multi-modal learning. Additionally, we demonstrate the reliability of the AMSS strategy through convergence analysis. Building upon theoretical insights, we further enhance the multi-modal mask subnetwork strategy using unbiased estimation, referred to as AMSS+. Extensive experiments reveal the superiority of our approach over comparison methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过自适应屏蔽子网学会重新平衡多模式优化

多模态学习旨在通过统一各种模态的模型来提高性能，但在实际数据中往往面临“模态失衡”问题，导致偏向主导模态而忽视其他模态，从而限制了其整体有效性。为了应对这一挑战，核心思想是平衡每种模态的优化，以实现联合最优。现有的方法通常采用模型级控制机制来调整每个模态参数的更新。然而，这种全局更新机制忽略了每个参数的不同重要性。受子网优化的启发，我们探索了一种基于均匀抽样的优化策略，发现它比全局更新更有效。根据研究结果，我们进一步提出了一种新的基于重要性采样的元素智能联合优化方法，称为考虑模态显著性的自适应掩码子网（AMSS）。具体来说，我们结合互信息率来确定模态显著性，并采用非均匀自适应采样从每个模态中选择前景子网络进行参数更新，从而重新平衡多模态学习。此外，我们通过收敛分析证明了AMSS策略的可靠性。在理论见解的基础上，我们使用无偏估计进一步增强了多模态掩码子网策略，称为AMSS+。大量的实验表明，我们的方法优于比较方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量