On-the-Fly Modulation for Balanced Multimodal Learning

IF 18.6 IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-25 DOI:10.1109/TPAMI.2024.3468315

Yake Wei;Di Hu;Henghui Du;Ji-Rong Wen

{"title":"On-the-Fly Modulation for Balanced Multimodal Learning","authors":"Yake Wei;Di Hu;Henghui Du;Ji-Rong Wen","doi":"10.1109/TPAMI.2024.3468315","DOIUrl":null,"url":null,"abstract":"Multimodal learning is expected to boost model performance by integrating information from different modalities. However, its potential is not fully exploited because the widely-used joint training strategy, which has a uniform objective for all modalities, leads to imbalanced and under-optimized uni-modal representations. Specifically, we point out that there often exists modality with more discriminative information, e.g., vision of \n<italic>playing football</i>\n and sound of \n<italic>blowing wind</i>\n. They could dominate the joint training process, resulting in other modalities being significantly under-optimized. To alleviate this problem, we first analyze the under-optimized phenomenon from both the feed-forward and the back-propagation stages during optimization. Then, On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies are proposed to modulate the optimization of each modality, by monitoring the discriminative discrepancy between modalities during training. Concretely, OPM weakens the influence of the dominant modality by dropping its feature with dynamical probability in the feed-forward stage, while OGM mitigates its gradient in the back-propagation stage. In experiments, our methods demonstrate considerable improvement across a variety of multimodal tasks. These simple yet effective strategies not only enhance performance in vanilla and task-oriented multimodal models, but also in more complex multimodal tasks, showcasing their effectiveness and flexibility.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 1","pages":"469-485"},"PeriodicalIF":18.6000,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10694738/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Multimodal learning is expected to boost model performance by integrating information from different modalities. However, its potential is not fully exploited because the widely-used joint training strategy, which has a uniform objective for all modalities, leads to imbalanced and under-optimized uni-modal representations. Specifically, we point out that there often exists modality with more discriminative information, e.g., vision of playing football and sound of blowing wind . They could dominate the joint training process, resulting in other modalities being significantly under-optimized. To alleviate this problem, we first analyze the under-optimized phenomenon from both the feed-forward and the back-propagation stages during optimization. Then, On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies are proposed to modulate the optimization of each modality, by monitoring the discriminative discrepancy between modalities during training. Concretely, OPM weakens the influence of the dominant modality by dropping its feature with dynamical probability in the feed-forward stage, while OGM mitigates its gradient in the back-propagation stage. In experiments, our methods demonstrate considerable improvement across a variety of multimodal tasks. These simple yet effective strategies not only enhance performance in vanilla and task-oriented multimodal models, but also in more complex multimodal tasks, showcasing their effectiveness and flexibility.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

实时调制，实现多模态均衡学习

多模态学习有望通过整合来自不同模态的信息来提高模型的性能。然而，由于广泛使用的联合训练策略对所有模态都有统一的目标，导致单模态表示不平衡和优化不足，因此其潜力没有得到充分利用。具体来说，我们指出往往存在具有更多判别信息的模态，例如踢球的视觉和吹风的声音。他们可能主导联合训练过程，导致其他模式明显不足。为了缓解这一问题，我们首先从前馈和反向传播两个阶段分析了优化过程中的欠优化现象。然后，提出了动态预测调制（OPM）和动态梯度调制（OGM）策略，通过监测训练过程中模态之间的判别差异来调节每个模态的优化。具体而言，OPM通过在前馈阶段降低其动态概率特征来减弱优势模态的影响，而OGM则在反向传播阶段减轻其梯度。在实验中，我们的方法在各种多模态任务中表现出相当大的改进。这些简单而有效的策略不仅在普通和面向任务的多模态模型中提高了性能，而且在更复杂的多模态任务中也显示了它们的有效性和灵活性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量