Pub Date : 2026-01-27DOI: 10.1016/j.inffus.2026.104188
Wenlong Zhang, Ying Li, Hanhan Du, Yan Wei, Aiqing Fang
Gradient compression can reduce communication overhead. However, current static sparsity techniques may disturb gradient dynamics, resulting in unstable model convergence and reduced feature discriminative ability, whereas transmitting the complete gradient leads to high costs. To address this issue, inspired by nonequilibrium thermodynamics, this paper proposes a Physics-guided Gradient Sparsification Criterion (PGSC). Specifically, we formulate a continuous field equation based on the gradient magnitude distribution, deriving an adaptive decay rule for the sparsification threshold during the training phase. We then dynamically adjust the sparsification threshold according to this rule, effectively addressing the complexity of multimodal features and ensuring consistent information transmission. Our method achieves adaptive co-optimization of gradient compression and model accuracy by establishing a dynamic equilibrium mechanism between gradient dissipation and information entropy. This approach ensures stable convergence rates while preserving the gradient structure of multi-scale features. Extensive experiments on public datasets, including CIFAR-10, MNIST, and FLIR_ADAS_v2, demonstrate significant advantages over competitors such as TopK and quantization compression, while also reducing communication costs.
{"title":"PGSC: A Gradient Sparsification Communication Optimization Criterion for Nonequilibrium Thermodynamics","authors":"Wenlong Zhang, Ying Li, Hanhan Du, Yan Wei, Aiqing Fang","doi":"10.1016/j.inffus.2026.104188","DOIUrl":"https://doi.org/10.1016/j.inffus.2026.104188","url":null,"abstract":"Gradient compression can reduce communication overhead. However, current static sparsity techniques may disturb gradient dynamics, resulting in unstable model convergence and reduced feature discriminative ability, whereas transmitting the complete gradient leads to high costs. To address this issue, inspired by nonequilibrium thermodynamics, this paper proposes a Physics-guided Gradient Sparsification Criterion (PGSC). Specifically, we formulate a continuous field equation based on the gradient magnitude distribution, deriving an adaptive decay rule for the sparsification threshold during the training phase. We then dynamically adjust the sparsification threshold according to this rule, effectively addressing the complexity of multimodal features and ensuring consistent information transmission. Our method achieves adaptive co-optimization of gradient compression and model accuracy by establishing a dynamic equilibrium mechanism between gradient dissipation and information entropy. This approach ensures stable convergence rates while preserving the gradient structure of multi-scale features. Extensive experiments on public datasets, including CIFAR-10, MNIST, and FLIR_ADAS_v2, demonstrate significant advantages over competitors such as TopK and quantization compression, while also reducing communication costs.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"13 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate medical image segmentation requires models that capture high-level semantics while preserving fine-grained structural details, due to anatomical heterogeneity and subtle textures in clinical scenarios. However, existing U-shaped networks usually lack a unified perspective to reconcile semantic representation with structural detail. To this end, we present SU-RMT, a U-shaped network that embodies this unified perspective by redesigning the encoder, bottleneck, and skip connection. The encoder employs the Dynamic Spatial Attention (DySA) mechanism to capture global context with spatial priors. The bottleneck introduces a Hybrid Spectral Adaptive (HSA) module to transform abstract semantics into structure-aware features. The first skip connection incorporates a Frequency-Fused (F2) block to enhance boundary details without amplifying noise. Across several medical image segmentation tasks, SU-RMT demonstrates strong performance. The code is at the link.
{"title":"SU-RMT: Toward Bridging Semantic Representation and Structural Detail Modeling for Medical Image Segmentation","authors":"Peibo Song, Zihao Wang, Jinshuo Zhang, Shujun Fu, Yunfeng Zhang, Wei Wu, Fangxun Bao","doi":"10.1016/j.inffus.2026.104182","DOIUrl":"https://doi.org/10.1016/j.inffus.2026.104182","url":null,"abstract":"Accurate medical image segmentation requires models that capture high-level semantics while preserving fine-grained structural details, due to anatomical heterogeneity and subtle textures in clinical scenarios. However, existing U-shaped networks usually lack a unified perspective to reconcile semantic representation with structural detail. To this end, we present <ce:bold>SU-RMT</ce:bold>, a U-shaped network that embodies this unified perspective by redesigning the encoder, bottleneck, and skip connection. The encoder employs the <ce:bold>Dy</ce:bold>namic <ce:bold>S</ce:bold>patial <ce:bold>A</ce:bold>ttention <ce:bold>(DySA)</ce:bold> mechanism to capture global context with spatial priors. The bottleneck introduces a <ce:bold>H</ce:bold>ybrid <ce:bold>S</ce:bold>pectral <ce:bold>A</ce:bold>daptive <ce:bold>(HSA)</ce:bold> module to transform abstract semantics into structure-aware features. The first skip connection incorporates a <ce:bold>F</ce:bold>requency-<ce:bold>F</ce:bold>used <ce:bold>(F</ce:bold><ce:sup loc=\"post\">2</ce:sup><ce:bold>)</ce:bold> block to enhance boundary details without amplifying noise. Across several medical image segmentation tasks, SU-RMT demonstrates strong performance. The code is at the <ce:inter-ref xlink:href=\"https://github.com/setsese/SURMTArchive\" xlink:type=\"simple\">link</ce:inter-ref>.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"41 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146048053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-24DOI: 10.1016/j.inffus.2026.104184
Xingcan Bao , Jianzhou Feng , Yiru Huo , Huaxiao Qiu , Haoran Yu , Shenyuan Ren , Jiadong Ren
Large language models (LLMs) have demonstrated remarkable performance across diverse natural language processing tasks. However, they still face significant challenges in multi-task continual learning, particularly in dynamic environments where tasks evolve sequentially and resources are constrained. Existing approaches typically learn separate adapter modules for each task, leading to a linear increase in parameters as tasks accumulate and thus hindering scalability and deployment efficiency. In this paper, we propose Controlled Subspace Fusion (CSF), a rehearsal-free and task-agnostic continual learning framework for language models that integrates knowledge across tasks while preventing parameter explosion. CSF introduces a shared low-rank projection subspace to provide a unified representational foundation, thereby enhancing consistency and facilitating cross-task knowledge transfer. In addition, we design an incremental subspace fusion mechanism that adaptively merges new task adapters with previously fused representations, while suppressing redundant parameter growth. As a result, the framework achieves scalable and robust knowledge fusion across sequential tasks. We evaluate CSF on mainstream architectures, including LLaMA and T5, across model scales ranging from 220M to 13B parameters. Experimental results on continual learning benchmarks demonstrate that CSF not only achieves superior average accuracy and parameter efficiency compared to existing approaches, but also provides a scalable and deployment-friendly solution that supports efficient knowledge fusion.
{"title":"Controlled subspace fusion for language model continual learning","authors":"Xingcan Bao , Jianzhou Feng , Yiru Huo , Huaxiao Qiu , Haoran Yu , Shenyuan Ren , Jiadong Ren","doi":"10.1016/j.inffus.2026.104184","DOIUrl":"10.1016/j.inffus.2026.104184","url":null,"abstract":"<div><div>Large language models (LLMs) have demonstrated remarkable performance across diverse natural language processing tasks. However, they still face significant challenges in multi-task continual learning, particularly in dynamic environments where tasks evolve sequentially and resources are constrained. Existing approaches typically learn separate adapter modules for each task, leading to a linear increase in parameters as tasks accumulate and thus hindering scalability and deployment efficiency. In this paper, we propose Controlled Subspace Fusion (CSF), a rehearsal-free and task-agnostic continual learning framework for language models that integrates knowledge across tasks while preventing parameter explosion. CSF introduces a shared low-rank projection subspace to provide a unified representational foundation, thereby enhancing consistency and facilitating cross-task knowledge transfer. In addition, we design an incremental subspace fusion mechanism that adaptively merges new task adapters with previously fused representations, while suppressing redundant parameter growth. As a result, the framework achieves scalable and robust knowledge fusion across sequential tasks. We evaluate CSF on mainstream architectures, including LLaMA and T5, across model scales ranging from 220M to 13B parameters. Experimental results on continual learning benchmarks demonstrate that CSF not only achieves superior average accuracy and parameter efficiency compared to existing approaches, but also provides a scalable and deployment-friendly solution that supports efficient knowledge fusion.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104184"},"PeriodicalIF":15.5,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-24DOI: 10.1016/j.inffus.2026.104185
Jin Xu, Hengrun Zhang, Huiqun Yu, Guisheng Fan
Heterogeneity challenges have been long discussed in Federated Learning (FL). Among these challenges, statistical heterogeneity, where non-independent and identical (non-IID) data distributions across clients severely impact model convergence and performance, remains particularly problematic. While existing batch size optimization strategies effectively address system-level heterogeneity and resource constraints, they inadequately tackle statistical heterogeneity, often simply increasing batch sizes without theoretical justification. Such approaches overlook a critical convergence-generalization dilemma well-established in traditional machine learning: larger batch sizes accelerate convergence but may deteriorate generalization performance beyond critical thresholds, which is usually termed “generalization gap”. To bridge this gap in FL, we propose a comprehensive framework with three key contributions. First, we establish a batch size optimization mechanism that balances convergence and generalization objectives through a penalty function, providing mathematically derived closed-form solutions for optimal batch sizes. Second, we design a Stackelberg game-based incentive mechanism that coordinates batch size assignments with resource contributions while ensuring fair reward allocation to maximize individual client utility (defined as the difference between rewards and costs). Third, we develop a two-step verification strategy that detects and mitigates free-riding behaviors while monitoring convergence patterns to terminate ineffective training processes. Extensive experiments on real-world datasets validate our approach, demonstrating significant improvements in both convergence performance and fairness compared to state-of-the-art algorithms. Ablation studies confirm the effectiveness of each component.
{"title":"IDFL: Incentive-driven federated learning with selfish clients","authors":"Jin Xu, Hengrun Zhang, Huiqun Yu, Guisheng Fan","doi":"10.1016/j.inffus.2026.104185","DOIUrl":"10.1016/j.inffus.2026.104185","url":null,"abstract":"<div><div>Heterogeneity challenges have been long discussed in Federated Learning (FL). Among these challenges, statistical heterogeneity, where non-independent and identical (non-IID) data distributions across clients severely impact model convergence and performance, remains particularly problematic. While existing batch size optimization strategies effectively address system-level heterogeneity and resource constraints, they inadequately tackle statistical heterogeneity, often simply increasing batch sizes without theoretical justification. Such approaches overlook a critical convergence-generalization dilemma well-established in traditional machine learning: larger batch sizes accelerate convergence but may deteriorate generalization performance beyond critical thresholds, which is usually termed “generalization gap”. To bridge this gap in FL, we propose a comprehensive framework with three key contributions. First, we establish a batch size optimization mechanism that balances convergence and generalization objectives through a penalty function, providing mathematically derived closed-form solutions for optimal batch sizes. Second, we design a Stackelberg game-based incentive mechanism that coordinates batch size assignments with resource contributions while ensuring fair reward allocation to maximize individual client utility (defined as the difference between rewards and costs). Third, we develop a two-step verification strategy that detects and mitigates free-riding behaviors while monitoring convergence patterns to terminate ineffective training processes. Extensive experiments on real-world datasets validate our approach, demonstrating significant improvements in both convergence performance and fairness compared to state-of-the-art algorithms. Ablation studies confirm the effectiveness of each component.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104185"},"PeriodicalIF":15.5,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.inffus.2026.104173
Muhammad Umar Khan , Girija Chetty , Stefanos Gkikas , Manolis Tsiknakis , Roland Goecke , Raul Fernandez-Rojas
Reliable pain assessment is crucial in clinical practice, yet it remains a challenge because self-report-based assessment is inherently subjective. In this work, we introduce GIAFormer, a deep learning framework designed to provide an objective measure of multilevel pain by jointly analysing Electrodermal Activity (EDA) and functional Near-Infrared Spectroscopy (fNIRS) signals. By combining the complementary information from autonomic and cortical responses, the proposed model aims to capture both physiological and neural aspects of pain. GIAFormer integrates a Gradient-Infused Attention (GIA) module with a Transformer. The GIA module enhances signal representation by fusing the physiological signals with their temporal gradients and applying spatial attention to highlight inter-channel dependencies. The Transformer component follows, enabling the model to learn long-range temporal relationships. The framework was evaluated on the AI4Pain dataset comprising 65 subjects using a leave-one-subject-out validation protocol. GIAFormer achieved an accuracy of 90.51% and outperformed recent state-of-the-art approaches. These findings highlight the potential of gradient-aware attention and multimodal fusion for interpretable, non-invasive, and generalisable pain assessment suitable for clinical and real-world applications.
{"title":"GIAFormer: A Gradient-Infused Attention and Transformer for Pain Assessment with EDA-fNIRS Fusion","authors":"Muhammad Umar Khan , Girija Chetty , Stefanos Gkikas , Manolis Tsiknakis , Roland Goecke , Raul Fernandez-Rojas","doi":"10.1016/j.inffus.2026.104173","DOIUrl":"10.1016/j.inffus.2026.104173","url":null,"abstract":"<div><div>Reliable pain assessment is crucial in clinical practice, yet it remains a challenge because self-report-based assessment is inherently subjective. In this work, we introduce GIAFormer, a deep learning framework designed to provide an objective measure of multilevel pain by jointly analysing Electrodermal Activity (EDA) and functional Near-Infrared Spectroscopy (fNIRS) signals. By combining the complementary information from autonomic and cortical responses, the proposed model aims to capture both physiological and neural aspects of pain. GIAFormer integrates a Gradient-Infused Attention (GIA) module with a Transformer. The GIA module enhances signal representation by fusing the physiological signals with their temporal gradients and applying spatial attention to highlight inter-channel dependencies. The Transformer component follows, enabling the model to learn long-range temporal relationships. The framework was evaluated on the AI4Pain dataset comprising 65 subjects using a leave-one-subject-out validation protocol. GIAFormer achieved an accuracy of 90.51% and outperformed recent state-of-the-art approaches. These findings highlight the potential of gradient-aware attention and multimodal fusion for interpretable, non-invasive, and generalisable pain assessment suitable for clinical and real-world applications.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104173"},"PeriodicalIF":15.5,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to its strong capacity for integrating heterogeneous multi-source information, multimodal sentiment analysis (MSA) has achieved remarkable progress in affective computing. However, existing methods typically adopt symmetric fusion strategies that treat all modalities equally, overlooking their inherent performance disparities that some modalities excel at discriminative representation, while others carry underutilized supportive cues. This limitation leads to insufficiency in cross-modal complementary correlation exploration. To address this issue, we propose a novel Grading-Inspired Complementary Enhancing (GCE) framework for MSA, which is one of the first attempts to conduct dynamic assessment for knowledge transfer in progressive multimodal fusion and cooperation. Specifically, based on cross-modal interaction, a task-aware grading mechanism categorizes modality-pair associations into dominant (high-performing) and supplementary (low-performing) branches according to their task performance. Accordingly, a relation filtering module selectively identifies the trustworthy information from the dominant branch to enhance consistency exploration in supplementary modality pairs with minimized redundancy. Afterwards, a weight adaptation module is adopted to dynamically adjust the guiding weight of individual samples for adaptability and generalization. Extensive experiments conducted on three benchmark datasets evidence that our proposed GCE approach can outperform the state-of-the-art MSA methods. Our code is available at https://github.com/hka-7/GCEforMSA.
{"title":"Grading-Inspired Complementary Enhancing for Multimodal Sentiment Analysis","authors":"Zhijing Huang, Wen-Jue He, Baotian Hu, Zheng Zhang","doi":"10.1016/j.inffus.2026.104174","DOIUrl":"https://doi.org/10.1016/j.inffus.2026.104174","url":null,"abstract":"Due to its strong capacity for integrating heterogeneous multi-source information, multimodal sentiment analysis (MSA) has achieved remarkable progress in affective computing. However, existing methods typically adopt symmetric fusion strategies that treat all modalities equally, overlooking their inherent performance disparities that some modalities excel at discriminative representation, while others carry underutilized supportive cues. This limitation leads to insufficiency in cross-modal complementary correlation exploration. To address this issue, we propose a novel Grading-Inspired Complementary Enhancing (GCE) framework for MSA, which is one of the first attempts to conduct dynamic assessment for knowledge transfer in progressive multimodal fusion and cooperation. Specifically, based on cross-modal interaction, a task-aware grading mechanism categorizes modality-pair associations into dominant (high-performing) and supplementary (low-performing) branches according to their task performance. Accordingly, a relation filtering module selectively identifies the trustworthy information from the dominant branch to enhance consistency exploration in supplementary modality pairs with minimized redundancy. Afterwards, a weight adaptation module is adopted to dynamically adjust the guiding weight of individual samples for adaptability and generalization. Extensive experiments conducted on three benchmark datasets evidence that our proposed GCE approach can outperform the state-of-the-art MSA methods. Our code is available at <ce:inter-ref xlink:href=\"https://github.com/hka-7/GCEforMSA\" xlink:type=\"simple\">https://github.com/hka-7/GCEforMSA</ce:inter-ref>.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"290 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.inffus.2026.104183
Xinyu Xiang , Xuying Wu , Shengxiang Li , Qinglong Yan , Tong Zou , Hao Zhang , Jiayi Ma
Existing adversarial perturbation attack for visual object trackers mainly focuses on RGB modality, yet research on RGB-T trackers’ adversarial perturbation remains unexplored. To address this gap, we propose an Intra-modal excavation and Cross-modal collusion adversarial perturbation attack algorithm (ICAttack) for RGB-T Tracking. Firstly, we establish a novel intra-modal adversarial clues excavation (ImAE) paradigm. By leveraging the unique distribution properties of each modality as a prior, we independently extract the attack cues of different modalities from the public noise space. Building upon this, we develop a cross-modal adversarial collusion (CmAC) strategy, which enables implicit and dynamic interaction between the adversarial tokens of two modalities. This interaction facilitates negotiation and collaboration, achieving a synergistic attack gain for RGB-T trackers that surpasses the effect of a single-modality attack. The above process, from intra-modal excavation to cross-modal collusion, creates a progressive and systematic attack framework for RGB-T trackers. Besides, by introducing the spatial adversarial intensity control module and precise response disruption loss, we further enhance both the attack stealthiness and precision of our adversarial perturbations. The control module reduces attack strength in less critical areas to improve stealth. The disruption loss uses a small mask on the tracker’s brightest semantic response region, concentrating the perturbation to interfere with the tracker’s target awareness precisely. Extensive evaluations of attack performances in different SOTA victimized RGB-T trackers demonstrate the advantages of ICAttack in terms of specificity and effectiveness of cross-modal attacks. Moreover, we offer a user-friendly interface to promote the practical deployment of adversarial perturbations. Our code is publicly available at https://github.com/Xinyu-Xiang/ICAttack.
{"title":"Adversarial perturbation for RGB-T tracking via intra-modal excavation and cross-modal collusion","authors":"Xinyu Xiang , Xuying Wu , Shengxiang Li , Qinglong Yan , Tong Zou , Hao Zhang , Jiayi Ma","doi":"10.1016/j.inffus.2026.104183","DOIUrl":"10.1016/j.inffus.2026.104183","url":null,"abstract":"<div><div>Existing adversarial perturbation attack for visual object trackers mainly focuses on RGB modality, yet research on RGB-T trackers’ adversarial perturbation remains unexplored. To address this gap, we propose an <strong>I</strong>ntra-modal excavation and <strong>C</strong>ross-modal collusion adversarial perturbation attack algorithm (ICAttack) for RGB-T Tracking. Firstly, we establish a novel intra-modal adversarial clues excavation (ImAE) paradigm. By leveraging the unique distribution properties of each modality as a prior, we independently extract the attack cues of different modalities from the public noise space. Building upon this, we develop a cross-modal adversarial collusion (CmAC) strategy, which enables implicit and dynamic interaction between the adversarial tokens of two modalities. This interaction facilitates negotiation and collaboration, achieving a synergistic attack gain for RGB-T trackers that surpasses the effect of a single-modality attack. The above process, from intra-modal excavation to cross-modal collusion, creates a progressive and systematic attack framework for RGB-T trackers. Besides, by introducing the spatial adversarial intensity control module and precise response disruption loss, we further enhance both the attack stealthiness and precision of our adversarial perturbations. The control module reduces attack strength in less critical areas to improve stealth. The disruption loss uses a small mask on the tracker’s brightest semantic response region, concentrating the perturbation to interfere with the tracker’s target awareness precisely. Extensive evaluations of attack performances in different SOTA victimized RGB-T trackers demonstrate the advantages of ICAttack in terms of specificity and effectiveness of cross-modal attacks. Moreover, we offer a user-friendly interface to promote the practical deployment of adversarial perturbations. Our code is publicly available at <span><span>https://github.com/Xinyu-Xiang/ICAttack</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104183"},"PeriodicalIF":15.5,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.inffus.2026.104186
Yongcai Chen , Qinghua Zhang , Xinfa Shi , Lei Zhang
Intelligent engineering tasks step into real application with the development of deep learning techniques. However, performance in real conditions often falls into decline caused by scarce data, or subtle, easily confused patterns. Although vision-language models with prompt learning provide a new way for learning without retraining the backbone, these approaches still suffer from problems of overfitting under low-data regimes or poor expressive ability of prompts. To address these challenges, we propose a novel framework PromptMix that jointly considers semantic prompt learning, multimodal information fusion, and the alignment between pre-trained and domain-specific data. Specifically, PromptMix integrates three key components: (1) a Modality-Agnostic Shared Representation module to construct a shared latent space that mitigates the distribution discrepancies between pre-trained and target data, (2) a LLM-Aided Prompt Evolution mechanism to semantically enrich and iteratively refine learnable context prompts, and (3) a Cross-Attentive Adapter to enhance multimodal information fusion and robustness under low-sample conditions. Experiments on seven datasets, including six public benchmarks and one custom industrial dataset, demonstrate that PromptMix effectively enhances vision-language model adaptability, improves semantic representations, and achieves robust generalization under both base-to-novel and few-shot learning scenarios, delivering superior performance in engineering applications with limited labeled data.
{"title":"PromptMix: LLM-aided prompt learning for generalizing vision-language models","authors":"Yongcai Chen , Qinghua Zhang , Xinfa Shi , Lei Zhang","doi":"10.1016/j.inffus.2026.104186","DOIUrl":"10.1016/j.inffus.2026.104186","url":null,"abstract":"<div><div>Intelligent engineering tasks step into real application with the development of deep learning techniques. However, performance in real conditions often falls into decline caused by scarce data, or subtle, easily confused patterns. Although vision-language models with prompt learning provide a new way for learning without retraining the backbone, these approaches still suffer from problems of overfitting under low-data regimes or poor expressive ability of prompts. To address these challenges, we propose a novel framework <em>PromptMix</em> that jointly considers semantic prompt learning, multimodal information fusion, and the alignment between pre-trained and domain-specific data. Specifically, PromptMix integrates three key components: (1) a <em>Modality-Agnostic Shared Representation</em> module to construct a shared latent space that mitigates the distribution discrepancies between pre-trained and target data, (2) a <em>LLM-Aided Prompt Evolution</em> mechanism to semantically enrich and iteratively refine learnable context prompts, and (3) a <em>Cross-Attentive Adapter</em> to enhance multimodal information fusion and robustness under low-sample conditions. Experiments on seven datasets, including six public benchmarks and one custom industrial dataset, demonstrate that PromptMix effectively enhances vision-language model adaptability, improves semantic representations, and achieves robust generalization under both base-to-novel and few-shot learning scenarios, delivering superior performance in engineering applications with limited labeled data.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104186"},"PeriodicalIF":15.5,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}