Pub Date : 2026-07-01Epub Date: 2026-01-13DOI: 10.1016/j.inffus.2026.104148
Jungwon Seo , Ferhat Ozgur Catak , Chunming Rong , Kibeom Hong , Minhoe Kim
Federated Learning (FL) enables privacy-preserving multi-source information fusion (MSIF) but suffers from client drift in highly heterogeneous data settings. Many existing approaches mitigate drift by providing clients with common reference points, typically derived from past information, to align objectives or gradient directions. However, under severe partial participation, such history-dependent references may become unreliable, as the set of client data distributions participating in each round can vary drastically. To overcome this limitation, we propose a method that mitigates client drift without relying on past information by constraining the update space through Gradient Centralization (GC). Specifically, we introduce Local GC and Global GC, which apply GC at the local and global update stages, respectively, and further present GC-Fed, a hybrid formulation that generalizes both. Theoretical analysis and extensive experiments on benchmark FL tasks demonstrate that GC-Fed effectively alleviates client drift and achieves up to 20 % accuracy improvement under data heterogeneous and partial participation conditions.
{"title":"GC-Fed: Gradient centralized federated learning with partial client participation","authors":"Jungwon Seo , Ferhat Ozgur Catak , Chunming Rong , Kibeom Hong , Minhoe Kim","doi":"10.1016/j.inffus.2026.104148","DOIUrl":"10.1016/j.inffus.2026.104148","url":null,"abstract":"<div><div>Federated Learning (FL) enables privacy-preserving multi-source information fusion (MSIF) but suffers from client drift in highly heterogeneous data settings. Many existing approaches mitigate drift by providing clients with common reference points, typically derived from past information, to align objectives or gradient directions. However, under severe partial participation, such history-dependent references may become unreliable, as the set of client data distributions participating in each round can vary drastically. To overcome this limitation, we propose a method that mitigates client drift without relying on past information by constraining the update space through Gradient Centralization (GC). Specifically, we introduce <span>Local GC</span> and <span>Global GC</span>, which apply GC at the local and global update stages, respectively, and further present <span>GC-Fed</span>, a hybrid formulation that generalizes both. Theoretical analysis and extensive experiments on benchmark FL tasks demonstrate that <span>GC-Fed</span> effectively alleviates client drift and achieves up to 20 % accuracy improvement under data heterogeneous and partial participation conditions.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104148"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145962592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-22DOI: 10.1016/j.inffus.2026.104172
Zhiqi Shao , Ze Wang , Haoning Xi , Michael G.H. Bell , Xusheng Yao , D. Glenn Geers , Junbin Gao
Real-time spatiotemporal forecasting, particularly in traffic systems, requires balancing computational cost and predictive accuracy-a challenge that conventional methods struggle to address effectively. In this work, we propose a non-trade-off framework called Spatial-Temporal Selective State Space (ST-Mamba), which leverages two key components to achieve both efficiency and accuracy concurrently. The Spatial-Temporal Mixer (ST-Mixer) dynamically fuses spatial and temporal features to capture complex dependencies, and the STF-Mamba layer incorporates Mamba’s selective state-space formulation to capture long-range dynamics efficiently. Beyond empirical improvements, we address a critical gap in the literature by presenting a theoretical analysis of ST-Mamba’s expressive power. Specifically, we establish its ability to approximate a broad class of Transformer and formally demonstrate its equivalence to at least two consecutive attention layers within the same framework. This result highlights ST-Mamba’s capacity to capture long-range dependencies while reducing computational overhead efficiently, reinforcing its theoretical and practical advantages over conventional transformer-based models. Through extensive evaluations of real-world traffic datasets, ST-Mamba demonstrates a 61.11% reduction in runtime alongside a 0.67% improvement in predictive performance compared to leading approaches, underscoring its potential to set a new benchmark for real-time spatiotemporal forecasting.
{"title":"Unleashing Mamba’s expressive power: A non-tradeoff approach to spatio-temporal forecasting","authors":"Zhiqi Shao , Ze Wang , Haoning Xi , Michael G.H. Bell , Xusheng Yao , D. Glenn Geers , Junbin Gao","doi":"10.1016/j.inffus.2026.104172","DOIUrl":"10.1016/j.inffus.2026.104172","url":null,"abstract":"<div><div>Real-time spatiotemporal forecasting, particularly in traffic systems, requires balancing computational cost and predictive accuracy-a challenge that conventional methods struggle to address effectively. In this work, we propose a non-trade-off framework called Spatial-Temporal Selective State Space (ST-Mamba), which leverages two key components to achieve both efficiency and accuracy concurrently. The Spatial-Temporal Mixer (ST-Mixer) dynamically fuses spatial and temporal features to capture complex dependencies, and the STF-Mamba layer incorporates Mamba’s selective state-space formulation to capture long-range dynamics efficiently. Beyond empirical improvements, we address a critical gap in the literature by presenting a theoretical analysis of ST-Mamba’s expressive power. Specifically, we establish its ability to approximate a broad class of Transformer and formally demonstrate its equivalence to at least two consecutive attention layers within the same framework. This result highlights ST-Mamba’s capacity to capture long-range dependencies while reducing computational overhead efficiently, reinforcing its theoretical and practical advantages over conventional transformer-based models. Through extensive evaluations of real-world traffic datasets, <span>ST-Mamba</span> demonstrates a 61.11% reduction in runtime alongside a 0.67% improvement in predictive performance compared to leading approaches, underscoring its potential to set a new benchmark for real-time spatiotemporal forecasting.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104172"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-23DOI: 10.1016/j.inffus.2026.104186
Yongcai Chen , Qinghua Zhang , Xinfa Shi , Lei Zhang
Intelligent engineering tasks step into real application with the development of deep learning techniques. However, performance in real conditions often falls into decline caused by scarce data, or subtle, easily confused patterns. Although vision-language models with prompt learning provide a new way for learning without retraining the backbone, these approaches still suffer from problems of overfitting under low-data regimes or poor expressive ability of prompts. To address these challenges, we propose a novel framework PromptMix that jointly considers semantic prompt learning, multimodal information fusion, and the alignment between pre-trained and domain-specific data. Specifically, PromptMix integrates three key components: (1) a Modality-Agnostic Shared Representation module to construct a shared latent space that mitigates the distribution discrepancies between pre-trained and target data, (2) a LLM-Aided Prompt Evolution mechanism to semantically enrich and iteratively refine learnable context prompts, and (3) a Cross-Attentive Adapter to enhance multimodal information fusion and robustness under low-sample conditions. Experiments on seven datasets, including six public benchmarks and one custom industrial dataset, demonstrate that PromptMix effectively enhances vision-language model adaptability, improves semantic representations, and achieves robust generalization under both base-to-novel and few-shot learning scenarios, delivering superior performance in engineering applications with limited labeled data.
{"title":"PromptMix: LLM-aided prompt learning for generalizing vision-language models","authors":"Yongcai Chen , Qinghua Zhang , Xinfa Shi , Lei Zhang","doi":"10.1016/j.inffus.2026.104186","DOIUrl":"10.1016/j.inffus.2026.104186","url":null,"abstract":"<div><div>Intelligent engineering tasks step into real application with the development of deep learning techniques. However, performance in real conditions often falls into decline caused by scarce data, or subtle, easily confused patterns. Although vision-language models with prompt learning provide a new way for learning without retraining the backbone, these approaches still suffer from problems of overfitting under low-data regimes or poor expressive ability of prompts. To address these challenges, we propose a novel framework <em>PromptMix</em> that jointly considers semantic prompt learning, multimodal information fusion, and the alignment between pre-trained and domain-specific data. Specifically, PromptMix integrates three key components: (1) a <em>Modality-Agnostic Shared Representation</em> module to construct a shared latent space that mitigates the distribution discrepancies between pre-trained and target data, (2) a <em>LLM-Aided Prompt Evolution</em> mechanism to semantically enrich and iteratively refine learnable context prompts, and (3) a <em>Cross-Attentive Adapter</em> to enhance multimodal information fusion and robustness under low-sample conditions. Experiments on seven datasets, including six public benchmarks and one custom industrial dataset, demonstrate that PromptMix effectively enhances vision-language model adaptability, improves semantic representations, and achieves robust generalization under both base-to-novel and few-shot learning scenarios, delivering superior performance in engineering applications with limited labeled data.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104186"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-23DOI: 10.1016/j.inffus.2026.104183
Xinyu Xiang , Xuying Wu , Shengxiang Li , Qinglong Yan , Tong Zou , Hao Zhang , Jiayi Ma
Existing adversarial perturbation attack for visual object trackers mainly focuses on RGB modality, yet research on RGB-T trackers’ adversarial perturbation remains unexplored. To address this gap, we propose an Intra-modal excavation and Cross-modal collusion adversarial perturbation attack algorithm (ICAttack) for RGB-T Tracking. Firstly, we establish a novel intra-modal adversarial clues excavation (ImAE) paradigm. By leveraging the unique distribution properties of each modality as a prior, we independently extract the attack cues of different modalities from the public noise space. Building upon this, we develop a cross-modal adversarial collusion (CmAC) strategy, which enables implicit and dynamic interaction between the adversarial tokens of two modalities. This interaction facilitates negotiation and collaboration, achieving a synergistic attack gain for RGB-T trackers that surpasses the effect of a single-modality attack. The above process, from intra-modal excavation to cross-modal collusion, creates a progressive and systematic attack framework for RGB-T trackers. Besides, by introducing the spatial adversarial intensity control module and precise response disruption loss, we further enhance both the attack stealthiness and precision of our adversarial perturbations. The control module reduces attack strength in less critical areas to improve stealth. The disruption loss uses a small mask on the tracker’s brightest semantic response region, concentrating the perturbation to interfere with the tracker’s target awareness precisely. Extensive evaluations of attack performances in different SOTA victimized RGB-T trackers demonstrate the advantages of ICAttack in terms of specificity and effectiveness of cross-modal attacks. Moreover, we offer a user-friendly interface to promote the practical deployment of adversarial perturbations. Our code is publicly available at https://github.com/Xinyu-Xiang/ICAttack.
{"title":"Adversarial perturbation for RGB-T tracking via intra-modal excavation and cross-modal collusion","authors":"Xinyu Xiang , Xuying Wu , Shengxiang Li , Qinglong Yan , Tong Zou , Hao Zhang , Jiayi Ma","doi":"10.1016/j.inffus.2026.104183","DOIUrl":"10.1016/j.inffus.2026.104183","url":null,"abstract":"<div><div>Existing adversarial perturbation attack for visual object trackers mainly focuses on RGB modality, yet research on RGB-T trackers’ adversarial perturbation remains unexplored. To address this gap, we propose an <strong>I</strong>ntra-modal excavation and <strong>C</strong>ross-modal collusion adversarial perturbation attack algorithm (ICAttack) for RGB-T Tracking. Firstly, we establish a novel intra-modal adversarial clues excavation (ImAE) paradigm. By leveraging the unique distribution properties of each modality as a prior, we independently extract the attack cues of different modalities from the public noise space. Building upon this, we develop a cross-modal adversarial collusion (CmAC) strategy, which enables implicit and dynamic interaction between the adversarial tokens of two modalities. This interaction facilitates negotiation and collaboration, achieving a synergistic attack gain for RGB-T trackers that surpasses the effect of a single-modality attack. The above process, from intra-modal excavation to cross-modal collusion, creates a progressive and systematic attack framework for RGB-T trackers. Besides, by introducing the spatial adversarial intensity control module and precise response disruption loss, we further enhance both the attack stealthiness and precision of our adversarial perturbations. The control module reduces attack strength in less critical areas to improve stealth. The disruption loss uses a small mask on the tracker’s brightest semantic response region, concentrating the perturbation to interfere with the tracker’s target awareness precisely. Extensive evaluations of attack performances in different SOTA victimized RGB-T trackers demonstrate the advantages of ICAttack in terms of specificity and effectiveness of cross-modal attacks. Moreover, we offer a user-friendly interface to promote the practical deployment of adversarial perturbations. Our code is publicly available at <span><span>https://github.com/Xinyu-Xiang/ICAttack</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104183"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper investigates the joint design of multiple channel access and power control for multi-sensor remote estimation. Smart sensors with energy constraints transmit their local estimates over sharing Markovian fading channels. A novel discount-average weighting criterion (DAWC) is introduced in the infinite horizon, which balances immediate and long-term transmission performance, unlike traditional criteria that focus on one aspect. We formulate the co-design issue including channel selection and power allocation as a Markov decision process (MDP) with DAWC. The existence of ϵ-optimal policy is presented for ergodic MDP via a model checking method, and the switch-like optimal transmission policy is derived from the set of randomized Markov strategies. Further, we prove the existence of ϵ-s-optimal policy that is an ultimately deterministic policy for general MDP. An elaborately devised algorithm is employed to generate optimal transmission decisions utilizing a forward iterative approach. Finally, an example of turbofan engine speed regulation is applied to demonstrate the superiority of previous results.
{"title":"Multiple channel access and power control for discount-average weighting criterion over multi-sensor and Markovian fading environments","authors":"Yunbo Song , Jianrong Zhao , Kangkai Zheng , Ticao Jiao","doi":"10.1016/j.inffus.2026.104191","DOIUrl":"10.1016/j.inffus.2026.104191","url":null,"abstract":"<div><div>This paper investigates the joint design of multiple channel access and power control for multi-sensor remote estimation. Smart sensors with energy constraints transmit their local estimates over sharing Markovian fading channels. A novel discount-average weighting criterion (DAWC) is introduced in the infinite horizon, which balances immediate and long-term transmission performance, unlike traditional criteria that focus on one aspect. We formulate the co-design issue including channel selection and power allocation as a Markov decision process (MDP) with DAWC. The existence of ϵ-optimal policy is presented for ergodic MDP via a model checking method, and the switch-like optimal transmission policy is derived from the set of randomized Markov strategies. Further, we prove the existence of ϵ-s-optimal policy that is an ultimately deterministic policy for general MDP. An elaborately devised algorithm is employed to generate optimal transmission decisions utilizing a forward iterative approach. Finally, an example of turbofan engine speed regulation is applied to demonstrate the superiority of previous results.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104191"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146071717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-27DOI: 10.1016/j.inffus.2026.104189
Marian Lupaşcu, Ana-Cristina Rogoz, Mihai Sorin Stupariu, Radu Tudor Ionescu
In this survey, we systematically analyze techniques used to adapt large multimodal models (LMMs) for low-resource (LR) languages, examining approaches ranging from visual enhancement and data creation to cross-modal transfer and fusion strategies. Through a comprehensive analysis of 117 studies across 96 LR languages, we identify key patterns in how researchers tackle the challenges of limited data and computational resources. We categorize works into resource-oriented and method-oriented contributions, further dividing contributions into relevant sub-categories. We compare method-oriented contributions in terms of performance and efficiency, discussing benefits and limitations of representative studies. We find that visual information often serves as a crucial bridge for improving model performance in LR settings, though significant challenges remain in areas such as hallucination mitigation and computational efficiency. In summary, we provide researchers with a clear understanding of current approaches and remaining challenges in making LMMs more accessible to speakers of LR (understudied) languages. We complement our survey with an open-source repository available at: https://github.com/marianlupascu/LMM4LRL-Survey.
{"title":"Large multimodal models for low-resource languages: A survey","authors":"Marian Lupaşcu, Ana-Cristina Rogoz, Mihai Sorin Stupariu, Radu Tudor Ionescu","doi":"10.1016/j.inffus.2026.104189","DOIUrl":"10.1016/j.inffus.2026.104189","url":null,"abstract":"<div><div>In this survey, we systematically analyze techniques used to adapt large multimodal models (LMMs) for low-resource (LR) languages, examining approaches ranging from visual enhancement and data creation to cross-modal transfer and fusion strategies. Through a comprehensive analysis of 117 studies across 96 LR languages, we identify key patterns in how researchers tackle the challenges of limited data and computational resources. We categorize works into resource-oriented and method-oriented contributions, further dividing contributions into relevant sub-categories. We compare method-oriented contributions in terms of performance and efficiency, discussing benefits and limitations of representative studies. We find that visual information often serves as a crucial bridge for improving model performance in LR settings, though significant challenges remain in areas such as hallucination mitigation and computational efficiency. In summary, we provide researchers with a clear understanding of current approaches and remaining challenges in making LMMs more accessible to speakers of LR (understudied) languages. We complement our survey with an open-source repository available at: <span><span>https://github.com/marianlupascu/LMM4LRL-Survey</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104189"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146071718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-28DOI: 10.1016/j.inffus.2026.104190
Mengyao Li , Zhibang Yang , Xu Zhou , Joey Tianyi Zhou , Quanqing Xu , Chuanhui Yang , Kenli Li , Keqin Li
Multi-view attributed graphs (MVAG) are well-known for their ability to model complex networks and relationships, which can provide diverse yet complementary information for finding a consensus partition suitable for all views. There have been abundant methods for clustering over multi-view attributed graphs. However, most of them are not suitable for large-scale graphs due to high complexity. Moreover, while existing anchor-based methods can effectively accelerate clustering, they mainly focus on either attribute information or graph structure during anchor selection, and some suffer from stability issues. Inspired by this, in this paper, we propose the adaptive virtual anchor clustering method (AVAC) to boost clustering performance and keep stable results. In particular, we first introduce adaptive virtual anchors for multi-view attributed graphs, which are learned and generated from graphs adaptively. After that, we connect anchor learning and anchor graph construction closely and cyclically to learn virtual anchors dynamically and make them capture real data distribution and topology information more accurately. Last but not least, we design a five-block coordinate descent method with proven convergence to further optimize our virtual anchors more representative of existing nodes. Extensive experiments over both real and synthetic datasets demonstrate the effectiveness, efficiency, and stability of our method. Compared to state-of-the-art approaches, the AVAC algorithm always gains stable results with a significant improvement in accuracy, and achieves a speedup of 1.8 times on public large-scale datasets. The source code is available at https://github.com/lmyfree/AVAC.
{"title":"Adaptive virtual anchors for efficient and stable clustering over large multi-view attributed graphs","authors":"Mengyao Li , Zhibang Yang , Xu Zhou , Joey Tianyi Zhou , Quanqing Xu , Chuanhui Yang , Kenli Li , Keqin Li","doi":"10.1016/j.inffus.2026.104190","DOIUrl":"10.1016/j.inffus.2026.104190","url":null,"abstract":"<div><div>Multi-view attributed graphs (MVAG) are well-known for their ability to model complex networks and relationships, which can provide diverse yet complementary information for finding a consensus partition suitable for all views. There have been abundant methods for clustering over multi-view attributed graphs. However, most of them are not suitable for large-scale graphs due to high complexity. Moreover, while existing anchor-based methods can effectively accelerate clustering, they mainly focus on either attribute information or graph structure during anchor selection, and some suffer from stability issues. Inspired by this, in this paper, we propose the adaptive virtual anchor clustering method (AVAC) to boost clustering performance and keep stable results. In particular, we first introduce adaptive virtual anchors for multi-view attributed graphs, which are learned and generated from graphs adaptively. After that, we connect anchor learning and anchor graph construction closely and cyclically to learn virtual anchors dynamically and make them capture real data distribution and topology information more accurately. Last but not least, we design a five-block coordinate descent method with proven convergence to further optimize our virtual anchors more representative of existing nodes. Extensive experiments over both real and synthetic datasets demonstrate the effectiveness, efficiency, and stability of our method. Compared to state-of-the-art approaches, the AVAC algorithm always gains stable results with a significant improvement in accuracy, and achieves a speedup of 1.8 times on public large-scale datasets. The source code is available at <span><span>https://github.com/lmyfree/AVAC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104190"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146072489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-16DOI: 10.1016/j.inffus.2026.104158
Naeem Ullah , Andrés Manuel Chacón-Maldonado , Francisco Martínez-Álvarez , Ivanoe De Falco , Giovanna Sannino
Accurate phenological stage classification is crucial for addressing global challenges to food security posed by climate change, water scarcity, and land degradation. It enables precision agriculture by optimizing key interventions such as irrigation, fertilization, and pest control. While deep learning offers powerful tools, existing methods face four key limitations: reliance on narrow features and models, limited long-term forecasting capability, computational inefficiency, and opaque, unvalidated explanations. To overcome these limitations, this paper presents a deep learning framework for phenology classification, utilizing multi-source time series data from satellite imagery, meteorological stations, and field observations. The approach emphasizes temporal consistency, spatial adaptability, computational efficiency, and explainability. A feature engineering pipeline extracts temporal dynamics via lag features, rolling statistics, Fourier transforms and seasonal encodings. Feature selection combines incremental strategies with classical filter, wrapper, and embedded methods. Deep learning models across multiple paradigms-feedforward, recurrent, convolutional, and attention-based-are benchmarked under multi-horizon forecasting tasks. To reduce model complexity while preserving performance where possible, the framework employs knowledge distillation, transferring predictive knowledge from complex teacher models to compact and deployable student models. For model interpretability, a new Hybrid SHAP-Association Rule Explainability approach is proposed, integrating model-driven and data-driven explanations. Agreement between views is quantified using trust metrics: precision@k, coverage, and Jaccard similarity, with a retraining-based validation mechanism. Experiments on phenology data from Andalusia demonstrate high accuracy, strong generalizability, trustworthy explanations and resource-efficient phenology monitoring in agricultural systems.
{"title":"A novel knowledge distillation and hybrid explainability approach for phenology stage classification from multi-source time series","authors":"Naeem Ullah , Andrés Manuel Chacón-Maldonado , Francisco Martínez-Álvarez , Ivanoe De Falco , Giovanna Sannino","doi":"10.1016/j.inffus.2026.104158","DOIUrl":"10.1016/j.inffus.2026.104158","url":null,"abstract":"<div><div>Accurate phenological stage classification is crucial for addressing global challenges to food security posed by climate change, water scarcity, and land degradation. It enables precision agriculture by optimizing key interventions such as irrigation, fertilization, and pest control. While deep learning offers powerful tools, existing methods face four key limitations: reliance on narrow features and models, limited long-term forecasting capability, computational inefficiency, and opaque, unvalidated explanations. To overcome these limitations, this paper presents a deep learning framework for phenology classification, utilizing multi-source time series data from satellite imagery, meteorological stations, and field observations. The approach emphasizes temporal consistency, spatial adaptability, computational efficiency, and explainability. A feature engineering pipeline extracts temporal dynamics via lag features, rolling statistics, Fourier transforms and seasonal encodings. Feature selection combines incremental strategies with classical filter, wrapper, and embedded methods. Deep learning models across multiple paradigms-feedforward, recurrent, convolutional, and attention-based-are benchmarked under multi-horizon forecasting tasks. To reduce model complexity while preserving performance where possible, the framework employs knowledge distillation, transferring predictive knowledge from complex teacher models to compact and deployable student models. For model interpretability, a new Hybrid SHAP-Association Rule Explainability approach is proposed, integrating model-driven and data-driven explanations. Agreement between views is quantified using trust metrics: precision@k, coverage, and Jaccard similarity, with a retraining-based validation mechanism. Experiments on phenology data from Andalusia demonstrate high accuracy, strong generalizability, trustworthy explanations and resource-efficient phenology monitoring in agricultural systems.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104158"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-16DOI: 10.1016/j.inffus.2026.104153
Huaxiang Liu , Wei Sun , Youyao Fu , Shiqing Zhang , Jie Jin , Jiangxiong Fang , Binliang Wang
Accurate liver segmentation in computed tomography (CT) scans is crucial for the diagnosis of hepatocellular carcinoma and surgical planning; however, manual delineation is laborious and prone to operator variability. Existing deep learning methods frequently sacrifice precise boundary delineation when expanding receptive fields or fail to leverage frequency-domain cues that encode global shape, while conventional attention mechanisms are less effective in processing low-contrast images. To address these challenges, we introduce LWT-Net, a novel network guided by a trainable lifting wavelet transform, incorporating a frequency-split histogram attention mechanism to enhance liver segmentation. LWT-Net incorporates a trainable lifting wavelet transform within an encoder-decoder framework to hierarchically decompose features into low-frequency components that capture global structure and high-frequency bands that preserve edge and texture details. A complementary inverse lifting stage reconstructs high-resolution features while maintaining spatial consistency. The frequency-spatial fusion module, driven by a histogram-based attention mechanism, performs histogram-guided feature reorganization across global and local bins, while employing self-attention to capture long-range dependencies and prioritize anatomically significant regions. Comprehensive evaluations on the LiTS2017, WORD, and FLARE22 datasets confirm LWT-Net’s superior performance, achieving mean Dice similarity coefficients of 95.96%, 97.15%, and 95.97%.
{"title":"Lifting wavelet transform-guided network with histogram attention for liver segmentation in CT scans","authors":"Huaxiang Liu , Wei Sun , Youyao Fu , Shiqing Zhang , Jie Jin , Jiangxiong Fang , Binliang Wang","doi":"10.1016/j.inffus.2026.104153","DOIUrl":"10.1016/j.inffus.2026.104153","url":null,"abstract":"<div><div>Accurate liver segmentation in computed tomography (CT) scans is crucial for the diagnosis of hepatocellular carcinoma and surgical planning; however, manual delineation is laborious and prone to operator variability. Existing deep learning methods frequently sacrifice precise boundary delineation when expanding receptive fields or fail to leverage frequency-domain cues that encode global shape, while conventional attention mechanisms are less effective in processing low-contrast images. To address these challenges, we introduce LWT-Net, a novel network guided by a trainable lifting wavelet transform, incorporating a frequency-split histogram attention mechanism to enhance liver segmentation. LWT-Net incorporates a trainable lifting wavelet transform within an encoder-decoder framework to hierarchically decompose features into low-frequency components that capture global structure and high-frequency bands that preserve edge and texture details. A complementary inverse lifting stage reconstructs high-resolution features while maintaining spatial consistency. The frequency-spatial fusion module, driven by a histogram-based attention mechanism, performs histogram-guided feature reorganization across global and local bins, while employing self-attention to capture long-range dependencies and prioritize anatomically significant regions. Comprehensive evaluations on the LiTS2017, WORD, and FLARE22 datasets confirm LWT-Net’s superior performance, achieving mean Dice similarity coefficients of 95.96%, 97.15%, and 95.97%.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104153"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-17DOI: 10.1016/j.inffus.2026.104160
Minghan Li , Ercong Nie , Huiping Huang , Xinxuan Lv , Guodong Zhou
Large Language Models (LLMs) show strong potential for query expansion (QE), but their effectiveness is highly sensitive to prompt design. This paper investigates whether exploiting the system-user prompt distinction in chat-based LLMs improves QE, and how multiple expansions should be combined. We propose Dual-Layer Prompt Ensembles, which pair a behavioural system prompt with varied user prompts to generate diverse expansions, and aggregate their BM25-ranked lists using lightweight SU-RankFusion schemes. Experiments on six heterogeneous datasets show that dual-layer prompting consistently outperforms strong single-prompt baselines. For example, on Touche-2020 a dual-layer configuration improves nDCG@10 from 0.4177 (QE-CoT) to 0.4696, and SU-RankFusion further raises it to 0.4797. On Robust04 and DBPedia, SU-RankFusion improves nDCG@10 over BM25 by 24.7% and 25.5%, respectively, with similar gains on NFCorpus, FiQA, and TREC-COVID. These results demonstrate that system-user prompt ensembles are effective for QE, and that simple fusion transforms prompt-level diversity into stable retrieval improvements.
{"title":"Dual-layer prompt ensembles: Leveraging system- and user-level instructions for robust LLM-based query expansion and rank fusion","authors":"Minghan Li , Ercong Nie , Huiping Huang , Xinxuan Lv , Guodong Zhou","doi":"10.1016/j.inffus.2026.104160","DOIUrl":"10.1016/j.inffus.2026.104160","url":null,"abstract":"<div><div>Large Language Models (LLMs) show strong potential for query expansion (QE), but their effectiveness is highly sensitive to prompt design. This paper investigates whether exploiting the system-user prompt distinction in chat-based LLMs improves QE, and how multiple expansions should be combined. We propose Dual-Layer Prompt Ensembles, which pair a behavioural system prompt with varied user prompts to generate diverse expansions, and aggregate their BM25-ranked lists using lightweight SU-RankFusion schemes. Experiments on six heterogeneous datasets show that dual-layer prompting consistently outperforms strong single-prompt baselines. For example, on Touche-2020 a dual-layer configuration improves nDCG@10 from 0.4177 (QE-CoT) to 0.4696, and SU-RankFusion further raises it to 0.4797. On Robust04 and DBPedia, SU-RankFusion improves nDCG@10 over BM25 by 24.7% and 25.5%, respectively, with similar gains on NFCorpus, FiQA, and TREC-COVID. These results demonstrate that system-user prompt ensembles are effective for QE, and that simple fusion transforms prompt-level diversity into stable retrieval improvements.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104160"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}