Pub Date : 2026-07-01Epub Date: 2026-01-20DOI: 10.1016/j.inffus.2026.104164
Yunfei Guo
The proliferation of urban big data presents unprecedented opportunities for understanding cities, yet the analytical methods to harness this data are often fragmented and domain-specific. Existing predictive models in urban computing are typically highly specialized, creating analytical silos that inhibit knowledge transfer and are difficult to adapt across domains such as public safety, housing and transport. This paper confronts this critical gap by developing a generalizable, multimodal spatio-temporal deep learning framework engineered for both high predictive performance and interpretability, which is capable of mastering diverse urban prediction tasks without architectural modification. The hybrid architecture fuses a Multi-Head Graph Convolutional Network (GCN) for spatial diffusion, a Long Short-Term Memory (LSTM) network for temporal dynamics, and a learnable Gating Mechanism that weights the influence of spatial graph versus static external features. To validate this generalizability, the framework was tested on three distinct urban domains in London: crime forecasting, housing price estimation and transport network demand. The model outperformed traditional baselines (ARIMA, XGBoost) and state-of-the-art deep learning models (TabNet, TFT). Moreover, the framework moves beyond prediction to explanation by incorporating attention mechanisms and permutation feature importance analysis.
{"title":"Multimodal spatio-temporal fusion: A generalizable GCN-LSTM with attention framework for urban application","authors":"Yunfei Guo","doi":"10.1016/j.inffus.2026.104164","DOIUrl":"10.1016/j.inffus.2026.104164","url":null,"abstract":"<div><div>The proliferation of urban big data presents unprecedented opportunities for understanding cities, yet the analytical methods to harness this data are often fragmented and domain-specific. Existing predictive models in urban computing are typically highly specialized, creating analytical silos that inhibit knowledge transfer and are difficult to adapt across domains such as public safety, housing and transport. This paper confronts this critical gap by developing a generalizable, multimodal spatio-temporal deep learning framework engineered for both high predictive performance and interpretability, which is capable of mastering diverse urban prediction tasks without architectural modification. The hybrid architecture fuses a Multi-Head Graph Convolutional Network (GCN) for spatial diffusion, a Long Short-Term Memory (LSTM) network for temporal dynamics, and a learnable Gating Mechanism that weights the influence of spatial graph versus static external features. To validate this generalizability, the framework was tested on three distinct urban domains in London: crime forecasting, housing price estimation and transport network demand. The model outperformed traditional baselines (ARIMA, XGBoost) and state-of-the-art deep learning models (TabNet, TFT). Moreover, the framework moves beyond prediction to explanation by incorporating attention mechanisms and permutation feature importance analysis.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104164"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146014809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-29DOI: 10.1016/j.inffus.2026.104193
Shunlei Li , Longsen Gao , Jin Wang , Chang Che , Xi Xiao , Jiuwen Cao , Yingbai Hu , Hamid Reza Karimi
Teaching robots dexterous skills from human videos remains challenging due to the reliance on low-level trajectory imitation, which fails to generalize across object types, spatial layouts, and manipulator configurations. We propose Graph-Fused Vision-Language-Action (GF-VLA), a framework that enables dual-arm robotic systems to perform task-level reasoning and execution directly from RGB(-D) human demonstrations. GF-VLA first extracts Shannon-information-based cues to identify hands and objects with the highest task relevance, then encodes these cues into temporally ordered scene graphs that capture both hand-object and object-object interactions. These graphs are fused with a language-conditioned transformer that generates hierarchical behavior trees and interpretable Cartesian motion commands. To improve execution efficiency in bimanual settings, we further introduce a cross-hand selection policy that infers optimal gripper assignment without explicit geometric reasoning. We evaluate GF-VLA on four structured dual-arm block assembly tasks involving symbolic shape construction and spatial generalization. Experimental results show that the information-theoretic scene representation achieves over 95% graph accuracy and 93% subtask segmentation, supporting the LLM planner in generating reliable and human-readable task policies. When executed by the dual-arm robot, these policies yield 94% grasp success, 89% placement accuracy, and 90% overall task success across stacking, letter-building, and geometric reconfiguration scenarios, demonstrating strong generalization and robustness across diverse spatial and semantic variations.
{"title":"Information-theoretic graph fusion with vision-language-action model for policy reasoning and dual robotic control","authors":"Shunlei Li , Longsen Gao , Jin Wang , Chang Che , Xi Xiao , Jiuwen Cao , Yingbai Hu , Hamid Reza Karimi","doi":"10.1016/j.inffus.2026.104193","DOIUrl":"10.1016/j.inffus.2026.104193","url":null,"abstract":"<div><div>Teaching robots dexterous skills from human videos remains challenging due to the reliance on low-level trajectory imitation, which fails to generalize across object types, spatial layouts, and manipulator configurations. We propose Graph-Fused Vision-Language-Action (GF-VLA), a framework that enables dual-arm robotic systems to perform task-level reasoning and execution directly from RGB(-D) human demonstrations. GF-VLA first extracts Shannon-information-based cues to identify hands and objects with the highest task relevance, then encodes these cues into temporally ordered scene graphs that capture both hand-object and object-object interactions. These graphs are fused with a language-conditioned transformer that generates hierarchical behavior trees and interpretable Cartesian motion commands. To improve execution efficiency in bimanual settings, we further introduce a cross-hand selection policy that infers optimal gripper assignment without explicit geometric reasoning. We evaluate GF-VLA on four structured dual-arm block assembly tasks involving symbolic shape construction and spatial generalization. Experimental results show that the information-theoretic scene representation achieves over 95% graph accuracy and 93% subtask segmentation, supporting the LLM planner in generating reliable and human-readable task policies. When executed by the dual-arm robot, these policies yield 94% grasp success, 89% placement accuracy, and 90% overall task success across stacking, letter-building, and geometric reconfiguration scenarios, demonstrating strong generalization and robustness across diverse spatial and semantic variations.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104193"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146072488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-29DOI: 10.1016/j.inffus.2026.104195
Hulin Kuang , Bin Hu , Shuai Yang , Dongcui Wang , Guanghua Luo , Weihua Liao , Wu Qiu , Shulin Liu , Jianxin Wang
Acute ischemic stroke (AIS) outcome prediction is crucial for treatment decisions. However, AIS outcome prediction is challenging due to the combined influence of lesion characteristics, vascular status, and other health conditions. In this study, we introduce a vision-language model with a Siamese bilateral difference network and a text-guided image feature enhancement module for predicting AIS outcome (e.g., modified Rankin Scale, mRS) on CT angiography. In the Siamese bilateral difference network, based on fine-tuning the foundation model LVM-Med, we design an interactive Transformer fine-tuning encoder and a vision question answering guided bilateral difference awareness module, which generates bilateral difference text via image-text pair question answering as a prompt to enhance the extracted brain vascular difference features. Additionally, in the text-guided image feature enhancement module, we propose a text feature extraction module to extract patient phrase-level and inter-phrase embeddings from clinical notes, and employ a multi-scale image-text interaction module to obtain fine-grained phrase-enhanced image attention feature and coarse-grained phrase context-aware image attention feature. We validate our model on the public ISLES2024 dataset, a private dataset A, and an external AIS dataset. It achieves accuracies of 81.11%, 83.05%, and 80.00% and AUCs of 80.06%, 85.48% and 82.62% for 90-day mRS prediction on the 3 datasets, respectively, outperforming several state-of-the-art methods and demonstrating its generalization ability. Moreover, the proposed method can be effectively extended to glaucoma visual field progression prediction, which is also related to vascular differences and clinical notes.
{"title":"Vision-language model with siamese bilateral difference network and text-guided image feature enhancement for acute ischemic stroke outcome prediction on CT angiography","authors":"Hulin Kuang , Bin Hu , Shuai Yang , Dongcui Wang , Guanghua Luo , Weihua Liao , Wu Qiu , Shulin Liu , Jianxin Wang","doi":"10.1016/j.inffus.2026.104195","DOIUrl":"10.1016/j.inffus.2026.104195","url":null,"abstract":"<div><div>Acute ischemic stroke (AIS) outcome prediction is crucial for treatment decisions. However, AIS outcome prediction is challenging due to the combined influence of lesion characteristics, vascular status, and other health conditions. In this study, we introduce a vision-language model with a Siamese bilateral difference network and a text-guided image feature enhancement module for predicting AIS outcome (e.g., modified Rankin Scale, mRS) on CT angiography. In the Siamese bilateral difference network, based on fine-tuning the foundation model LVM-Med, we design an interactive Transformer fine-tuning encoder and a vision question answering guided bilateral difference awareness module, which generates bilateral difference text via image-text pair question answering as a prompt to enhance the extracted brain vascular difference features. Additionally, in the text-guided image feature enhancement module, we propose a text feature extraction module to extract patient phrase-level and inter-phrase embeddings from clinical notes, and employ a multi-scale image-text interaction module to obtain fine-grained phrase-enhanced image attention feature and coarse-grained phrase context-aware image attention feature. We validate our model on the public ISLES2024 dataset, a private dataset A, and an external AIS dataset. It achieves accuracies of 81.11%, 83.05%, and 80.00% and AUCs of 80.06%, 85.48% and 82.62% for 90-day mRS prediction on the 3 datasets, respectively, outperforming several state-of-the-art methods and demonstrating its generalization ability. Moreover, the proposed method can be effectively extended to glaucoma visual field progression prediction, which is also related to vascular differences and clinical notes.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104195"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146072487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-16DOI: 10.1016/j.inffus.2026.104155
Daniel M. Jimenez-Gutierrez , Yelizaveta Falkouskaya , José L. Hernandez-Ramos , Aris Anagnostopoulos , Ioannis Chatzigiannakis , Andrea Vitaletti
Federated Learning (FL) is an emerging distributed machine learning paradigm enabling multiple clients to train a global model collaboratively without sharing their raw data. While FL enhances data privacy by design, it remains vulnerable to various security and privacy threats. This survey provides a comprehensive overview of 203 papers regarding the state-of-the-art attacks and defense mechanisms developed to address these challenges, categorizing them into security-enhancing and privacy-preserving techniques. Security-enhancing methods aim to improve FL robustness against malicious behaviors such as byzantine attacks, poisoning, and Sybil attacks. At the same time, privacy-preserving techniques focus on protecting sensitive data through cryptographic approaches, differential privacy, and secure aggregation. We critically analyze the strengths and limitations of existing methods, highlight the trade-offs between privacy, security, and model performance, and discuss the implications of non-IID data distributions on the effectiveness of these defenses. Furthermore, we identify open research challenges and future directions, including the need for scalable, adaptive, and energy-efficient solutions operating in dynamic and heterogeneous FL environments. Our survey aims to guide researchers and practitioners in developing robust and privacy-preserving FL systems, fostering advancements safeguarding collaborative learning frameworks’ integrity and confidentiality.
{"title":"On the security and privacy of federated learning: A survey with attacks, defenses, frameworks, applications, and future directions","authors":"Daniel M. Jimenez-Gutierrez , Yelizaveta Falkouskaya , José L. Hernandez-Ramos , Aris Anagnostopoulos , Ioannis Chatzigiannakis , Andrea Vitaletti","doi":"10.1016/j.inffus.2026.104155","DOIUrl":"10.1016/j.inffus.2026.104155","url":null,"abstract":"<div><div>Federated Learning (FL) is an emerging distributed machine learning paradigm enabling multiple clients to train a global model collaboratively without sharing their raw data. While FL enhances data privacy by design, it remains vulnerable to various security and privacy threats. This survey provides a comprehensive overview of 203 papers regarding the state-of-the-art attacks and defense mechanisms developed to address these challenges, categorizing them into security-enhancing and privacy-preserving techniques. Security-enhancing methods aim to improve FL robustness against malicious behaviors such as byzantine attacks, poisoning, and Sybil attacks. At the same time, privacy-preserving techniques focus on protecting sensitive data through cryptographic approaches, differential privacy, and secure aggregation. We critically analyze the strengths and limitations of existing methods, highlight the trade-offs between privacy, security, and model performance, and discuss the implications of non-IID data distributions on the effectiveness of these defenses. Furthermore, we identify open research challenges and future directions, including the need for scalable, adaptive, and energy-efficient solutions operating in dynamic and heterogeneous FL environments. Our survey aims to guide researchers and practitioners in developing robust and privacy-preserving FL systems, fostering advancements safeguarding collaborative learning frameworks’ integrity and confidentiality.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104155"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-18DOI: 10.1016/j.inffus.2026.104124
Gabriel Oduori , Chaira Cocco , Payam Sajadi , Francesco Pilla
Data fusion (DF) addresses the challenge of integrating heterogeneous data sources to improve decision-making and inference. Although DF has been widely explored, no prior systematic review has specifically focused on its application to low-cost sensor (LCS) data in environmental monitoring. To address this gap, we conduct a systematic literature review (SLR) following the PRISMA framework, synthesising findings from 82 peer-reviewed articles. The review addresses three key questions: (1) What fusion methodologies are employed in conjunction with LCS data? (2) In what environmental contexts are these methods applied? (3) What are the methodological challenges and research gaps? Our analysis reveals that geostatistical and machine learning approaches dominate current practice, with air quality monitoring emerging as the primary application domain. Additionally, artificial intelligence (AI)-based methods are increasingly used to integrate spatial, temporal, and multimodal data. However, limitations persist in uncertainty quantification, validation standards, and the generalisability of fusion frameworks. This review provides a comprehensive synthesis of current techniques and outlines key directions for future research, including the development of robust, uncertainty-aware fusion methods and broader application to less-studied environmental variables.
{"title":"Data fusion for low-cost sensors: A systematic literature review","authors":"Gabriel Oduori , Chaira Cocco , Payam Sajadi , Francesco Pilla","doi":"10.1016/j.inffus.2026.104124","DOIUrl":"10.1016/j.inffus.2026.104124","url":null,"abstract":"<div><div>Data fusion (DF) addresses the challenge of integrating heterogeneous data sources to improve decision-making and inference. Although DF has been widely explored, no prior systematic review has specifically focused on its application to low-cost sensor (LCS) data in environmental monitoring. To address this gap, we conduct a systematic literature review (SLR) following the PRISMA framework, synthesising findings from 82 peer-reviewed articles. The review addresses three key questions: (1) What fusion methodologies are employed in conjunction with LCS data? (2) In what environmental contexts are these methods applied? (3) What are the methodological challenges and research gaps? Our analysis reveals that geostatistical and machine learning approaches dominate current practice, with air quality monitoring emerging as the primary application domain. Additionally, artificial intelligence (AI)-based methods are increasingly used to integrate spatial, temporal, and multimodal data. However, limitations persist in uncertainty quantification, validation standards, and the generalisability of fusion frameworks. This review provides a comprehensive synthesis of current techniques and outlines key directions for future research, including the development of robust, uncertainty-aware fusion methods and broader application to less-studied environmental variables.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104124"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-29DOI: 10.1016/j.inffus.2026.104198
Dawei Zhang , Chenglin Sang , Tianyi Lyu
Knee osteoarthritis (KOA) is a common degenerative joint disease, and accurate diagnosis and severity grading are crucial for effective treatment. At present, although deep learning techniques based on X-rays or magnetic resonance imaging (MRI) have greatly improved diagnostic accuracy, two-dimensional images often cannot fully capture the complex three-dimensional morphology and texture changes related to KOA. To address these challenges, we propose a shape aware osteoarthritis diagnostic network, which is a novel bidirectional cross modal fusion framework that integrates 3D point clouds and MRI sequences. This framework consists of three parts: (1) a local relation aware dynamic graph convolutional neural network (CNN) used to extract complex geometric features from point clouds representing the surfaces of knee joint bones and cartilage; (2) For MRI sequences, a sequence aggregation method was adopted, which combines 2D CNN for spatial feature extraction and self-attention mechanism for cross slice sequences. (3) The bidirectional transmembrane fusion module is capable of conducting in-depth interactive feature learning between the geometric domain of point clouds and the texture spatiotemporal domain of MRI, enabling these two modes to improve and enhance each other’s representations. Extensive experiments conducted on a large cohort of osteoarthritis initiatives (OAI) have shown that our model achieves state-of-the-art performance. Its accuracy in the challenging 5-level Kellgren Lawrence (KL) classification is 0.73, which represents a improvement of approximately 23.7% over the 0.59 achieved by using 3D shape features alone in the ShapeMed-Knee benchmark. Furthermore, its AUC in binary OA diagnosis is 0.95, significantly better than existing unimodal and multimodal baselines.
膝关节骨性关节炎(KOA)是一种常见的退行性关节疾病,准确的诊断和严重程度分级是有效治疗的关键。目前,尽管基于x射线或磁共振成像(MRI)的深度学习技术大大提高了诊断准确性,但二维图像往往不能完全捕捉到与KOA相关的复杂的三维形态和纹理变化。为了解决这些挑战,我们提出了一个形状感知骨关节炎诊断网络,这是一个新的双向交叉模态融合框架,集成了3D点云和MRI序列。该框架由三部分组成:(1)局部关系感知的动态图卷积神经网络(CNN)用于从代表膝关节骨骼和软骨表面的点云中提取复杂几何特征;(2)对于MRI序列,采用序列聚合方法,结合二维CNN的空间特征提取和横切面序列的自关注机制。(3)双向跨膜融合模块能够在点云的几何域和MRI的纹理时空域之间进行深度的交互特征学习,使这两种模式能够相互改进和增强表征。在骨关节炎倡议(OAI)的大量队列中进行的广泛实验表明,我们的模型达到了最先进的性能。在具有挑战性的5级Kellgren Lawrence (KL)分类中,它的准确率为0.73,比在ShapeMed-Knee基准中单独使用3D形状特征获得的0.59提高了约23.7%。诊断二元OA的AUC为0.95,明显优于现有的单峰和多峰基线。
{"title":"Shape-aware osteoarthritis network: Bidirectional fusion of MRI and 3D point clouds for knee osteoarthritis diagnosis","authors":"Dawei Zhang , Chenglin Sang , Tianyi Lyu","doi":"10.1016/j.inffus.2026.104198","DOIUrl":"10.1016/j.inffus.2026.104198","url":null,"abstract":"<div><div>Knee osteoarthritis (KOA) is a common degenerative joint disease, and accurate diagnosis and severity grading are crucial for effective treatment. At present, although deep learning techniques based on X-rays or magnetic resonance imaging (MRI) have greatly improved diagnostic accuracy, two-dimensional images often cannot fully capture the complex three-dimensional morphology and texture changes related to KOA. To address these challenges, we propose a shape aware osteoarthritis diagnostic network, which is a novel bidirectional cross modal fusion framework that integrates 3D point clouds and MRI sequences. This framework consists of three parts: (1) a local relation aware dynamic graph convolutional neural network (CNN) used to extract complex geometric features from point clouds representing the surfaces of knee joint bones and cartilage; (2) For MRI sequences, a sequence aggregation method was adopted, which combines 2D CNN for spatial feature extraction and self-attention mechanism for cross slice sequences. (3) The bidirectional transmembrane fusion module is capable of conducting in-depth interactive feature learning between the geometric domain of point clouds and the texture spatiotemporal domain of MRI, enabling these two modes to improve and enhance each other’s representations. Extensive experiments conducted on a large cohort of osteoarthritis initiatives (OAI) have shown that our model achieves state-of-the-art performance. Its accuracy in the challenging 5-level Kellgren Lawrence (KL) classification is 0.73, which represents a improvement of approximately 23.7% over the 0.59 achieved by using 3D shape features alone in the ShapeMed-Knee benchmark. Furthermore, its AUC in binary OA diagnosis is 0.95, significantly better than existing unimodal and multimodal baselines.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104198"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146072486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-23DOI: 10.1016/j.inffus.2026.104174
Zhijing Huang , Wen-Jue He , Baotian Hu, Zheng Zhang
Due to its strong capacity for integrating heterogeneous multi-source information, multimodal sentiment analysis (MSA) has achieved remarkable progress in affective computing. However, existing methods typically adopt symmetric fusion strategies that treat all modalities equally, overlooking their inherent performance disparities that some modalities excel at discriminative representation, while others carry underutilized supportive cues. This limitation leads to insufficiency in cross-modal complementary correlation exploration. To address this issue, we propose a novel Grading-Inspired Complementary Enhancing (GCE) framework for MSA, which is one of the first attempts to conduct dynamic assessment for knowledge transfer in progressive multimodal fusion and cooperation. Specifically, based on cross-modal interaction, a task-aware grading mechanism categorizes modality-pair associations into dominant (high-performing) and supplementary (low-performing) branches according to their task performance. Accordingly, a relation filtering module selectively identifies the trustworthy information from the dominant branch to enhance consistency exploration in supplementary modality pairs with minimized redundancy. Afterwards, a weight adaptation module is adopted to dynamically adjust the guiding weight of individual samples for adaptability and generalization. Extensive experiments conducted on three benchmark datasets evidence that our proposed GCE approach can outperform the state-of-the-art MSA methods. Our code is available at https://github.com/hka-7/GCEforMSA.
{"title":"Grading-inspired complementary enhancing for multimodal sentiment analysis","authors":"Zhijing Huang , Wen-Jue He , Baotian Hu, Zheng Zhang","doi":"10.1016/j.inffus.2026.104174","DOIUrl":"10.1016/j.inffus.2026.104174","url":null,"abstract":"<div><div>Due to its strong capacity for integrating heterogeneous multi-source information, multimodal sentiment analysis (MSA) has achieved remarkable progress in affective computing. However, existing methods typically adopt symmetric fusion strategies that treat all modalities equally, overlooking their inherent performance disparities that some modalities excel at discriminative representation, while others carry underutilized supportive cues. This limitation leads to insufficiency in cross-modal complementary correlation exploration. To address this issue, we propose a novel Grading-Inspired Complementary Enhancing (GCE) framework for MSA, which is one of the first attempts to conduct dynamic assessment for knowledge transfer in progressive multimodal fusion and cooperation. Specifically, based on cross-modal interaction, a task-aware grading mechanism categorizes modality-pair associations into dominant (high-performing) and supplementary (low-performing) branches according to their task performance. Accordingly, a relation filtering module selectively identifies the trustworthy information from the dominant branch to enhance consistency exploration in supplementary modality pairs with minimized redundancy. Afterwards, a weight adaptation module is adopted to dynamically adjust the guiding weight of individual samples for adaptability and generalization. Extensive experiments conducted on three benchmark datasets evidence that our proposed GCE approach can outperform the state-of-the-art MSA methods. Our code is available at <span><span>https://github.com/hka-7/GCEforMSA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104174"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-20DOI: 10.1016/j.inffus.2026.104165
Shengyingjie Liu , Jianxin Li , Qian Wan , Bo He , Zhijun Huang , Qing Li
Programming education is essential for equipping individuals with digital literacy skills and developing the problem-solving abilities necessary for success in the modern workforce. In online programming tutoring systems, knowledge tracing (KT) techniques are crucial for programming prediction, as they monitor user performance and model user cognition. However, both universal and programming-specific knowledge transfer methods depend on traditional state-driven paradigms that indirectly predict programming outcomes based on users’ knowledge states. It does not align with the core objective of programming prediction, which is to determine whether submitted code can solve the question. To address this, we present the code-driven feature fusion KT (CFKT), which integrates large language models (LLM) and encoders for both individualized and common code features. It consists of two modules: pass prediction and code prediction. The pass prediction module leverages LLM to incorporate semantic information from the question and code through embedding, extracting key features that determine code correctness through proxy tasks and effectively narrowing the solution space with vectorization. The code prediction module integrates user historical data and data from other users through feature fusion blocks, allowing for accurate predictions of submitted code and effectively mitigating the cold start problem. Experiments on multiple real-world public programming datasets demonstrate that CFKT significantly outperforms existing baseline methods.
{"title":"Code-driven programming prediction enhanced by LLM with a feature fusion approach","authors":"Shengyingjie Liu , Jianxin Li , Qian Wan , Bo He , Zhijun Huang , Qing Li","doi":"10.1016/j.inffus.2026.104165","DOIUrl":"10.1016/j.inffus.2026.104165","url":null,"abstract":"<div><div>Programming education is essential for equipping individuals with digital literacy skills and developing the problem-solving abilities necessary for success in the modern workforce. In online programming tutoring systems, knowledge tracing (KT) techniques are crucial for programming prediction, as they monitor user performance and model user cognition. However, both universal and programming-specific knowledge transfer methods depend on traditional state-driven paradigms that indirectly predict programming outcomes based on users’ knowledge states. It does not align with the core objective of programming prediction, which is to determine whether submitted code can solve the question. To address this, we present the code-driven feature fusion KT (CFKT), which integrates large language models (LLM) and encoders for both individualized and common code features. It consists of two modules: pass prediction and code prediction. The pass prediction module leverages LLM to incorporate semantic information from the question and code through embedding, extracting key features that determine code correctness through proxy tasks and effectively narrowing the solution space with vectorization. The code prediction module integrates user historical data and data from other users through feature fusion blocks, allowing for accurate predictions of submitted code and effectively mitigating the cold start problem. Experiments on multiple real-world public programming datasets demonstrate that CFKT significantly outperforms existing baseline methods.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104165"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146014543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-30DOI: 10.1016/j.inffus.2026.104194
Yunlong He, Fei Chen, Hanlin Zhang, Jia Yu
When personalized federated learning meets crowdsourced label annotation, it can potentially form a complete ecosystem from large-scale data labeling, through model training in massive devices, toward flexible service for diverse end users. Actually, most common crowdsourced annotators can hardly follow a uniform annotation regulation and make the annotations in their own way. Even though they can share the cognitive consistency on the perception, the label annotation can still be expressed in various ways. This situation can be specifically serious in the federated learning scenario, in which the diverse label expressions are always kept locally in distributed clients for privacy concerns and can hardly be unified. In this work, we are motivated to propose CrowdFed, a systematic solution for crowdsourced federated learning systems with underlying label representation skew issue. Specifically, the global model is trained through federated learning for global categorical alignment, and the personalized layers are learned through an auxiliary network in each client for local representation alignment. Furthermore, a category-level similarity matching strategy is presented for the alignment of inconsistent label representations between the local category and the global category. Evaluated by four benchmark datasets, our proposed strategy proves its superiority in terms of system efficiency and cost.
{"title":"Crowdsourced federated learning with inconsistent label representation","authors":"Yunlong He, Fei Chen, Hanlin Zhang, Jia Yu","doi":"10.1016/j.inffus.2026.104194","DOIUrl":"10.1016/j.inffus.2026.104194","url":null,"abstract":"<div><div>When personalized federated learning meets crowdsourced label annotation, it can potentially form a complete ecosystem from large-scale data labeling, through model training in massive devices, toward flexible service for diverse end users. Actually, most common crowdsourced annotators can hardly follow a uniform annotation regulation and make the annotations in their own way. Even though they can share the cognitive consistency on the perception, the label annotation can still be expressed in various ways. This situation can be specifically serious in the federated learning scenario, in which the diverse label expressions are always kept locally in distributed clients for privacy concerns and can hardly be unified. In this work, we are motivated to propose CrowdFed, a systematic solution for crowdsourced federated learning systems with underlying label representation skew issue. Specifically, the global model is trained through federated learning for global categorical alignment, and the personalized layers are learned through an auxiliary network in each client for local representation alignment. Furthermore, a category-level similarity matching strategy is presented for the alignment of inconsistent label representations between the local category and the global category. Evaluated by four benchmark datasets, our proposed strategy proves its superiority in terms of system efficiency and cost.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104194"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146089494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-17DOI: 10.1016/j.inffus.2026.104157
Xunqi Zhou , Zhenqi Zhang , Zifeng Wu , Qianming Wang , Jing Teng , Jinlong Liu , Yongjie Zhai
In intelligent vehicle damage assessment, component recognition faces challenges such as significant intra-class variability and minimal inter-class differences, which hinder detection, as well as occlusions and ambiguous boundaries, which complicate segmentation. We generalize these problems into three core aspects: inter-object relational modeling, semantic-detail information balancing, and occlusion-aware decoupling. To this end, we propose the Adaptive Regularized Topological Segmentation (ARTSeg) network, comprising three complementary modules: Inter-Class Graph Constraint (ICGC), Constrained Detail Feature Backtracking (CDFB), and Topological Decoupling Segmentation (TDS). Each module is purposefully designed, integrated in a progressive structure, and synergistically reinforces the others to enhance overall performance. Specifically, ICGC clusters intra-class features and establishes implicit topological constraints among categories during feature extraction, enabling the model to better capture inter-class relationships and improve detection representation. Subsequently, CDFB evaluates the impact of channel-wise feature information within each candidate region on segmentation accuracy and computational cost, dynamically selecting appropriate feature resolutions for individual instances while balancing the demands of detection and segmentation tasks. Finally, TDS introduces topological associations between occluded and occluding regions at the feature level and decouples them at the task level, explicitly modeling generalized occlusion regions and enhancing segmentation performance. We quantitatively and qualitatively evaluate ARTSeg on a 59-category vehicle component dataset constructed for insurance damage assessment, achieving notable improvements in addressing the aforementioned problems. Experiments on two public datasets, DSMLR and Carparts, further validate the generalization capability of the proposed method. Results indicate that ARTSeg provides practical guidance for component recognition in intelligent vehicle damage assessment.
{"title":"An adaptive regularized topological segmentation network integrating inter-class relations and occlusion information for vehicle component recognition","authors":"Xunqi Zhou , Zhenqi Zhang , Zifeng Wu , Qianming Wang , Jing Teng , Jinlong Liu , Yongjie Zhai","doi":"10.1016/j.inffus.2026.104157","DOIUrl":"10.1016/j.inffus.2026.104157","url":null,"abstract":"<div><div>In intelligent vehicle damage assessment, component recognition faces challenges such as significant intra-class variability and minimal inter-class differences, which hinder detection, as well as occlusions and ambiguous boundaries, which complicate segmentation. We generalize these problems into three core aspects: inter-object relational modeling, semantic-detail information balancing, and occlusion-aware decoupling. To this end, we propose the Adaptive Regularized Topological Segmentation (ARTSeg) network, comprising three complementary modules: Inter-Class Graph Constraint (ICGC), Constrained Detail Feature Backtracking (CDFB), and Topological Decoupling Segmentation (TDS). Each module is purposefully designed, integrated in a progressive structure, and synergistically reinforces the others to enhance overall performance. Specifically, ICGC clusters intra-class features and establishes implicit topological constraints among categories during feature extraction, enabling the model to better capture inter-class relationships and improve detection representation. Subsequently, CDFB evaluates the impact of channel-wise feature information within each candidate region on segmentation accuracy and computational cost, dynamically selecting appropriate feature resolutions for individual instances while balancing the demands of detection and segmentation tasks. Finally, TDS introduces topological associations between occluded and occluding regions at the feature level and decouples them at the task level, explicitly modeling generalized occlusion regions and enhancing segmentation performance. We quantitatively and qualitatively evaluate ARTSeg on a 59-category vehicle component dataset constructed for insurance damage assessment, achieving notable improvements in addressing the aforementioned problems. Experiments on two public datasets, DSMLR and Carparts, further validate the generalization capability of the proposed method. Results indicate that ARTSeg provides practical guidance for component recognition in intelligent vehicle damage assessment.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104157"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}