首页 > 最新文献

Information Fusion最新文献

英文 中文
Multimodal spatio-temporal fusion: A generalizable GCN-LSTM with attention framework for urban application 多模态时空融合:一个具有城市应用关注框架的广义GCN-LSTM
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-20 DOI: 10.1016/j.inffus.2026.104164
Yunfei Guo
The proliferation of urban big data presents unprecedented opportunities for understanding cities, yet the analytical methods to harness this data are often fragmented and domain-specific. Existing predictive models in urban computing are typically highly specialized, creating analytical silos that inhibit knowledge transfer and are difficult to adapt across domains such as public safety, housing and transport. This paper confronts this critical gap by developing a generalizable, multimodal spatio-temporal deep learning framework engineered for both high predictive performance and interpretability, which is capable of mastering diverse urban prediction tasks without architectural modification. The hybrid architecture fuses a Multi-Head Graph Convolutional Network (GCN) for spatial diffusion, a Long Short-Term Memory (LSTM) network for temporal dynamics, and a learnable Gating Mechanism that weights the influence of spatial graph versus static external features. To validate this generalizability, the framework was tested on three distinct urban domains in London: crime forecasting, housing price estimation and transport network demand. The model outperformed traditional baselines (ARIMA, XGBoost) and state-of-the-art deep learning models (TabNet, TFT). Moreover, the framework moves beyond prediction to explanation by incorporating attention mechanisms and permutation feature importance analysis.
城市大数据的激增为理解城市提供了前所未有的机会,然而利用这些数据的分析方法往往是碎片化的,并且是特定于领域的。城市计算中现有的预测模型通常是高度专业化的,造成了分析孤岛,阻碍了知识的转移,并且难以跨公共安全、住房和交通等领域进行适应。本文通过开发具有高预测性能和可解释性的可推广的多模态时空深度学习框架来解决这一关键差距,该框架能够在不修改架构的情况下掌握各种城市预测任务。该混合架构融合了用于空间扩散的多头图卷积网络(GCN),用于时间动态的长短期记忆(LSTM)网络,以及用于权衡空间图与静态外部特征影响的可学习门控制机制。为了验证这种普遍性,该框架在伦敦三个不同的城市领域进行了测试:犯罪预测、房价估计和交通网络需求。该模型优于传统的基线(ARIMA、XGBoost)和最先进的深度学习模型(TabNet、TFT)。此外,该框架通过结合注意机制和排列特征重要性分析,从预测走向解释。
{"title":"Multimodal spatio-temporal fusion: A generalizable GCN-LSTM with attention framework for urban application","authors":"Yunfei Guo","doi":"10.1016/j.inffus.2026.104164","DOIUrl":"10.1016/j.inffus.2026.104164","url":null,"abstract":"<div><div>The proliferation of urban big data presents unprecedented opportunities for understanding cities, yet the analytical methods to harness this data are often fragmented and domain-specific. Existing predictive models in urban computing are typically highly specialized, creating analytical silos that inhibit knowledge transfer and are difficult to adapt across domains such as public safety, housing and transport. This paper confronts this critical gap by developing a generalizable, multimodal spatio-temporal deep learning framework engineered for both high predictive performance and interpretability, which is capable of mastering diverse urban prediction tasks without architectural modification. The hybrid architecture fuses a Multi-Head Graph Convolutional Network (GCN) for spatial diffusion, a Long Short-Term Memory (LSTM) network for temporal dynamics, and a learnable Gating Mechanism that weights the influence of spatial graph versus static external features. To validate this generalizability, the framework was tested on three distinct urban domains in London: crime forecasting, housing price estimation and transport network demand. The model outperformed traditional baselines (ARIMA, XGBoost) and state-of-the-art deep learning models (TabNet, TFT). Moreover, the framework moves beyond prediction to explanation by incorporating attention mechanisms and permutation feature importance analysis.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104164"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146014809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Information-theoretic graph fusion with vision-language-action model for policy reasoning and dual robotic control 基于视觉语言-动作模型的信息图融合策略推理与双机器人控制
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-29 DOI: 10.1016/j.inffus.2026.104193
Shunlei Li , Longsen Gao , Jin Wang , Chang Che , Xi Xiao , Jiuwen Cao , Yingbai Hu , Hamid Reza Karimi
Teaching robots dexterous skills from human videos remains challenging due to the reliance on low-level trajectory imitation, which fails to generalize across object types, spatial layouts, and manipulator configurations. We propose Graph-Fused Vision-Language-Action (GF-VLA), a framework that enables dual-arm robotic systems to perform task-level reasoning and execution directly from RGB(-D) human demonstrations. GF-VLA first extracts Shannon-information-based cues to identify hands and objects with the highest task relevance, then encodes these cues into temporally ordered scene graphs that capture both hand-object and object-object interactions. These graphs are fused with a language-conditioned transformer that generates hierarchical behavior trees and interpretable Cartesian motion commands. To improve execution efficiency in bimanual settings, we further introduce a cross-hand selection policy that infers optimal gripper assignment without explicit geometric reasoning. We evaluate GF-VLA on four structured dual-arm block assembly tasks involving symbolic shape construction and spatial generalization. Experimental results show that the information-theoretic scene representation achieves over 95% graph accuracy and 93% subtask segmentation, supporting the LLM planner in generating reliable and human-readable task policies. When executed by the dual-arm robot, these policies yield 94% grasp success, 89% placement accuracy, and 90% overall task success across stacking, letter-building, and geometric reconfiguration scenarios, demonstrating strong generalization and robustness across diverse spatial and semantic variations.
从人类视频中教授机器人灵巧技能仍然具有挑战性,因为它依赖于低水平的轨迹模仿,而这种模仿无法概括对象类型、空间布局和机械手配置。我们提出了图形融合视觉语言动作(GF-VLA)框架,该框架使双臂机器人系统能够直接从RGB(d)人类演示中执行任务级推理和执行。GF-VLA首先提取基于香农信息的线索来识别具有最高任务相关性的手和物体,然后将这些线索编码成捕捉手-物体和物体-物体相互作用的时序场景图。这些图形与一个语言条件转换器融合在一起,该转换器生成分层行为树和可解释的笛卡尔运动命令。为了提高双手操作的执行效率,我们进一步引入了一种交叉手选择策略,该策略可以在没有显式几何推理的情况下推断出最佳的抓手分配。我们评估了GF-VLA在包含符号形状构建和空间概化的四种结构化双臂块组装任务上的效果。实验结果表明,基于信息论的场景表示实现了95%以上的图准确率和93%以上的子任务分割,支持LLM规划器生成可靠的、人类可读的任务策略。当由双臂机器人执行时,这些策略在堆叠、字母构建和几何重构场景中获得了94%的抓取成功率、89%的放置准确性和90%的总体任务成功率,在不同的空间和语义变化中表现出强大的泛化和鲁棒性。
{"title":"Information-theoretic graph fusion with vision-language-action model for policy reasoning and dual robotic control","authors":"Shunlei Li ,&nbsp;Longsen Gao ,&nbsp;Jin Wang ,&nbsp;Chang Che ,&nbsp;Xi Xiao ,&nbsp;Jiuwen Cao ,&nbsp;Yingbai Hu ,&nbsp;Hamid Reza Karimi","doi":"10.1016/j.inffus.2026.104193","DOIUrl":"10.1016/j.inffus.2026.104193","url":null,"abstract":"<div><div>Teaching robots dexterous skills from human videos remains challenging due to the reliance on low-level trajectory imitation, which fails to generalize across object types, spatial layouts, and manipulator configurations. We propose Graph-Fused Vision-Language-Action (GF-VLA), a framework that enables dual-arm robotic systems to perform task-level reasoning and execution directly from RGB(-D) human demonstrations. GF-VLA first extracts Shannon-information-based cues to identify hands and objects with the highest task relevance, then encodes these cues into temporally ordered scene graphs that capture both hand-object and object-object interactions. These graphs are fused with a language-conditioned transformer that generates hierarchical behavior trees and interpretable Cartesian motion commands. To improve execution efficiency in bimanual settings, we further introduce a cross-hand selection policy that infers optimal gripper assignment without explicit geometric reasoning. We evaluate GF-VLA on four structured dual-arm block assembly tasks involving symbolic shape construction and spatial generalization. Experimental results show that the information-theoretic scene representation achieves over 95% graph accuracy and 93% subtask segmentation, supporting the LLM planner in generating reliable and human-readable task policies. When executed by the dual-arm robot, these policies yield 94% grasp success, 89% placement accuracy, and 90% overall task success across stacking, letter-building, and geometric reconfiguration scenarios, demonstrating strong generalization and robustness across diverse spatial and semantic variations.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104193"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146072488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vision-language model with siamese bilateral difference network and text-guided image feature enhancement for acute ischemic stroke outcome prediction on CT angiography 基于Siamese双侧差异网络和文本引导图像特征增强的视觉语言模型在急性缺血性卒中CT血管造影预后预测中的应用
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-29 DOI: 10.1016/j.inffus.2026.104195
Hulin Kuang , Bin Hu , Shuai Yang , Dongcui Wang , Guanghua Luo , Weihua Liao , Wu Qiu , Shulin Liu , Jianxin Wang
Acute ischemic stroke (AIS) outcome prediction is crucial for treatment decisions. However, AIS outcome prediction is challenging due to the combined influence of lesion characteristics, vascular status, and other health conditions. In this study, we introduce a vision-language model with a Siamese bilateral difference network and a text-guided image feature enhancement module for predicting AIS outcome (e.g., modified Rankin Scale, mRS) on CT angiography. In the Siamese bilateral difference network, based on fine-tuning the foundation model LVM-Med, we design an interactive Transformer fine-tuning encoder and a vision question answering guided bilateral difference awareness module, which generates bilateral difference text via image-text pair question answering as a prompt to enhance the extracted brain vascular difference features. Additionally, in the text-guided image feature enhancement module, we propose a text feature extraction module to extract patient phrase-level and inter-phrase embeddings from clinical notes, and employ a multi-scale image-text interaction module to obtain fine-grained phrase-enhanced image attention feature and coarse-grained phrase context-aware image attention feature. We validate our model on the public ISLES2024 dataset, a private dataset A, and an external AIS dataset. It achieves accuracies of 81.11%, 83.05%, and 80.00% and AUCs of 80.06%, 85.48% and 82.62% for 90-day mRS prediction on the 3 datasets, respectively, outperforming several state-of-the-art methods and demonstrating its generalization ability. Moreover, the proposed method can be effectively extended to glaucoma visual field progression prediction, which is also related to vascular differences and clinical notes.
急性缺血性卒中(AIS)预后预测对治疗决策至关重要。然而,由于病变特征、血管状态和其他健康状况的综合影响,AIS的预后预测具有挑战性。在本研究中,我们引入了一种带有Siamese双侧差异网络和文本引导图像特征增强模块的视觉语言模型,用于预测CT血管成像的AIS结果(例如,修改的Rankin量表,mRS)。在Siamese双侧差异网络中,基于基础模型LVM-Med的微调,设计了互动式Transformer微调编码器和视觉问答引导双侧差异感知模块,通过图像-文本对问答提示生成双侧差异文本,增强提取的脑血管差异特征。此外,在文本引导的图像特征增强模块中,我们提出了文本特征提取模块,从临床笔记中提取患者短语级和短语间嵌入,并采用多尺度图像-文本交互模块,获得细粒度的短语增强图像关注特征和粗粒度的短语上下文感知图像关注特征。我们在公共ISLES2024数据集、私有数据集a和外部AIS数据集上验证了我们的模型。该方法在3个数据集上的90天mRS预测准确率分别为81.11%、83.05%和80.00%,auc分别为80.06%、85.48%和82.62%,优于几种最先进的方法,显示了其泛化能力。此外,该方法可有效推广到青光眼视野进展预测,这也与血管差异和临床注意事项有关。
{"title":"Vision-language model with siamese bilateral difference network and text-guided image feature enhancement for acute ischemic stroke outcome prediction on CT angiography","authors":"Hulin Kuang ,&nbsp;Bin Hu ,&nbsp;Shuai Yang ,&nbsp;Dongcui Wang ,&nbsp;Guanghua Luo ,&nbsp;Weihua Liao ,&nbsp;Wu Qiu ,&nbsp;Shulin Liu ,&nbsp;Jianxin Wang","doi":"10.1016/j.inffus.2026.104195","DOIUrl":"10.1016/j.inffus.2026.104195","url":null,"abstract":"<div><div>Acute ischemic stroke (AIS) outcome prediction is crucial for treatment decisions. However, AIS outcome prediction is challenging due to the combined influence of lesion characteristics, vascular status, and other health conditions. In this study, we introduce a vision-language model with a Siamese bilateral difference network and a text-guided image feature enhancement module for predicting AIS outcome (e.g., modified Rankin Scale, mRS) on CT angiography. In the Siamese bilateral difference network, based on fine-tuning the foundation model LVM-Med, we design an interactive Transformer fine-tuning encoder and a vision question answering guided bilateral difference awareness module, which generates bilateral difference text via image-text pair question answering as a prompt to enhance the extracted brain vascular difference features. Additionally, in the text-guided image feature enhancement module, we propose a text feature extraction module to extract patient phrase-level and inter-phrase embeddings from clinical notes, and employ a multi-scale image-text interaction module to obtain fine-grained phrase-enhanced image attention feature and coarse-grained phrase context-aware image attention feature. We validate our model on the public ISLES2024 dataset, a private dataset A, and an external AIS dataset. It achieves accuracies of 81.11%, 83.05%, and 80.00% and AUCs of 80.06%, 85.48% and 82.62% for 90-day mRS prediction on the 3 datasets, respectively, outperforming several state-of-the-art methods and demonstrating its generalization ability. Moreover, the proposed method can be effectively extended to glaucoma visual field progression prediction, which is also related to vascular differences and clinical notes.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104195"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146072487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the security and privacy of federated learning: A survey with attacks, defenses, frameworks, applications, and future directions 关于联邦学习的安全和隐私:攻击、防御、框架、应用和未来方向的调查
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-16 DOI: 10.1016/j.inffus.2026.104155
Daniel M. Jimenez-Gutierrez , Yelizaveta Falkouskaya , José L. Hernandez-Ramos , Aris Anagnostopoulos , Ioannis Chatzigiannakis , Andrea Vitaletti
Federated Learning (FL) is an emerging distributed machine learning paradigm enabling multiple clients to train a global model collaboratively without sharing their raw data. While FL enhances data privacy by design, it remains vulnerable to various security and privacy threats. This survey provides a comprehensive overview of 203 papers regarding the state-of-the-art attacks and defense mechanisms developed to address these challenges, categorizing them into security-enhancing and privacy-preserving techniques. Security-enhancing methods aim to improve FL robustness against malicious behaviors such as byzantine attacks, poisoning, and Sybil attacks. At the same time, privacy-preserving techniques focus on protecting sensitive data through cryptographic approaches, differential privacy, and secure aggregation. We critically analyze the strengths and limitations of existing methods, highlight the trade-offs between privacy, security, and model performance, and discuss the implications of non-IID data distributions on the effectiveness of these defenses. Furthermore, we identify open research challenges and future directions, including the need for scalable, adaptive, and energy-efficient solutions operating in dynamic and heterogeneous FL environments. Our survey aims to guide researchers and practitioners in developing robust and privacy-preserving FL systems, fostering advancements safeguarding collaborative learning frameworks’ integrity and confidentiality.
联邦学习(FL)是一种新兴的分布式机器学习范式,使多个客户端能够在不共享原始数据的情况下协作训练全局模型。虽然FL通过设计增强了数据隐私,但它仍然容易受到各种安全和隐私威胁。本调查提供了203篇论文的全面概述,这些论文涉及为应对这些挑战而开发的最先进的攻击和防御机制,将它们分为安全增强和隐私保护技术。安全增强方法旨在提高FL对拜占庭攻击、中毒攻击和Sybil攻击等恶意行为的鲁棒性。同时,隐私保护技术侧重于通过加密方法、差分隐私和安全聚合来保护敏感数据。我们批判性地分析了现有方法的优势和局限性,强调了隐私、安全性和模型性能之间的权衡,并讨论了非iid数据分布对这些防御有效性的影响。此外,我们确定了开放的研究挑战和未来的方向,包括在动态和异构FL环境中运行的可扩展、自适应和节能解决方案的需求。我们的调查旨在指导研究人员和从业人员开发强大的、保护隐私的FL系统,促进进步,保护协作学习框架的完整性和保密性。
{"title":"On the security and privacy of federated learning: A survey with attacks, defenses, frameworks, applications, and future directions","authors":"Daniel M. Jimenez-Gutierrez ,&nbsp;Yelizaveta Falkouskaya ,&nbsp;José L. Hernandez-Ramos ,&nbsp;Aris Anagnostopoulos ,&nbsp;Ioannis Chatzigiannakis ,&nbsp;Andrea Vitaletti","doi":"10.1016/j.inffus.2026.104155","DOIUrl":"10.1016/j.inffus.2026.104155","url":null,"abstract":"<div><div>Federated Learning (FL) is an emerging distributed machine learning paradigm enabling multiple clients to train a global model collaboratively without sharing their raw data. While FL enhances data privacy by design, it remains vulnerable to various security and privacy threats. This survey provides a comprehensive overview of 203 papers regarding the state-of-the-art attacks and defense mechanisms developed to address these challenges, categorizing them into security-enhancing and privacy-preserving techniques. Security-enhancing methods aim to improve FL robustness against malicious behaviors such as byzantine attacks, poisoning, and Sybil attacks. At the same time, privacy-preserving techniques focus on protecting sensitive data through cryptographic approaches, differential privacy, and secure aggregation. We critically analyze the strengths and limitations of existing methods, highlight the trade-offs between privacy, security, and model performance, and discuss the implications of non-IID data distributions on the effectiveness of these defenses. Furthermore, we identify open research challenges and future directions, including the need for scalable, adaptive, and energy-efficient solutions operating in dynamic and heterogeneous FL environments. Our survey aims to guide researchers and practitioners in developing robust and privacy-preserving FL systems, fostering advancements safeguarding collaborative learning frameworks’ integrity and confidentiality.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104155"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data fusion for low-cost sensors: A systematic literature review 低成本传感器的数据融合:系统文献综述
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-18 DOI: 10.1016/j.inffus.2026.104124
Gabriel Oduori , Chaira Cocco , Payam Sajadi , Francesco Pilla
Data fusion (DF) addresses the challenge of integrating heterogeneous data sources to improve decision-making and inference. Although DF has been widely explored, no prior systematic review has specifically focused on its application to low-cost sensor (LCS) data in environmental monitoring. To address this gap, we conduct a systematic literature review (SLR) following the PRISMA framework, synthesising findings from 82 peer-reviewed articles. The review addresses three key questions: (1) What fusion methodologies are employed in conjunction with LCS data? (2) In what environmental contexts are these methods applied? (3) What are the methodological challenges and research gaps? Our analysis reveals that geostatistical and machine learning approaches dominate current practice, with air quality monitoring emerging as the primary application domain. Additionally, artificial intelligence (AI)-based methods are increasingly used to integrate spatial, temporal, and multimodal data. However, limitations persist in uncertainty quantification, validation standards, and the generalisability of fusion frameworks. This review provides a comprehensive synthesis of current techniques and outlines key directions for future research, including the development of robust, uncertainty-aware fusion methods and broader application to less-studied environmental variables.
数据融合(DF)解决了集成异构数据源以改进决策和推理的挑战。虽然DF已被广泛探索,但尚未有系统综述专门关注其在环境监测中低成本传感器(LCS)数据中的应用。为了解决这一差距,我们根据PRISMA框架进行了系统的文献综述(SLR),综合了82篇同行评议文章的发现。该综述解决了三个关键问题:(1)结合LCS数据采用了哪些融合方法?(2)这些方法适用于什么环境背景?(3)方法论上的挑战和研究差距是什么?我们的分析表明,地质统计学和机器学习方法主导了当前的实践,空气质量监测正在成为主要的应用领域。此外,基于人工智能(AI)的方法越来越多地用于整合空间、时间和多模态数据。然而,在不确定度量化、验证标准和融合框架的通用性方面仍然存在局限性。这篇综述提供了对当前技术的全面综合,并概述了未来研究的关键方向,包括鲁棒性、不确定性感知融合方法的发展以及对较少研究的环境变量的更广泛应用。
{"title":"Data fusion for low-cost sensors: A systematic literature review","authors":"Gabriel Oduori ,&nbsp;Chaira Cocco ,&nbsp;Payam Sajadi ,&nbsp;Francesco Pilla","doi":"10.1016/j.inffus.2026.104124","DOIUrl":"10.1016/j.inffus.2026.104124","url":null,"abstract":"<div><div>Data fusion (DF) addresses the challenge of integrating heterogeneous data sources to improve decision-making and inference. Although DF has been widely explored, no prior systematic review has specifically focused on its application to low-cost sensor (LCS) data in environmental monitoring. To address this gap, we conduct a systematic literature review (SLR) following the PRISMA framework, synthesising findings from 82 peer-reviewed articles. The review addresses three key questions: (1) What fusion methodologies are employed in conjunction with LCS data? (2) In what environmental contexts are these methods applied? (3) What are the methodological challenges and research gaps? Our analysis reveals that geostatistical and machine learning approaches dominate current practice, with air quality monitoring emerging as the primary application domain. Additionally, artificial intelligence (AI)-based methods are increasingly used to integrate spatial, temporal, and multimodal data. However, limitations persist in uncertainty quantification, validation standards, and the generalisability of fusion frameworks. This review provides a comprehensive synthesis of current techniques and outlines key directions for future research, including the development of robust, uncertainty-aware fusion methods and broader application to less-studied environmental variables.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104124"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shape-aware osteoarthritis network: Bidirectional fusion of MRI and 3D point clouds for knee osteoarthritis diagnosis 形状感知骨关节炎网络:MRI和3D点云双向融合诊断膝关节骨关节炎
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-29 DOI: 10.1016/j.inffus.2026.104198
Dawei Zhang , Chenglin Sang , Tianyi Lyu
Knee osteoarthritis (KOA) is a common degenerative joint disease, and accurate diagnosis and severity grading are crucial for effective treatment. At present, although deep learning techniques based on X-rays or magnetic resonance imaging (MRI) have greatly improved diagnostic accuracy, two-dimensional images often cannot fully capture the complex three-dimensional morphology and texture changes related to KOA. To address these challenges, we propose a shape aware osteoarthritis diagnostic network, which is a novel bidirectional cross modal fusion framework that integrates 3D point clouds and MRI sequences. This framework consists of three parts: (1) a local relation aware dynamic graph convolutional neural network (CNN) used to extract complex geometric features from point clouds representing the surfaces of knee joint bones and cartilage; (2) For MRI sequences, a sequence aggregation method was adopted, which combines 2D CNN for spatial feature extraction and self-attention mechanism for cross slice sequences. (3) The bidirectional transmembrane fusion module is capable of conducting in-depth interactive feature learning between the geometric domain of point clouds and the texture spatiotemporal domain of MRI, enabling these two modes to improve and enhance each other’s representations. Extensive experiments conducted on a large cohort of osteoarthritis initiatives (OAI) have shown that our model achieves state-of-the-art performance. Its accuracy in the challenging 5-level Kellgren Lawrence (KL) classification is 0.73, which represents a improvement of approximately 23.7% over the 0.59 achieved by using 3D shape features alone in the ShapeMed-Knee benchmark. Furthermore, its AUC in binary OA diagnosis is 0.95, significantly better than existing unimodal and multimodal baselines.
膝关节骨性关节炎(KOA)是一种常见的退行性关节疾病,准确的诊断和严重程度分级是有效治疗的关键。目前,尽管基于x射线或磁共振成像(MRI)的深度学习技术大大提高了诊断准确性,但二维图像往往不能完全捕捉到与KOA相关的复杂的三维形态和纹理变化。为了解决这些挑战,我们提出了一个形状感知骨关节炎诊断网络,这是一个新的双向交叉模态融合框架,集成了3D点云和MRI序列。该框架由三部分组成:(1)局部关系感知的动态图卷积神经网络(CNN)用于从代表膝关节骨骼和软骨表面的点云中提取复杂几何特征;(2)对于MRI序列,采用序列聚合方法,结合二维CNN的空间特征提取和横切面序列的自关注机制。(3)双向跨膜融合模块能够在点云的几何域和MRI的纹理时空域之间进行深度的交互特征学习,使这两种模式能够相互改进和增强表征。在骨关节炎倡议(OAI)的大量队列中进行的广泛实验表明,我们的模型达到了最先进的性能。在具有挑战性的5级Kellgren Lawrence (KL)分类中,它的准确率为0.73,比在ShapeMed-Knee基准中单独使用3D形状特征获得的0.59提高了约23.7%。诊断二元OA的AUC为0.95,明显优于现有的单峰和多峰基线。
{"title":"Shape-aware osteoarthritis network: Bidirectional fusion of MRI and 3D point clouds for knee osteoarthritis diagnosis","authors":"Dawei Zhang ,&nbsp;Chenglin Sang ,&nbsp;Tianyi Lyu","doi":"10.1016/j.inffus.2026.104198","DOIUrl":"10.1016/j.inffus.2026.104198","url":null,"abstract":"<div><div>Knee osteoarthritis (KOA) is a common degenerative joint disease, and accurate diagnosis and severity grading are crucial for effective treatment. At present, although deep learning techniques based on X-rays or magnetic resonance imaging (MRI) have greatly improved diagnostic accuracy, two-dimensional images often cannot fully capture the complex three-dimensional morphology and texture changes related to KOA. To address these challenges, we propose a shape aware osteoarthritis diagnostic network, which is a novel bidirectional cross modal fusion framework that integrates 3D point clouds and MRI sequences. This framework consists of three parts: (1) a local relation aware dynamic graph convolutional neural network (CNN) used to extract complex geometric features from point clouds representing the surfaces of knee joint bones and cartilage; (2) For MRI sequences, a sequence aggregation method was adopted, which combines 2D CNN for spatial feature extraction and self-attention mechanism for cross slice sequences. (3) The bidirectional transmembrane fusion module is capable of conducting in-depth interactive feature learning between the geometric domain of point clouds and the texture spatiotemporal domain of MRI, enabling these two modes to improve and enhance each other’s representations. Extensive experiments conducted on a large cohort of osteoarthritis initiatives (OAI) have shown that our model achieves state-of-the-art performance. Its accuracy in the challenging 5-level Kellgren Lawrence (KL) classification is 0.73, which represents a improvement of approximately 23.7% over the 0.59 achieved by using 3D shape features alone in the ShapeMed-Knee benchmark. Furthermore, its AUC in binary OA diagnosis is 0.95, significantly better than existing unimodal and multimodal baselines.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104198"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146072486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Grading-inspired complementary enhancing for multimodal sentiment analysis 基于评分的多模态情感分析互补增强
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-23 DOI: 10.1016/j.inffus.2026.104174
Zhijing Huang , Wen-Jue He , Baotian Hu, Zheng Zhang
Due to its strong capacity for integrating heterogeneous multi-source information, multimodal sentiment analysis (MSA) has achieved remarkable progress in affective computing. However, existing methods typically adopt symmetric fusion strategies that treat all modalities equally, overlooking their inherent performance disparities that some modalities excel at discriminative representation, while others carry underutilized supportive cues. This limitation leads to insufficiency in cross-modal complementary correlation exploration. To address this issue, we propose a novel Grading-Inspired Complementary Enhancing (GCE) framework for MSA, which is one of the first attempts to conduct dynamic assessment for knowledge transfer in progressive multimodal fusion and cooperation. Specifically, based on cross-modal interaction, a task-aware grading mechanism categorizes modality-pair associations into dominant (high-performing) and supplementary (low-performing) branches according to their task performance. Accordingly, a relation filtering module selectively identifies the trustworthy information from the dominant branch to enhance consistency exploration in supplementary modality pairs with minimized redundancy. Afterwards, a weight adaptation module is adopted to dynamically adjust the guiding weight of individual samples for adaptability and generalization. Extensive experiments conducted on three benchmark datasets evidence that our proposed GCE approach can outperform the state-of-the-art MSA methods. Our code is available at https://github.com/hka-7/GCEforMSA.
多模态情感分析(MSA)由于具有强大的整合异构多源信息的能力,在情感计算领域取得了显著的进展。然而,现有的方法通常采用对称融合策略,平等地对待所有模式,忽视了它们内在的性能差异,即一些模式擅长于歧视性表征,而另一些模式则带有未充分利用的支持性线索。这种局限性导致了跨模态互补相关性研究的不足。为了解决这一问题,我们提出了一种新的基于评分启发的互补增强(GCE)框架,这是对渐进式多模态融合与合作中的知识转移进行动态评估的首次尝试。具体而言,基于跨模态交互,任务感知分级机制根据其任务绩效将模态对关联分类为主导(高性能)和辅助(低性能)分支。因此,关系过滤模块选择性地从优势分支中识别可信信息,以增强冗余最小化的互补模态对的一致性探索。然后,采用权值自适应模块动态调整单个样本的引导权值,实现自适应性和泛化。在三个基准数据集上进行的大量实验表明,我们提出的GCE方法优于最先进的MSA方法。我们的代码可在https://github.com/hka-7/GCEforMSA上获得。
{"title":"Grading-inspired complementary enhancing for multimodal sentiment analysis","authors":"Zhijing Huang ,&nbsp;Wen-Jue He ,&nbsp;Baotian Hu,&nbsp;Zheng Zhang","doi":"10.1016/j.inffus.2026.104174","DOIUrl":"10.1016/j.inffus.2026.104174","url":null,"abstract":"<div><div>Due to its strong capacity for integrating heterogeneous multi-source information, multimodal sentiment analysis (MSA) has achieved remarkable progress in affective computing. However, existing methods typically adopt symmetric fusion strategies that treat all modalities equally, overlooking their inherent performance disparities that some modalities excel at discriminative representation, while others carry underutilized supportive cues. This limitation leads to insufficiency in cross-modal complementary correlation exploration. To address this issue, we propose a novel Grading-Inspired Complementary Enhancing (GCE) framework for MSA, which is one of the first attempts to conduct dynamic assessment for knowledge transfer in progressive multimodal fusion and cooperation. Specifically, based on cross-modal interaction, a task-aware grading mechanism categorizes modality-pair associations into dominant (high-performing) and supplementary (low-performing) branches according to their task performance. Accordingly, a relation filtering module selectively identifies the trustworthy information from the dominant branch to enhance consistency exploration in supplementary modality pairs with minimized redundancy. Afterwards, a weight adaptation module is adopted to dynamically adjust the guiding weight of individual samples for adaptability and generalization. Extensive experiments conducted on three benchmark datasets evidence that our proposed GCE approach can outperform the state-of-the-art MSA methods. Our code is available at <span><span>https://github.com/hka-7/GCEforMSA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104174"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Code-driven programming prediction enhanced by LLM with a feature fusion approach 基于特征融合的LLM增强代码驱动编程预测
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-20 DOI: 10.1016/j.inffus.2026.104165
Shengyingjie Liu , Jianxin Li , Qian Wan , Bo He , Zhijun Huang , Qing Li
Programming education is essential for equipping individuals with digital literacy skills and developing the problem-solving abilities necessary for success in the modern workforce. In online programming tutoring systems, knowledge tracing (KT) techniques are crucial for programming prediction, as they monitor user performance and model user cognition. However, both universal and programming-specific knowledge transfer methods depend on traditional state-driven paradigms that indirectly predict programming outcomes based on users’ knowledge states. It does not align with the core objective of programming prediction, which is to determine whether submitted code can solve the question. To address this, we present the code-driven feature fusion KT (CFKT), which integrates large language models (LLM) and encoders for both individualized and common code features. It consists of two modules: pass prediction and code prediction. The pass prediction module leverages LLM to incorporate semantic information from the question and code through embedding, extracting key features that determine code correctness through proxy tasks and effectively narrowing the solution space with vectorization. The code prediction module integrates user historical data and data from other users through feature fusion blocks, allowing for accurate predictions of submitted code and effectively mitigating the cold start problem. Experiments on multiple real-world public programming datasets demonstrate that CFKT significantly outperforms existing baseline methods.
编程教育对于使个人具备数字素养技能和发展在现代劳动力中取得成功所必需的解决问题的能力至关重要。在在线编程辅导系统中,知识跟踪(KT)技术对编程预测至关重要,因为它们监控用户性能并为用户认知建模。然而,通用知识转移方法和特定编程知识转移方法都依赖于传统的状态驱动范式,这些范式基于用户的知识状态间接预测编程结果。它不符合编程预测的核心目标,即确定提交的代码是否可以解决问题。为了解决这个问题,我们提出了代码驱动的特征融合KT (CFKT),它集成了大型语言模型(LLM)和用于个性化和公共代码特征的编码器。它包括两个模块:传递预测和代码预测。pass预测模块利用LLM通过嵌入将问题和代码中的语义信息结合起来,通过代理任务提取确定代码正确性的关键特征,并通过向量化有效地缩小解决方案空间。代码预测模块通过特征融合块集成用户历史数据和来自其他用户的数据,允许对提交的代码进行准确预测,并有效缓解冷启动问题。在多个真实世界公共编程数据集上的实验表明,CFKT显著优于现有的基线方法。
{"title":"Code-driven programming prediction enhanced by LLM with a feature fusion approach","authors":"Shengyingjie Liu ,&nbsp;Jianxin Li ,&nbsp;Qian Wan ,&nbsp;Bo He ,&nbsp;Zhijun Huang ,&nbsp;Qing Li","doi":"10.1016/j.inffus.2026.104165","DOIUrl":"10.1016/j.inffus.2026.104165","url":null,"abstract":"<div><div>Programming education is essential for equipping individuals with digital literacy skills and developing the problem-solving abilities necessary for success in the modern workforce. In online programming tutoring systems, knowledge tracing (KT) techniques are crucial for programming prediction, as they monitor user performance and model user cognition. However, both universal and programming-specific knowledge transfer methods depend on traditional state-driven paradigms that indirectly predict programming outcomes based on users’ knowledge states. It does not align with the core objective of programming prediction, which is to determine whether submitted code can solve the question. To address this, we present the code-driven feature fusion KT (CFKT), which integrates large language models (LLM) and encoders for both individualized and common code features. It consists of two modules: pass prediction and code prediction. The pass prediction module leverages LLM to incorporate semantic information from the question and code through embedding, extracting key features that determine code correctness through proxy tasks and effectively narrowing the solution space with vectorization. The code prediction module integrates user historical data and data from other users through feature fusion blocks, allowing for accurate predictions of submitted code and effectively mitigating the cold start problem. Experiments on multiple real-world public programming datasets demonstrate that CFKT significantly outperforms existing baseline methods.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104165"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146014543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Crowdsourced federated learning with inconsistent label representation 不一致标签表示的众包联邦学习
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-30 DOI: 10.1016/j.inffus.2026.104194
Yunlong He, Fei Chen, Hanlin Zhang, Jia Yu
When personalized federated learning meets crowdsourced label annotation, it can potentially form a complete ecosystem from large-scale data labeling, through model training in massive devices, toward flexible service for diverse end users. Actually, most common crowdsourced annotators can hardly follow a uniform annotation regulation and make the annotations in their own way. Even though they can share the cognitive consistency on the perception, the label annotation can still be expressed in various ways. This situation can be specifically serious in the federated learning scenario, in which the diverse label expressions are always kept locally in distributed clients for privacy concerns and can hardly be unified. In this work, we are motivated to propose CrowdFed, a systematic solution for crowdsourced federated learning systems with underlying label representation skew issue. Specifically, the global model is trained through federated learning for global categorical alignment, and the personalized layers are learned through an auxiliary network in each client for local representation alignment. Furthermore, a category-level similarity matching strategy is presented for the alignment of inconsistent label representations between the local category and the global category. Evaluated by four benchmark datasets, our proposed strategy proves its superiority in terms of system efficiency and cost.
当个性化的联邦学习遇到众包标签标注时,它可能会形成一个完整的生态系统,从大规模数据标注,到大规模设备上的模型训练,再到为不同的最终用户提供灵活的服务。实际上,大多数常见的众包注释器很难遵循统一的注释规则,以自己的方式进行注释。尽管它们可以在感知上共享认知一致性,但标签标注仍然可以以多种方式表示。这种情况在联邦学习场景中尤为严重,因为出于隐私考虑,不同的标签表达式总是保存在分布式客户机的本地,很难统一。在这项工作中,我们提出了CrowdFed,这是一个针对具有潜在标签表示倾斜问题的众包联邦学习系统的系统解决方案。具体来说,通过联邦学习训练全局模型以实现全局分类对齐,通过每个客户端的辅助网络学习个性化层以实现局部表示对齐。此外,针对局部类别和全局类别之间不一致的标签表示,提出了一种类别级相似度匹配策略。通过四个基准数据集的测试,证明了该策略在系统效率和成本方面的优越性。
{"title":"Crowdsourced federated learning with inconsistent label representation","authors":"Yunlong He,&nbsp;Fei Chen,&nbsp;Hanlin Zhang,&nbsp;Jia Yu","doi":"10.1016/j.inffus.2026.104194","DOIUrl":"10.1016/j.inffus.2026.104194","url":null,"abstract":"<div><div>When personalized federated learning meets crowdsourced label annotation, it can potentially form a complete ecosystem from large-scale data labeling, through model training in massive devices, toward flexible service for diverse end users. Actually, most common crowdsourced annotators can hardly follow a uniform annotation regulation and make the annotations in their own way. Even though they can share the cognitive consistency on the perception, the label annotation can still be expressed in various ways. This situation can be specifically serious in the federated learning scenario, in which the diverse label expressions are always kept locally in distributed clients for privacy concerns and can hardly be unified. In this work, we are motivated to propose CrowdFed, a systematic solution for crowdsourced federated learning systems with underlying label representation skew issue. Specifically, the global model is trained through federated learning for global categorical alignment, and the personalized layers are learned through an auxiliary network in each client for local representation alignment. Furthermore, a category-level similarity matching strategy is presented for the alignment of inconsistent label representations between the local category and the global category. Evaluated by four benchmark datasets, our proposed strategy proves its superiority in terms of system efficiency and cost.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104194"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146089494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An adaptive regularized topological segmentation network integrating inter-class relations and occlusion information for vehicle component recognition 一种融合类间关系和遮挡信息的自适应正则化拓扑分割网络用于车辆部件识别
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-17 DOI: 10.1016/j.inffus.2026.104157
Xunqi Zhou , Zhenqi Zhang , Zifeng Wu , Qianming Wang , Jing Teng , Jinlong Liu , Yongjie Zhai
In intelligent vehicle damage assessment, component recognition faces challenges such as significant intra-class variability and minimal inter-class differences, which hinder detection, as well as occlusions and ambiguous boundaries, which complicate segmentation. We generalize these problems into three core aspects: inter-object relational modeling, semantic-detail information balancing, and occlusion-aware decoupling. To this end, we propose the Adaptive Regularized Topological Segmentation (ARTSeg) network, comprising three complementary modules: Inter-Class Graph Constraint (ICGC), Constrained Detail Feature Backtracking (CDFB), and Topological Decoupling Segmentation (TDS). Each module is purposefully designed, integrated in a progressive structure, and synergistically reinforces the others to enhance overall performance. Specifically, ICGC clusters intra-class features and establishes implicit topological constraints among categories during feature extraction, enabling the model to better capture inter-class relationships and improve detection representation. Subsequently, CDFB evaluates the impact of channel-wise feature information within each candidate region on segmentation accuracy and computational cost, dynamically selecting appropriate feature resolutions for individual instances while balancing the demands of detection and segmentation tasks. Finally, TDS introduces topological associations between occluded and occluding regions at the feature level and decouples them at the task level, explicitly modeling generalized occlusion regions and enhancing segmentation performance. We quantitatively and qualitatively evaluate ARTSeg on a 59-category vehicle component dataset constructed for insurance damage assessment, achieving notable improvements in addressing the aforementioned problems. Experiments on two public datasets, DSMLR and Carparts, further validate the generalization capability of the proposed method. Results indicate that ARTSeg provides practical guidance for component recognition in intelligent vehicle damage assessment.
在智能车辆损伤评估中,部件识别面临着类内差异大、类间差异小等问题,这些问题阻碍了检测,以及遮挡和模糊的边界使分割变得复杂。我们将这些问题概括为三个核心方面:对象间关系建模、语义-细节信息平衡和闭塞感知解耦。为此,我们提出了自适应正则化拓扑分割(ARTSeg)网络,该网络由三个互补模块组成:类间图约束(ICGC)、约束细节特征回溯(CDFB)和拓扑解耦分割(TDS)。每个模块都有针对性地设计,集成在一个渐进的结构中,并协同加强其他模块以提高整体性能。具体来说,在特征提取过程中,ICGC对类内特征进行聚类,并在类别之间建立隐式拓扑约束,使模型能够更好地捕获类间关系,提高检测表示。随后,CDFB评估每个候选区域内通道特征信息对分割精度和计算成本的影响,在平衡检测和分割任务需求的同时,动态地为单个实例选择合适的特征分辨率。最后,TDS在特征层引入被遮挡区域和遮挡区域之间的拓扑关联,在任务层解耦,明确建模广义遮挡区域,提高分割性能。我们在用于保险损害评估的59类车辆部件数据集上对ARTSeg进行了定量和定性评估,在解决上述问题方面取得了显着改进。在DSMLR和Carparts两个公共数据集上的实验进一步验证了该方法的泛化能力。结果表明,ARTSeg为智能车辆损伤评估中的部件识别提供了实用的指导。
{"title":"An adaptive regularized topological segmentation network integrating inter-class relations and occlusion information for vehicle component recognition","authors":"Xunqi Zhou ,&nbsp;Zhenqi Zhang ,&nbsp;Zifeng Wu ,&nbsp;Qianming Wang ,&nbsp;Jing Teng ,&nbsp;Jinlong Liu ,&nbsp;Yongjie Zhai","doi":"10.1016/j.inffus.2026.104157","DOIUrl":"10.1016/j.inffus.2026.104157","url":null,"abstract":"<div><div>In intelligent vehicle damage assessment, component recognition faces challenges such as significant intra-class variability and minimal inter-class differences, which hinder detection, as well as occlusions and ambiguous boundaries, which complicate segmentation. We generalize these problems into three core aspects: inter-object relational modeling, semantic-detail information balancing, and occlusion-aware decoupling. To this end, we propose the Adaptive Regularized Topological Segmentation (ARTSeg) network, comprising three complementary modules: Inter-Class Graph Constraint (ICGC), Constrained Detail Feature Backtracking (CDFB), and Topological Decoupling Segmentation (TDS). Each module is purposefully designed, integrated in a progressive structure, and synergistically reinforces the others to enhance overall performance. Specifically, ICGC clusters intra-class features and establishes implicit topological constraints among categories during feature extraction, enabling the model to better capture inter-class relationships and improve detection representation. Subsequently, CDFB evaluates the impact of channel-wise feature information within each candidate region on segmentation accuracy and computational cost, dynamically selecting appropriate feature resolutions for individual instances while balancing the demands of detection and segmentation tasks. Finally, TDS introduces topological associations between occluded and occluding regions at the feature level and decouples them at the task level, explicitly modeling generalized occlusion regions and enhancing segmentation performance. We quantitatively and qualitatively evaluate ARTSeg on a 59-category vehicle component dataset constructed for insurance damage assessment, achieving notable improvements in addressing the aforementioned problems. Experiments on two public datasets, DSMLR and Carparts, further validate the generalization capability of the proposed method. Results indicate that ARTSeg provides practical guidance for component recognition in intelligent vehicle damage assessment.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104157"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information Fusion
全部 Geobiology Appl. Clay Sci. Geochim. Cosmochim. Acta J. Hydrol. Org. Geochem. Carbon Balance Manage. Contrib. Mineral. Petrol. Int. J. Biometeorol. IZV-PHYS SOLID EART+ J. Atmos. Chem. Acta Oceanolog. Sin. Acta Geophys. ACTA GEOL POL ACTA PETROL SIN ACTA GEOL SIN-ENGL AAPG Bull. Acta Geochimica Adv. Atmos. Sci. Adv. Meteorol. Am. J. Phys. Anthropol. Am. J. Sci. Am. Mineral. Annu. Rev. Earth Planet. Sci. Appl. Geochem. Aquat. Geochem. Ann. Glaciol. Archaeol. Anthropol. Sci. ARCHAEOMETRY ARCT ANTARCT ALP RES Asia-Pac. J. Atmos. Sci. ATMOSPHERE-BASEL Atmos. Res. Aust. J. Earth Sci. Atmos. Chem. Phys. Atmos. Meas. Tech. Basin Res. Big Earth Data BIOGEOSCIENCES Geostand. Geoanal. Res. GEOLOGY Geosci. J. Geochem. J. Geochem. Trans. Geosci. Front. Geol. Ore Deposits Global Biogeochem. Cycles Gondwana Res. Geochem. Int. Geol. J. Geophys. Prospect. Geosci. Model Dev. GEOL BELG GROUNDWATER Hydrogeol. J. Hydrol. Earth Syst. Sci. Hydrol. Processes Int. J. Climatol. Int. J. Earth Sci. Int. Geol. Rev. Int. J. Disaster Risk Reduct. Int. J. Geomech. Int. J. Geog. Inf. Sci. Isl. Arc J. Afr. Earth. Sci. J. Adv. Model. Earth Syst. J APPL METEOROL CLIM J. Atmos. Oceanic Technol. J. Atmos. Sol. Terr. Phys. J. Clim. J. Earth Sci. J. Earth Syst. Sci. J. Environ. Eng. Geophys. J. Geog. Sci. Mineral. Mag. Miner. Deposita Mon. Weather Rev. Nat. Hazards Earth Syst. Sci. Nat. Clim. Change Nat. Geosci. Ocean Dyn. Ocean and Coastal Research npj Clim. Atmos. Sci. Ocean Modell. Ocean Sci. Ore Geol. Rev. OCEAN SCI J Paleontol. J. PALAEOGEOGR PALAEOCL PERIOD MINERAL PETROLOGY+ Phys. Chem. Miner. Polar Sci. Prog. Oceanogr. Quat. Sci. Rev. Q. J. Eng. Geol. Hydrogeol. RADIOCARBON Pure Appl. Geophys. Resour. Geol. Rev. Geophys. Sediment. Geol.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1