首页 > 最新文献

Information Fusion最新文献

英文 中文
WDASR: A wavelet-based deformable attention network for cardiac cine MRI super-resolution with spatiotemporal motion modeling WDASR:一种基于小波的可变形注意网络,用于心脏电影MRI超分辨率的时空运动建模
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-06 DOI: 10.1016/j.inffus.2025.104116
Jun Lyu , Xunkang Zhao , Jing Qin , Chengyan Wang
Cardiac cine MRI is the clinical gold standard for dynamic cardiac assessment, but reducing k-space sampling to accelerate acquisition results in low-resolution images that fail to depict fine anatomical details. Existing super-resolution methods struggle to preserve spatial details and temporal coherence due to limitations in handling non-rigid cardiac deformations and lossy feature downsampling. This paper proposes a Wavelet-based Deformable Attention Super-Resolution Network (WDASR) that addresses these limitations through two key innovations. First, a Frequency Subband Adaptive Alignment (FSAA) module applies deformable convolution to wavelet-decomposed frequency subbands, enabling lossless downsampling that prevents offset over-shifting and allows targeted alignment across neighboring and remote frames. Second, a Cross-Resolution Wavelet Attention (CRWA) module uses temporally-aggregated frequency subbands as low-resolution keys and values, and the current frame as high-resolution query, reducing computational complexity by 75% while effectively integrating multi-scale spatiotemporal information for enhanced texture representation. A bidirectional recurrent mechanism further propagates the enhanced features to maintain temporal consistency. Experiments on public and private datasets demonstrate that WDASR achieves 4 ×  super-resolution with state-of-the-art performance and potential for clinical application.
心脏电影MRI是动态心脏评估的临床金标准,但减少k空间采样以加速采集会导致低分辨率图像无法描绘精细的解剖细节。由于处理非刚性心脏变形和有损特征下采样的局限性,现有的超分辨率方法难以保持空间细节和时间相干性。本文提出了一种基于小波的可变形注意力超分辨率网络(WDASR),通过两个关键创新解决了这些限制。首先,频率子带自适应对准(FSAA)模块对小波分解的频率子带进行可变形卷积,实现无损下采样,防止偏移过移,并允许在相邻帧和远程帧之间进行目标对准。其次,交叉分辨率小波注意(Cross-Resolution Wavelet Attention, CRWA)模块采用时间聚合的频率子带作为低分辨率键和值,当前帧作为高分辨率查询,在有效整合多尺度时空信息的同时,将计算复杂度降低了75%,增强了纹理表征。双向循环机制进一步传播增强的特征以保持时间一致性。在公共和私有数据集上的实验表明,WDASR达到了4 × 超分辨率,具有最先进的性能和临床应用潜力。
{"title":"WDASR: A wavelet-based deformable attention network for cardiac cine MRI super-resolution with spatiotemporal motion modeling","authors":"Jun Lyu ,&nbsp;Xunkang Zhao ,&nbsp;Jing Qin ,&nbsp;Chengyan Wang","doi":"10.1016/j.inffus.2025.104116","DOIUrl":"10.1016/j.inffus.2025.104116","url":null,"abstract":"<div><div>Cardiac cine MRI is the clinical gold standard for dynamic cardiac assessment, but reducing k-space sampling to accelerate acquisition results in low-resolution images that fail to depict fine anatomical details. Existing super-resolution methods struggle to preserve spatial details and temporal coherence due to limitations in handling non-rigid cardiac deformations and lossy feature downsampling. This paper proposes a Wavelet-based Deformable Attention Super-Resolution Network (WDASR) that addresses these limitations through two key innovations. First, a Frequency Subband Adaptive Alignment (FSAA) module applies deformable convolution to wavelet-decomposed frequency subbands, enabling lossless downsampling that prevents offset over-shifting and allows targeted alignment across neighboring and remote frames. Second, a Cross-Resolution Wavelet Attention (CRWA) module uses temporally-aggregated frequency subbands as low-resolution keys and values, and the current frame as high-resolution query, reducing computational complexity by 75% while effectively integrating multi-scale spatiotemporal information for enhanced texture representation. A bidirectional recurrent mechanism further propagates the enhanced features to maintain temporal consistency. Experiments on public and private datasets demonstrate that WDASR achieves 4 ×  super-resolution with state-of-the-art performance and potential for clinical application.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104116"},"PeriodicalIF":15.5,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rethink: reveal the impact of semantic distribution transfer from the cross-modal hashing perspective 重新思考:从跨模态散列的角度揭示语义分布转移的影响
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-05 DOI: 10.1016/j.inffus.2026.104123
Yinan Li , Zhi Liu , Jiajun Tang , Binghong Chen , Mingjin Kuai , Jun Long , Zhan Yang
Hashing has been extensively applied in cross-modal retrieval by mapping diverse modalities data into binary codes. Semantic transfer aims to enhance the relevance of heterogeneous representations through migrating valuable information from one modality to another in the unsupervised paradigm. The combination of semantic transfer and hash learning substitutes the dense vector search with Hamming distance, significantly reducing storage requirements and increasing retrieval efficiency. However, the current unsupervised mechanism demonstrates ordinary performance in retrieval precision, which requires more improvement from semantic annotation. Particularly, the mediocre information fusion strategy directly affects the quality of learned hash codes. In this paper, we propose a novel Semantic Transfer framework for Semi-supervised Cross-modal Hashing, denoted as STSCH. Initially, we utilize multiple auto-encoders to learn the high-level semantic representation of each modality. To guarantee the completeness of heterogeneous data, we incorporate them via semantic transfer and analyse the feature distribution of diverse modalities. Furthermore, an asymmetric hash learning framework between individual modality-specific representation and minor semantic labels is constructed. Finally, an effective optimization algorithm is proposed. Comprehensive experiments on Wiki, MIRFlickr, and NUS-WIDE datasets demonstrate the superior performance of STSCH to state-of-the-art hashing approaches.
通过将不同模态的数据映射成二进制码,哈希在跨模态检索中得到了广泛应用。语义迁移旨在通过在无监督范式中将有价值的信息从一种模态迁移到另一种模态来增强异构表示的相关性。语义转移与哈希学习的结合替代了基于汉明距离的密集向量搜索,显著降低了存储需求,提高了检索效率。然而,目前的无监督机制在检索精度上表现一般,还需要语义标注的进一步改进。其中,信息融合策略的平庸性直接影响了学习到的哈希码的质量。在本文中,我们提出了一种新的半监督跨模态哈希语义转移框架,称为STSCH。首先,我们使用多个自编码器来学习每个模态的高级语义表示。为了保证异构数据的完整性,我们通过语义转移对异构数据进行整合,并分析了不同模态的特征分布。此外,在单个模态特定表示和次要语义标签之间构建了一个非对称哈希学习框架。最后,提出了一种有效的优化算法。在Wiki、MIRFlickr和NUS-WIDE数据集上的综合实验表明,STSCH比最先进的哈希方法性能优越。
{"title":"Rethink: reveal the impact of semantic distribution transfer from the cross-modal hashing perspective","authors":"Yinan Li ,&nbsp;Zhi Liu ,&nbsp;Jiajun Tang ,&nbsp;Binghong Chen ,&nbsp;Mingjin Kuai ,&nbsp;Jun Long ,&nbsp;Zhan Yang","doi":"10.1016/j.inffus.2026.104123","DOIUrl":"10.1016/j.inffus.2026.104123","url":null,"abstract":"<div><div>Hashing has been extensively applied in cross-modal retrieval by mapping diverse modalities data into binary codes. Semantic transfer aims to enhance the relevance of heterogeneous representations through migrating valuable information from one modality to another in the unsupervised paradigm. The combination of semantic transfer and hash learning substitutes the dense vector search with Hamming distance, significantly reducing storage requirements and increasing retrieval efficiency. However, the current unsupervised mechanism demonstrates ordinary performance in retrieval precision, which requires more improvement from semantic annotation. Particularly, the mediocre information fusion strategy directly affects the quality of learned hash codes. In this paper, we propose a novel <strong>S</strong>emantic <strong>T</strong>ransfer framework for <strong>S</strong>emi-supervised <strong>C</strong>ross-modal <strong>H</strong>ashing, denoted as STSCH. Initially, we utilize multiple auto-encoders to learn the high-level semantic representation of each modality. To guarantee the completeness of heterogeneous data, we incorporate them via semantic transfer and analyse the feature distribution of diverse modalities. Furthermore, an asymmetric hash learning framework between individual modality-specific representation and minor semantic labels is constructed. Finally, an effective optimization algorithm is proposed. Comprehensive experiments on Wiki, MIRFlickr, and NUS-WIDE datasets demonstrate the superior performance of STSCH to state-of-the-art hashing approaches.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104123"},"PeriodicalIF":15.5,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145902475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GULSTSVM: A fusion of graph information and universum learning in twin SVM 双支持向量机中图信息与全和学习的融合
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-04 DOI: 10.1016/j.inffus.2025.104114
Bharat Richhariya , M. Tanveer , Weiping Ding
In several applications, the datasets have an underlying graphical structure, and geometric information about the data is needed in the learning algorithm. Universum data serves as a useful resource for classification problems by providing prior information about the data distribution. However, the graph connectivity information embedded in the universum data has not been utilized in previous algorithms. To address this problem, a novel graph based algorithm is proposed in this work to infuse connectivity information of universum in the optimization problem of the classifier. The proposed algorithm is termed as graph based universum least squares twin support vector machine (GULSTSVM). The proposed algorithm involves manifold regularization on the universum graph to provide geometric information to the classifier. The solution of the proposed algorithm involves a system of linear equations, making it efficient in terms of training time. Moreover, to efficiently capture local and global connectivity information of universum data, a novel multi-hop connectivity method is also proposed in this work. The multi-hop approach provides a fusion of local and global graph connectivity. A concept of minimum spanning tree is presented to capture local connectivity, and feature aggregation is performed to obtain global connectivity information. Experimental results on synthetic and real-world benchmark datasets show the advantages and applicability of the proposed algorithm.
在一些应用中,数据集具有底层的图形结构,并且在学习算法中需要有关数据的几何信息。Universum数据通过提供有关数据分布的先验信息,为分类问题提供了有用的资源。然而,在以前的算法中,没有利用嵌入在universum数据中的图连接信息。为了解决这一问题,本文提出了一种新的基于图的算法,将宇宙和的连通性信息注入分类器的优化问题中。该算法被称为基于图的全和最小二乘双支持向量机(GULSTSVM)。该算法通过对全和图进行流形正则化,为分类器提供几何信息。该算法的求解涉及一个线性方程组,使其在训练时间方面效率很高。此外,为了有效地捕获universum数据的本地和全局连接信息,本文还提出了一种新的多跳连接方法。多跳方法提供了局部和全局图连接的融合。提出了最小生成树的概念来获取局部连通性,并通过特征聚合来获取全局连通性信息。在综合和实际基准数据集上的实验结果表明了该算法的优越性和适用性。
{"title":"GULSTSVM: A fusion of graph information and universum learning in twin SVM","authors":"Bharat Richhariya ,&nbsp;M. Tanveer ,&nbsp;Weiping Ding","doi":"10.1016/j.inffus.2025.104114","DOIUrl":"10.1016/j.inffus.2025.104114","url":null,"abstract":"<div><div>In several applications, the datasets have an underlying graphical structure, and geometric information about the data is needed in the learning algorithm. Universum data serves as a useful resource for classification problems by providing prior information about the data distribution. However, the graph connectivity information embedded in the universum data has not been utilized in previous algorithms. To address this problem, a novel graph based algorithm is proposed in this work to infuse connectivity information of universum in the optimization problem of the classifier. The proposed algorithm is termed as graph based universum least squares twin support vector machine (GULSTSVM). The proposed algorithm involves manifold regularization on the universum graph to provide geometric information to the classifier. The solution of the proposed algorithm involves a system of linear equations, making it efficient in terms of training time. Moreover, to efficiently capture local and global connectivity information of universum data, a novel multi-hop connectivity method is also proposed in this work. The multi-hop approach provides a fusion of local and global graph connectivity. A concept of minimum spanning tree is presented to capture local connectivity, and feature aggregation is performed to obtain global connectivity information. Experimental results on synthetic and real-world benchmark datasets show the advantages and applicability of the proposed algorithm.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104114"},"PeriodicalIF":15.5,"publicationDate":"2026-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145897493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DAK-Pose: Dual-augmentor knowledge fusion for generalizable video-based 3D human pose estimation DAK-Pose:基于广义视频的三维人体姿态估计的双增强知识融合
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-03 DOI: 10.1016/j.inffus.2025.104100
Yachuan Wang, Bin Zhang, Hao Yuan
Real-world deployment of video-based 3D human pose estimation remains challenging, as limited annotated data collected in constrained lab settings cannot fully capture the complexity of human motion. While motion synthesis for data augmentation has emerged as a mainstream solution to enhance generalization, existing synthesis methods suffer from inherent trade-offs: kinematics-based motion synthesis approaches preserve anatomical plausibility but sacrifice temporal coherence, while coordinate-based methods ensure motion smoothness but violate biomechanical constraints. This results in persistent domain gaps when synthetic data is directly used in the observation space to train pose estimation models. To overcome this, we propose DAK-Pose, which shifts augmentation to the feature space. We disentangle motion into structural and dynamic features, and design two complementary augmentors: (1) A structure-prioritized module enforces kinematic constraints for anatomical validity, and (2) a dynamic-prioritized module generates diverse temporal patterns. Auxiliary encoders trained on synthetic motions generated by these augmentors transfer domain-invariant knowledge to the pose estimator through adversarial alignment. Experiments on Human3.6M, MPI-INF-3DHP, and 3DPW datasets show that DAK-Pose achieves state-of-the-art cross-dataset performance.
基于视频的3D人体姿态估计在现实世界中的部署仍然具有挑战性,因为在受限的实验室环境中收集的有限注释数据无法完全捕捉到人体运动的复杂性。虽然用于数据增强的运动合成已成为增强泛化的主流解决方案,但现有的合成方法存在固有的权衡:基于运动学的运动合成方法保留了解剖学的合理性,但牺牲了时间一致性,而基于坐标的方法确保了运动的平滑性,但违反了生物力学约束。当直接在观测空间中使用合成数据来训练姿态估计模型时,这会导致持久的域间隙。为了克服这个问题,我们提出了DAK-Pose,它将增强转移到特征空间。我们将运动分解为结构特征和动态特征,并设计了两个互补的增强器:(1)结构优先模块执行解剖学有效性的运动学约束;(2)动态优先模块生成多种时间模式。辅助编码器对这些增强量生成的合成运动进行训练,通过对抗性对齐将域不变知识传递给姿态估计器。在Human3.6M、MPI-INF-3DHP和3DPW数据集上的实验表明,DAK-Pose实现了最先进的跨数据集性能。
{"title":"DAK-Pose: Dual-augmentor knowledge fusion for generalizable video-based 3D human pose estimation","authors":"Yachuan Wang,&nbsp;Bin Zhang,&nbsp;Hao Yuan","doi":"10.1016/j.inffus.2025.104100","DOIUrl":"10.1016/j.inffus.2025.104100","url":null,"abstract":"<div><div>Real-world deployment of video-based 3D human pose estimation remains challenging, as limited annotated data collected in constrained lab settings cannot fully capture the complexity of human motion. While motion synthesis for data augmentation has emerged as a mainstream solution to enhance generalization, existing synthesis methods suffer from inherent trade-offs: kinematics-based motion synthesis approaches preserve anatomical plausibility but sacrifice temporal coherence, while coordinate-based methods ensure motion smoothness but violate biomechanical constraints. This results in persistent domain gaps when synthetic data is directly used in the observation space to train pose estimation models. To overcome this, we propose DAK-Pose, which shifts augmentation to the feature space. We disentangle motion into structural and dynamic features, and design two complementary augmentors: (1) A structure-prioritized module enforces kinematic constraints for anatomical validity, and (2) a dynamic-prioritized module generates diverse temporal patterns. Auxiliary encoders trained on synthetic motions generated by these augmentors transfer domain-invariant knowledge to the pose estimator through adversarial alignment. Experiments on Human3.6M, MPI-INF-3DHP, and 3DPW datasets show that DAK-Pose achieves state-of-the-art cross-dataset performance.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104100"},"PeriodicalIF":15.5,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145894681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A hierarchical information policy fusion framework with multimodal large language models for autonomous guidewire navigation in endovascular procedures 基于多模态大语言模型的血管内导丝自主导航分层信息策略融合框架
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-03 DOI: 10.1016/j.inffus.2025.104115
Haoyu Wang , Taylor Yiu , Serena Lee , Ka Gao , Hangling Sun , Chenyu Zhou , Anji Li , Qiangqiang Fu , Yu Wang , Bin Chen
Robotic-assisted endovascular interventions promise to transform cardiovascular therapy by improving procedural precision and minimizing cardiologists’ exposure to occupational risks. However, current systems are limited by their reliance on manual control and lack of adaptability to complex vascular anatomies. To address these challenges, we propose a novel Hierarchical Autonomous Guidewire Navigation and Delivery (HAG-ND) framework that leverages the strengths of multimodal large language models (MLLMs) and a novel reinforcement learning module inspired by Deep Q-Networks (DQNs). The high-level MLLM is trained on diverse blood vessel and guidewire scenarios from various angles and positions, enabling it to assess the suitability and timing of substance release at the target location. Within the MLLM, a parliamentary mechanism is introduced, where multiple specialized models, each focusing on a specific aspect of the vascular environment, vote on the optimal course of action. The low-level reinforcement learning module focuses on optimizing autonomous guidewire navigation to the designated target site by learning from the rich semantic understanding provided by the MLLM. Experimental evaluations demonstrate that the HAG-ND framework significantly improves the accuracy and reliability of guidewire positioning and targeted delivery compared to existing methods. By harnessing the complementary capabilities of MLLMs and novel reinforcement learning techniques in a hierarchical architecture, HAG-ND represents a significant step towards fully autonomous and adaptive robotic-assisted endovascular interventions.
机器人辅助血管内介入有望通过提高手术精度和减少心脏病专家的职业风险来改变心血管治疗。然而,目前的系统受到人工控制的限制,缺乏对复杂血管解剖结构的适应性。为了解决这些挑战,我们提出了一种新的分层自主导线导航和交付(HAG-ND)框架,该框架利用了多模态大语言模型(mllm)的优势和受深度q网络(dqn)启发的新型强化学习模块。高水平MLLM从不同角度和位置对不同的血管和导丝情景进行训练,使其能够评估目标位置物质释放的适宜性和时间。在MLLM中,引入了议会机制,其中多个专门模型,每个模型都关注血管环境的一个特定方面,对最佳行动方案进行投票。底层强化学习模块通过学习MLLM提供的丰富语义理解,优化导丝自主导航到指定目标位置。实验评估表明,与现有方法相比,HAG-ND框架显著提高了导丝定位和定向投放的准确性和可靠性。通过在分层结构中利用mllm的互补能力和新型强化学习技术,HAG-ND代表了向完全自主和自适应机器人辅助血管内干预迈出的重要一步。
{"title":"A hierarchical information policy fusion framework with multimodal large language models for autonomous guidewire navigation in endovascular procedures","authors":"Haoyu Wang ,&nbsp;Taylor Yiu ,&nbsp;Serena Lee ,&nbsp;Ka Gao ,&nbsp;Hangling Sun ,&nbsp;Chenyu Zhou ,&nbsp;Anji Li ,&nbsp;Qiangqiang Fu ,&nbsp;Yu Wang ,&nbsp;Bin Chen","doi":"10.1016/j.inffus.2025.104115","DOIUrl":"10.1016/j.inffus.2025.104115","url":null,"abstract":"<div><div>Robotic-assisted endovascular interventions promise to transform cardiovascular therapy by improving procedural precision and minimizing cardiologists’ exposure to occupational risks. However, current systems are limited by their reliance on manual control and lack of adaptability to complex vascular anatomies. To address these challenges, we propose a novel <em><strong>H</strong></em>ierarchical <em><strong>A</strong></em>utonomous <em><strong>G</strong></em>uidewire <em><strong>N</strong></em>avigation and <em><strong>D</strong></em>elivery (<em><strong>HAG-ND</strong></em>) framework that leverages the strengths of multimodal large language models (MLLMs) and a novel reinforcement learning module inspired by Deep Q-Networks (DQNs). The high-level MLLM is trained on diverse blood vessel and guidewire scenarios from various angles and positions, enabling it to assess the suitability and timing of substance release at the target location. Within the MLLM, a parliamentary mechanism is introduced, where multiple specialized models, each focusing on a specific aspect of the vascular environment, vote on the optimal course of action. The low-level reinforcement learning module focuses on optimizing autonomous guidewire navigation to the designated target site by learning from the rich semantic understanding provided by the MLLM. Experimental evaluations demonstrate that the HAG-ND framework significantly improves the accuracy and reliability of guidewire positioning and targeted delivery compared to existing methods. By harnessing the complementary capabilities of MLLMs and novel reinforcement learning techniques in a hierarchical architecture, HAG-ND represents a significant step towards fully autonomous and adaptive robotic-assisted endovascular interventions.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104115"},"PeriodicalIF":15.5,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145894682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Few-shot harmful meme detection via self-adaption mixture-of-experts 基于自适应混合专家的少量有害模因检测
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-03 DOI: 10.1016/j.inffus.2026.104122
Zou Li , Jinzhi Liao , Jiting Li , Ji Wang , Xiang Zhao
The automatic detection of harmful memes is essential for healthy online ecosystems but remains challenging due to the intricate interaction between visual and textual elements. Recently, the remarkable capabilities of multimodal large language models (MLLMs) have significantly enhanced the detection performance, yet scarce labeled data still limits their effectiveness. Although pioneering few-shot studies have explored this regime, they merely leverage surface-level capabilities while ignoring deeper complexities. To approach the core of the problem, we identify its notorious challenges: (1) heterogeneous multimodal features are complex and may exhibit negative correlations; (2) the semantic patterns underlying single modal are hard to uncover; and (3) the insufficient training samples render models more reliant on commonsense. To address the challenges, we propose a structural self-adaption mixture-of-experts framework (SSMoE) for few-shot harmful meme detection, including universal and specialized experts to foster more effective knowledge sharing, modal synergy, and expert specialization within the MLLM structure. Specifically, SSMoE integrates four novel components: (1) Semantic Data Clustering module aims to partition heterogeneous source data and mitigate negative transfer; (2) Targeted Prompt Injection module aims to employ a teacher model for providing cluster-specific external guidance; (3) Asymmetric Expert Specialization module aims to introduce shared and specialized experts for efficient parameter adaptation and knowledge specialization; and (4) Cluster-conditioned Routing module aims to dynamically direct inputs to the most relevant expert pathway based on semantic cluster identity. Extensive experiments on three benchmark datasets (FHM, MAMI, HarM) demonstrate that SSMoE significantly outperforms state-of-the-art baseline methods, particularly in extremely low-data scenarios.
有害模因的自动检测对于健康的在线生态系统至关重要,但由于视觉和文本元素之间复杂的相互作用,仍然具有挑战性。近年来,多模态大语言模型(multimodal large language model, mllm)的显著性能大大提高了检测性能,但标记数据的匮乏仍然限制了其有效性。尽管一些开创性的研究已经探索了这一机制,但它们只是利用了表面的能力,而忽略了更深层次的复杂性。为了接近问题的核心,我们确定了其臭名昭着的挑战:(1)异质多模态特征是复杂的,并且可能表现出负相关;(2)单模态的语义模式难以发现;(3)训练样本不足使模型更依赖于常识。为了解决这些挑战,我们提出了一个结构自适应专家混合框架(SSMoE),用于少量有害模因检测,包括通用和专业专家,以促进更有效的知识共享,模态协同和专家专业化在MLLM结构中。具体而言,SSMoE集成了四个新组件:(1)语义数据聚类模块旨在对异构源数据进行分区,减轻负迁移;(2)针对性提示注入模块,采用教师模式,针对集群进行外部引导;(3)非对称专家专门化模块旨在引入共享专家和专门化专家,实现高效的参数自适应和知识专门化;(4)集群条件路由模块旨在基于语义集群身份动态地将输入引导到最相关的专家路径。在三个基准数据集(FHM, MAMI, HarM)上进行的大量实验表明,SSMoE显著优于最先进的基线方法,特别是在极低数据场景下。
{"title":"Few-shot harmful meme detection via self-adaption mixture-of-experts","authors":"Zou Li ,&nbsp;Jinzhi Liao ,&nbsp;Jiting Li ,&nbsp;Ji Wang ,&nbsp;Xiang Zhao","doi":"10.1016/j.inffus.2026.104122","DOIUrl":"10.1016/j.inffus.2026.104122","url":null,"abstract":"<div><div>The automatic detection of harmful memes is essential for healthy online ecosystems but remains challenging due to the intricate interaction between visual and textual elements. Recently, the remarkable capabilities of multimodal large language models (MLLMs) have significantly enhanced the detection performance, yet scarce labeled data still limits their effectiveness. Although pioneering few-shot studies have explored this regime, they merely leverage surface-level capabilities while ignoring deeper complexities. To approach the core of the problem, we identify its notorious challenges: (1) heterogeneous multimodal features are complex and may exhibit negative correlations; (2) the semantic patterns underlying single modal are hard to uncover; and (3) the insufficient training samples render models more reliant on commonsense. To address the challenges, we propose a structural self-adaption mixture-of-experts framework (SSMoE) for few-shot harmful meme detection, including universal and specialized experts to foster more effective knowledge sharing, modal synergy, and expert specialization within the MLLM structure. Specifically, SSMoE integrates four novel components: (1) Semantic Data Clustering module aims to partition heterogeneous source data and mitigate negative transfer; (2) Targeted Prompt Injection module aims to employ a teacher model for providing cluster-specific external guidance; (3) Asymmetric Expert Specialization module aims to introduce shared and specialized experts for efficient parameter adaptation and knowledge specialization; and (4) Cluster-conditioned Routing module aims to dynamically direct inputs to the most relevant expert pathway based on semantic cluster identity. Extensive experiments on three benchmark datasets (FHM, MAMI, HarM) demonstrate that SSMoE significantly outperforms state-of-the-art baseline methods, particularly in extremely low-data scenarios.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104122"},"PeriodicalIF":15.5,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145894680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EPSO-net: A multi-objective evolutionary neural architecture search with PSO-guided mutation fusion for explainable brain tumor segmentation EPSO-Net:基于pso引导的突变融合的多目标进化神经结构搜索,用于可解释的脑肿瘤分割
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-03 DOI: 10.1016/j.inffus.2025.104119
Farhana Yasmin , Yu Xue , Mahade Hasan , Ghulam Muhammad
Accurate brain tumor segmentation from magnetic resonance imaging (MRI) remains a significant challenge due to early loss of spatial detail, inadequate contextual representation, and ineffective decoder fusion. In this paper, we propose EPSO-Net, a multi-objective evolutionary neural architecture search (NAS) framework that integrates three specialized modules: UTSA for preserving spatial encoding and enhancing low-level feature representation, Astra for capturing semantic abstraction and multi-scale context, and Revo for improving decoder refinement through attention-guided fusion of feature maps. These modules work synergistically within a flexible modular 3D search space, enabling dynamic architecture optimization during the evolutionary process. EPSO-Net utilizes a particle swarm optimization (PSO)-guided mutation fusion mechanism that enables efficient exploration of the search space, adjusting mutation behavior based on performance feedback. To the best of our knowledge, this is the first multi-objective evolutionary NAS framework employing PSO-guided mutation fusion to adapt mutation strategies, driving the search towards optimal solutions in a resource-efficient manner. Experiments on the BraTS 2021, BraTS 2020, and MSD Brain Tumor datasets demonstrate that EPSO-Net outperforms nine state-of-the-art methods, achieving high dice similarity coefficients (DSC) of 93.89%, 95.02%, and 91.25%, low Hausdorff distance (HD95) of 1.14 mm, 1.02 mm, and 1.44 mm, and strong Grad-CAM IoU (GIoU) of 89.32%, 90.12%, and 85.68%, respectively. EPSO-Net also demonstrates reliable generalization to the CHAOS, PROMISE12, and ACDC datasets. Furthermore, it significantly reduces model complexity, lowers FLOPS, accelerates inference, and enhances interpretability. The full code will be publicly available at: https://github.com/Farhana005/EPSO-Net.
由于早期空间细节的丢失、上下文表示的不足和无效的解码器融合,从磁共振成像(MRI)中准确分割脑肿瘤仍然是一个重大挑战。在本文中,我们提出了EPSO-Net,这是一个多目标进化神经结构搜索(NAS)框架,它集成了三个专门的模块:用于保留空间编码和增强低级特征表示的UTSA,用于捕获语义抽象和多尺度上下文的Astra,以及用于通过注意引导融合特征图来改进解码器优化的Revo。这些模块在一个灵活的模块化3D搜索空间中协同工作,在进化过程中实现动态架构优化。EPSO-Net利用粒子群优化(PSO)引导的突变融合机制,能够有效地探索搜索空间,并根据性能反馈调整突变行为。据我们所知,这是第一个采用pso引导的突变融合来适应突变策略的多目标进化NAS框架,以资源高效的方式推动了对最佳解决方案的搜索。在BraTS 2021, BraTS 2020和MSD脑肿瘤数据集上的实验表明,EPSO-Net优于9种最先进的方法,实现了高dice相似系数(DSC)为93.89%,95.02%和91.25%,低Hausdorff距离(HD95)为1.14 mm, 1.02 mm和1.44 mm,高Grad-CAM IoU (GIoU)分别为89.32%,90.12%和85.68%。EPSO-Net还展示了对CHAOS, PROMISE12和ACDC数据集的可靠泛化。此外,它显著降低了模型复杂性,降低了FLOPS,加速了推理,增强了可解释性。完整的代码将在https://github.com/Farhana005/EPSO-Net上公开。
{"title":"EPSO-net: A multi-objective evolutionary neural architecture search with PSO-guided mutation fusion for explainable brain tumor segmentation","authors":"Farhana Yasmin ,&nbsp;Yu Xue ,&nbsp;Mahade Hasan ,&nbsp;Ghulam Muhammad","doi":"10.1016/j.inffus.2025.104119","DOIUrl":"10.1016/j.inffus.2025.104119","url":null,"abstract":"<div><div>Accurate brain tumor segmentation from magnetic resonance imaging (MRI) remains a significant challenge due to early loss of spatial detail, inadequate contextual representation, and ineffective decoder fusion. In this paper, we propose EPSO-Net, a multi-objective evolutionary neural architecture search (NAS) framework that integrates three specialized modules: UTSA for preserving spatial encoding and enhancing low-level feature representation, Astra for capturing semantic abstraction and multi-scale context, and Revo for improving decoder refinement through attention-guided fusion of feature maps. These modules work synergistically within a flexible modular 3D search space, enabling dynamic architecture optimization during the evolutionary process. EPSO-Net utilizes a particle swarm optimization (PSO)-guided mutation fusion mechanism that enables efficient exploration of the search space, adjusting mutation behavior based on performance feedback. To the best of our knowledge, this is the first multi-objective evolutionary NAS framework employing PSO-guided mutation fusion to adapt mutation strategies, driving the search towards optimal solutions in a resource-efficient manner. Experiments on the BraTS 2021, BraTS 2020, and MSD Brain Tumor datasets demonstrate that EPSO-Net outperforms nine state-of-the-art methods, achieving high dice similarity coefficients (DSC) of 93.89%, 95.02%, and 91.25%, low Hausdorff distance (HD95) of 1.14 mm, 1.02 mm, and 1.44 mm, and strong Grad-CAM IoU (GIoU) of 89.32%, 90.12%, and 85.68%, respectively. EPSO-Net also demonstrates reliable generalization to the CHAOS, PROMISE12, and ACDC datasets. Furthermore, it significantly reduces model complexity, lowers FLOPS, accelerates inference, and enhances interpretability. The full code will be publicly available at: <span><span>https://github.com/Farhana005/EPSO-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104119"},"PeriodicalIF":15.5,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145894686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Progressive temporal compensation and semantic enhancement for Exo-to-Ego video generation 渐进式时间补偿和语义增强的外自我视频生成
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-03 DOI: 10.1016/j.inffus.2025.104117
Xingyue Wang , Weipeng Hu , Jiun Tian Hoe , Jianhui Li , Ping Hu , Yap-Peng Tan
Transforming video perspectives from exocentric (third-person) to egocentric (first-person) is challenging due to limited overlap between two perspectives. Existing approaches often neglect the temporal dynamics-critical for capturing motion cues and reappearing objects-and do not fully exploit source-view inferred semantics. To address these limitations, we propose a Progressive Temporal Compensation and Semantic Enhancement (PCSE) framework for Exocentric-to-Egocentric Video Generation. The Progressive Temporal Compensation (PTC) module focuses on long-term temporal dependencies, progressively aligning exocentric temporal patterns with egocentric representations. By employing a reliance-shifting mechanism with a progression mask, PTC gradually reduces dependence on egocentric supervision, enabling more robust target-view learning. Moreover, to leverage high-level scene context, we introduce a Hierarchical Dual-channel Transformer (HDT), which jointly generates egocentric frames and their corresponding semantic layouts via dual encoder-decoder architectures with hierarchically processed transformer blocks. To further enhance structural coherence and semantic consistency, the generated semantic layouts guide frame refinement through an Uncertainty-aware Semantic Enhancement (USE) module. USE dynamically estimates uncertainty masks to locate and refine ambiguous regions, yielding more coherent and visually accurate results. Extensive experiments demonstrate that PCSE achieves leading performance among cue-free methods.
将视频视角从外部中心(第三人称)转换为自我中心(第一人称)是具有挑战性的,因为两种视角之间的重叠有限。现有的方法往往忽略了时间动态——捕捉运动线索和再现对象的关键——并且没有充分利用源-视图推断语义。为了解决这些限制,我们提出了一个渐进式时间补偿和语义增强(PCSE)框架,用于外心到自我中心的视频生成。进行性时间补偿(PTC)模块侧重于长期时间依赖性,逐步将外心时间模式与自我中心表征对齐。通过采用一种带有递进掩模的依赖转移机制,PTC逐渐减少了对自我中心监督的依赖,从而实现了更稳健的目标视图学习。此外,为了利用高级场景上下文,我们引入了分层双通道变压器(HDT),它通过分层处理变压器块的双编码器-解码器架构共同生成以自我为中心的帧及其相应的语义布局。为了进一步增强结构一致性和语义一致性,生成的语义布局通过不确定性感知语义增强(USE)模块指导框架优化。USE动态估计不确定性掩模来定位和细化模糊区域,产生更连贯和视觉上准确的结果。大量的实验表明,PCSE在无线索方法中具有领先的性能。
{"title":"Progressive temporal compensation and semantic enhancement for Exo-to-Ego video generation","authors":"Xingyue Wang ,&nbsp;Weipeng Hu ,&nbsp;Jiun Tian Hoe ,&nbsp;Jianhui Li ,&nbsp;Ping Hu ,&nbsp;Yap-Peng Tan","doi":"10.1016/j.inffus.2025.104117","DOIUrl":"10.1016/j.inffus.2025.104117","url":null,"abstract":"<div><div>Transforming video perspectives from exocentric (third-person) to egocentric (first-person) is challenging due to limited overlap between two perspectives. Existing approaches often neglect the temporal dynamics-critical for capturing motion cues and reappearing objects-and do not fully exploit source-view inferred semantics. To address these limitations, we propose a Progressive Temporal Compensation and Semantic Enhancement (PCSE) framework for Exocentric-to-Egocentric Video Generation. The Progressive Temporal Compensation (PTC) module focuses on long-term temporal dependencies, progressively aligning exocentric temporal patterns with egocentric representations. By employing a reliance-shifting mechanism with a progression mask, PTC gradually reduces dependence on egocentric supervision, enabling more robust target-view learning. Moreover, to leverage high-level scene context, we introduce a Hierarchical Dual-channel Transformer (HDT), which jointly generates egocentric frames and their corresponding semantic layouts via dual encoder-decoder architectures with hierarchically processed transformer blocks. To further enhance structural coherence and semantic consistency, the generated semantic layouts guide frame refinement through an Uncertainty-aware Semantic Enhancement (USE) module. USE dynamically estimates uncertainty masks to locate and refine ambiguous regions, yielding more coherent and visually accurate results. Extensive experiments demonstrate that PCSE achieves leading performance among cue-free methods.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104117"},"PeriodicalIF":15.5,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145894687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedEGL: Edge-assisted federated graph learning FedEGL:边缘辅助联邦图学习
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-02 DOI: 10.1016/j.inffus.2025.104118
Haitao Wang , Aojie Luo , Wenchao Xu , Haozhao Wang , Yichen Li , Yining Qi , Rui Zhang , Ruixuan Li
Federated graph learning excels in learning graph-structured data that are distributed across multiple clients. However, the partition of graph data results in each client only possessing a subgraph, lacking its neighbor nodes, which significantly degrades accuracy. Although exchanging original nodes can address this issue, it requires interaction with a remote server, not only causing significant communication delays but also leaking data privacy. To tackle this, this paper proposes an edge-server-assisted federated graph learning approach, namely FedEGL, which aggregates and exchanges intermediate features of approximated nodes through a third-party edge server, performing cross-client feature alignment and dynamic weighted aggregation while dynamically allocating privacy budgets with adaptive differential privacy to preserve node privacy. Additionally, differential privacy is introduced to protect the privacy of approximated node features by dynamically allocating privacy budgets. Experimental results show that our method achieves accuracy close to that in centralized settings, with the classification accuracy improved by up to 8% compared to the latest baseline. This method can improve model accuracy while protecting privacy, providing an effective solution to the subgraph partitioning problem in federated graph learning.
联邦图学习擅长学习分布在多个客户机上的图结构数据。然而,图数据的分区导致每个客户端只拥有一个子图,而缺乏它的邻居节点,这大大降低了精度。虽然交换原始节点可以解决这个问题,但它需要与远程服务器进行交互,这不仅会导致严重的通信延迟,还会泄露数据隐私。为了解决这个问题,本文提出了一种边缘服务器辅助的联邦图学习方法,即FedEGL,该方法通过第三方边缘服务器聚合和交换近似节点的中间特征,进行跨客户端特征对齐和动态加权聚合,同时使用自适应差分隐私动态分配隐私预算,以保护节点隐私。此外,引入差分隐私,通过动态分配隐私预算来保护近似节点特征的隐私。实验结果表明,该方法的准确率接近集中式设置,与最新基线相比,分类准确率提高了8%。该方法在保护隐私的同时提高了模型的准确性,为联邦图学习中的子图划分问题提供了一种有效的解决方案。
{"title":"FedEGL: Edge-assisted federated graph learning","authors":"Haitao Wang ,&nbsp;Aojie Luo ,&nbsp;Wenchao Xu ,&nbsp;Haozhao Wang ,&nbsp;Yichen Li ,&nbsp;Yining Qi ,&nbsp;Rui Zhang ,&nbsp;Ruixuan Li","doi":"10.1016/j.inffus.2025.104118","DOIUrl":"10.1016/j.inffus.2025.104118","url":null,"abstract":"<div><div>Federated graph learning excels in learning graph-structured data that are distributed across multiple clients. However, the partition of graph data results in each client only possessing a subgraph, lacking its neighbor nodes, which significantly degrades accuracy. Although exchanging original nodes can address this issue, it requires interaction with a remote server, not only causing significant communication delays but also leaking data privacy. To tackle this, this paper proposes an edge-server-assisted federated graph learning approach, namely FedEGL, which aggregates and exchanges intermediate features of <em>approximated</em> nodes through a third-party edge server, performing cross-client feature alignment and dynamic weighted aggregation while dynamically allocating privacy budgets with adaptive differential privacy to preserve node privacy. Additionally, differential privacy is introduced to protect the privacy of approximated node features by dynamically allocating privacy budgets. Experimental results show that our method achieves accuracy close to that in centralized settings, with the classification accuracy improved by up to 8% compared to the latest baseline. This method can improve model accuracy while protecting privacy, providing an effective solution to the subgraph partitioning problem in federated graph learning.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104118"},"PeriodicalIF":15.5,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145894683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Regional defeats global: An efficient regional feature fusion via convolutional architecture for multispectral object detection 区域战胜全局:基于卷积结构的高效区域特征融合多光谱目标检测
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-02 DOI: 10.1016/j.inffus.2025.104110
Zhenhao Wang, Tian Tian
Multispectral object detection continues to face significant challenges in achieving a balanced optimization between accuracy and efficiency. Most existing approaches rely heavily on global modeling, which, although capable of integrating multi-band information, incurs substantial computational overhead and fails to fully exploit the spatial correlations across spectral bands. To address this issue, this paper introduces a convolutional architecture-based region feature computation mechanism that leverages the inherent advantage of convolutional operations in preserving spatial structure, enabling spatial cues to be fully retained during feature representation learning and explicitly incorporated into multispectral feature interaction. Meanwhile, by reconstructing global attention computation into localized regional modeling, the proposed method markedly reduces computational cost while maintaining effective feature fusion, thereby facilitating a lightweight architectural design. Experimental results demonstrate that the proposed module achieves the lowest computational overhead while improving mAP@50 by 1.97% and 1.66% on the DroneVehicle and VEDAI remote-sensing datasets, respectively, compared with state-of-the-art methods. Moreover, it exhibits strong applicability on the pedestrian detection datasets FLIR and LLVIP. The code is available https://github.com/wzh326/LMFFM_CARFCOM.git.
在实现精度和效率之间的平衡优化方面,多光谱目标检测仍然面临着重大挑战。大多数现有方法严重依赖于全局建模,尽管能够集成多波段信息,但会产生大量的计算开销,并且不能充分利用光谱波段间的空间相关性。为了解决这一问题,本文引入了一种基于卷积架构的区域特征计算机制,该机制利用卷积运算在保留空间结构方面的固有优势,使空间线索在特征表示学习过程中得到充分保留,并明确地融入到多光谱特征交互中。同时,该方法通过将全局注意力计算重构为局部区域建模,在保持有效特征融合的同时显著降低了计算成本,从而实现了轻量化的建筑设计。实验结果表明,与现有方法相比,该模块在无人机和VEDAI遥感数据集上分别提高了1.97%和1.66%的mAP@50,实现了最低的计算开销。此外,该方法在行人检测数据集FLIR和LLVIP上具有较强的适用性。代码可以在https://github.com/wzh326/LMFFM_CARFCOM.git上找到。
{"title":"Regional defeats global: An efficient regional feature fusion via convolutional architecture for multispectral object detection","authors":"Zhenhao Wang,&nbsp;Tian Tian","doi":"10.1016/j.inffus.2025.104110","DOIUrl":"10.1016/j.inffus.2025.104110","url":null,"abstract":"<div><div>Multispectral object detection continues to face significant challenges in achieving a balanced optimization between accuracy and efficiency. Most existing approaches rely heavily on global modeling, which, although capable of integrating multi-band information, incurs substantial computational overhead and fails to fully exploit the spatial correlations across spectral bands. To address this issue, this paper introduces a convolutional architecture-based region feature computation mechanism that leverages the inherent advantage of convolutional operations in preserving spatial structure, enabling spatial cues to be fully retained during feature representation learning and explicitly incorporated into multispectral feature interaction. Meanwhile, by reconstructing global attention computation into localized regional modeling, the proposed method markedly reduces computational cost while maintaining effective feature fusion, thereby facilitating a lightweight architectural design. Experimental results demonstrate that the proposed module achieves the lowest computational overhead while improving mAP@50 by 1.97% and 1.66% on the DroneVehicle and VEDAI remote-sensing datasets, respectively, compared with state-of-the-art methods. Moreover, it exhibits strong applicability on the pedestrian detection datasets FLIR and LLVIP. The code is available <span><span>https://github.com/wzh326/LMFFM_CARFCOM.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104110"},"PeriodicalIF":15.5,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145894684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information Fusion
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1