首页 > 最新文献

Information Fusion最新文献

英文 中文
Arbitrary-scale spatial-spectral fusion using kernel integral and progressive resampling 使用核积分和渐进重采样的任意尺度空间光谱融合
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-12 DOI: 10.1016/j.inffus.2026.104143
Wei Li, Honghui Xu, Yueqian Quan, Zhe Chen, Jianwei Zheng
Benefiting from the booming deep learning techniques, spatial-spectral fusion (SSF) is considered as an ideal alternative to break the traditions of acquiring hyperspectral images (HSI) with costly devices. Yet with the remarkable progress, current solutions necessitate training and storing multiple models for different scaling factors. To overcome this dilemma, we propose a spatial-spectral fusion neural operator (SFNO) to perform arbitrary-scale SSF within the operator learning framework. Specifically, SFNO approaches the problem from the perspective of approximation theory by embedding the features of two degraded functions into a high-dimensional latent space through pointwise convolution layers, thereby capturing richer spectral feature information. Consequently, the mapping between function spaces is approximated via the Galerkin integral (GI) mechanism, which culminates in a final dimensionality reduction step to produce a high-resolution HSI. Moreover, we propose a progressive resampling integration (PR) that resamples the integrand’s domain in the triple kernel integration to provide non-local multi-scale information. The synergistic action of both integration mechanisms enables SFNO to effortlessly handle magnification factors it never encountered during training. Extensive experiments on the CAVE, Chikusei, Pavia Centre, Harvard, and real-world datasets demonstrate that our SFNO delivers substantial improvements over existing state-of-the-art methods. In particular, under the 8× upsampling setting on the CAVE, Chikusei, and Pavia Centre datasets, SFNO surpasses the second-best model by 0.56 dB, 1.05 dB, and 0.72 dB in PSNR, respectively. Our code is publicly available at https://github.com/weili419/SFNO.
得益于蓬勃发展的深度学习技术,空间光谱融合(SSF)被认为是打破使用昂贵设备获取高光谱图像(HSI)传统的理想替代方案。然而,随着显著的进步,目前的解决方案需要为不同的比例因子训练和存储多个模型。为了克服这一困境,我们提出了一种空间-光谱融合神经算子(SFNO)在算子学习框架内执行任意尺度的SSF。具体而言,SFNO从近似理论的角度解决问题,通过点向卷积层将两个退化函数的特征嵌入到高维潜在空间中,从而捕获更丰富的光谱特征信息。因此,函数空间之间的映射通过伽辽金积分(GI)机制进行近似,该机制在最终降维步骤中达到高潮,从而产生高分辨率的HSI。此外,我们提出了一种渐进式重采样积分(PR),在三核积分中对被积者的域进行重采样,以提供非局部的多尺度信息。两种整合机制的协同作用使SFNO能够毫不费力地处理在训练中从未遇到过的放大因素。在CAVE、Chikusei、Pavia中心、哈佛大学和现实世界的数据集上进行的大量实验表明,我们的SFNO比现有的最先进的方法有了实质性的改进。在CAVE、Chikusei和Pavia Centre数据集的8倍上采样设置下,SFNO的PSNR分别比次优模型高0.56 dB、1.05 dB和0.72 dB。我们的代码可以在https://github.com/weili419/SFNO上公开获得。
{"title":"Arbitrary-scale spatial-spectral fusion using kernel integral and progressive resampling","authors":"Wei Li,&nbsp;Honghui Xu,&nbsp;Yueqian Quan,&nbsp;Zhe Chen,&nbsp;Jianwei Zheng","doi":"10.1016/j.inffus.2026.104143","DOIUrl":"10.1016/j.inffus.2026.104143","url":null,"abstract":"<div><div>Benefiting from the booming deep learning techniques, spatial-spectral fusion (SSF) is considered as an ideal alternative to break the traditions of acquiring hyperspectral images (HSI) with costly devices. Yet with the remarkable progress, current solutions necessitate training and storing multiple models for different scaling factors. To overcome this dilemma, we propose a spatial-spectral fusion neural operator (SFNO) to perform <strong>arbitrary-scale</strong> SSF within the operator learning framework. Specifically, SFNO approaches the problem from the perspective of approximation theory by embedding the features of two degraded functions into a high-dimensional latent space through pointwise convolution layers, thereby capturing richer spectral feature information. Consequently, the mapping between function spaces is approximated via the Galerkin integral (GI) mechanism, which culminates in a final dimensionality reduction step to produce a high-resolution HSI. Moreover, we propose a progressive resampling integration (PR) that resamples the integrand’s domain in the triple kernel integration to provide non-local multi-scale information. The synergistic action of both integration mechanisms enables SFNO to effortlessly handle magnification factors it never encountered during training. Extensive experiments on the CAVE, Chikusei, Pavia Centre, Harvard, and real-world datasets demonstrate that our SFNO delivers substantial improvements over existing state-of-the-art methods. In particular, under the 8× upsampling setting on the CAVE, Chikusei, and Pavia Centre datasets, SFNO surpasses the second-best model by 0.56 dB, 1.05 dB, and 0.72 dB in PSNR, respectively. Our code is publicly available at <span><span>https://github.com/weili419/SFNO</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104143"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MCIVA: A multi-view pedestrian detection framework with a central inverse nearest neighbor map and a view adaptive module 基于中心逆最近邻映射和视图自适应模块的多视图行人检测框架
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-11 DOI: 10.1016/j.inffus.2026.104142
He Li , Taiyu Liao , Weihang Kong , Xingchen Zhang
Multi-view pedestrian detection is an important task and has many applications in areas such as surveillance and smart cities. Despite the significant performance improvements achieved in recent multi-view pedestrian detection methods, there are still three main challenges for this task. First, in crowded areas, neighboring connected components may merge in dense regions, resulting in unclear localization of pixel peaks for each pedestrian. Second, the loss functions used in previous multi-view pedestrian detection methods have a high response to background regions. Third, camera parameters have not been fully utilized; they are only used to generate fixed projection matrices. To address these challenges, we propose a novel multi-view pedestrian detection framework (MCIVA) with a central inverse nearest neighbor (CINN) map and a view adaptive module (VAM). The CINN map is introduced to generate the ground-truth probability occupancy map (POM) based on annotations, providing more precise location information for each pedestrian. To enhance the model’s attention to local structural information, we propose a local structural similarity loss to reduce the influence of false local maxima in background regions. Moreover, the VAM is introduced to utilize camera parameters to generate learnable weights for multi-view feature fusion. We evaluate the proposed method on three benchmark datasets, and the results show that the proposed MCIVA improves the quality of prediction maps and achieves state-of-the-art performance.
多视角行人检测是一项重要的任务,在监控和智慧城市等领域有着广泛的应用。尽管最近的多视角行人检测方法取得了显著的性能进步,但这项任务仍然存在三个主要挑战。1)在拥挤区域,相邻的连通组件可能在密集区域合并,导致每个行人像素峰值定位不清。2)以往多视角行人检测方法中使用的损失函数对背景的响应较高。3)相机参数没有被充分利用;它们仅用于生成定值投影矩阵。为了解决这些挑战,我们提出了一种新的多视图行人检测框架,该框架具有中心逆最近邻地图和视图自适应模块(MCIVA)。引入中心逆近邻图(Central Inverse Nearest Neighbor, CINN)生成基于标注的地真概率占用图(ground-truth Probability Occupancy map, POM),为每个行人提供更精确的位置信息。为了增强模型对局部结构信息的关注,我们提出了局部结构相似度损失来减少背景区域虚假局部极大值的影响。此外,引入了一种新型的即插即用视图自适应模块(VAM),利用摄像机参数生成可学习的权重,用于多视图特征融合。我们在三个基准数据集上对所提出的方法进行了评估,结果表明所提出的MCIVA方法显著提高了预测图的质量,达到了最先进的性能。
{"title":"MCIVA: A multi-view pedestrian detection framework with a central inverse nearest neighbor map and a view adaptive module","authors":"He Li ,&nbsp;Taiyu Liao ,&nbsp;Weihang Kong ,&nbsp;Xingchen Zhang","doi":"10.1016/j.inffus.2026.104142","DOIUrl":"10.1016/j.inffus.2026.104142","url":null,"abstract":"<div><div>Multi-view pedestrian detection is an important task and has many applications in areas such as surveillance and smart cities. Despite the significant performance improvements achieved in recent multi-view pedestrian detection methods, there are still three main challenges for this task. First, in crowded areas, neighboring connected components may merge in dense regions, resulting in unclear localization of pixel peaks for each pedestrian. Second, the loss functions used in previous multi-view pedestrian detection methods have a high response to background regions. Third, camera parameters have not been fully utilized; they are only used to generate fixed projection matrices. To address these challenges, we propose a novel multi-view pedestrian detection framework (<strong>MCIVA</strong>) with a central inverse nearest neighbor <em>(CINN)</em> map and a view adaptive module <em>(VAM)</em>. The CINN map is introduced to generate the ground-truth probability occupancy map (POM) based on annotations, providing more precise location information for each pedestrian. To enhance the model’s attention to local structural information, we propose <em>a local structural similarity loss</em> to reduce the influence of false local maxima in background regions. Moreover, the VAM is introduced to utilize camera parameters to generate learnable weights for multi-view feature fusion. We evaluate the proposed method on three benchmark datasets, and the results show that the proposed MCIVA improves the quality of prediction maps and achieves state-of-the-art performance.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104142"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A data fusion approach to synthesize microwave imagery of tropical cyclones from infrared data using vision transformers 利用视觉变换从红外数据合成热带气旋微波图像的数据融合方法
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-22 DOI: 10.1016/j.inffus.2026.104167
Fan Meng , Tao Song , Xianxuan Lin , Kunlin Yang
Microwave images with high spatiotemporal resolution are essential for observing and predicting tropical cyclones (TCs), including TC positioning, intensity estimation, and detection of concentric eyewall. Nevertheless, the temporal resolution of tropical cyclone microwave (TCMW) images is limited due to satellite quantity and orbit constraints, presenting a challenging problem for TC disaster forecasting. This research suggests a multi-sensor data fusion approach, using high-temporal-resolution tropical cyclone infrared (TCIR) images to generate synthetic TCMW images, offering a solution to this data scarcity problem. In particular, we introduce a deep learning network based on the Vision Transformer (TCA-ViT) to translate TCIR images into TCMW images. This can be viewed as a form of synthetic data generation, enhancing the available information for decision-making. We integrate a phase-based physical guidance mechanism into the training process. Furthermore, we have developed a dataset of TC infrared-to-microwave image conversions (TCIR2MW) for training and testing the model. Experimental results demonstrate the method’s capability in rapidly and accurately extracting key features of TCs. Leveraging techniques like Mask and Transfer Learning, it addresses the absence of TCMW images by generating MW images from IR images, thereby aiding downstream tasks like TC intensity and precipitation forecasting. This study introduces a novel approach to the field of TC image research, with the potential to advance deep learning in this direction and provide vital insights for real-time observation and prediction of global TCs. Our source code and data are publicly available online at https://github.com/kleenY/TCIR2MW.
具有高时空分辨率的微波图像对热带气旋的观测和预报至关重要,包括热带气旋定位、强度估计和同心眼壁检测。然而,由于卫星数量和轨道的限制,热带气旋微波图像的时间分辨率有限,这给热带气旋灾害预报带来了挑战。本研究提出了一种多传感器数据融合方法,利用高时间分辨率热带气旋红外(TCIR)图像生成合成的热带气旋红外(TCMW)图像,解决了这一数据稀缺问题。特别地,我们引入了一种基于视觉转换器(TCA-ViT)的深度学习网络,将TCIR图像转换为TCMW图像。这可以看作是生成综合数据的一种形式,增强了决策所需的现有信息。我们将基于阶段的物理指导机制整合到训练过程中。此外,我们开发了一个TC红外到微波图像转换(TCIR2MW)数据集,用于训练和测试模型。实验结果表明,该方法能够快速准确地提取tc的关键特征。利用掩模和迁移学习等技术,它通过从红外图像生成MW图像来解决TCMW图像的缺失问题,从而帮助下游任务,如TC强度和降水预报。本研究为TC图像研究领域引入了一种新的方法,有可能在这一方向上推进深度学习,并为全球TC的实时观测和预测提供重要的见解。我们的源代码和数据可以在https://github.com/kleenY/TCIR2MW上公开获取。
{"title":"A data fusion approach to synthesize microwave imagery of tropical cyclones from infrared data using vision transformers","authors":"Fan Meng ,&nbsp;Tao Song ,&nbsp;Xianxuan Lin ,&nbsp;Kunlin Yang","doi":"10.1016/j.inffus.2026.104167","DOIUrl":"10.1016/j.inffus.2026.104167","url":null,"abstract":"<div><div>Microwave images with high spatiotemporal resolution are essential for observing and predicting tropical cyclones (TCs), including TC positioning, intensity estimation, and detection of concentric eyewall. Nevertheless, the temporal resolution of tropical cyclone microwave (TCMW) images is limited due to satellite quantity and orbit constraints, presenting a challenging problem for TC disaster forecasting. This research suggests a multi-sensor data fusion approach, using high-temporal-resolution tropical cyclone infrared (TCIR) images to generate synthetic TCMW images, offering a solution to this data scarcity problem. In particular, we introduce a deep learning network based on the Vision Transformer (TCA-ViT) to translate TCIR images into TCMW images. This can be viewed as a form of synthetic data generation, enhancing the available information for decision-making. We integrate a phase-based physical guidance mechanism into the training process. Furthermore, we have developed a dataset of TC infrared-to-microwave image conversions (TCIR2MW) for training and testing the model. Experimental results demonstrate the method’s capability in rapidly and accurately extracting key features of TCs. Leveraging techniques like Mask and Transfer Learning, it addresses the absence of TCMW images by generating MW images from IR images, thereby aiding downstream tasks like TC intensity and precipitation forecasting. This study introduces a novel approach to the field of TC image research, with the potential to advance deep learning in this direction and provide vital insights for real-time observation and prediction of global TCs. Our source code and data are publicly available online at <span><span>https://github.com/kleenY/TCIR2MW</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104167"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IDFL: Incentive-driven federated learning with selfish clients IDFL:激励驱动的联邦学习与自私的客户
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-24 DOI: 10.1016/j.inffus.2026.104185
Jin Xu, Hengrun Zhang, Huiqun Yu, Guisheng Fan
Heterogeneity challenges have been long discussed in Federated Learning (FL). Among these challenges, statistical heterogeneity, where non-independent and identical (non-IID) data distributions across clients severely impact model convergence and performance, remains particularly problematic. While existing batch size optimization strategies effectively address system-level heterogeneity and resource constraints, they inadequately tackle statistical heterogeneity, often simply increasing batch sizes without theoretical justification. Such approaches overlook a critical convergence-generalization dilemma well-established in traditional machine learning: larger batch sizes accelerate convergence but may deteriorate generalization performance beyond critical thresholds, which is usually termed “generalization gap”. To bridge this gap in FL, we propose a comprehensive framework with three key contributions. First, we establish a batch size optimization mechanism that balances convergence and generalization objectives through a penalty function, providing mathematically derived closed-form solutions for optimal batch sizes. Second, we design a Stackelberg game-based incentive mechanism that coordinates batch size assignments with resource contributions while ensuring fair reward allocation to maximize individual client utility (defined as the difference between rewards and costs). Third, we develop a two-step verification strategy that detects and mitigates free-riding behaviors while monitoring convergence patterns to terminate ineffective training processes. Extensive experiments on real-world datasets validate our approach, demonstrating significant improvements in both convergence performance and fairness compared to state-of-the-art algorithms. Ablation studies confirm the effectiveness of each component.
异质性挑战在联邦学习(FL)中已经讨论了很长时间。在这些挑战中,统计异质性,即跨客户端的非独立和相同(非iid)数据分布严重影响模型的收敛和性能,仍然是特别成问题的。虽然现有的批大小优化策略有效地解决了系统级的异构性和资源约束,但它们不能充分解决统计上的异构性,通常只是在没有理论依据的情况下简单地增加批大小。这些方法忽略了传统机器学习中存在的一个关键的收敛-泛化困境:更大的批处理规模加速了收敛,但可能会使泛化性能恶化,超过临界阈值,这通常被称为“泛化差距”。为了弥补这一差距,我们提出了一个全面的框架,其中包括三个关键贡献。首先,我们建立了一个批大小优化机制,通过惩罚函数平衡收敛和泛化目标,提供数学上导出的最优批大小的封闭形式解。其次,我们设计了一个基于Stackelberg博弈的激励机制,该机制协调了批量分配与资源贡献,同时确保公平的奖励分配,以最大化个人客户效用(定义为奖励与成本之间的差异)。第三,我们开发了一个两步验证策略,在监测收敛模式以终止无效训练过程的同时检测和减轻搭便车行为。在现实世界数据集上进行的大量实验验证了我们的方法,与最先进的算法相比,在收敛性能和公平性方面都有了显著的改进。消融研究证实了每个组成部分的有效性。
{"title":"IDFL: Incentive-driven federated learning with selfish clients","authors":"Jin Xu,&nbsp;Hengrun Zhang,&nbsp;Huiqun Yu,&nbsp;Guisheng Fan","doi":"10.1016/j.inffus.2026.104185","DOIUrl":"10.1016/j.inffus.2026.104185","url":null,"abstract":"<div><div>Heterogeneity challenges have been long discussed in Federated Learning (FL). Among these challenges, statistical heterogeneity, where non-independent and identical (non-IID) data distributions across clients severely impact model convergence and performance, remains particularly problematic. While existing batch size optimization strategies effectively address system-level heterogeneity and resource constraints, they inadequately tackle statistical heterogeneity, often simply increasing batch sizes without theoretical justification. Such approaches overlook a critical convergence-generalization dilemma well-established in traditional machine learning: larger batch sizes accelerate convergence but may deteriorate generalization performance beyond critical thresholds, which is usually termed “generalization gap”. To bridge this gap in FL, we propose a comprehensive framework with three key contributions. First, we establish a batch size optimization mechanism that balances convergence and generalization objectives through a penalty function, providing mathematically derived closed-form solutions for optimal batch sizes. Second, we design a Stackelberg game-based incentive mechanism that coordinates batch size assignments with resource contributions while ensuring fair reward allocation to maximize individual client utility (defined as the difference between rewards and costs). Third, we develop a two-step verification strategy that detects and mitigates free-riding behaviors while monitoring convergence patterns to terminate ineffective training processes. Extensive experiments on real-world datasets validate our approach, demonstrating significant improvements in both convergence performance and fairness compared to state-of-the-art algorithms. Ablation studies confirm the effectiveness of each component.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104185"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph-guided cross-image correlation learning with adaptive global-local feature fusion for fine-grained visual representation 基于自适应全局-局部特征融合的图导交叉图像相关学习方法
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-02-03 DOI: 10.1016/j.inffus.2026.104204
Hongxing You , Yangtao Wang , Xiaocui Li , Yanzhao Xie , Da Chen , Xinyu Zhang , Wensheng Zhang
Fine-grained visual classification (FGVC) has been challenging due to the difficulty of distinguishing between highly similar local regions. Recent studies leverage graph neural network (GNN) to learn local representations, but they solely focus on patch interactions within each image, failing to capture semantic relationships across different samples and rendering fine-grained features semantically disconnected from each other. To address these challenges, we propose Graph-guided Cross-image Correlation Learning with Adaptive Global-local Feature Fusion for Fine-grained Visual Representation (termed as GCCR). We design a Cross-image Correlation Learning (CCL) module where spatially corresponding patches across images are connected as graph nodes, enabling inter-image interactions to capture semantically rich local features. In this CCL module, we introduce a Ranking Loss to address the limitation of traditional classification losses that focus solely on maximizing individual sample confidence without explicitly constraining feature discriminability among visually similar categories. In addition, GCCR constructs a lightweight fusion module that dynamically balances the contributions of global and local features, leading to unbiased image representations. We conduct extensive experiments on 4 popular FGVC datasets including CUB-200-2011, Stanford Cars, FGVC-Aircraft, and iNaturalist 2017. Experimental results verify that GCCR can achieve much higher performance than the state-of-the-art (SOTA) FGVC methods, while maintaining lower model complexity. Take the most challenging iNaturalist 2017 for example, GCCR gains at least 7.51% accuracy while reducing more than 4.42M parameter scale and 80M FLOPs than the optimal solution. We release the pretrained model and code at GitHub: https://github.com/dislie/GCCR.
由于难以区分高度相似的局部区域,细粒度视觉分类(FGVC)一直具有挑战性。最近的研究利用图神经网络(GNN)来学习局部表示,但它们只关注每个图像内的补丁交互,未能捕获不同样本之间的语义关系,并呈现彼此语义断开的细粒度特征。为了解决这些挑战,我们提出了用于细粒度视觉表示的具有自适应全局-局部特征融合的图引导交叉图像相关学习(称为GCCR)。我们设计了一个跨图像相关学习(Cross-image Correlation Learning, CCL)模块,其中跨图像的空间对应的patch被连接为图节点,使图像间的交互能够捕获语义丰富的局部特征。在这个CCL模块中,我们引入了排序损失来解决传统分类损失的局限性,传统分类损失只关注最大化个体样本置信度,而没有明确约束视觉相似类别之间的特征可判别性。此外,GCCR构建了一个轻量级融合模块,动态平衡全局和局部特征的贡献,从而实现无偏图像表示。我们在4个流行的FGVC数据集上进行了广泛的实验,包括ub -200-2011, Stanford Cars, FGVC- aircraft和iNaturalist 2017。实验结果表明,GCCR可以实现比最先进的(SOTA) FGVC方法更高的性能,同时保持更低的模型复杂度。以最具挑战性的iNaturalist 2017为例,GCCR的准确率至少提高了7.51%,同时比最优解减少了4.42M以上的参数尺度和80M以上的FLOPs。我们在GitHub上发布了预训练模型和代码:https://github.com/dislie/GCCR。
{"title":"Graph-guided cross-image correlation learning with adaptive global-local feature fusion for fine-grained visual representation","authors":"Hongxing You ,&nbsp;Yangtao Wang ,&nbsp;Xiaocui Li ,&nbsp;Yanzhao Xie ,&nbsp;Da Chen ,&nbsp;Xinyu Zhang ,&nbsp;Wensheng Zhang","doi":"10.1016/j.inffus.2026.104204","DOIUrl":"10.1016/j.inffus.2026.104204","url":null,"abstract":"<div><div>Fine-grained visual classification (FGVC) has been challenging due to the difficulty of distinguishing between highly similar local regions. Recent studies leverage graph neural network (GNN) to learn local representations, but they solely focus on patch interactions within each image, failing to capture semantic relationships across different samples and rendering fine-grained features semantically disconnected from each other. To address these challenges, we propose <strong>G</strong>raph-guided <strong>C</strong>ross-image <strong>C</strong>orrelation Learning with Adaptive Global-local Feature Fusion for Fine-grained Visual <strong>R</strong>epresentation (termed as GCCR). We design a Cross-image Correlation Learning (CCL) module where spatially corresponding patches across images are connected as graph nodes, enabling inter-image interactions to capture semantically rich local features. In this CCL module, we introduce a Ranking Loss to address the limitation of traditional classification losses that focus solely on maximizing individual sample confidence without explicitly constraining feature discriminability among visually similar categories. In addition, GCCR constructs a lightweight fusion module that dynamically balances the contributions of global and local features, leading to unbiased image representations. We conduct extensive experiments on 4 popular FGVC datasets including CUB-200-2011, Stanford Cars, FGVC-Aircraft, and iNaturalist 2017. Experimental results verify that GCCR can achieve much higher performance than the state-of-the-art (SOTA) FGVC methods, while maintaining lower model complexity. Take the most challenging iNaturalist 2017 for example, GCCR gains at least 7.51% accuracy while reducing more than 4.42M parameter scale and 80M FLOPs than the optimal solution. We release the pretrained model and code at GitHub: <span><span>https://github.com/dislie/GCCR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104204"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146110180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stain-aware domain alignment for imbalance blood cell classification 不平衡血细胞分类的染色敏感区域比对
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-20 DOI: 10.1016/j.inffus.2026.104166
Yongcheng Li , Lingcong Cai , Ying Lu , Xiao Han , Ma Li , Wenxing Lai , Xiangzhong Zhang , Xiaomao Fan
Blood cell identification is critical for hematological analysis as it aids physicians in diagnosing various blood-related diseases. In real-world scenarios, blood cell image datasets often present the issues of domain shift and data imbalance, posing challenges for accurate blood cell identification. To address these issues, we propose a novel blood cell classification method termed SADA via stain-aware domain alignment. The primary objective of this work is to mine domain-invariant features in the presence of domain shifts and data imbalance. To accomplish this objective, we propose a stain-based augmentation approach and a local alignment constraint to learn domain-invariant features. Furthermore, we propose a domain-invariant supervised contrastive learning strategy to capture discriminative features. We decouple the training process into two stages of domain-invariant feature learning and classification training, alleviating the problem of data imbalance. Experiment results on four public blood cell datasets and a private real dataset collected from the Third Affiliated Hospital of Sun Yat-sen University demonstrate that SADA can achieve a new state-of-the-art baseline, which is superior to the existing cutting-edge methods. The source code can be available at the URL (https://github.com/AnoK3111/SADA).
血细胞鉴定是血液分析的关键,因为它有助于医生诊断各种血液相关疾病。在现实场景中,血细胞图像数据集经常出现域移位和数据不平衡的问题,这给准确的血细胞识别带来了挑战。为了解决这些问题,我们提出了一种新的血细胞分类方法,称为SADA,通过染色感知结构域对齐。这项工作的主要目标是在存在域移位和数据不平衡的情况下挖掘域不变特征。为了实现这一目标,我们提出了一种基于染色的增强方法和局部对齐约束来学习域不变特征。此外,我们提出了一种领域不变的监督对比学习策略来捕获判别特征。我们将训练过程解耦为域不变特征学习和分类训练两个阶段,缓解了数据不平衡的问题。在中山大学附属第三医院的四个公共血细胞数据集和一个私人真实数据集上的实验结果表明,SADA可以实现新的最先进的基线,优于现有的前沿方法。源代码可以从URL (https://github.com/AnoK3111/SADA)获得。
{"title":"Stain-aware domain alignment for imbalance blood cell classification","authors":"Yongcheng Li ,&nbsp;Lingcong Cai ,&nbsp;Ying Lu ,&nbsp;Xiao Han ,&nbsp;Ma Li ,&nbsp;Wenxing Lai ,&nbsp;Xiangzhong Zhang ,&nbsp;Xiaomao Fan","doi":"10.1016/j.inffus.2026.104166","DOIUrl":"10.1016/j.inffus.2026.104166","url":null,"abstract":"<div><div>Blood cell identification is critical for hematological analysis as it aids physicians in diagnosing various blood-related diseases. In real-world scenarios, blood cell image datasets often present the issues of domain shift and data imbalance, posing challenges for accurate blood cell identification. To address these issues, we propose a novel blood cell classification method termed SADA via stain-aware domain alignment. The primary objective of this work is to mine domain-invariant features in the presence of domain shifts and data imbalance. To accomplish this objective, we propose a stain-based augmentation approach and a local alignment constraint to learn domain-invariant features. Furthermore, we propose a domain-invariant supervised contrastive learning strategy to capture discriminative features. We decouple the training process into two stages of domain-invariant feature learning and classification training, alleviating the problem of data imbalance. Experiment results on four public blood cell datasets and a private real dataset collected from the Third Affiliated Hospital of Sun Yat-sen University demonstrate that SADA can achieve a new state-of-the-art baseline, which is superior to the existing cutting-edge methods. The source code can be available at the URL (<span><span>https://github.com/AnoK3111/SADA</span><svg><path></path></svg></span>).</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104166"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146014547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal fusion of 3D point cloud and intraoperative imaging to enhance surgical robot navigation 三维点云和术中影像的多模态融合增强手术机器人导航
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-07-01 Epub Date: 2026-01-21 DOI: 10.1016/j.inffus.2026.104171
Yiheng Wang , Tianlun Wang , Tao Liu , Yihao Huang
To address the insufficient navigation accuracy of surgical robots in dynamic, complex, and non-rigid intraoperative environments, this paper proposes an enhanced multimodal fusion framework—EMF-RSN. This framework achieves spatial consistency between point clouds and intraoperative images through a depth-guided Geometry-Vision Alignment Module (GVAN), implements dynamic weighted fusion of geometric and visual features through a cross-modal attention fusion module (CAFM), and constructs a closed-loop optimization mechanism from perception to decision through a Task Feedback Optimization (TFO) module, thereby improving navigation accuracy and stability.
Experiments on the public dataset (Hamlyn) and the self-built simulation dataset (Sim-Surgical Fusion) demonstrate that EMF-RSN significantly outperforms existing methods in terms of geometric accuracy, semantic consistency, and task robustness. Compared to traditional registration algorithms, point cloud errors are reduced by approximately 50%, trajectory errors are reduced by over 20%, and real-time performance of 44 FPS is maintained even in complex deformation and occlusion environments. This research provides a new technical approach and model foundation for realizing intelligent surgical navigation that integrates virtual and real elements, and is of great significance for the perception and autonomous control of surgical robots.
为了解决手术机器人在动态、复杂、非刚性的术中环境中导航精度不足的问题,本文提出了一种增强型多模态融合框架- emf - rsn。该框架通过深度引导的几何-视觉对齐模块(GVAN)实现点云和术中图像的空间一致性,通过跨模态注意力融合模块(CAFM)实现几何特征和视觉特征的动态加权融合,通过任务反馈优化模块(TFO)构建从感知到决策的闭环优化机制,从而提高导航精度和稳定性。在公共数据集(Hamlyn)和自建仿真数据集(Sim-Surgical Fusion)上的实验表明,EMF-RSN在几何精度、语义一致性和任务鲁棒性方面显著优于现有方法。与传统配准算法相比,点云误差减少了约50%,轨迹误差减少了20%以上,即使在复杂的变形和遮挡环境下也能保持44 FPS的实时性能。本研究为实现虚实结合的智能手术导航提供了新的技术途径和模型基础,对手术机器人的感知和自主控制具有重要意义。
{"title":"Multimodal fusion of 3D point cloud and intraoperative imaging to enhance surgical robot navigation","authors":"Yiheng Wang ,&nbsp;Tianlun Wang ,&nbsp;Tao Liu ,&nbsp;Yihao Huang","doi":"10.1016/j.inffus.2026.104171","DOIUrl":"10.1016/j.inffus.2026.104171","url":null,"abstract":"<div><div>To address the insufficient navigation accuracy of surgical robots in dynamic, complex, and non-rigid intraoperative environments, this paper proposes an enhanced multimodal fusion framework—EMF-RSN. This framework achieves spatial consistency between point clouds and intraoperative images through a depth-guided Geometry-Vision Alignment Module (GVAN), implements dynamic weighted fusion of geometric and visual features through a cross-modal attention fusion module (CAFM), and constructs a closed-loop optimization mechanism from perception to decision through a Task Feedback Optimization (TFO) module, thereby improving navigation accuracy and stability.</div><div>Experiments on the public dataset (Hamlyn) and the self-built simulation dataset (Sim-Surgical Fusion) demonstrate that EMF-RSN significantly outperforms existing methods in terms of geometric accuracy, semantic consistency, and task robustness. Compared to traditional registration algorithms, point cloud errors are reduced by approximately 50%, trajectory errors are reduced by over 20%, and real-time performance of 44 FPS is maintained even in complex deformation and occlusion environments. This research provides a new technical approach and model foundation for realizing intelligent surgical navigation that integrates virtual and real elements, and is of great significance for the perception and autonomous control of surgical robots.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104171"},"PeriodicalIF":15.5,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146014542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generating vision-language navigation instructions incorporated fine-grained alignment annotations 生成包含细粒度对齐注释的视觉语言导航指令
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-06-01 Epub Date: 2025-12-30 DOI: 10.1016/j.inffus.2025.104107
Yibo Cui , Liang Xie , Yu Zhao , Jiawei Sun , Erwei Yin
Vision-Language Navigation (VLN) enables intelligent agents to navigate environments by integrating visual perception and natural language instructions, yet faces significant challenges due to the scarcity of fine-grained cross-modal alignment annotations. Existing datasets primarily focus on global instruction-trajectory matching, neglecting sub-instruction-level and entity-level alignments critical for accurate navigation action decision-making. To address this limitation, we propose FCA-NIG, a generative framework that automatically constructs navigation instructions with dual-level fine-grained cross-modal annotations. In this framework, an augmented trajectory is first divided into sub-trajectories, which are then processed through GLIP-based landmark detection, crafted instruction construction, OFA-Speaker based R2R-like instruction generation, and CLIP-powered entity selection, generating sub-instruction-trajectory pairs with entity-landmark annotations. Finally, these sub-pairs are aggregated to form a complete instruction-trajectory pair. The framework generates the FCA-R2R dataset, the first large-scale augmentation dataset featuring precise sub-instruction-sub-trajectory and entity-landmark alignments. Extensive experiments demonstrate that training with FCA-R2R significantly improves the performance of multiple state-of-the-art VLN agents, including SF, EnvDrop, RecBERT, HAMT, DUET, and BEVBERT. Incorporating sub-instruction-trajectory alignment enhances agents’ state awareness and decision accuracy, while entity-landmark alignment further boosts navigation performance and generalization. These results highlight the effectiveness of FCA-NIG in generating high-quality, scalable training data without manual annotation, advancing fine-grained cross-modal learning in complex navigation tasks.
视觉语言导航(VLN)使智能代理能够通过集成视觉感知和自然语言指令来导航环境,但由于缺乏细粒度跨模态对齐注释,因此面临重大挑战。现有数据集主要关注全局指令-轨迹匹配,而忽略了子指令级和实体级对齐,这对精确的导航行动决策至关重要。为了解决这一限制,我们提出了FCA-NIG,这是一个生成框架,可以自动构建带有双级细粒度跨模态注释的导航指令。在该框架中,首先将增强轨迹划分为子轨迹,然后通过基于glip的地标检测、精心设计的指令构建、基于OFA-Speaker的类r2r指令生成和基于clip的实体选择对子轨迹进行处理,生成带有实体地标注释的子指令轨迹对。最后,将这些子对聚合成一个完整的指令轨迹对。该框架生成FCA-R2R数据集,这是第一个具有精确子指令-子轨迹和实体-地标对齐的大规模增强数据集。大量实验表明,使用FCA-R2R进行训练可以显著提高多个最先进的VLN智能体的性能,包括SF、EnvDrop、RecBERT、HAMT、DUET和BEVBERT。子指令-轨迹对齐提高了智能体的状态感知和决策精度,而实体-地标对齐进一步提高了导航性能和泛化。这些结果突出了FCA-NIG在生成高质量、可扩展的训练数据方面的有效性,无需人工注释,推进了复杂导航任务中细粒度跨模态学习。
{"title":"Generating vision-language navigation instructions incorporated fine-grained alignment annotations","authors":"Yibo Cui ,&nbsp;Liang Xie ,&nbsp;Yu Zhao ,&nbsp;Jiawei Sun ,&nbsp;Erwei Yin","doi":"10.1016/j.inffus.2025.104107","DOIUrl":"10.1016/j.inffus.2025.104107","url":null,"abstract":"<div><div>Vision-Language Navigation (VLN) enables intelligent agents to navigate environments by integrating visual perception and natural language instructions, yet faces significant challenges due to the scarcity of fine-grained cross-modal alignment annotations. Existing datasets primarily focus on global instruction-trajectory matching, neglecting sub-instruction-level and entity-level alignments critical for accurate navigation action decision-making. To address this limitation, we propose FCA-NIG, a generative framework that automatically constructs navigation instructions with dual-level fine-grained cross-modal annotations. In this framework, an augmented trajectory is first divided into sub-trajectories, which are then processed through GLIP-based landmark detection, crafted instruction construction, OFA-Speaker based R2R-like instruction generation, and CLIP-powered entity selection, generating sub-instruction-trajectory pairs with entity-landmark annotations. Finally, these sub-pairs are aggregated to form a complete instruction-trajectory pair. The framework generates the FCA-R2R dataset, the first large-scale augmentation dataset featuring precise sub-instruction-sub-trajectory and entity-landmark alignments. Extensive experiments demonstrate that training with FCA-R2R significantly improves the performance of multiple state-of-the-art VLN agents, including SF, EnvDrop, RecBERT, HAMT, DUET, and BEVBERT. Incorporating sub-instruction-trajectory alignment enhances agents’ state awareness and decision accuracy, while entity-landmark alignment further boosts navigation performance and generalization. These results highlight the effectiveness of FCA-NIG in generating high-quality, scalable training data without manual annotation, advancing fine-grained cross-modal learning in complex navigation tasks.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104107"},"PeriodicalIF":15.5,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145939775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Secure Tobit filtering for multi-rate nonlinear systems under multi-node random access protocol: A Paillier encryption-decryption mechanism 多节点随机访问协议下多速率非线性系统的安全Tobit滤波:一种Paillier加解密机制
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-06-01 Epub Date: 2026-01-10 DOI: 10.1016/j.inffus.2026.104146
Shuo Yang , Raquel Caballero-Águila , Jun Hu , Antonia Oya-Lechuga
In this paper, the secure Tobit filtering (TF) problem is investigated for nonlinear systems subject to measurement censoring under a multi-node random access protocol (MNRAP). A multi-rate sampling framework is considered, which allows the system states and measurement outputs to operate with distinct sampling periods, thus reflecting practical engineering constraints. Furthermore, to mitigate data collisions and improve resource utilization, the MNRAP is adopted to regulate the transmission order of measurement signals over communication networks. In addition, to safeguard the communication confidentiality between the sensor node and the filter, the Paillier encryption-decryption mechanism is incorporated. This protects the transmitted information from being intercepted by unauthorized third parties. This paper concentrates on developing an innovative secure TF scheme that guarantees the existence of an upper bound (UB) on the filtering error second moment. Subsequently, the minimization of the obtained UB is carried out in the trace sense by designing a proper filter gain. Additionally, the uniform boundedness of the filtering error is verified in the mean-square sense by establishing a sufficient criterion. Finally, the efficacy and advantages of the proposed secure TF approach are demonstrated through a simulation example.
研究了在多节点随机访问协议(MNRAP)下,受测量滤波约束的非线性系统的安全Tobit滤波问题。考虑了一个多速率采样框架,它允许系统状态和测量输出在不同的采样周期下运行,从而反映了实际的工程约束。此外,为了减少数据冲突,提高资源利用率,采用MNRAP规范通信网络中测量信号的传输顺序。此外,为了保证传感器节点与滤波器之间通信的机密性,还引入了Paillier加解密机制。这样可以防止传输的信息被未经授权的第三方截获。本文研究了一种保证滤波误差秒矩存在上界的安全TF方案。随后,通过设计适当的滤波器增益,在迹线意义上实现了所得UB的最小化。此外,通过建立一个充分的判据,在均方意义上验证了滤波误差的均匀有界性。最后,通过仿真实例验证了该方法的有效性和优越性。
{"title":"Secure Tobit filtering for multi-rate nonlinear systems under multi-node random access protocol: A Paillier encryption-decryption mechanism","authors":"Shuo Yang ,&nbsp;Raquel Caballero-Águila ,&nbsp;Jun Hu ,&nbsp;Antonia Oya-Lechuga","doi":"10.1016/j.inffus.2026.104146","DOIUrl":"10.1016/j.inffus.2026.104146","url":null,"abstract":"<div><div>In this paper, the secure Tobit filtering (TF) problem is investigated for nonlinear systems subject to measurement censoring under a multi-node random access protocol (MNRAP). A multi-rate sampling framework is considered, which allows the system states and measurement outputs to operate with distinct sampling periods, thus reflecting practical engineering constraints. Furthermore, to mitigate data collisions and improve resource utilization, the MNRAP is adopted to regulate the transmission order of measurement signals over communication networks. In addition, to safeguard the communication confidentiality between the sensor node and the filter, the Paillier encryption-decryption mechanism is incorporated. This protects the transmitted information from being intercepted by unauthorized third parties. This paper concentrates on developing an innovative secure TF scheme that guarantees the existence of an upper bound (UB) on the filtering error second moment. Subsequently, the minimization of the obtained UB is carried out in the trace sense by designing a proper filter gain. Additionally, the uniform boundedness of the filtering error is verified in the mean-square sense by establishing a sufficient criterion. Finally, the efficacy and advantages of the proposed secure TF approach are demonstrated through a simulation example.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104146"},"PeriodicalIF":15.5,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention-driven contrastive learning for cross-modal hashing with prototypical separation 具有原型分离的跨模态哈希的注意驱动对比学习
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-06-01 Epub Date: 2025-12-25 DOI: 10.1016/j.inffus.2025.104078
Zhipeng He , Wenzhe Liu , Lian Wu , Jinrong Cui , Jie Wen
Effective retrieval and structuring of heterogeneous data have grown more difficult due to the exponential development of multimedia data. The surge in data volume emphasizes the importance of efficient cross-modal hashing techniques, known for their rapid retrieval speed and minimal storage requirements, which have garnered attention recently. However, existing unsupervised cross-modal hashing methods often fail to capture latent semantic structures and meaningful modality interactions, which limits their retrieval performance. To address these challenges, we propose Attention-driven Contrastive Learning for Cross-Modal Hashing via Prototypical Separation (ACoPSe). The method introduces a modality-aware fusion mechanism to enhance cross-modal feature interaction and a prototype alignment strategy that reduces heterogeneity at the cluster level by leveraging pseudo-labels derived from clustering. Extensive experiments demonstrate that our method achieves comparable performance to state-of-the-art approaches.
随着多媒体数据的迅猛发展,异构数据的有效检索和结构化变得越来越困难。数据量的激增强调了高效的跨模态散列技术的重要性,这种技术以其快速的检索速度和最小的存储需求而闻名,最近引起了人们的关注。然而,现有的无监督跨模态哈希方法往往不能捕获潜在的语义结构和有意义的模态交互,这限制了它们的检索性能。为了解决这些挑战,我们提出了通过原型分离(ACoPSe)进行跨模态哈希的注意驱动对比学习。该方法引入了一种模态感知融合机制来增强跨模态特征交互,并引入了一种原型对齐策略,通过利用来自聚类的伪标签来减少聚类级别的异质性。大量的实验表明,我们的方法达到了与最先进的方法相当的性能。
{"title":"Attention-driven contrastive learning for cross-modal hashing with prototypical separation","authors":"Zhipeng He ,&nbsp;Wenzhe Liu ,&nbsp;Lian Wu ,&nbsp;Jinrong Cui ,&nbsp;Jie Wen","doi":"10.1016/j.inffus.2025.104078","DOIUrl":"10.1016/j.inffus.2025.104078","url":null,"abstract":"<div><div>Effective retrieval and structuring of heterogeneous data have grown more difficult due to the exponential development of multimedia data. The surge in data volume emphasizes the importance of efficient cross-modal hashing techniques, known for their rapid retrieval speed and minimal storage requirements, which have garnered attention recently. However, existing unsupervised cross-modal hashing methods often fail to capture latent semantic structures and meaningful modality interactions, which limits their retrieval performance. To address these challenges, we propose Attention-driven Contrastive Learning for Cross-Modal Hashing via Prototypical Separation (ACoPSe). The method introduces a modality-aware fusion mechanism to enhance cross-modal feature interaction and a prototype alignment strategy that reduces heterogeneity at the cluster level by leveraging pseudo-labels derived from clustering. Extensive experiments demonstrate that our method achieves comparable performance to state-of-the-art approaches.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104078"},"PeriodicalIF":15.5,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information Fusion
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1