首页 > 最新文献

Journal of Visual Communication and Image Representation最新文献

英文 中文
CrossGlue: Cross-Modal Image matching via potential message investigation and visual-gradient message integration 交叉胶:跨模态图像匹配通过潜在的信息调查和视觉梯度信息集成
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-03 DOI: 10.1016/j.jvcir.2025.104620
Chaobo Yu , Zhonghui Pei , Xiaoran Wang , Huabing Zhou
Compared with single-modal image matching, cross-modal image matching can provide more comprehensive and detailed information, which is essential for a series of visual-related tasks. However, the matching process is difficult due to differences in imaging principles, proportions, relative translation and rotation between visible and infrared images. Besides, other detection-based single-modal matching methods have low accuracy, while detection-free methods are time-consuming and fail to handle real-world scenarios. Therefore, this paper proposes CrossGlue, a light cross-modal images matching framework. The framework introduces a cross-modal message transfer (CMT) module to integrate more potential information for each keypoint through one-to-one image transfer, and a visual-gradient graph neural network (VG-GNN) to enhance visible–infrared matching in degraded scenarios. Experimental results on public datasets show that CrossGlue has excellent performance among detection-based methods and outperforms strong baseline methods in tasks such as homography estimation and relative pose estimation.
与单模态图像匹配相比,跨模态图像匹配可以提供更全面和详细的信息,这对于一系列与视觉相关的任务至关重要。然而,由于可见光和红外图像在成像原理、比例、相对平移和旋转等方面的差异,使得匹配过程变得困难。此外,其他基于检测的单模态匹配方法精度较低,而无检测的方法耗时且无法处理真实场景。为此,本文提出了一种轻型跨模态图像匹配框架CrossGlue。该框架引入了一个跨模态信息传输(CMT)模块,通过一对一的图像传输来整合每个关键点的更多潜在信息,并引入了一个视觉梯度图神经网络(svg - gnn)来增强退化场景下的可见-红外匹配。在公共数据集上的实验结果表明,CrossGlue在基于检测的方法中具有优异的性能,在单应性估计和相对姿态估计等任务中优于强基线方法。
{"title":"CrossGlue: Cross-Modal Image matching via potential message investigation and visual-gradient message integration","authors":"Chaobo Yu ,&nbsp;Zhonghui Pei ,&nbsp;Xiaoran Wang ,&nbsp;Huabing Zhou","doi":"10.1016/j.jvcir.2025.104620","DOIUrl":"10.1016/j.jvcir.2025.104620","url":null,"abstract":"<div><div>Compared with single-modal image matching, cross-modal image matching can provide more comprehensive and detailed information, which is essential for a series of visual-related tasks. However, the matching process is difficult due to differences in imaging principles, proportions, relative translation and rotation between visible and infrared images. Besides, other detection-based single-modal matching methods have low accuracy, while detection-free methods are time-consuming and fail to handle real-world scenarios. Therefore, this paper proposes CrossGlue, a light cross-modal images matching framework. The framework introduces a cross-modal message transfer (CMT) module to integrate more potential information for each keypoint through one-to-one image transfer, and a visual-gradient graph neural network (VG-GNN) to enhance visible–infrared matching in degraded scenarios. Experimental results on public datasets show that CrossGlue has excellent performance among detection-based methods and outperforms strong baseline methods in tasks such as homography estimation and relative pose estimation.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"114 ","pages":"Article 104620"},"PeriodicalIF":3.1,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145468802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiple cross-modal complementation network for lightweight RGB-D salient object detection 轻量级RGB-D显著目标检测的多跨模态互补网络
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-01 DOI: 10.1016/j.jvcir.2025.104622
Changhe Zhang, Fen Chen, Lian Huang, Zongju Peng, Xin Hu
The large model sizes and high computational costs of traditional convolutional neural networks hinder the deployment of RGB-D salient object detection (SOD) models on mobile devices. To effectively balance and improve the efficiency and accuracy of RGB-D SOD, we propose a multiple cross-modal complementation network (MCCNet) which fully utilizes complementary information in multiple dimensions. First, according to the information complementarity between depth features and RGB features , we propose a multiple cross-modal complementation (MCC) module to strengthen the feature representation and fusion ability of lightweight networks. Secondly, based on the MCC module, we propose a global and local features cooperative depth enhancement module to improve the quality of depth maps. Finally, we propose an RGB-assisted extraction and fusion backbone. RGB features are fed into this backbone and assist in the extraction of depth features, so as to be efficiently fused with extracted depth features. The experimental results on five challenging datasets show that the MCCNet achieves 1955fps on a single RTX 4090 GPU with few parameters (5.5M), and performs favorably against 12 state-of-the-art RGB-D SOD methods in term of accuracy.
传统卷积神经网络模型尺寸大、计算成本高,阻碍了RGB-D显著目标检测(SOD)模型在移动设备上的应用。为了有效平衡和提高RGB-D SOD的效率和准确性,我们提出了一个多维度充分利用互补信息的多跨模态互补网络(mcnet)。首先,根据深度特征与RGB特征之间的信息互补性,提出了多模态互补(multi - cross-modal complementary, MCC)模块,增强了轻量化网络的特征表示和融合能力;其次,在MCC模块的基础上,提出了全局与局部特征协同深度增强模块,提高深度图质量;最后,我们提出了一个rgb辅助提取和融合骨干。将RGB特征输入到该主干中,辅助深度特征的提取,从而与提取的深度特征有效融合。在5个具有挑战性的数据集上的实验结果表明,MCCNet在单个RTX 4090 GPU上以较少的参数(5.5M)达到了1955fps,并且在精度方面优于12种最先进的RGB-D SOD方法。
{"title":"Multiple cross-modal complementation network for lightweight RGB-D salient object detection","authors":"Changhe Zhang,&nbsp;Fen Chen,&nbsp;Lian Huang,&nbsp;Zongju Peng,&nbsp;Xin Hu","doi":"10.1016/j.jvcir.2025.104622","DOIUrl":"10.1016/j.jvcir.2025.104622","url":null,"abstract":"<div><div>The large model sizes and high computational costs of traditional convolutional neural networks hinder the deployment of RGB-D salient object detection (SOD) models on mobile devices. To effectively balance and improve the efficiency and accuracy of RGB-D SOD, we propose a multiple cross-modal complementation network (MCCNet) which fully utilizes complementary information in multiple dimensions. First, according to the information complementarity between depth features and RGB features , we propose a multiple cross-modal complementation (MCC) module to strengthen the feature representation and fusion ability of lightweight networks. Secondly, based on the MCC module, we propose a global and local features cooperative depth enhancement module to improve the quality of depth maps. Finally, we propose an RGB-assisted extraction and fusion backbone. RGB features are fed into this backbone and assist in the extraction of depth features, so as to be efficiently fused with extracted depth features. The experimental results on five challenging datasets show that the MCCNet achieves 1955fps on a single RTX 4090 GPU with few parameters (5.5M), and performs favorably against 12 state-of-the-art RGB-D SOD methods in term of accuracy.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"113 ","pages":"Article 104622"},"PeriodicalIF":3.1,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MIEI:A KID-based quality assessment metric for grayscale industrial equipment images MIEI:基于kid的灰度工业设备图像质量评估度量
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-01 DOI: 10.1016/j.jvcir.2025.104626
TianXu Han, Jiani Sun, Haihong Li, Yanjun Zhang
Existing Image Quality Assessment metrics exhibit limited adaptability to complex industrial scenarios, constraining evaluation accuracy for Industrial Equipment Images (IEIs). We investigate with grayscale IEIs and propose a new metric named MIEI. First, we design the parametric Sobel (Para-Sobel) that dynamically adjusts central-row weight coefficients to compensate for edge detection errors induced by fixed weights in traditional operators. Second, we introduce a geometric constraint module, GOML, that couples pixel-level gradient magnitude with directional features to simultaneously capture edge length and scale variation in key regions. Experiments demonstrate that the Para-Sobel improves edge continuity detection accuracy by 24 % (FOM) and 9.6 % (SSIM) over traditional Sobel. The GOML-integrated model achieves a 14.38 % higher Edge Preservation Ratio than baseline models. Collectively, MIEI outperforms KID by 12.24 % in KROCC and 21.17 % in PLCC across critical metrics, while maintaining real-time inference at 49.5 ms.
现有的图像质量评估指标对复杂工业场景的适应性有限,限制了工业设备图像(iei)的评估准确性。我们对灰度iei进行了研究,并提出了一种新的度量,称为MIEI。首先,我们设计了参数化Sobel (Para-Sobel)算法,动态调整中心行权系数,以补偿传统算子中固定权值引起的边缘检测误差。其次,引入几何约束模块GOML,将像素级梯度大小与方向特征相结合,同时捕获关键区域的边缘长度和尺度变化。实验表明,与传统的索贝尔算法相比,Para-Sobel算法的边缘连续性检测精度分别提高了24% (FOM)和9.6% (SSIM)。与基线模型相比,gml集成模型的边缘保留率提高了14.38%。总的来说,MIEI在KROCC和PLCC的关键指标上比KID高出12.24%和21.17%,同时保持了49.5 ms的实时推断。
{"title":"MIEI:A KID-based quality assessment metric for grayscale industrial equipment images","authors":"TianXu Han,&nbsp;Jiani Sun,&nbsp;Haihong Li,&nbsp;Yanjun Zhang","doi":"10.1016/j.jvcir.2025.104626","DOIUrl":"10.1016/j.jvcir.2025.104626","url":null,"abstract":"<div><div>Existing Image Quality Assessment metrics exhibit limited adaptability to complex industrial scenarios, constraining evaluation accuracy for Industrial Equipment Images (IEIs). We investigate with grayscale IEIs and propose a new metric named MIEI. First, we design the parametric Sobel (Para-Sobel) that dynamically adjusts central-row weight coefficients to compensate for edge detection errors induced by fixed weights in traditional operators. Second, we introduce a geometric constraint module, GOML, that couples pixel-level gradient magnitude with directional features to simultaneously capture edge length and scale variation in key regions. Experiments demonstrate that the Para-Sobel improves edge continuity detection accuracy by 24 % (FOM) and 9.6 % (SSIM) over traditional Sobel. The GOML-integrated model achieves a 14.38 % higher Edge Preservation Ratio than baseline models. Collectively, MIEI outperforms KID by 12.24 % in KROCC and 21.17 % in PLCC across critical metrics, while maintaining real-time inference at 49.5 ms.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"113 ","pages":"Article 104626"},"PeriodicalIF":3.1,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incremental pseudo-labeling for black-box unsupervised domain adaptation 黑箱无监督域自适应的增量伪标记
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-10-30 DOI: 10.1016/j.jvcir.2025.104630
Yawen Zou , Chunzhi Gu , Jun Yu , Shangce Gao , Chao Zhang
Black-Box unsupervised domain adaptation (BBUDA) learns knowledge only with the prediction of target data from the source model without access to the source data and source model, which attempts to alleviate concerns about the privacy and security of data. However, incorrect pseudo-labels are prevalent in the prediction generated by the source model due to the cross-domain discrepancy, which may substantially degrade the performance of the target model. To address this problem, we propose a novel approach that incrementally selects high-confidence pseudo-labels to improve the generalization ability of the target model. Specifically, we first generate pseudo-labels using a source model and train a crude target model by a vanilla BBUDA method. Second, we iteratively select high-confidence data from the low-confidence data pool by thresholding the softmax probabilities, prototype labels, and intra-class similarity. Then, we iteratively train a stronger target network based on the crude target model to correct the wrongly labeled samples to improve the accuracy of the pseudo-label. Experimental results demonstrate that the proposed method achieves state-of-the-art black-box unsupervised domain adaptation performance on three benchmark datasets.
黑盒无监督域自适应(Black-Box unsupervised domain adaptation, BBUDA)仅通过源模型对目标数据的预测来学习知识,而不需要访问源数据和源模型,试图减轻对数据隐私和安全的担忧。然而,由于跨域差异,源模型生成的预测中普遍存在不正确的伪标签,这可能会大大降低目标模型的性能。为了解决这个问题,我们提出了一种新的方法,即增量选择高置信度的伪标签来提高目标模型的泛化能力。具体来说,我们首先使用源模型生成伪标签,并通过香草BBUDA方法训练粗目标模型。其次,我们通过阈值化softmax概率、原型标签和类内相似性,从低置信度数据池中迭代选择高置信度数据。然后,我们在原始目标模型的基础上迭代训练一个更强的目标网络来纠正错误标记的样本,以提高伪标签的准确性。实验结果表明,该方法在三个基准数据集上取得了最先进的黑盒无监督域自适应性能。
{"title":"Incremental pseudo-labeling for black-box unsupervised domain adaptation","authors":"Yawen Zou ,&nbsp;Chunzhi Gu ,&nbsp;Jun Yu ,&nbsp;Shangce Gao ,&nbsp;Chao Zhang","doi":"10.1016/j.jvcir.2025.104630","DOIUrl":"10.1016/j.jvcir.2025.104630","url":null,"abstract":"<div><div>Black-Box unsupervised domain adaptation (BBUDA) learns knowledge only with the prediction of target data from the source model without access to the source data and source model, which attempts to alleviate concerns about the privacy and security of data. However, incorrect pseudo-labels are prevalent in the prediction generated by the source model due to the cross-domain discrepancy, which may substantially degrade the performance of the target model. To address this problem, we propose a novel approach that incrementally selects high-confidence pseudo-labels to improve the generalization ability of the target model. Specifically, we first generate pseudo-labels using a source model and train a crude target model by a vanilla BBUDA method. Second, we iteratively select high-confidence data from the low-confidence data pool by thresholding the softmax probabilities, prototype labels, and intra-class similarity. Then, we iteratively train a stronger target network based on the crude target model to correct the wrongly labeled samples to improve the accuracy of the pseudo-label. Experimental results demonstrate that the proposed method achieves state-of-the-art black-box unsupervised domain adaptation performance on three benchmark datasets.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"113 ","pages":"Article 104630"},"PeriodicalIF":3.1,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145418404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SegGeo-SLAM: A real-time Visual SLAM system for dynamic environments SegGeo-SLAM:用于动态环境的实时可视化SLAM系统
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-10-30 DOI: 10.1016/j.jvcir.2025.104627
Zhaoqian Jia, Yixiao Ma, Nan Zhou, Guangqiang Yin, Zhiguo Wang
Most Visual Simultaneous Localization and Mapping (V-SLAM) algorithms assume a static environment, within which they exhibit robust performance. However, dynamic objects can significantly degrade the performance of these systems, as they fail to account for object motion. To address this limitation, we propose a novel V-SLAM system, SegGeo-SLAM, which is specifically tailored to operate effectively in dynamic environments. To ensure real-time performance, a frame keeping mechanism is carefully designed, enabling SegGeo-SLAM to capture objects from multiple viewpoints. Furthermore, the motion check that combines 2D epipolar error with 3D depth error is proposed, ensuring the accurate identification of dynamic objects. Dynamic objects are eliminated, while the rigid constraints within static targets are applied to enhance the accuracy of pose estimation. Extensive experiments demonstrate that SegGeo-SLAM achieves comparable performance to existing advanced methods, thereby offering a compelling solution for V-SLAM in dynamic environments.
大多数视觉同步定位和映射(V-SLAM)算法假设一个静态环境,在这个环境中它们表现出强大的性能。然而,动态对象会显著降低这些系统的性能,因为它们无法解释对象的运动。为了解决这一限制,我们提出了一种新的V-SLAM系统,SegGeo-SLAM,它是专门为在动态环境中有效运行而量身定制的。为了确保实时性能,我们精心设计了帧保持机制,使SegGeo-SLAM能够从多个视点捕获目标。在此基础上,提出了二维极面误差与三维深度误差相结合的运动检测方法,保证了动态目标的准确识别。在消除动态目标的同时,利用静态目标内部的刚性约束来提高姿态估计的精度。大量实验表明,SegGeo-SLAM的性能与现有的先进方法相当,从而为动态环境中的V-SLAM提供了令人信服的解决方案。
{"title":"SegGeo-SLAM: A real-time Visual SLAM system for dynamic environments","authors":"Zhaoqian Jia,&nbsp;Yixiao Ma,&nbsp;Nan Zhou,&nbsp;Guangqiang Yin,&nbsp;Zhiguo Wang","doi":"10.1016/j.jvcir.2025.104627","DOIUrl":"10.1016/j.jvcir.2025.104627","url":null,"abstract":"<div><div>Most Visual Simultaneous Localization and Mapping (V-SLAM) algorithms assume a static environment, within which they exhibit robust performance. However, dynamic objects can significantly degrade the performance of these systems, as they fail to account for object motion. To address this limitation, we propose a novel V-SLAM system, SegGeo-SLAM, which is specifically tailored to operate effectively in dynamic environments. To ensure real-time performance, a frame keeping mechanism is carefully designed, enabling SegGeo-SLAM to capture objects from multiple viewpoints. Furthermore, the motion check that combines 2D epipolar error with 3D depth error is proposed, ensuring the accurate identification of dynamic objects. Dynamic objects are eliminated, while the rigid constraints within static targets are applied to enhance the accuracy of pose estimation. Extensive experiments demonstrate that SegGeo-SLAM achieves comparable performance to existing advanced methods, thereby offering a compelling solution for V-SLAM in dynamic environments.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"113 ","pages":"Article 104627"},"PeriodicalIF":3.1,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145418405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Superpixel segmentation of remote sensing images via edge extension and adaptive region merging 基于边缘扩展和自适应区域合并的遥感图像超像素分割
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-10-29 DOI: 10.1016/j.jvcir.2025.104628
Kun Wang , Yubo Men , Yongmei Liu , Jin Li , Ruifeng Zhao , Chaoguang Men
Superpixel segmentation is critical for remote sensing image analysis but remains challenging to balance boundary adherence and shape regularity under weak edges and spectral heterogeneity. And Most existing superpixel algorithms primarily focus on color intensity and spatial coordinates, often neglecting local neighborhood factors and regional constraints. This paper proposes an edge-aware framework that fuses Simple Linear Iterative Clustering (SLIC) superpixel edges with pyramid multi-scale Canny edges, closes fragmented contours via gradient-guided endpoint extension, and performs regionally adaptive merging on a region adjacency graph using multi-feature similarity (intensity, GLCM texture, geometry). Evaluated on US3D (qualitative) and Vaihingen (quantitative), the method consistently improves boundary alignment and shape regularity. With about 1000 superpixels, boundary recall improves by about 16% over the second-best method, undersegmentation error decreases by about 26% relative to SLIC. These results demonstrate that the proposed method achieves competitive performance in boundary adherence and regional consistency on high-resolution remote sensing satellite and aerial image datasets.
超像素分割是遥感图像分析的关键,但在弱边缘和光谱非均质条件下,如何平衡图像边界的附着性和形状的规律性是一个难题。现有的超像素算法主要关注色彩强度和空间坐标,往往忽略了局部邻域因素和区域约束。本文提出了一种边缘感知框架,该框架融合了简单线性迭代聚类(Simple Linear Iterative Clustering, SLIC)超像素边缘和金字塔多尺度Canny边缘,通过梯度引导的端点扩展闭合碎片化轮廓,并利用多特征相似性(强度、GLCM纹理、几何形状)对区域邻接图进行区域自适应合并。通过US3D(定性)和Vaihingen(定量)的评估,该方法一致地改善了边界对准和形状规则性。使用大约1000个超像素,边界召回率比第二好的方法提高了约16%,相对于SLIC,欠分割误差降低了约26%。结果表明,该方法在高分辨率遥感卫星和航空影像数据集的边界粘附性和区域一致性方面具有竞争力。
{"title":"Superpixel segmentation of remote sensing images via edge extension and adaptive region merging","authors":"Kun Wang ,&nbsp;Yubo Men ,&nbsp;Yongmei Liu ,&nbsp;Jin Li ,&nbsp;Ruifeng Zhao ,&nbsp;Chaoguang Men","doi":"10.1016/j.jvcir.2025.104628","DOIUrl":"10.1016/j.jvcir.2025.104628","url":null,"abstract":"<div><div>Superpixel segmentation is critical for remote sensing image analysis but remains challenging to balance boundary adherence and shape regularity under weak edges and spectral heterogeneity. And Most existing superpixel algorithms primarily focus on color intensity and spatial coordinates, often neglecting local neighborhood factors and regional constraints. This paper proposes an edge-aware framework that fuses Simple Linear Iterative Clustering (SLIC) superpixel edges with pyramid multi-scale Canny edges, closes fragmented contours via gradient-guided endpoint extension, and performs regionally adaptive merging on a region adjacency graph using multi-feature similarity (intensity, GLCM texture, geometry). Evaluated on US3D (qualitative) and Vaihingen (quantitative), the method consistently improves boundary alignment and shape regularity. With about 1000 superpixels, boundary recall improves by about 16% over the second-best method, undersegmentation error decreases by about 26% relative to SLIC. These results demonstrate that the proposed method achieves competitive performance in boundary adherence and regional consistency on high-resolution remote sensing satellite and aerial image datasets.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"113 ","pages":"Article 104628"},"PeriodicalIF":3.1,"publicationDate":"2025-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multi-modal 3D object detection framework based on enhanced Convolution, mixed Sampling, and Image-Point cloud bidirectional fusion 一种基于增强卷积、混合采样和图像点云双向融合的多模态三维目标检测框架
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-10-28 DOI: 10.1016/j.jvcir.2025.104624
Xin Zhou , Xiaolong Xu
As deep learning and computer vision technologies advance, the demand for 3D object detection in autonomous driving is increasing. In this paper, we address the limitations of single-modal approaches by proposing a method that fuses LiDAR point clouds and RGB images. First, we provide an overview and discussion of existing multi-modal fusion methods in the literature. Second, we designed the enhanced convolutional module (SEC-Block) based on the channel attention mechanism to effectively capture and represent the key features in images. Then, we employed the Mixed Sampling strategy (M-FPS) to address the challenges in point cloud sampling. We also design the Att-Fusion module to handle the fusion of point clouds and images. The Att-Fusion module adaptively estimates important features of images and point clouds, fully exploiting their complementarity for efficient fusion. We integrate SEC-Block, M−FPS, and Att-Fusion into a multi-modal 3D object detection model named PAINet. Experimental results demonstrate that PAINet achieves a 3D detection accuracy of 82.59% for moderate-level cars on the KITTI test set, outperforming other state-of-the-art models and providing an effective solution for environmental perception in autonomous driving systems.
随着深度学习和计算机视觉技术的发展,自动驾驶对三维目标检测的需求日益增加。在本文中,我们通过提出一种融合激光雷达点云和RGB图像的方法来解决单模态方法的局限性。首先,我们对文献中现有的多模态融合方法进行了概述和讨论。其次,我们设计了基于通道注意机制的增强卷积模块(SEC-Block),以有效地捕获和表示图像中的关键特征。然后,我们采用混合采样策略(M-FPS)来解决点云采样中的挑战。我们还设计了at - fusion模块来处理点云和图像的融合。Att-Fusion模块自适应估计图像和点云的重要特征,充分利用它们的互补性进行高效融合。我们将SEC-Block、M - FPS和at - fusion集成到一个名为PAINet的多模态3D物体检测模型中。实验结果表明,在KITTI测试集上,PAINet对中等级别汽车的3D检测准确率达到82.59%,优于其他最先进的模型,为自动驾驶系统的环境感知提供了有效的解决方案。
{"title":"A multi-modal 3D object detection framework based on enhanced Convolution, mixed Sampling, and Image-Point cloud bidirectional fusion","authors":"Xin Zhou ,&nbsp;Xiaolong Xu","doi":"10.1016/j.jvcir.2025.104624","DOIUrl":"10.1016/j.jvcir.2025.104624","url":null,"abstract":"<div><div>As deep learning and computer vision technologies advance, the demand for 3D object detection in autonomous driving is increasing. In this paper, we address the limitations of single-modal approaches by proposing a method that fuses LiDAR point clouds and RGB images. First, we provide an overview and discussion of existing multi-modal fusion methods in the literature. Second, we designed the enhanced convolutional module (SEC-Block) based on the channel attention mechanism to effectively capture and represent the key features in images. Then, we employed the Mixed Sampling strategy (M-FPS) to address the challenges in point cloud sampling. We also design the Att-Fusion module to handle the fusion of point clouds and images. The Att-Fusion module adaptively estimates important features of images and point clouds, fully exploiting their complementarity for efficient fusion. We integrate SEC-Block, M−FPS, and Att-Fusion into a multi-modal 3D object detection model named PAINet. Experimental results demonstrate that PAINet achieves a 3D detection accuracy of 82.59% for moderate-level cars on the KITTI test set, outperforming other state-of-the-art models and providing an effective solution for environmental perception in autonomous driving systems.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"113 ","pages":"Article 104624"},"PeriodicalIF":3.1,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145418409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dehaze-cGAN: Image dehazing using a multi-head attention-based conditional GAN for traffic video monitoring Dehaze-cGAN:用于交通视频监控的基于多头注意力的条件GAN图像去雾
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-10-27 DOI: 10.1016/j.jvcir.2025.104619
Maheshkumar H. Kolekar, Hemang Dipakbhai Chhatbar, Samprit Bose
Image dehazing is a process of eliminating haze to improve image clarity, essential for activities using computer vision. Traditional dehazing methods depend on manually constructed priors and assumptions, which frequently do not generalize in complex real-world scenes with dense or uneven haze, leading to inconsistent and suboptimal results. In this paper, we introduce, Dehaze-cGAN, a novel conditional Generative Adversarial Network designed for effective single-image dehazing. The proposed model features a Multi-Head Self Attention UNet generator that captures both local textures and long-range spatial dependencies. Complementing this, a channel attention-guided discriminator selectively emphasizes important feature channels, enhancing its ability to distinguish real haze-free images from generated outputs. Comprehensive experiments on synthetic datasets, real-world datasets, and natural hazy datasets depict that Dehaze-cGAN consistently surpasses state-of-the-art methods. The practical effectiveness of the model is further validated through significant improvements in license plate detection accuracy on dehazed traffic images.
图像去雾是一种消除雾霾以提高图像清晰度的过程,对于使用计算机视觉的活动至关重要。传统的除雾方法依赖于人工构建的先验和假设,这些先验和假设在雾霾密集或不均匀的复杂现实场景中往往不能泛化,导致结果不一致和次优。在本文中,我们介绍了Dehaze-cGAN,一种新的条件生成对抗网络,用于有效的单幅图像去雾。该模型具有多头自注意UNet生成器,可以捕获局部纹理和远程空间依赖关系。与此相补充的是,通道注意引导鉴别器选择性地强调重要的特征通道,增强其从生成的输出中区分真实无雾图像的能力。在合成数据集、真实世界数据集和自然朦胧数据集上进行的综合实验表明,Dehaze-cGAN始终优于最先进的方法。通过对去雾交通图像车牌检测精度的显著提高,进一步验证了该模型的实际有效性。
{"title":"Dehaze-cGAN: Image dehazing using a multi-head attention-based conditional GAN for traffic video monitoring","authors":"Maheshkumar H. Kolekar,&nbsp;Hemang Dipakbhai Chhatbar,&nbsp;Samprit Bose","doi":"10.1016/j.jvcir.2025.104619","DOIUrl":"10.1016/j.jvcir.2025.104619","url":null,"abstract":"<div><div>Image dehazing is a process of eliminating haze to improve image clarity, essential for activities using computer vision. Traditional dehazing methods depend on manually constructed priors and assumptions, which frequently do not generalize in complex real-world scenes with dense or uneven haze, leading to inconsistent and suboptimal results. In this paper, we introduce, Dehaze-cGAN, a novel conditional Generative Adversarial Network designed for effective single-image dehazing. The proposed model features a Multi-Head Self Attention UNet generator that captures both local textures and long-range spatial dependencies. Complementing this, a channel attention-guided discriminator selectively emphasizes important feature channels, enhancing its ability to distinguish real haze-free images from generated outputs. Comprehensive experiments on synthetic datasets, real-world datasets, and natural hazy datasets depict that Dehaze-cGAN consistently surpasses state-of-the-art methods. The practical effectiveness of the model is further validated through significant improvements in license plate detection accuracy on dehazed traffic images.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"113 ","pages":"Article 104619"},"PeriodicalIF":3.1,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145418408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-time facial expression recognition via quaternion Gabor convolutional neural network 基于四元数Gabor卷积神经网络的实时面部表情识别
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-10-27 DOI: 10.1016/j.jvcir.2025.104625
Yu Zhou , Liyuan Guo , Beibei Jiang , Bo Wang , Kunlei Jing
Real-time facial expression recognition (FER) has become a hot spot in computer vision. In recent years, convolutional neural networks (CNNs) have been widely employed for FER. However, traditional CNN architecture usually struggles to balance performance and computation, which is crucial for real-time applications. Another challenge is the inadequate consideration of color information. CNNs always treat the color image as three independent channels or convert it to grayscale, losing important information related to facial expressions. To address these problems, we propose a lightweight quaternion Gabor CNN (LQG-CNN) for real-time FER. LQG-CNN encodes color images into quaternions, allowing it to naturally handle the interrelationships between color channels within a quaternion framework. Additionally, quaternion Gabor convolutional layers are introduced to capture spatial transformations, which require fewer parameters and offer faster inference speeds, making real-time FER feasible. Experiments on three datasets demonstrate that LQG-CNN achieves cost-efficient performance, outperforming other methods. Code will be available at https://github.com/jiangbeibe/LQG-CNN.
实时面部表情识别已成为计算机视觉领域的研究热点。近年来,卷积神经网络(cnn)被广泛应用于人工神经网络。然而,传统的CNN架构通常难以平衡性能和计算,这对实时应用至关重要。另一个挑战是对颜色信息的考虑不足。cnn总是将彩色图像作为三个独立的通道处理或将其转换为灰度,从而丢失与面部表情相关的重要信息。为了解决这些问题,我们提出了一种用于实时FER的轻量级四元数Gabor CNN (LQG-CNN)。LQG-CNN将彩色图像编码为四元数,允许它在四元数框架内自然地处理颜色通道之间的相互关系。此外,引入四元数Gabor卷积层来捕获空间变换,需要更少的参数并提供更快的推理速度,使实时FER成为可能。在三个数据集上的实验表明,LQG-CNN达到了性价比,优于其他方法。代码将在https://github.com/jiangbeibe/LQG-CNN上提供。
{"title":"Real-time facial expression recognition via quaternion Gabor convolutional neural network","authors":"Yu Zhou ,&nbsp;Liyuan Guo ,&nbsp;Beibei Jiang ,&nbsp;Bo Wang ,&nbsp;Kunlei Jing","doi":"10.1016/j.jvcir.2025.104625","DOIUrl":"10.1016/j.jvcir.2025.104625","url":null,"abstract":"<div><div>Real-time facial expression recognition (FER) has become a hot spot in computer vision. In recent years, convolutional neural networks (CNNs) have been widely employed for FER. However, traditional CNN architecture usually struggles to balance performance and computation, which is crucial for real-time applications. Another challenge is the inadequate consideration of color information. CNNs always treat the color image as three independent channels or convert it to grayscale, losing important information related to facial expressions. To address these problems, we propose a lightweight quaternion Gabor CNN (LQG-CNN) for real-time FER. LQG-CNN encodes color images into quaternions, allowing it to naturally handle the interrelationships between color channels within a quaternion framework. Additionally, quaternion Gabor convolutional layers are introduced to capture spatial transformations, which require fewer parameters and offer faster inference speeds, making real-time FER feasible. Experiments on three datasets demonstrate that LQG-CNN achieves cost-efficient performance, outperforming other methods. Code will be available at <span><span>https://github.com/jiangbeibe/LQG-CNN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"113 ","pages":"Article 104625"},"PeriodicalIF":3.1,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145418439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stochastic latent feature distillation: Enhancing dataset distillation via structured uncertainty modeling 随机潜在特征蒸馏:通过结构化不确定性建模增强数据集蒸馏
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-10-27 DOI: 10.1016/j.jvcir.2025.104623
Zhe Li , Sarah Cechnicka , Cheng Ouyang , Katharina Breininger , Peter Schüffler , Bernhard Kainz
As deep learning models continue to scale in complexity and data size, reducing storage and training costs has become increasingly important. Dataset distillation addresses this challenge by synthesizing a small set of synthetic samples that effectively substitute for the original dataset in downstream tasks. Existing approaches typically rely on matching gradients or features either in pixel space or in the latent space of a pretrained generative model. We propose a novel stochastic distillation method that models the joint distribution of latent features using a low-rank multivariate normal distribution, parameterized by a lightweight neural network. This formulation captures spatial correlations in the feature space, which are then projected into class probability space to generate more diverse and informative predictions. The proposed module integrates seamlessly with existing distillation pipelines. Our method achieves state-of-the-art cross-architecture results, improving test accuracy by up to 7.47% in gradient matching and 35.71% in distribution matching over baselines.
随着深度学习模型的复杂性和数据规模不断扩大,减少存储和训练成本变得越来越重要。数据集蒸馏通过合成一小组合成样本来解决这一挑战,这些样本在下游任务中有效地替代了原始数据集。现有的方法通常依赖于在像素空间或预训练生成模型的潜在空间中匹配梯度或特征。我们提出了一种新的随机蒸馏方法,该方法使用低秩多元正态分布对潜在特征的联合分布进行建模,并通过轻量级神经网络参数化。该公式捕获特征空间中的空间相关性,然后将其投影到类概率空间中,以生成更多样化和信息丰富的预测。所提出的模块与现有的蒸馏管道无缝集成。我们的方法实现了最先进的跨架构结果,在梯度匹配中提高了7.47%的测试精度,在基线上的分布匹配中提高了35.71%。
{"title":"Stochastic latent feature distillation: Enhancing dataset distillation via structured uncertainty modeling","authors":"Zhe Li ,&nbsp;Sarah Cechnicka ,&nbsp;Cheng Ouyang ,&nbsp;Katharina Breininger ,&nbsp;Peter Schüffler ,&nbsp;Bernhard Kainz","doi":"10.1016/j.jvcir.2025.104623","DOIUrl":"10.1016/j.jvcir.2025.104623","url":null,"abstract":"<div><div>As deep learning models continue to scale in complexity and data size, reducing storage and training costs has become increasingly important. Dataset distillation addresses this challenge by synthesizing a small set of synthetic samples that effectively substitute for the original dataset in downstream tasks. Existing approaches typically rely on matching gradients or features either in pixel space or in the latent space of a pretrained generative model. We propose a novel stochastic distillation method that models the joint distribution of latent features using a low-rank multivariate normal distribution, parameterized by a lightweight neural network. This formulation captures spatial correlations in the feature space, which are then projected into class probability space to generate more diverse and informative predictions. The proposed module integrates seamlessly with existing distillation pipelines. Our method achieves state-of-the-art cross-architecture results, improving test accuracy by up to 7.47% in gradient matching and 35.71% in distribution matching over baselines.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"113 ","pages":"Article 104623"},"PeriodicalIF":3.1,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145418406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Visual Communication and Image Representation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1