首页 > 最新文献

Journal of Visual Communication and Image Representation最新文献

英文 中文
Deep low light image enhancement via Multi-Task Learning of Few Shot Exposure Imaging 基于少镜头曝光成像多任务学习的深微光图像增强
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 DOI: 10.1016/j.jvcir.2025.104693
Yi Wang, Haonan Su, Zhaolin Xiao, Haiyan Jin
Deep low-light enhancement methods typically learn from long-exposure ground truths. While effective for dark scenes, this approach often causes overexposure in HDR scenarios and lacks adaptability to varying illumination levels. Therefore, we develop a deep low light image enhancement via Multi-task Learning of Few Shot Exposure Imaging (MLFSEI) which is formulated as Bayesian multi-task directed graphical model and predict the enhanced images by learning few-shot tasks comprising multi-exposure images and their corresponding exposure vectors. The proposed method predicts the enhanced image from the selected exposure vector and the learned latent variable among few tasks. The exposure vectors are defined as the characteristics of few shot exposure datasets containing mean, variance and contrast of images. Moreover, the multi order gradients are developed to constrain the structure and details from the ground truth. Experimental results demonstrate significant improvements, with average gains of 4.64 dB in PSNR and 0.071 in SSIM, along with an average reduction of 1.12 in NIQE across multiple benchmark datasets compared to state-of-the-art methods. Furthermore, the proposed method can be extended to accommodate multiple outputs with varying exposure levels among one model.
深度低光增强方法通常是从长时间曝光的事实中学习的。虽然对黑暗场景有效,但这种方法在HDR场景中经常导致过度曝光,并且缺乏对不同照明水平的适应性。因此,我们开发了一种基于少镜头曝光成像多任务学习(MLFSEI)的深弱光图像增强方法,该方法被制定为贝叶斯多任务定向图形模型,并通过学习包含多曝光图像及其相应曝光向量的少镜头任务来预测增强图像。该方法利用所选择的曝光向量和学习到的潜在变量在几个任务中预测增强图像。曝光向量定义为包含图像均值、方差和对比度的少数镜头曝光数据集的特征。此外,还建立了多阶梯度来约束结构和细节,使其不受地面真实情况的影响。实验结果显示了显著的改进,在多个基准数据集上,与最先进的方法相比,PSNR的平均增益为4.64 dB, SSIM的平均增益为0.071,NIQE的平均增益为1.12。此外,所提出的方法可以扩展到在一个模型中容纳不同暴露水平的多个输出。
{"title":"Deep low light image enhancement via Multi-Task Learning of Few Shot Exposure Imaging","authors":"Yi Wang,&nbsp;Haonan Su,&nbsp;Zhaolin Xiao,&nbsp;Haiyan Jin","doi":"10.1016/j.jvcir.2025.104693","DOIUrl":"10.1016/j.jvcir.2025.104693","url":null,"abstract":"<div><div>Deep low-light enhancement methods typically learn from long-exposure ground truths. While effective for dark scenes, this approach often causes overexposure in HDR scenarios and lacks adaptability to varying illumination levels. Therefore, we develop a deep low light image enhancement via Multi-task Learning of Few Shot Exposure Imaging (MLFSEI) which is formulated as Bayesian multi-task directed graphical model and predict the enhanced images by learning few-shot tasks comprising multi-exposure images and their corresponding exposure vectors. The proposed method predicts the enhanced image from the selected exposure vector and the learned latent variable among few tasks. The exposure vectors are defined as the characteristics of few shot exposure datasets containing mean, variance and contrast of images. Moreover, the multi order gradients are developed to constrain the structure and details from the ground truth. Experimental results demonstrate significant improvements, with average gains of 4.64 dB in PSNR and 0.071 in SSIM, along with an average reduction of 1.12 in NIQE across multiple benchmark datasets compared to state-of-the-art methods. Furthermore, the proposed method can be extended to accommodate multiple outputs with varying exposure levels among one model.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104693"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Position-rotation graph and elevation partitioning strategy for traffic police gesture recognition 交警手势识别的位置旋转图和高程划分策略
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 DOI: 10.1016/j.jvcir.2025.104698
Jian He , Rongqi Cao , Cheng Zhang , Suyu Wang
Traffic police gesture recognition is important in autonomous driving. Most existing methods rely on extracting pixel-level features from RGB images, which lack interpretability due to the absence of explicit skeletal gesture features. Current deep learning approaches often fail to effectively model skeletal gesture information because they ignore the inherent connections between joint coordinate data and gesture semantics. Additionally, many methods fail to integrate multi-modal skeletal information (such as joint positions, rotations, and root orientation), limiting their ability to capture cross-modal correlations. Beyond methodological limitations, existing datasets often lack diversity in commanding directions, hindering fine-grained recognition of gestures intended for different traffic flows. To address these limitations, this paper presents the CTPGesture v2 dataset with Chinese traffic police gestures that command vehicles in four directions and proposes a skeleton-based graph convolution method for continuous gesture recognition. Specifically, a position-rotation graph (PR-Graph) is constructed with joint positions, rotations, and root rotations all in the same graph to enrich the graph’s representational power. An elevation partitioning strategy (EPS) is introduced to address the shortcutting issue of the conventional spatial configuration partitioning strategy (SCPS). Experiments demonstrate our method achieves 0.842 Jaccard score on CTPGesture v2 at 31.9 FPS, improving over previous works. The proposed PR-Graph and EPS establish a more descriptive graph for GCN and help capture cross-modality correlations during the graph convolution stages. Our code is available at https://github.com/crq0528/RT-VIBT. Our datasets are available at https://github.com/crq0528/traffic-gesture-datasets.
交通警察的手势识别在自动驾驶中很重要。大多数现有方法依赖于从RGB图像中提取像素级特征,由于缺乏明确的骨骼手势特征而缺乏可解释性。目前的深度学习方法往往不能有效地建模骨骼手势信息,因为它们忽略了关节坐标数据和手势语义之间的内在联系。此外,许多方法不能集成多模态骨骼信息(如关节位置、旋转和根方向),限制了它们捕获跨模态相关性的能力。除了方法上的限制,现有的数据集往往缺乏指挥方向的多样性,阻碍了对不同交通流量的手势的细粒度识别。为了解决这些限制,本文提出了CTPGesture v2数据集,其中包含中国交通警察在四个方向上指挥车辆的手势,并提出了一种基于骨架的图卷积方法用于连续手势识别。具体地说,一个位置-旋转图(PR-Graph)是由关节位置、旋转和根旋转都在同一个图中构造的,以丰富图的表示能力。提出了一种高程分区策略(EPS)来解决传统空间配置分区策略(SCPS)的捷径问题。实验表明,该方法在CTPGesture v2上以31.9 FPS的速度获得了0.842的Jaccard分数,比以往的工作有了很大的提高。提出的PR-Graph和EPS为GCN建立了更具描述性的图,并有助于在图卷积阶段捕获跨模态相关性。我们的代码可在https://github.com/crq0528/RT-VIBT上获得。我们的数据集可在https://github.com/crq0528/traffic-gesture-datasets上获得。
{"title":"Position-rotation graph and elevation partitioning strategy for traffic police gesture recognition","authors":"Jian He ,&nbsp;Rongqi Cao ,&nbsp;Cheng Zhang ,&nbsp;Suyu Wang","doi":"10.1016/j.jvcir.2025.104698","DOIUrl":"10.1016/j.jvcir.2025.104698","url":null,"abstract":"<div><div>Traffic police gesture recognition is important in autonomous driving. Most existing methods rely on extracting pixel-level features from RGB images, which lack interpretability due to the absence of explicit skeletal gesture features. Current deep learning approaches often fail to effectively model skeletal gesture information because they ignore the inherent connections between joint coordinate data and gesture semantics. Additionally, many methods fail to integrate multi-modal skeletal information (such as joint positions, rotations, and root orientation), limiting their ability to capture cross-modal correlations. Beyond methodological limitations, existing datasets often lack diversity in commanding directions, hindering fine-grained recognition of gestures intended for different traffic flows. To address these limitations, this paper presents the CTPGesture v2 dataset with Chinese traffic police gestures that command vehicles in four directions and proposes a skeleton-based graph convolution method for continuous gesture recognition. Specifically, a position-rotation graph (PR-Graph) is constructed with joint positions, rotations, and root rotations all in the same graph to enrich the graph’s representational power. An elevation partitioning strategy (EPS) is introduced to address the shortcutting issue of the conventional spatial configuration partitioning strategy (SCPS). Experiments demonstrate our method achieves 0.842 Jaccard score on CTPGesture v2 at 31.9 FPS, improving over previous works. The proposed PR-Graph and EPS establish a more descriptive graph for GCN and help capture cross-modality correlations during the graph convolution stages. Our code is available at <span><span>https://github.com/crq0528/RT-VIBT</span><svg><path></path></svg></span>. Our datasets are available at <span><span>https://github.com/crq0528/traffic-gesture-datasets</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104698"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Regional decay attention for image shadow removal 图像阴影去除的区域衰减注意
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 DOI: 10.1016/j.jvcir.2025.104694
Xiujin Zhu , Chee-Onn Chow , Joon Huang Chuah
Existing Transformer-based shadow removal methods are limited by fixed window sizes, making it difficult to effectively model global information. In addition, they do not fully utilize the distance prior in shadow images. This study argues that shadow removal should model brightness variations between regions from a global perspective. Non-shadow areas near the shadow boundaries are the most important for restoring brightness in shadow regions, and their importance gradually decreases as the distance increases. To achieve this, a regional decay attention mechanism is proposed, which introduces a positional decay bias into the self-attention computation to enable dynamic modeling of contributions from different spatial positions. A local perception module is introduced to improve the model’s ability to capture local details, and a shadow removal model named FW-Former is developed. This model achieves superior performance across multiple datasets, demonstrates stable generalization capability, and maintains a low parameter count.
现有的基于transformer的阴影去除方法受到固定窗口大小的限制,使得难以有效地模拟全局信息。此外,它们不能充分利用阴影图像中的先验距离。本研究认为,阴影去除应该从全局角度模拟区域之间的亮度变化。靠近阴影边界的非阴影区域对于恢复阴影区域的亮度最为重要,其重要性随着距离的增加而逐渐降低。为此,提出了一种区域衰减注意机制,该机制在自注意计算中引入位置衰减偏差,从而实现不同空间位置贡献的动态建模。为了提高模型对局部细节的捕捉能力,引入了局部感知模块,并开发了一个阴影去除模型FW-Former。该模型实现了跨多数据集的卓越性能,展示了稳定的泛化能力,并保持了较低的参数计数。
{"title":"Regional decay attention for image shadow removal","authors":"Xiujin Zhu ,&nbsp;Chee-Onn Chow ,&nbsp;Joon Huang Chuah","doi":"10.1016/j.jvcir.2025.104694","DOIUrl":"10.1016/j.jvcir.2025.104694","url":null,"abstract":"<div><div>Existing Transformer-based shadow removal methods are limited by fixed window sizes, making it difficult to effectively model global information. In addition, they do not fully utilize the distance prior in shadow images. This study argues that shadow removal should model brightness variations between regions from a global perspective. Non-shadow areas near the shadow boundaries are the most important for restoring brightness in shadow regions, and their importance gradually decreases as the distance increases. To achieve this, a regional decay attention mechanism is proposed, which introduces a positional decay bias into the self-attention computation to enable dynamic modeling of contributions from different spatial positions. A local perception module is introduced to improve the model’s ability to capture local details, and a shadow removal model named FW-Former is developed. This model achieves superior performance across multiple datasets, demonstrates stable generalization capability, and maintains a low parameter count.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104694"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DSFusion: A dual-branch step-by-step fusion network for medical image fusion DSFusion:一种用于医学图像融合的双分支分步融合网络
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 DOI: 10.1016/j.jvcir.2025.104700
Nili Tian, YuLong Ling, Qing Pan
To address the limitations of insufficient consideration of multi-dimensional modal image features in existing medical image fusion methods, this paper proposes a novel dual-branch step-by-step fusion (DSFusion) network. The dual sequence extraction block (DSE) is used for initial feature extraction, followed by the multi-scale lightweight residual (MLRes) block for enhanced efficiency and generalization. Features are then fused through the global pixel-level multi-dimensional fusion (GPMF) module, comprising a multi-dimensional feature extraction block (MFEB) and a pixel-level global fusion branch (PGFB). Finally, fused features are reconstructed into the final image. Experiments performed on different modality datasets demonstrate that DSFusion achieves competitive or superior performance across multiple evaluation metrics in the four indicators as QMI, PSNR, and QP metrics.
针对现有医学图像融合方法未充分考虑图像多维模态特征的局限性,提出了一种新的双分支分步融合(DSFusion)网络。采用双序列提取块(dual sequence extraction block, DSE)进行初始特征提取,然后采用多尺度轻量残差(multi-scale lightweight residual, MLRes)块提高提取效率和泛化能力。然后通过全局像素级多维融合(GPMF)模块进行特征融合,该模块包括一个多维特征提取块(MFEB)和一个像素级全局融合分支(PGFB)。最后,将融合后的特征重构成最终图像。在不同模态数据集上进行的实验表明,DSFusion在四个指标(QMI、PSNR和QP指标)中的多个评估指标中取得了具有竞争力或更优的性能。
{"title":"DSFusion: A dual-branch step-by-step fusion network for medical image fusion","authors":"Nili Tian,&nbsp;YuLong Ling,&nbsp;Qing Pan","doi":"10.1016/j.jvcir.2025.104700","DOIUrl":"10.1016/j.jvcir.2025.104700","url":null,"abstract":"<div><div>To address the limitations of insufficient consideration of multi-dimensional modal image features in existing medical image fusion methods, this paper proposes a novel dual-branch step-by-step fusion (DSFusion) network. The dual sequence extraction block (DSE) is used for initial feature extraction, followed by the multi-scale lightweight residual (MLRes) block for enhanced efficiency and generalization. Features are then fused through the global pixel-level multi-dimensional fusion (GPMF) module, comprising a multi-dimensional feature extraction block (MFEB) and a pixel-level global fusion branch (PGFB). Finally, fused features are reconstructed into the final image. Experiments performed on different modality datasets demonstrate that DSFusion achieves competitive or superior performance across multiple evaluation metrics in the four indicators as <span><math><msub><mrow><mi>Q</mi></mrow><mrow><mi>M</mi><mi>I</mi></mrow></msub></math></span>, <span><math><mrow><mi>P</mi><mi>S</mi><mi>N</mi><mi>R</mi></mrow></math></span>, and <span><math><msub><mrow><mi>Q</mi></mrow><mrow><mi>P</mi></mrow></msub></math></span> metrics.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104700"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PV3M-YOLO: A triple attention-enhanced model for detecting pedestrians and vehicles in UAV-enabled smart transport networks PV3M-YOLO:一个三重注意力增强模型,用于在无人机支持的智能交通网络中检测行人和车辆
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 DOI: 10.1016/j.jvcir.2025.104701
Noor Ul Ain Tahir , Li Kuang , Melikamu Liyih Sinishaw , Muhammad Asim
Pedestrian and vehicle detection in aerial images remains a challenging task due to small object sizes, scale variation, and occlusion, often resulting in missed detections or the need for overly complex models. This study introduces the PV3M-YOLO approach, which incorporates three key focus modules to enhance the detection of small and undetectable objects. The proposed model integrates lightweight Ghost convolution, Convolutional Block Attention Module (CBAM), and Coordination Attention (CA) modules with optimized feature aggregation (C2f) into a unified architecture. These modifications improve the capacity of model to capture essential spatial details and contextual dependencies without increasing computational complexity. Furthermore, the Wise-IoUv3 loss calculus is employed to reduce the influence of subpar cases, enhancing localization and reducing erroneous identifications by reducing negative gradients. Experimental evaluations on VisDrone2019 dataset demonstrate that PV3M-YOLO achieves a [email protected] of 45.4% and [email protected]:95 of 27.9%, surpassing the baseline by 3.9% and 2.7% respectively. The model maintains efficiency with a compact size of 45.3 MB and a runtime of 8.8 ms. However, the detection of extremely small objects remains a limitation due to high-altitude nature of aerial imagery, indicating the need for future model enhancements targeting ultra-small object detection.
由于物体尺寸小、比例变化和遮挡,航空图像中的行人和车辆检测仍然是一项具有挑战性的任务,经常导致错过检测或需要过于复杂的模型。本文介绍了PV3M-YOLO方法,该方法结合了三个关键焦点模块来增强对微小和不可检测目标的检测。该模型将轻量级Ghost卷积、卷积块注意模块(CBAM)和具有优化特征聚合(C2f)的协调注意模块(CA)集成到一个统一的体系结构中。这些修改提高了模型捕捉基本空间细节和上下文依赖关系的能力,而不增加计算复杂性。此外,采用Wise-IoUv3损耗演算来减少次优情况的影响,通过减少负梯度来增强定位和减少错误识别。在VisDrone2019数据集上的实验评估表明,PV3M-YOLO实现了[email protected]的45.4%和[email protected]的95%(27.9%),分别超过基线3.9%和2.7%。该模型以45.3 MB的紧凑大小和8.8 ms的运行时间保持了效率。然而,由于航空图像的高海拔特性,极小物体的检测仍然是一个限制,这表明未来需要针对超小物体检测的模型增强。
{"title":"PV3M-YOLO: A triple attention-enhanced model for detecting pedestrians and vehicles in UAV-enabled smart transport networks","authors":"Noor Ul Ain Tahir ,&nbsp;Li Kuang ,&nbsp;Melikamu Liyih Sinishaw ,&nbsp;Muhammad Asim","doi":"10.1016/j.jvcir.2025.104701","DOIUrl":"10.1016/j.jvcir.2025.104701","url":null,"abstract":"<div><div>Pedestrian and vehicle detection in aerial images remains a challenging task due to small object sizes, scale variation, and occlusion, often resulting in missed detections or the need for overly complex models. This study introduces the PV3M-YOLO approach, which incorporates three key focus modules to enhance the detection of small and undetectable objects. The proposed model integrates lightweight Ghost convolution, Convolutional Block Attention Module (CBAM), and Coordination Attention (CA) modules with optimized feature aggregation (C2f) into a unified architecture. These modifications improve the capacity of model to capture essential spatial details and contextual dependencies without increasing computational complexity. Furthermore, the Wise-IoUv3 loss calculus is employed to reduce the influence of subpar cases, enhancing localization and reducing erroneous identifications by reducing negative gradients. Experimental evaluations on VisDrone2019 dataset demonstrate that PV3M-YOLO achieves a [email protected] of 45.4% and [email protected]:95 of 27.9%, surpassing the baseline by 3.9% and 2.7% respectively. The model maintains efficiency with a compact size of 45.3 MB and a runtime of 8.8 ms. However, the detection of extremely small objects remains a limitation due to high-altitude nature of aerial imagery, indicating the need for future model enhancements targeting ultra-small object detection.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104701"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effective face recognition from video using enhanced social collie optimization-based deep convolutional neural network technique 基于增强社会牧羊犬优化的深度卷积神经网络技术有效地识别视频中的人脸
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 DOI: 10.1016/j.jvcir.2025.104639
Jitendra Chandrakant Musale , Anuj kumar Singh , Swati Shirke
A key feature of video surveillance systems is face recognition, which allows the identification and verification of people who appear in scenes frequently collected by a distributed network of cameras. The scientific community is interested in recognizing the individuals faces in videos, in part due to the potential applications also due to the difficulty in the artificial vision algorithms. The deep convolutional neural network is utilized to recognize the face from the set of provided video samples by the hybrid weighted texture pattern descriptor (HWTP). The deep CNN parameter is tuned by the Enhanced social collie optimization (ESCO), which determines the better solution by the various strategies, similar to this, the face of an individual is identified using optimum parameters. The attained accuracy, precision, recall, and F-measure of the proposed model is 87.92 %, 88.01 %, 88.01 %, and 88.01 % for the number of retrieval 500, respectively.
视频监控系统的一个关键特征是面部识别,它允许识别和验证出现在分布式摄像机网络经常收集的场景中的人。科学界对识别视频中的个人面孔很感兴趣,部分原因是由于潜在的应用,也由于人工视觉算法的困难。通过混合加权纹理模式描述符(HWTP),利用深度卷积神经网络从提供的视频样本集中识别人脸。深度CNN参数通过增强社会牧羊犬优化(Enhanced social collie optimization, ESCO)进行调整,通过各种策略确定更好的解决方案,类似地,使用最优参数识别个体的面部。在检索次数为500的情况下,该模型的准确率、精密度、召回率和F-measure分别为87.92%、88.01%、88.01%和88.01%。
{"title":"Effective face recognition from video using enhanced social collie optimization-based deep convolutional neural network technique","authors":"Jitendra Chandrakant Musale ,&nbsp;Anuj kumar Singh ,&nbsp;Swati Shirke","doi":"10.1016/j.jvcir.2025.104639","DOIUrl":"10.1016/j.jvcir.2025.104639","url":null,"abstract":"<div><div>A key feature of video surveillance systems is face recognition, which allows the identification and verification of people who appear in scenes frequently collected by a distributed network of cameras. The scientific community is interested in recognizing the individuals faces in videos, in part due to the potential applications also due to the difficulty in the artificial vision algorithms. The deep convolutional neural network is utilized to recognize the face from the set of provided video samples by the hybrid weighted texture pattern descriptor (HWTP). The deep CNN parameter is tuned by the Enhanced social collie optimization (ESCO), which determines the better solution by the various strategies, similar to this, the face of an individual is identified using optimum parameters. The attained accuracy, precision, recall, and F-measure of the proposed model is 87.92 %, 88.01 %, 88.01 %, and 88.01 % for the number of retrieval 500, respectively.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104639"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knowledge distillation meets video foundation models: A video saliency prediction case study 知识精馏满足视频基础模型:视频显著性预测案例研究
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 DOI: 10.1016/j.jvcir.2026.104706
Morteza Moradi , Mohammad Moradi , Concetto Spampinato , Ali Borji , Simone Palazzo
Modeling spatio-temporal dynamics remains a major challenge and critical factor for effective video saliency prediction (VSP). The evolution from LSTM and 3D convolutional networks to vision transformers has sparked numerous innovations for tackling this complex video understanding task. However, current technologies still struggle to capture short- and long-term frame dependencies simultaneously. The emergence of large-scale video models has introduced unprecedented opportunities to overcome these limitations but poses significant practical challenges due to their substantial parameter counts and computational costs. To address this, we propose leveraging knowledge distillation—an approach yet to be fully explored in VSP solutions. Specifically, we employ THTD-Net, a leading transformer-based VSP architecture, as the student network, guided by a newly developed large-scale VSP model serving as the teacher. Evaluations on benchmark datasets confirm the efficacy of this novel approach, demonstrating promising performance and substantially reducing the complexity required for real-world applications.
建模时空动态是视频显著性预测(VSP)的主要挑战和关键因素。从LSTM和3D卷积网络到视觉转换器的演变引发了许多解决这一复杂视频理解任务的创新。然而,当前的技术仍然难以同时捕获短期和长期的框架依赖关系。大规模视频模型的出现为克服这些限制带来了前所未有的机会,但由于其大量的参数计数和计算成本,也带来了重大的实际挑战。为了解决这个问题,我们建议利用知识提炼——一种在VSP解决方案中尚未充分探索的方法。具体来说,我们采用THTD-Net,一种领先的基于变压器的VSP架构作为学生网络,由新开发的大型VSP模型作为教师进行指导。对基准数据集的评估证实了这种新方法的有效性,展示了有希望的性能,并大大降低了实际应用所需的复杂性。
{"title":"Knowledge distillation meets video foundation models: A video saliency prediction case study","authors":"Morteza Moradi ,&nbsp;Mohammad Moradi ,&nbsp;Concetto Spampinato ,&nbsp;Ali Borji ,&nbsp;Simone Palazzo","doi":"10.1016/j.jvcir.2026.104706","DOIUrl":"10.1016/j.jvcir.2026.104706","url":null,"abstract":"<div><div>Modeling spatio-temporal dynamics remains a major challenge and critical factor for effective video saliency prediction (VSP). The evolution from LSTM and 3D convolutional networks to vision transformers has sparked numerous innovations for tackling this complex video understanding task. However, current technologies still struggle to capture short- and long-term frame dependencies simultaneously. The emergence of large-scale video models has introduced unprecedented opportunities to overcome these limitations but poses significant practical challenges due to their substantial parameter counts and computational costs. To address this, we propose leveraging knowledge distillation—an approach yet to be fully explored in VSP solutions. Specifically, we employ THTD-Net, a leading transformer-based VSP architecture, as the student network, guided by a newly developed large-scale VSP model serving as the teacher. Evaluations on benchmark datasets confirm the efficacy of this novel approach, demonstrating promising performance and substantially reducing the complexity required for real-world applications.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104706"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Iterative mutual voting matching for efficient and accurate Structure-from-Motion 迭代互投票匹配,实现高效、准确的运动构造
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 DOI: 10.1016/j.jvcir.2025.104697
Suning Ge , Chunlin Ren , Ya’nan He , Linjie Li , Jiaqi Yang , Kun Sun , Yanning Zhang
As a crucial topic in 3D vision, Structure-from-Motion (SfM) aims to recover camera poses and 3D structures from unconstrained images. Performing pairwise image matching is a critical step. Typically, matching relationships are represented as a view graph, but the initial graph often contains redundant or potentially false edges, affecting both efficiency and accuracy. We propose an efficient incremental SfM method that optimizes the critical image matching step. Specifically, given an image similarity graph, an initialized weighted view graph is constructed. Next, the vertices and edges of the graph are treated as candidates and voters, with iterative mutual voting performed to score image pairs until convergence. Then, the optimal subgraph is extracted using the maximum spanning tree (MST). Finally, incremental reconstruction is carried out based on the selected images. We demonstrate the efficiency and accuracy of our method on general datasets and ambiguous datasets.
作为3D视觉领域的一个重要课题,运动结构(SfM)旨在从无约束图像中恢复相机姿态和3D结构。执行成对图像匹配是一个关键步骤。通常,匹配关系表示为视图图,但初始图通常包含冗余或潜在的假边,从而影响效率和准确性。我们提出了一种有效的增量SfM方法来优化关键的图像匹配步骤。具体来说,给定一个图像相似图,构造一个初始化的加权视图图。接下来,图的顶点和边被视为候选人和选民,通过迭代的相互投票来对图像对进行评分,直到收敛。然后,使用最大生成树(MST)提取最优子图。最后,在选取图像的基础上进行增量重建。我们证明了该方法在一般数据集和模糊数据集上的效率和准确性。
{"title":"Iterative mutual voting matching for efficient and accurate Structure-from-Motion","authors":"Suning Ge ,&nbsp;Chunlin Ren ,&nbsp;Ya’nan He ,&nbsp;Linjie Li ,&nbsp;Jiaqi Yang ,&nbsp;Kun Sun ,&nbsp;Yanning Zhang","doi":"10.1016/j.jvcir.2025.104697","DOIUrl":"10.1016/j.jvcir.2025.104697","url":null,"abstract":"<div><div>As a crucial topic in 3D vision, Structure-from-Motion (SfM) aims to recover camera poses and 3D structures from unconstrained images. Performing pairwise image matching is a critical step. Typically, matching relationships are represented as a view graph, but the initial graph often contains redundant or potentially false edges, affecting both efficiency and accuracy. We propose an efficient incremental SfM method that optimizes the critical image matching step. Specifically, given an image similarity graph, an initialized weighted view graph is constructed. Next, the vertices and edges of the graph are treated as candidates and voters, with iterative mutual voting performed to score image pairs until convergence. Then, the optimal subgraph is extracted using the maximum spanning tree (MST). Finally, incremental reconstruction is carried out based on the selected images. We demonstrate the efficiency and accuracy of our method on general datasets and ambiguous datasets.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104697"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An optical remote sensing ship detection model based on feature diffusion and higher-order relationship modeling 基于特征扩散和高阶关系建模的光学遥感舰船检测模型
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 DOI: 10.1016/j.jvcir.2025.104695
Chunman Yan, Ningning Qi
Ship detection plays an increasingly important role in the field of marine monitoring, with Optical Remote Sensing (ORS) technology providing high-resolution spatial and texture information support. However, existing ship detection methods still face significant challenges in accurately detecting small targets, suppressing complex background interference, and modeling cross-scale semantic relationships, limiting their effectiveness in practical applications. Inspired by feature diffusion theory and higher-order spatial interaction mechanisms, this paper proposes a ship detection model for Optical Remote Sensing imagery. Specifically, to address the problem of fine-grained information loss during feature downsampling, the Single-branch and Dual-branch Residual Feature Downsampling (SRFD and DRFD) modules are designed to enhance small target preservation and multi-scale robustness. To capture long-range spatial dependencies and improve robustness against target rotation variations, the Fast Spatial Pyramid Pooling module based on Large Kernel Separable Convolution Attention (SPPF-LSKA) is introduced, enabling efficient large receptive field modeling with rotation-invariant constraints. Furthermore, to dynamically model complex semantic dependencies between different feature scales, the Feature Diffusion Pyramid Network (FDPN) is proposed based on continuous feature diffusion and cross-scale graph reasoning. Experimental results show that model achieves an AP50 of 86.2 % and an AP50-95 of 58.0 % on multiple remote sensing ship detection datasets, with the number of parameters reduced to 2.6 M and the model size compressed to 5.5 MB, significantly outperforming several state-of-the-art models in terms of both detection accuracy and lightweight deployment. These results demonstrate the detection capability, robustness, and application potential of the proposed model in Optical Remote Sensing ship monitoring tasks.
船舶检测在海洋监测领域发挥着越来越重要的作用,光学遥感(ORS)技术提供了高分辨率的空间和纹理信息支持。然而,现有的舰船检测方法在精确检测小目标、抑制复杂背景干扰、跨尺度语义关系建模等方面仍面临重大挑战,限制了其在实际应用中的有效性。基于特征扩散理论和高阶空间相互作用机制,提出了一种用于光学遥感图像的船舶检测模型。具体来说,为了解决特征下采样过程中细粒度信息丢失的问题,设计了单分支和双分支残差特征下采样(SRFD和DRFD)模块,以增强小目标保存和多尺度鲁棒性。为了捕获远程空间依赖性并提高对目标旋转变化的鲁棒性,引入了基于大核可分离卷积注意(SPPF-LSKA)的快速空间金字塔池模块,实现了具有旋转不变约束的高效大接受场建模。在此基础上,提出了基于连续特征扩散和跨尺度图推理的特征扩散金字塔网络(FDPN),对不同特征尺度之间复杂的语义依赖关系进行动态建模。实验结果表明,该模型在多个遥感船舶检测数据集上的AP50和AP50-95分别达到了86.2%和58.0%,参数数量减少到2.6 M,模型尺寸压缩到5.5 MB,在检测精度和轻量化部署方面都明显优于几种最先进的模型。这些结果证明了该模型在光学遥感船舶监测任务中的检测能力、鲁棒性和应用潜力。
{"title":"An optical remote sensing ship detection model based on feature diffusion and higher-order relationship modeling","authors":"Chunman Yan,&nbsp;Ningning Qi","doi":"10.1016/j.jvcir.2025.104695","DOIUrl":"10.1016/j.jvcir.2025.104695","url":null,"abstract":"<div><div>Ship detection plays an increasingly important role in the field of marine monitoring, with Optical Remote Sensing (ORS) technology providing high-resolution spatial and texture information support. However, existing ship detection methods still face significant challenges in accurately detecting small targets, suppressing complex background interference, and modeling cross-scale semantic relationships, limiting their effectiveness in practical applications. Inspired by feature diffusion theory and higher-order spatial interaction mechanisms, this paper proposes a ship detection model for Optical Remote Sensing imagery. Specifically, to address the problem of fine-grained information loss during feature downsampling, the Single-branch and Dual-branch Residual Feature Downsampling (SRFD and DRFD) modules are designed to enhance small target preservation and multi-scale robustness. To capture long-range spatial dependencies and improve robustness against target rotation variations, the Fast Spatial Pyramid Pooling module based on Large Kernel Separable Convolution Attention (SPPF-LSKA) is introduced, enabling efficient large receptive field modeling with rotation-invariant constraints. Furthermore, to dynamically model complex semantic dependencies between different feature scales, the Feature Diffusion Pyramid Network (FDPN) is proposed based on continuous feature diffusion and cross-scale graph reasoning. Experimental results show that model achieves an <em>AP<sub>50</sub></em> of 86.2 % and an <em>AP<sub>50-95</sub></em> of 58.0 % on multiple remote sensing ship detection datasets, with the number of parameters reduced to 2.6 M and the model size compressed to 5.5 MB, significantly outperforming several state-of-the-art models in terms of both detection accuracy and lightweight deployment. These results demonstrate the detection capability, robustness, and application potential of the proposed model in Optical Remote Sensing ship monitoring tasks.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104695"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum to “Lightweight macro-pixel quality enhancement network for light field images compressed by versatile video coding” [J. Vis. Commun. Image Represent. 105 (2024) 104329] “多用途视频编码压缩光场图像的轻量级宏像素质量增强网络”[J]。粘度Commun。图像代表。105 (2024)104329]
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 DOI: 10.1016/j.jvcir.2025.104673
Hongyue Huang , Chen Cui , Chuanmin Jia , Xinfeng Zhang , Siwei Ma
{"title":"Corrigendum to “Lightweight macro-pixel quality enhancement network for light field images compressed by versatile video coding” [J. Vis. Commun. Image Represent. 105 (2024) 104329]","authors":"Hongyue Huang ,&nbsp;Chen Cui ,&nbsp;Chuanmin Jia ,&nbsp;Xinfeng Zhang ,&nbsp;Siwei Ma","doi":"10.1016/j.jvcir.2025.104673","DOIUrl":"10.1016/j.jvcir.2025.104673","url":null,"abstract":"","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104673"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Visual Communication and Image Representation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1