A key feature of video surveillance systems is face recognition, which allows the identification and verification of people who appear in scenes frequently collected by a distributed network of cameras. The scientific community is interested in recognizing the individuals faces in videos, in part due to the potential applications also due to the difficulty in the artificial vision algorithms. The deep convolutional neural network is utilized to recognize the face from the set of provided video samples by the hybrid weighted texture pattern descriptor (HWTP). The deep CNN parameter is tuned by the Enhanced social collie optimization (ESCO), which determines the better solution by the various strategies, similar to this, the face of an individual is identified using optimum parameters. The attained accuracy, precision, recall, and F-measure of the proposed model is 87.92 %, 88.01 %, 88.01 %, and 88.01 % for the number of retrieval 500, respectively.
视频监控系统的一个关键特征是面部识别,它允许识别和验证出现在分布式摄像机网络经常收集的场景中的人。科学界对识别视频中的个人面孔很感兴趣,部分原因是由于潜在的应用,也由于人工视觉算法的困难。通过混合加权纹理模式描述符(HWTP),利用深度卷积神经网络从提供的视频样本集中识别人脸。深度CNN参数通过增强社会牧羊犬优化(Enhanced social collie optimization, ESCO)进行调整,通过各种策略确定更好的解决方案,类似地,使用最优参数识别个体的面部。在检索次数为500的情况下,该模型的准确率、精密度、召回率和F-measure分别为87.92%、88.01%、88.01%和88.01%。
{"title":"Effective face recognition from video using enhanced social collie optimization-based deep convolutional neural network technique","authors":"Jitendra Chandrakant Musale , Anuj kumar Singh , Swati Shirke","doi":"10.1016/j.jvcir.2025.104639","DOIUrl":"10.1016/j.jvcir.2025.104639","url":null,"abstract":"<div><div>A key feature of video surveillance systems is face recognition, which allows the identification and verification of people who appear in scenes frequently collected by a distributed network of cameras. The scientific community is interested in recognizing the individuals faces in videos, in part due to the potential applications also due to the difficulty in the artificial vision algorithms. The deep convolutional neural network is utilized to recognize the face from the set of provided video samples by the hybrid weighted texture pattern descriptor (HWTP). The deep CNN parameter is tuned by the Enhanced social collie optimization (ESCO), which determines the better solution by the various strategies, similar to this, the face of an individual is identified using optimum parameters. The attained accuracy, precision, recall, and F-measure of the proposed model is 87.92 %, 88.01 %, 88.01 %, and 88.01 % for the number of retrieval 500, respectively.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104639"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-12-19DOI: 10.1016/j.jvcir.2025.104692
Daojie Zhou, Yihan Li, Yi Li
This paper proposes a multi-view fusion-based recognition algorithm to address the imbalanced recognition accuracy of existing 3D model multi-view recognition methods. This imbalance arises from their failure to account for inter-view feature consistency and their use of inefficient fusion strategies. The proposed algorithm employs a dual-branch feature extraction network and a multi-label loss function to enforce the learning of consistent features across different views of the same model. Concurrently, a multi-level Gated Recurrent Unit (GRU) fusion network is constructed to efficiently integrate high-dimensional features from various levels and temporal information across multiple views. Simulation results demonstrate that the proposed algorithm achieves highly competitive recognition accuracy on mainstream benchmark datasets. Furthermore, it exhibits more balanced performance across different categories, thereby showing enhanced stability and robustness.
{"title":"Multi-view 3D model recognition via multi-label and multi-level fusion with bidirectional GRU","authors":"Daojie Zhou, Yihan Li, Yi Li","doi":"10.1016/j.jvcir.2025.104692","DOIUrl":"10.1016/j.jvcir.2025.104692","url":null,"abstract":"<div><div>This paper proposes a multi-view fusion-based recognition algorithm to address the imbalanced recognition accuracy of existing 3D model multi-view recognition methods. This imbalance arises from their failure to account for inter-view feature consistency and their use of inefficient fusion strategies. The proposed algorithm employs a dual-branch feature extraction network and a multi-label loss function to enforce the learning of consistent features across different views of the same model. Concurrently, a multi-level Gated Recurrent Unit (GRU) fusion network is constructed to efficiently integrate high-dimensional features from various levels and temporal information across multiple views. Simulation results demonstrate that the proposed algorithm achieves highly competitive recognition accuracy on mainstream benchmark datasets. Furthermore, it exhibits more balanced performance across different categories, thereby showing enhanced stability and robustness.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104692"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-12-15DOI: 10.1016/j.jvcir.2025.104689
Jiawei Wang, Weiwei Shi, Xiaofan Wang, Xinhong Hei
Semi-supervised learning has achieved significant success through various approaches based on pseudo-labeling and consistency regularization. Despite efforts, effectively utilizing both labeled and unlabeled data remains a significant challenge. In this study, to enhance the efficient utilization of limited and valuable labeled data, we propose a self-adaptive weight redistribution strategy within a batch. This operation takes into account the heterogeneity of labeled data, adjusting its contribution to the overall loss based on sample-specific losses. This enables the model to more accurately identify challenging samples. Our experiments demonstrate that this weight reallocation strategy significantly enhances the model’s generalization ability. Additionally, to enhance intra-class compactness and inter-class separation of the learned features, we introduce a cosine similarity-based discriminative feature learning regularization term. This regularization term aims to reinforce feature consistency within the same class and enhance feature distinctiveness across different classes. Through this mechanism, we facilitate the model to prioritize learning discriminative feature representations, ensuring that features with authentic labels and those with high-confidence pseudo-labels are grouped together, while simultaneously separating features belonging to different clusters. The method can be combined with mainstream Semi-supervised learning methods, which we evaluate experimentally. Our experimental findings illustrate the efficacy of our approach in enhancing the performance of semi-supervised learning tasks across widely utilized image classification datasets.
{"title":"Deep semi-supervised learning method based on sample adaptive weights and discriminative feature learning","authors":"Jiawei Wang, Weiwei Shi, Xiaofan Wang, Xinhong Hei","doi":"10.1016/j.jvcir.2025.104689","DOIUrl":"10.1016/j.jvcir.2025.104689","url":null,"abstract":"<div><div>Semi-supervised learning has achieved significant success through various approaches based on pseudo-labeling and consistency regularization. Despite efforts, effectively utilizing both labeled and unlabeled data remains a significant challenge. In this study, to enhance the efficient utilization of limited and valuable labeled data, we propose a self-adaptive weight redistribution strategy within a batch. This operation takes into account the heterogeneity of labeled data, adjusting its contribution to the overall loss based on sample-specific losses. This enables the model to more accurately identify challenging samples. Our experiments demonstrate that this weight reallocation strategy significantly enhances the model’s generalization ability. Additionally, to enhance intra-class compactness and inter-class separation of the learned features, we introduce a cosine similarity-based discriminative feature learning regularization term. This regularization term aims to reinforce feature consistency within the same class and enhance feature distinctiveness across different classes. Through this mechanism, we facilitate the model to prioritize learning discriminative feature representations, ensuring that features with authentic labels and those with high-confidence pseudo-labels are grouped together, while simultaneously separating features belonging to different clusters. The method can be combined with mainstream Semi-supervised learning methods, which we evaluate experimentally. Our experimental findings illustrate the efficacy of our approach in enhancing the performance of semi-supervised learning tasks across widely utilized image classification datasets.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104689"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2026-01-04DOI: 10.1016/j.jvcir.2025.104695
Chunman Yan, Ningning Qi
Ship detection plays an increasingly important role in the field of marine monitoring, with Optical Remote Sensing (ORS) technology providing high-resolution spatial and texture information support. However, existing ship detection methods still face significant challenges in accurately detecting small targets, suppressing complex background interference, and modeling cross-scale semantic relationships, limiting their effectiveness in practical applications. Inspired by feature diffusion theory and higher-order spatial interaction mechanisms, this paper proposes a ship detection model for Optical Remote Sensing imagery. Specifically, to address the problem of fine-grained information loss during feature downsampling, the Single-branch and Dual-branch Residual Feature Downsampling (SRFD and DRFD) modules are designed to enhance small target preservation and multi-scale robustness. To capture long-range spatial dependencies and improve robustness against target rotation variations, the Fast Spatial Pyramid Pooling module based on Large Kernel Separable Convolution Attention (SPPF-LSKA) is introduced, enabling efficient large receptive field modeling with rotation-invariant constraints. Furthermore, to dynamically model complex semantic dependencies between different feature scales, the Feature Diffusion Pyramid Network (FDPN) is proposed based on continuous feature diffusion and cross-scale graph reasoning. Experimental results show that model achieves an AP50 of 86.2 % and an AP50-95 of 58.0 % on multiple remote sensing ship detection datasets, with the number of parameters reduced to 2.6 M and the model size compressed to 5.5 MB, significantly outperforming several state-of-the-art models in terms of both detection accuracy and lightweight deployment. These results demonstrate the detection capability, robustness, and application potential of the proposed model in Optical Remote Sensing ship monitoring tasks.
{"title":"An optical remote sensing ship detection model based on feature diffusion and higher-order relationship modeling","authors":"Chunman Yan, Ningning Qi","doi":"10.1016/j.jvcir.2025.104695","DOIUrl":"10.1016/j.jvcir.2025.104695","url":null,"abstract":"<div><div>Ship detection plays an increasingly important role in the field of marine monitoring, with Optical Remote Sensing (ORS) technology providing high-resolution spatial and texture information support. However, existing ship detection methods still face significant challenges in accurately detecting small targets, suppressing complex background interference, and modeling cross-scale semantic relationships, limiting their effectiveness in practical applications. Inspired by feature diffusion theory and higher-order spatial interaction mechanisms, this paper proposes a ship detection model for Optical Remote Sensing imagery. Specifically, to address the problem of fine-grained information loss during feature downsampling, the Single-branch and Dual-branch Residual Feature Downsampling (SRFD and DRFD) modules are designed to enhance small target preservation and multi-scale robustness. To capture long-range spatial dependencies and improve robustness against target rotation variations, the Fast Spatial Pyramid Pooling module based on Large Kernel Separable Convolution Attention (SPPF-LSKA) is introduced, enabling efficient large receptive field modeling with rotation-invariant constraints. Furthermore, to dynamically model complex semantic dependencies between different feature scales, the Feature Diffusion Pyramid Network (FDPN) is proposed based on continuous feature diffusion and cross-scale graph reasoning. Experimental results show that model achieves an <em>AP<sub>50</sub></em> of 86.2 % and an <em>AP<sub>50-95</sub></em> of 58.0 % on multiple remote sensing ship detection datasets, with the number of parameters reduced to 2.6 M and the model size compressed to 5.5 MB, significantly outperforming several state-of-the-art models in terms of both detection accuracy and lightweight deployment. These results demonstrate the detection capability, robustness, and application potential of the proposed model in Optical Remote Sensing ship monitoring tasks.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104695"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-12-11DOI: 10.1016/j.jvcir.2025.104673
Hongyue Huang , Chen Cui , Chuanmin Jia , Xinfeng Zhang , Siwei Ma
{"title":"Corrigendum to “Lightweight macro-pixel quality enhancement network for light field images compressed by versatile video coding” [J. Vis. Commun. Image Represent. 105 (2024) 104329]","authors":"Hongyue Huang , Chen Cui , Chuanmin Jia , Xinfeng Zhang , Siwei Ma","doi":"10.1016/j.jvcir.2025.104673","DOIUrl":"10.1016/j.jvcir.2025.104673","url":null,"abstract":"","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104673"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-11-15DOI: 10.1016/j.jvcir.2025.104644
Shuo Hu , Tongtong Liu , Liyang Han , Run Xing
Most existing visual tracking methods typically employ image patches as target references and endeavor to enhance tracking performance by maximizing the utilization of visual information through various deep networks. However, due to the intrinsic limitations of visual information, the performance of the trackers significantly deteriorates when confronted with drastic target variations or complex background environments. To address these issues, we propose a vision-language multimodal fusion tracker for object tracking. Firstly, we use semantic information from language descriptions to compensate for the instability of visual information, and establish multimodal cross-relations through the fusion of visual and language features. Secondly, we propose an attention-based token screening mechanism that utilizes semantic-guided attention and masking operations to eliminate irrelevant search tokens devoid of target information, thereby enhancing both accuracy and efficiency. Furthermore, we optimize the localization head by introducing channel attention, which effectively improves the accuracy of target positioning. Extensive experiments conducted on the OTB99, LaSOT, and TNL2K datasets demonstrate the effectiveness of our proposed tracking method, achieving success rates of 71.2%, 69.5%, and 58.9%, respectively.
{"title":"Vision-language tracking with attention-based optimization","authors":"Shuo Hu , Tongtong Liu , Liyang Han , Run Xing","doi":"10.1016/j.jvcir.2025.104644","DOIUrl":"10.1016/j.jvcir.2025.104644","url":null,"abstract":"<div><div>Most existing visual tracking methods typically employ image patches as target references and endeavor to enhance tracking performance by maximizing the utilization of visual information through various deep networks. However, due to the intrinsic limitations of visual information, the performance of the trackers significantly deteriorates when confronted with drastic target variations or complex background environments. To address these issues, we propose a vision-language multimodal fusion tracker for object tracking. Firstly, we use semantic information from language descriptions to compensate for the instability of visual information, and establish multimodal cross-relations through the fusion of visual and language features. Secondly, we propose an attention-based token screening mechanism that utilizes semantic-guided attention and masking operations to eliminate irrelevant search tokens devoid of target information, thereby enhancing both accuracy and efficiency. Furthermore, we optimize the localization head by introducing channel attention, which effectively improves the accuracy of target positioning. Extensive experiments conducted on the OTB99, LaSOT, and TNL2K datasets demonstrate the effectiveness of our proposed tracking method, achieving success rates of 71.2%, 69.5%, and 58.9%, respectively.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"114 ","pages":"Article 104644"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-11-15DOI: 10.1016/j.jvcir.2025.104641
Junyong You , Yuan Lin , Bin Hu
Generative models, e.g., stable diffusion, excel at producing compelling images but remain highly dependent on crafted prompts. Refining prompts for specific objectives, especially aesthetic quality, is time-consuming and inconsistent. We propose a novel approach that leverages LLMs to enhance prompt refinement process for stable diffusion. First, we propose a model to predict aesthetic image quality, examining various aesthetic elements in spatial, channel, and color domains. Reinforcement learning is employed to refine the prompt, starting from a rudimentary version and iteratively improving them with LLM’s assistance. This iterative process is guided by a policy network updating prompts based on interactions with the generated images, with a reward function measuring aesthetic improvement and adherence to the prompt. Our experimental results demonstrate that this method significantly boosts the visual quality of generated images when using these refined prompts. Beyond image synthesis, this approach provides a broader framework for improving prompts across diverse applications with the support of LLMs.
{"title":"Enhancing aesthetic image generation with reinforcement learning guided prompt optimization in stable diffusion","authors":"Junyong You , Yuan Lin , Bin Hu","doi":"10.1016/j.jvcir.2025.104641","DOIUrl":"10.1016/j.jvcir.2025.104641","url":null,"abstract":"<div><div>Generative models, e.g., stable diffusion, excel at producing compelling images but remain highly dependent on crafted prompts. Refining prompts for specific objectives, especially aesthetic quality, is time-consuming and inconsistent. We propose a novel approach that leverages LLMs to enhance prompt refinement process for stable diffusion. First, we propose a model to predict aesthetic image quality, examining various aesthetic elements in spatial, channel, and color domains. Reinforcement learning is employed to refine the prompt, starting from a rudimentary version and iteratively improving them with LLM’s assistance. This iterative process is guided by a policy network updating prompts based on interactions with the generated images, with a reward function measuring aesthetic improvement and adherence to the prompt. Our experimental results demonstrate that this method significantly boosts the visual quality of generated images when using these refined prompts. Beyond image synthesis, this approach provides a broader framework for improving prompts across diverse applications with the support of LLMs.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"114 ","pages":"Article 104641"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-11-25DOI: 10.1016/j.jvcir.2025.104656
Xingfa Wang , Chengjun Chen , Chenggang Dai , Kunhua Liu , Mingxing Lin
Due to inherent absorption and scattering effects, underwater images often exhibit low visibility and significant color deviation. These issues hinder the extraction of discriminative features and adversely impact instance-level segmentation accuracy. To address these challenges, this study proposes a novel Hybrid SAM and Mask R-CNN framework for underwater instance segmentation, integrating the strong generalization capability of SAM with the structural decoding strength of Mask R-CNN. The powerful global modeling ability of SAM effectively mitigates the impact of underwater image degradation, thereby enabling more robust feature representation. Moreover, a novel underwater feature weighted enhancer is introduced in the framework to enhance multi-scale feature fusion and improve the detection of small and scale-varying objects in underwater environments. To provide benchmark data, a large-scale underwater instance segmentation dataset, UW10K, is also constructed, comprising 13,551 images and 22,968 annotated instances across 15 categories. Comprehensive experiments validate the superiority of the proposed model across various instance segmentation tasks. Specifically, it achieves precisions of 74.2 %, 40.5 %, and 70.6 % on UW10K, USIS10K, and WHU Building datasets, respectively. This study is expected to advance ocean exploration and fisheries, while providing valuable training samples for instance segmentation tasks. Datasets and codes are available at https://github.com/xfwang-qut/HySaM.
{"title":"HySaM: An improved hybrid SAM and Mask R-CNN for underwater instance segmentation","authors":"Xingfa Wang , Chengjun Chen , Chenggang Dai , Kunhua Liu , Mingxing Lin","doi":"10.1016/j.jvcir.2025.104656","DOIUrl":"10.1016/j.jvcir.2025.104656","url":null,"abstract":"<div><div>Due to inherent absorption and scattering effects, underwater images often exhibit low visibility and significant color deviation. These issues hinder the extraction of discriminative features and adversely impact instance-level segmentation accuracy. To address these challenges, this study proposes a novel Hybrid SAM and Mask R-CNN framework for underwater instance segmentation, integrating the strong generalization capability of SAM with the structural decoding strength of Mask R-CNN. The powerful global modeling ability of SAM effectively mitigates the impact of underwater image degradation, thereby enabling more robust feature representation. Moreover, a novel underwater feature weighted enhancer is introduced in the framework to enhance multi-scale feature fusion and improve the detection of small and scale-varying objects in underwater environments. To provide benchmark data, a large-scale underwater instance segmentation dataset, UW10K, is also constructed, comprising 13,551 images and 22,968 annotated instances across 15 categories. Comprehensive experiments validate the superiority of the proposed model across various instance segmentation tasks. Specifically, it achieves precisions of 74.2 %, 40.5 %, and 70.6 % on UW10K, USIS10K, and WHU Building datasets, respectively. This study is expected to advance ocean exploration and fisheries, while providing valuable training samples for instance segmentation tasks. Datasets and codes are available at <span><span>https://github.com/xfwang-qut/HySaM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104656"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-12-17DOI: 10.1016/j.jvcir.2025.104691
Da Ai , Yunqiao Wang , Kai Jia , Zhike Ji , Ying Liu
In the actual video surveillance application scenarios, the imaging difference between near infrared (NIR) and visible light (VIS) spectrum and the photo distance are two important factors that restrict the accuracy of near infrared face recognition. In this paper, we first use a fixed focus near-infrared camera to capture NIR face images at different distances, constructing a large Cross-Spectral and Cross-Distance Face dataset (CSCD-F), and in order to improve recognition accuracy, we employ image enhancement techniques to preprocess low-quality face images. Furthermore, we adjusted the sampling depth of the generator in the CycleGAN network and introduced additional edge loss, proposing a general framework that combines generative models and transfer learning to achieve spectral feature translation between NIR and VIS images. The proposed method can effectively convert NIR face images into VIS images while retaining sufficient identity information. Various experimental results demonstrate that the proposed method achieves significant performance improvements on the self-built CSCD-F dataset. Additionally, it validates the generalization capability and effectiveness of the proposed method on public datasets such as HFB and Oulu-CASIA NIR-VIS.
在实际视频监控应用场景中,近红外(NIR)与可见光(VIS)光谱的成像差异以及拍摄距离是制约近红外人脸识别精度的两个重要因素。本文首先利用定焦近红外相机采集不同距离的近红外人脸图像,构建大型跨光谱和跨距离人脸数据集(Cross-Spectral and Cross-Distance face dataset, CSCD-F),并采用图像增强技术对低质量人脸图像进行预处理,以提高识别精度。此外,我们调整了CycleGAN网络中生成器的采样深度,并引入了额外的边缘损失,提出了一个结合生成模型和迁移学习的通用框架,以实现近红外和VIS图像之间的光谱特征转换。该方法可以有效地将近红外人脸图像转换为VIS图像,同时保留足够的身份信息。各种实验结果表明,该方法在自建的CSCD-F数据集上取得了显著的性能提升。此外,在HFB和Oulu-CASIA NIR-VIS等公共数据集上验证了该方法的泛化能力和有效性。
{"title":"Cross-distance near-infrared face recognition","authors":"Da Ai , Yunqiao Wang , Kai Jia , Zhike Ji , Ying Liu","doi":"10.1016/j.jvcir.2025.104691","DOIUrl":"10.1016/j.jvcir.2025.104691","url":null,"abstract":"<div><div>In the actual video surveillance application scenarios, the imaging difference between near infrared (NIR) and visible light (VIS) spectrum and the photo distance are two important factors that restrict the accuracy of near infrared face recognition. In this paper, we first use a fixed focus near-infrared camera to capture NIR face images at different distances, constructing a large Cross-Spectral and Cross-Distance Face dataset (CSCD-F), and in order to improve recognition accuracy, we employ image enhancement techniques to preprocess low-quality face images. Furthermore, we adjusted the sampling depth of the generator in the CycleGAN network and introduced additional edge loss, proposing a general framework that combines generative models and transfer learning to achieve spectral feature translation between NIR and VIS images. The proposed method can effectively convert NIR face images into VIS images while retaining sufficient identity information. Various experimental results demonstrate that the proposed method achieves significant performance improvements on the self-built CSCD-F dataset. Additionally, it validates the generalization capability and effectiveness of the proposed method on public datasets such as HFB and Oulu-CASIA NIR-VIS.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104691"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-11-11DOI: 10.1016/j.jvcir.2025.104642
Jiliang Wang , Jia Liu , Siwang Zhou
Existing facial image super-resolution methods have identified the capacity to transform low-resolution facial images into high-resolution ones. However, clearer high-resolution facial images increase the possibility of accurately extracting soft biometric features, such as gender, posing a significant risk of privacy leakage. To address this issue, we propose a gender-protected face super-resolution network, which can incorporate gender-identified privacy information by introducing fine image distortion during the super-resolution process. It progressively transforms low-resolution images into high-resolution ones while partially disturbing the face images. This procedure ensures that the generated super-resolution facial images can still be utilized by face matchers for matching purposes, but are less reliable for attribute classifiers that attempt to extract gender features. Furthermore, we introduce leaping adversarial learning to help the super-resolution network to generate gender-protected facial images and work on arbitrary gender classifiers. Extensive experiments have been conducted using multiple face matchers and gender classifiers to evaluate the effectiveness of the proposed network. The results also demonstrate that our proposed image super-resolution network is adaptable to arbitrary attribute classifiers for protecting gender privacy, while preserving facial image quality.
{"title":"Facial image super-resolution network for confusing arbitrary gender classifiers","authors":"Jiliang Wang , Jia Liu , Siwang Zhou","doi":"10.1016/j.jvcir.2025.104642","DOIUrl":"10.1016/j.jvcir.2025.104642","url":null,"abstract":"<div><div>Existing facial image super-resolution methods have identified the capacity to transform low-resolution facial images into high-resolution ones. However, clearer high-resolution facial images increase the possibility of accurately extracting soft biometric features, such as gender, posing a significant risk of privacy leakage. To address this issue, we propose a gender-protected face super-resolution network, which can incorporate gender-identified privacy information by introducing fine image distortion during the super-resolution process. It progressively transforms low-resolution images into high-resolution ones while partially disturbing the face images. This procedure ensures that the generated super-resolution facial images can still be utilized by face matchers for matching purposes, but are less reliable for attribute classifiers that attempt to extract gender features. Furthermore, we introduce leaping adversarial learning to help the super-resolution network to generate gender-protected facial images and work on arbitrary gender classifiers. Extensive experiments have been conducted using multiple face matchers and gender classifiers to evaluate the effectiveness of the proposed network. The results also demonstrate that our proposed image super-resolution network is adaptable to arbitrary attribute classifiers for protecting gender privacy, while preserving facial image quality.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"114 ","pages":"Article 104642"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}