首页 > 最新文献

Journal of Visual Communication and Image Representation最新文献

英文 中文
Effective face recognition from video using enhanced social collie optimization-based deep convolutional neural network technique 基于增强社会牧羊犬优化的深度卷积神经网络技术有效地识别视频中的人脸
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 Epub Date: 2025-11-06 DOI: 10.1016/j.jvcir.2025.104639
Jitendra Chandrakant Musale , Anuj kumar Singh , Swati Shirke
A key feature of video surveillance systems is face recognition, which allows the identification and verification of people who appear in scenes frequently collected by a distributed network of cameras. The scientific community is interested in recognizing the individuals faces in videos, in part due to the potential applications also due to the difficulty in the artificial vision algorithms. The deep convolutional neural network is utilized to recognize the face from the set of provided video samples by the hybrid weighted texture pattern descriptor (HWTP). The deep CNN parameter is tuned by the Enhanced social collie optimization (ESCO), which determines the better solution by the various strategies, similar to this, the face of an individual is identified using optimum parameters. The attained accuracy, precision, recall, and F-measure of the proposed model is 87.92 %, 88.01 %, 88.01 %, and 88.01 % for the number of retrieval 500, respectively.
视频监控系统的一个关键特征是面部识别,它允许识别和验证出现在分布式摄像机网络经常收集的场景中的人。科学界对识别视频中的个人面孔很感兴趣,部分原因是由于潜在的应用,也由于人工视觉算法的困难。通过混合加权纹理模式描述符(HWTP),利用深度卷积神经网络从提供的视频样本集中识别人脸。深度CNN参数通过增强社会牧羊犬优化(Enhanced social collie optimization, ESCO)进行调整,通过各种策略确定更好的解决方案,类似地,使用最优参数识别个体的面部。在检索次数为500的情况下,该模型的准确率、精密度、召回率和F-measure分别为87.92%、88.01%、88.01%和88.01%。
{"title":"Effective face recognition from video using enhanced social collie optimization-based deep convolutional neural network technique","authors":"Jitendra Chandrakant Musale ,&nbsp;Anuj kumar Singh ,&nbsp;Swati Shirke","doi":"10.1016/j.jvcir.2025.104639","DOIUrl":"10.1016/j.jvcir.2025.104639","url":null,"abstract":"<div><div>A key feature of video surveillance systems is face recognition, which allows the identification and verification of people who appear in scenes frequently collected by a distributed network of cameras. The scientific community is interested in recognizing the individuals faces in videos, in part due to the potential applications also due to the difficulty in the artificial vision algorithms. The deep convolutional neural network is utilized to recognize the face from the set of provided video samples by the hybrid weighted texture pattern descriptor (HWTP). The deep CNN parameter is tuned by the Enhanced social collie optimization (ESCO), which determines the better solution by the various strategies, similar to this, the face of an individual is identified using optimum parameters. The attained accuracy, precision, recall, and F-measure of the proposed model is 87.92 %, 88.01 %, 88.01 %, and 88.01 % for the number of retrieval 500, respectively.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104639"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-view 3D model recognition via multi-label and multi-level fusion with bidirectional GRU 基于双向GRU的多标签多层次融合多视角三维模型识别
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 Epub Date: 2025-12-19 DOI: 10.1016/j.jvcir.2025.104692
Daojie Zhou, Yihan Li, Yi Li
This paper proposes a multi-view fusion-based recognition algorithm to address the imbalanced recognition accuracy of existing 3D model multi-view recognition methods. This imbalance arises from their failure to account for inter-view feature consistency and their use of inefficient fusion strategies. The proposed algorithm employs a dual-branch feature extraction network and a multi-label loss function to enforce the learning of consistent features across different views of the same model. Concurrently, a multi-level Gated Recurrent Unit (GRU) fusion network is constructed to efficiently integrate high-dimensional features from various levels and temporal information across multiple views. Simulation results demonstrate that the proposed algorithm achieves highly competitive recognition accuracy on mainstream benchmark datasets. Furthermore, it exhibits more balanced performance across different categories, thereby showing enhanced stability and robustness.
针对现有三维模型多视图识别方法识别精度不平衡的问题,提出了一种基于多视图融合的识别算法。这种不平衡源于它们未能考虑到视点间特征的一致性,以及它们使用了低效的融合策略。该算法采用双分支特征提取网络和多标签损失函数来强制学习同一模型的不同视图之间的一致特征。同时,构建了多级门控循环单元(GRU)融合网络,有效地融合了不同层次的高维特征和多视图的时间信息。仿真结果表明,该算法在主流基准数据集上取得了极具竞争力的识别精度。此外,它在不同类别中表现出更平衡的性能,从而显示出增强的稳定性和鲁棒性。
{"title":"Multi-view 3D model recognition via multi-label and multi-level fusion with bidirectional GRU","authors":"Daojie Zhou,&nbsp;Yihan Li,&nbsp;Yi Li","doi":"10.1016/j.jvcir.2025.104692","DOIUrl":"10.1016/j.jvcir.2025.104692","url":null,"abstract":"<div><div>This paper proposes a multi-view fusion-based recognition algorithm to address the imbalanced recognition accuracy of existing 3D model multi-view recognition methods. This imbalance arises from their failure to account for inter-view feature consistency and their use of inefficient fusion strategies. The proposed algorithm employs a dual-branch feature extraction network and a multi-label loss function to enforce the learning of consistent features across different views of the same model. Concurrently, a multi-level Gated Recurrent Unit (GRU) fusion network is constructed to efficiently integrate high-dimensional features from various levels and temporal information across multiple views. Simulation results demonstrate that the proposed algorithm achieves highly competitive recognition accuracy on mainstream benchmark datasets. Furthermore, it exhibits more balanced performance across different categories, thereby showing enhanced stability and robustness.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104692"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep semi-supervised learning method based on sample adaptive weights and discriminative feature learning 基于样本自适应权值和判别特征学习的深度半监督学习方法
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 Epub Date: 2025-12-15 DOI: 10.1016/j.jvcir.2025.104689
Jiawei Wang, Weiwei Shi, Xiaofan Wang, Xinhong Hei
Semi-supervised learning has achieved significant success through various approaches based on pseudo-labeling and consistency regularization. Despite efforts, effectively utilizing both labeled and unlabeled data remains a significant challenge. In this study, to enhance the efficient utilization of limited and valuable labeled data, we propose a self-adaptive weight redistribution strategy within a batch. This operation takes into account the heterogeneity of labeled data, adjusting its contribution to the overall loss based on sample-specific losses. This enables the model to more accurately identify challenging samples. Our experiments demonstrate that this weight reallocation strategy significantly enhances the model’s generalization ability. Additionally, to enhance intra-class compactness and inter-class separation of the learned features, we introduce a cosine similarity-based discriminative feature learning regularization term. This regularization term aims to reinforce feature consistency within the same class and enhance feature distinctiveness across different classes. Through this mechanism, we facilitate the model to prioritize learning discriminative feature representations, ensuring that features with authentic labels and those with high-confidence pseudo-labels are grouped together, while simultaneously separating features belonging to different clusters. The method can be combined with mainstream Semi-supervised learning methods, which we evaluate experimentally. Our experimental findings illustrate the efficacy of our approach in enhancing the performance of semi-supervised learning tasks across widely utilized image classification datasets.
通过基于伪标记和一致性正则化的各种方法,半监督学习已经取得了显著的成功。尽管努力,有效地利用标记和未标记的数据仍然是一个重大挑战。在本研究中,为了提高有限和有价值的标记数据的有效利用,我们提出了一种自适应的批内权重重新分配策略。该操作考虑了标记数据的异质性,根据样本特定损失调整其对总体损失的贡献。这使得模型能够更准确地识别具有挑战性的样本。我们的实验表明,这种权重重新分配策略显著提高了模型的泛化能力。此外,为了增强学习特征的类内紧密性和类间分离性,我们引入了基于余弦相似度的判别特征学习正则化项。这个正则化术语旨在加强同一类内的特征一致性,并增强不同类之间的特征独特性。通过这种机制,我们促进模型优先学习判别特征表示,确保具有真实标签的特征和具有高置信度伪标签的特征被分组在一起,同时分离属于不同聚类的特征。该方法可以与主流的半监督学习方法相结合,并进行了实验验证。我们的实验结果说明了我们的方法在提高广泛使用的图像分类数据集的半监督学习任务的性能方面的有效性。
{"title":"Deep semi-supervised learning method based on sample adaptive weights and discriminative feature learning","authors":"Jiawei Wang,&nbsp;Weiwei Shi,&nbsp;Xiaofan Wang,&nbsp;Xinhong Hei","doi":"10.1016/j.jvcir.2025.104689","DOIUrl":"10.1016/j.jvcir.2025.104689","url":null,"abstract":"<div><div>Semi-supervised learning has achieved significant success through various approaches based on pseudo-labeling and consistency regularization. Despite efforts, effectively utilizing both labeled and unlabeled data remains a significant challenge. In this study, to enhance the efficient utilization of limited and valuable labeled data, we propose a self-adaptive weight redistribution strategy within a batch. This operation takes into account the heterogeneity of labeled data, adjusting its contribution to the overall loss based on sample-specific losses. This enables the model to more accurately identify challenging samples. Our experiments demonstrate that this weight reallocation strategy significantly enhances the model’s generalization ability. Additionally, to enhance intra-class compactness and inter-class separation of the learned features, we introduce a cosine similarity-based discriminative feature learning regularization term. This regularization term aims to reinforce feature consistency within the same class and enhance feature distinctiveness across different classes. Through this mechanism, we facilitate the model to prioritize learning discriminative feature representations, ensuring that features with authentic labels and those with high-confidence pseudo-labels are grouped together, while simultaneously separating features belonging to different clusters. The method can be combined with mainstream Semi-supervised learning methods, which we evaluate experimentally. Our experimental findings illustrate the efficacy of our approach in enhancing the performance of semi-supervised learning tasks across widely utilized image classification datasets.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104689"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An optical remote sensing ship detection model based on feature diffusion and higher-order relationship modeling 基于特征扩散和高阶关系建模的光学遥感舰船检测模型
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 Epub Date: 2026-01-04 DOI: 10.1016/j.jvcir.2025.104695
Chunman Yan, Ningning Qi
Ship detection plays an increasingly important role in the field of marine monitoring, with Optical Remote Sensing (ORS) technology providing high-resolution spatial and texture information support. However, existing ship detection methods still face significant challenges in accurately detecting small targets, suppressing complex background interference, and modeling cross-scale semantic relationships, limiting their effectiveness in practical applications. Inspired by feature diffusion theory and higher-order spatial interaction mechanisms, this paper proposes a ship detection model for Optical Remote Sensing imagery. Specifically, to address the problem of fine-grained information loss during feature downsampling, the Single-branch and Dual-branch Residual Feature Downsampling (SRFD and DRFD) modules are designed to enhance small target preservation and multi-scale robustness. To capture long-range spatial dependencies and improve robustness against target rotation variations, the Fast Spatial Pyramid Pooling module based on Large Kernel Separable Convolution Attention (SPPF-LSKA) is introduced, enabling efficient large receptive field modeling with rotation-invariant constraints. Furthermore, to dynamically model complex semantic dependencies between different feature scales, the Feature Diffusion Pyramid Network (FDPN) is proposed based on continuous feature diffusion and cross-scale graph reasoning. Experimental results show that model achieves an AP50 of 86.2 % and an AP50-95 of 58.0 % on multiple remote sensing ship detection datasets, with the number of parameters reduced to 2.6 M and the model size compressed to 5.5 MB, significantly outperforming several state-of-the-art models in terms of both detection accuracy and lightweight deployment. These results demonstrate the detection capability, robustness, and application potential of the proposed model in Optical Remote Sensing ship monitoring tasks.
船舶检测在海洋监测领域发挥着越来越重要的作用,光学遥感(ORS)技术提供了高分辨率的空间和纹理信息支持。然而,现有的舰船检测方法在精确检测小目标、抑制复杂背景干扰、跨尺度语义关系建模等方面仍面临重大挑战,限制了其在实际应用中的有效性。基于特征扩散理论和高阶空间相互作用机制,提出了一种用于光学遥感图像的船舶检测模型。具体来说,为了解决特征下采样过程中细粒度信息丢失的问题,设计了单分支和双分支残差特征下采样(SRFD和DRFD)模块,以增强小目标保存和多尺度鲁棒性。为了捕获远程空间依赖性并提高对目标旋转变化的鲁棒性,引入了基于大核可分离卷积注意(SPPF-LSKA)的快速空间金字塔池模块,实现了具有旋转不变约束的高效大接受场建模。在此基础上,提出了基于连续特征扩散和跨尺度图推理的特征扩散金字塔网络(FDPN),对不同特征尺度之间复杂的语义依赖关系进行动态建模。实验结果表明,该模型在多个遥感船舶检测数据集上的AP50和AP50-95分别达到了86.2%和58.0%,参数数量减少到2.6 M,模型尺寸压缩到5.5 MB,在检测精度和轻量化部署方面都明显优于几种最先进的模型。这些结果证明了该模型在光学遥感船舶监测任务中的检测能力、鲁棒性和应用潜力。
{"title":"An optical remote sensing ship detection model based on feature diffusion and higher-order relationship modeling","authors":"Chunman Yan,&nbsp;Ningning Qi","doi":"10.1016/j.jvcir.2025.104695","DOIUrl":"10.1016/j.jvcir.2025.104695","url":null,"abstract":"<div><div>Ship detection plays an increasingly important role in the field of marine monitoring, with Optical Remote Sensing (ORS) technology providing high-resolution spatial and texture information support. However, existing ship detection methods still face significant challenges in accurately detecting small targets, suppressing complex background interference, and modeling cross-scale semantic relationships, limiting their effectiveness in practical applications. Inspired by feature diffusion theory and higher-order spatial interaction mechanisms, this paper proposes a ship detection model for Optical Remote Sensing imagery. Specifically, to address the problem of fine-grained information loss during feature downsampling, the Single-branch and Dual-branch Residual Feature Downsampling (SRFD and DRFD) modules are designed to enhance small target preservation and multi-scale robustness. To capture long-range spatial dependencies and improve robustness against target rotation variations, the Fast Spatial Pyramid Pooling module based on Large Kernel Separable Convolution Attention (SPPF-LSKA) is introduced, enabling efficient large receptive field modeling with rotation-invariant constraints. Furthermore, to dynamically model complex semantic dependencies between different feature scales, the Feature Diffusion Pyramid Network (FDPN) is proposed based on continuous feature diffusion and cross-scale graph reasoning. Experimental results show that model achieves an <em>AP<sub>50</sub></em> of 86.2 % and an <em>AP<sub>50-95</sub></em> of 58.0 % on multiple remote sensing ship detection datasets, with the number of parameters reduced to 2.6 M and the model size compressed to 5.5 MB, significantly outperforming several state-of-the-art models in terms of both detection accuracy and lightweight deployment. These results demonstrate the detection capability, robustness, and application potential of the proposed model in Optical Remote Sensing ship monitoring tasks.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104695"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum to “Lightweight macro-pixel quality enhancement network for light field images compressed by versatile video coding” [J. Vis. Commun. Image Represent. 105 (2024) 104329] “多用途视频编码压缩光场图像的轻量级宏像素质量增强网络”[J]。粘度Commun。图像代表。105 (2024)104329]
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 Epub Date: 2025-12-11 DOI: 10.1016/j.jvcir.2025.104673
Hongyue Huang , Chen Cui , Chuanmin Jia , Xinfeng Zhang , Siwei Ma
{"title":"Corrigendum to “Lightweight macro-pixel quality enhancement network for light field images compressed by versatile video coding” [J. Vis. Commun. Image Represent. 105 (2024) 104329]","authors":"Hongyue Huang ,&nbsp;Chen Cui ,&nbsp;Chuanmin Jia ,&nbsp;Xinfeng Zhang ,&nbsp;Siwei Ma","doi":"10.1016/j.jvcir.2025.104673","DOIUrl":"10.1016/j.jvcir.2025.104673","url":null,"abstract":"","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104673"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vision-language tracking with attention-based optimization 基于注意力优化的视觉语言跟踪
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 Epub Date: 2025-11-15 DOI: 10.1016/j.jvcir.2025.104644
Shuo Hu , Tongtong Liu , Liyang Han , Run Xing
Most existing visual tracking methods typically employ image patches as target references and endeavor to enhance tracking performance by maximizing the utilization of visual information through various deep networks. However, due to the intrinsic limitations of visual information, the performance of the trackers significantly deteriorates when confronted with drastic target variations or complex background environments. To address these issues, we propose a vision-language multimodal fusion tracker for object tracking. Firstly, we use semantic information from language descriptions to compensate for the instability of visual information, and establish multimodal cross-relations through the fusion of visual and language features. Secondly, we propose an attention-based token screening mechanism that utilizes semantic-guided attention and masking operations to eliminate irrelevant search tokens devoid of target information, thereby enhancing both accuracy and efficiency. Furthermore, we optimize the localization head by introducing channel attention, which effectively improves the accuracy of target positioning. Extensive experiments conducted on the OTB99, LaSOT, and TNL2K datasets demonstrate the effectiveness of our proposed tracking method, achieving success rates of 71.2%, 69.5%, and 58.9%, respectively.
现有的大多数视觉跟踪方法通常采用图像补丁作为目标参考,并通过各种深度网络最大限度地利用视觉信息来提高跟踪性能。然而,由于视觉信息固有的局限性,当目标变化剧烈或背景环境复杂时,跟踪器的性能会显著下降。为了解决这些问题,我们提出了一种用于目标跟踪的视觉语言多模态融合跟踪器。首先,利用语言描述中的语义信息弥补视觉信息的不稳定性,通过视觉特征和语言特征的融合建立多模态交叉关系;其次,我们提出了一种基于注意力的令牌筛选机制,该机制利用语义引导的注意力和屏蔽操作来消除缺乏目标信息的不相关搜索令牌,从而提高准确性和效率。此外,通过引入信道注意对定位头进行优化,有效提高了目标定位的精度。在OTB99、LaSOT和TNL2K数据集上进行的大量实验证明了我们提出的跟踪方法的有效性,成功率分别为71.2%、69.5%和58.9%。
{"title":"Vision-language tracking with attention-based optimization","authors":"Shuo Hu ,&nbsp;Tongtong Liu ,&nbsp;Liyang Han ,&nbsp;Run Xing","doi":"10.1016/j.jvcir.2025.104644","DOIUrl":"10.1016/j.jvcir.2025.104644","url":null,"abstract":"<div><div>Most existing visual tracking methods typically employ image patches as target references and endeavor to enhance tracking performance by maximizing the utilization of visual information through various deep networks. However, due to the intrinsic limitations of visual information, the performance of the trackers significantly deteriorates when confronted with drastic target variations or complex background environments. To address these issues, we propose a vision-language multimodal fusion tracker for object tracking. Firstly, we use semantic information from language descriptions to compensate for the instability of visual information, and establish multimodal cross-relations through the fusion of visual and language features. Secondly, we propose an attention-based token screening mechanism that utilizes semantic-guided attention and masking operations to eliminate irrelevant search tokens devoid of target information, thereby enhancing both accuracy and efficiency. Furthermore, we optimize the localization head by introducing channel attention, which effectively improves the accuracy of target positioning. Extensive experiments conducted on the OTB99, LaSOT, and TNL2K datasets demonstrate the effectiveness of our proposed tracking method, achieving success rates of 71.2%, 69.5%, and 58.9%, respectively.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"114 ","pages":"Article 104644"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing aesthetic image generation with reinforcement learning guided prompt optimization in stable diffusion 在稳定扩散中,用强化学习引导提示优化增强美学图像的生成
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 Epub Date: 2025-11-15 DOI: 10.1016/j.jvcir.2025.104641
Junyong You , Yuan Lin , Bin Hu
Generative models, e.g., stable diffusion, excel at producing compelling images but remain highly dependent on crafted prompts. Refining prompts for specific objectives, especially aesthetic quality, is time-consuming and inconsistent. We propose a novel approach that leverages LLMs to enhance prompt refinement process for stable diffusion. First, we propose a model to predict aesthetic image quality, examining various aesthetic elements in spatial, channel, and color domains. Reinforcement learning is employed to refine the prompt, starting from a rudimentary version and iteratively improving them with LLM’s assistance. This iterative process is guided by a policy network updating prompts based on interactions with the generated images, with a reward function measuring aesthetic improvement and adherence to the prompt. Our experimental results demonstrate that this method significantly boosts the visual quality of generated images when using these refined prompts. Beyond image synthesis, this approach provides a broader framework for improving prompts across diverse applications with the support of LLMs.
生成模型,例如,稳定扩散,擅长产生引人注目的图像,但仍然高度依赖于精心制作的提示。针对特定目标(尤其是美学质量)精炼提示既耗时又不一致。我们提出了一种新的方法,利用llm来增强稳定扩散的快速细化过程。首先,我们提出了一个模型来预测美学图像质量,检查空间,通道和颜色域的各种美学元素。强化学习被用来完善提示,从一个基本的版本开始,在LLM的帮助下迭代改进它们。这个迭代过程由一个策略网络指导,该网络基于与生成的图像的交互更新提示,并带有衡量美学改进和对提示的遵守的奖励功能。我们的实验结果表明,当使用这些改进的提示时,该方法显著提高了生成图像的视觉质量。除了图像合成之外,这种方法还提供了一个更广泛的框架,可以在llm的支持下改进不同应用程序之间的提示。
{"title":"Enhancing aesthetic image generation with reinforcement learning guided prompt optimization in stable diffusion","authors":"Junyong You ,&nbsp;Yuan Lin ,&nbsp;Bin Hu","doi":"10.1016/j.jvcir.2025.104641","DOIUrl":"10.1016/j.jvcir.2025.104641","url":null,"abstract":"<div><div>Generative models, e.g., stable diffusion, excel at producing compelling images but remain highly dependent on crafted prompts. Refining prompts for specific objectives, especially aesthetic quality, is time-consuming and inconsistent. We propose a novel approach that leverages LLMs to enhance prompt refinement process for stable diffusion. First, we propose a model to predict aesthetic image quality, examining various aesthetic elements in spatial, channel, and color domains. Reinforcement learning is employed to refine the prompt, starting from a rudimentary version and iteratively improving them with LLM’s assistance. This iterative process is guided by a policy network updating prompts based on interactions with the generated images, with a reward function measuring aesthetic improvement and adherence to the prompt. Our experimental results demonstrate that this method significantly boosts the visual quality of generated images when using these refined prompts. Beyond image synthesis, this approach provides a broader framework for improving prompts across diverse applications with the support of LLMs.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"114 ","pages":"Article 104641"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HySaM: An improved hybrid SAM and Mask R-CNN for underwater instance segmentation HySaM:一种改进的混合SAM和Mask R-CNN用于水下实例分割
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 Epub Date: 2025-11-25 DOI: 10.1016/j.jvcir.2025.104656
Xingfa Wang , Chengjun Chen , Chenggang Dai , Kunhua Liu , Mingxing Lin
Due to inherent absorption and scattering effects, underwater images often exhibit low visibility and significant color deviation. These issues hinder the extraction of discriminative features and adversely impact instance-level segmentation accuracy. To address these challenges, this study proposes a novel Hybrid SAM and Mask R-CNN framework for underwater instance segmentation, integrating the strong generalization capability of SAM with the structural decoding strength of Mask R-CNN. The powerful global modeling ability of SAM effectively mitigates the impact of underwater image degradation, thereby enabling more robust feature representation. Moreover, a novel underwater feature weighted enhancer is introduced in the framework to enhance multi-scale feature fusion and improve the detection of small and scale-varying objects in underwater environments. To provide benchmark data, a large-scale underwater instance segmentation dataset, UW10K, is also constructed, comprising 13,551 images and 22,968 annotated instances across 15 categories. Comprehensive experiments validate the superiority of the proposed model across various instance segmentation tasks. Specifically, it achieves precisions of 74.2 %, 40.5 %, and 70.6 % on UW10K, USIS10K, and WHU Building datasets, respectively. This study is expected to advance ocean exploration and fisheries, while providing valuable training samples for instance segmentation tasks. Datasets and codes are available at https://github.com/xfwang-qut/HySaM.
由于固有的吸收和散射效应,水下图像往往表现出低能见度和显着的色彩偏差。这些问题阻碍了鉴别特征的提取,并对实例级分割的准确性产生不利影响。为了解决这些挑战,本研究提出了一种新的混合SAM和Mask R-CNN框架用于水下实例分割,将SAM的强大泛化能力与Mask R-CNN的结构解码强度相结合。SAM强大的全局建模能力有效减轻了水下图像退化的影响,从而实现了更鲁棒的特征表示。此外,在该框架中引入了一种新的水下特征加权增强器,增强了多尺度特征融合,提高了水下环境中小目标和尺度变化目标的检测能力。为了提供基准数据,还构建了一个大型水下实例分割数据集UW10K,该数据集包括15个类别的13,551张图像和22,968个注释实例。综合实验验证了该模型在各种实例分割任务中的优越性。具体来说,它在UW10K、USIS10K和WHU Building数据集上分别达到了74.2%、40.5%和70.6%的精度。该研究有望推动海洋勘探和渔业,同时为实例分割任务提供有价值的训练样本。数据集和代码可在https://github.com/xfwang-qut/HySaM上获得。
{"title":"HySaM: An improved hybrid SAM and Mask R-CNN for underwater instance segmentation","authors":"Xingfa Wang ,&nbsp;Chengjun Chen ,&nbsp;Chenggang Dai ,&nbsp;Kunhua Liu ,&nbsp;Mingxing Lin","doi":"10.1016/j.jvcir.2025.104656","DOIUrl":"10.1016/j.jvcir.2025.104656","url":null,"abstract":"<div><div>Due to inherent absorption and scattering effects, underwater images often exhibit low visibility and significant color deviation. These issues hinder the extraction of discriminative features and adversely impact instance-level segmentation accuracy. To address these challenges, this study proposes a novel Hybrid SAM and Mask R-CNN framework for underwater instance segmentation, integrating the strong generalization capability of SAM with the structural decoding strength of Mask R-CNN. The powerful global modeling ability of SAM effectively mitigates the impact of underwater image degradation, thereby enabling more robust feature representation. Moreover, a novel underwater feature weighted enhancer is introduced in the framework to enhance multi-scale feature fusion and improve the detection of small and scale-varying objects in underwater environments. To provide benchmark data, a large-scale underwater instance segmentation dataset, UW10K, is also constructed, comprising 13,551 images and 22,968 annotated instances across 15 categories. Comprehensive experiments validate the superiority of the proposed model across various instance segmentation tasks. Specifically, it achieves precisions of 74.2 %, 40.5 %, and 70.6 % on UW10K, USIS10K, and WHU Building datasets, respectively. This study is expected to advance ocean exploration and fisheries, while providing valuable training samples for instance segmentation tasks. Datasets and codes are available at <span><span>https://github.com/xfwang-qut/HySaM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104656"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-distance near-infrared face recognition 跨距离近红外人脸识别
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 Epub Date: 2025-12-17 DOI: 10.1016/j.jvcir.2025.104691
Da Ai , Yunqiao Wang , Kai Jia , Zhike Ji , Ying Liu
In the actual video surveillance application scenarios, the imaging difference between near infrared (NIR) and visible light (VIS) spectrum and the photo distance are two important factors that restrict the accuracy of near infrared face recognition. In this paper, we first use a fixed focus near-infrared camera to capture NIR face images at different distances, constructing a large Cross-Spectral and Cross-Distance Face dataset (CSCD-F), and in order to improve recognition accuracy, we employ image enhancement techniques to preprocess low-quality face images. Furthermore, we adjusted the sampling depth of the generator in the CycleGAN network and introduced additional edge loss, proposing a general framework that combines generative models and transfer learning to achieve spectral feature translation between NIR and VIS images. The proposed method can effectively convert NIR face images into VIS images while retaining sufficient identity information. Various experimental results demonstrate that the proposed method achieves significant performance improvements on the self-built CSCD-F dataset. Additionally, it validates the generalization capability and effectiveness of the proposed method on public datasets such as HFB and Oulu-CASIA NIR-VIS.
在实际视频监控应用场景中,近红外(NIR)与可见光(VIS)光谱的成像差异以及拍摄距离是制约近红外人脸识别精度的两个重要因素。本文首先利用定焦近红外相机采集不同距离的近红外人脸图像,构建大型跨光谱和跨距离人脸数据集(Cross-Spectral and Cross-Distance face dataset, CSCD-F),并采用图像增强技术对低质量人脸图像进行预处理,以提高识别精度。此外,我们调整了CycleGAN网络中生成器的采样深度,并引入了额外的边缘损失,提出了一个结合生成模型和迁移学习的通用框架,以实现近红外和VIS图像之间的光谱特征转换。该方法可以有效地将近红外人脸图像转换为VIS图像,同时保留足够的身份信息。各种实验结果表明,该方法在自建的CSCD-F数据集上取得了显著的性能提升。此外,在HFB和Oulu-CASIA NIR-VIS等公共数据集上验证了该方法的泛化能力和有效性。
{"title":"Cross-distance near-infrared face recognition","authors":"Da Ai ,&nbsp;Yunqiao Wang ,&nbsp;Kai Jia ,&nbsp;Zhike Ji ,&nbsp;Ying Liu","doi":"10.1016/j.jvcir.2025.104691","DOIUrl":"10.1016/j.jvcir.2025.104691","url":null,"abstract":"<div><div>In the actual video surveillance application scenarios, the imaging difference between near infrared (NIR) and visible light (VIS) spectrum and the photo distance are two important factors that restrict the accuracy of near infrared face recognition. In this paper, we first use a fixed focus near-infrared camera to capture NIR face images at different distances, constructing a large Cross-Spectral and Cross-Distance Face dataset (CSCD-F), and in order to improve recognition accuracy, we employ image enhancement techniques to preprocess low-quality face images. Furthermore, we adjusted the sampling depth of the generator in the CycleGAN network and introduced additional edge loss, proposing a general framework that combines generative models and transfer learning to achieve spectral feature translation between NIR and VIS images. The proposed method can effectively convert NIR face images into VIS images while retaining sufficient identity information. Various experimental results demonstrate that the proposed method achieves significant performance improvements on the self-built CSCD-F dataset. Additionally, it validates the generalization capability and effectiveness of the proposed method on public datasets such as HFB and Oulu-CASIA NIR-VIS.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104691"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Facial image super-resolution network for confusing arbitrary gender classifiers 用于混淆任意性别分类器的面部图像超分辨率网络
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 Epub Date: 2025-11-11 DOI: 10.1016/j.jvcir.2025.104642
Jiliang Wang , Jia Liu , Siwang Zhou
Existing facial image super-resolution methods have identified the capacity to transform low-resolution facial images into high-resolution ones. However, clearer high-resolution facial images increase the possibility of accurately extracting soft biometric features, such as gender, posing a significant risk of privacy leakage. To address this issue, we propose a gender-protected face super-resolution network, which can incorporate gender-identified privacy information by introducing fine image distortion during the super-resolution process. It progressively transforms low-resolution images into high-resolution ones while partially disturbing the face images. This procedure ensures that the generated super-resolution facial images can still be utilized by face matchers for matching purposes, but are less reliable for attribute classifiers that attempt to extract gender features. Furthermore, we introduce leaping adversarial learning to help the super-resolution network to generate gender-protected facial images and work on arbitrary gender classifiers. Extensive experiments have been conducted using multiple face matchers and gender classifiers to evaluate the effectiveness of the proposed network. The results also demonstrate that our proposed image super-resolution network is adaptable to arbitrary attribute classifiers for protecting gender privacy, while preserving facial image quality.
现有的人脸图像超分辨率方法已经确定了将低分辨率人脸图像转换为高分辨率人脸图像的能力。然而,更清晰的高分辨率面部图像增加了准确提取软生物特征(如性别)的可能性,这带来了重大的隐私泄露风险。为了解决这一问题,我们提出了一种性别保护的人脸超分辨率网络,该网络通过在超分辨率过程中引入精细的图像失真来融合性别识别的隐私信息。它在局部干扰人脸图像的同时,逐步将低分辨率图像转换为高分辨率图像。这一过程确保了生成的超分辨率人脸图像仍然可以被人脸匹配器用于匹配目的,但对于试图提取性别特征的属性分类器来说,可靠性较低。此外,我们引入了跳跃对抗学习来帮助超分辨率网络生成性别保护的面部图像,并在任意性别分类器上工作。大量的实验已经使用多个人脸匹配器和性别分类器来评估所提出的网络的有效性。结果还表明,我们提出的图像超分辨率网络可以适应任意属性分类器,在保持面部图像质量的同时保护性别隐私。
{"title":"Facial image super-resolution network for confusing arbitrary gender classifiers","authors":"Jiliang Wang ,&nbsp;Jia Liu ,&nbsp;Siwang Zhou","doi":"10.1016/j.jvcir.2025.104642","DOIUrl":"10.1016/j.jvcir.2025.104642","url":null,"abstract":"<div><div>Existing facial image super-resolution methods have identified the capacity to transform low-resolution facial images into high-resolution ones. However, clearer high-resolution facial images increase the possibility of accurately extracting soft biometric features, such as gender, posing a significant risk of privacy leakage. To address this issue, we propose a gender-protected face super-resolution network, which can incorporate gender-identified privacy information by introducing fine image distortion during the super-resolution process. It progressively transforms low-resolution images into high-resolution ones while partially disturbing the face images. This procedure ensures that the generated super-resolution facial images can still be utilized by face matchers for matching purposes, but are less reliable for attribute classifiers that attempt to extract gender features. Furthermore, we introduce leaping adversarial learning to help the super-resolution network to generate gender-protected facial images and work on arbitrary gender classifiers. Extensive experiments have been conducted using multiple face matchers and gender classifiers to evaluate the effectiveness of the proposed network. The results also demonstrate that our proposed image super-resolution network is adaptable to arbitrary attribute classifiers for protecting gender privacy, while preserving facial image quality.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"114 ","pages":"Article 104642"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Visual Communication and Image Representation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1