首页 > 最新文献

Journal of Visual Communication and Image Representation最新文献

英文 中文
3D human mesh recovery: Comparative review, models, and prospects 三维人体网格恢复:比较回顾,模型和前景
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 DOI: 10.1016/j.jvcir.2025.104699
Wonjun Kim
As a demand for immersive services increases in various fields, the ability to express objects or scenes in 3D has become essential. In particular, 3D human modeling has gained considerable attentions due to its plentiful possibilities for daily life as well as industrial applications. The first step of 3D human modeling is to restore a mesh, which is commonly defined as a set of connected vertices in the 3D space, from images and videos. This is so-called human mesh recovery (HMR). Such HMR has been studied based on complicated optimization techniques, however, owing to the great success of deep learning in recent years, it has been reformulated as a simple regression problem, thus numerous studies are now being actively conducted. This paper aims at providing a comprehensive review with a special focus on deep learning-based methods for HMR. Specifically, this paper covers a systematic taxonomy along with questions at the heart of each research period, diverse methodologies, and abundant performance evaluations on benchmark datasets both qualitatively and quantitatively, and also gives constructive discussions for realization of HMR-based commercialization services. This review is expected to serve as a concise handbook to HMR rather than a vast collection of existing studies.
随着各个领域对沉浸式服务需求的增加,以3D方式表达物体或场景的能力变得至关重要。特别是三维人体建模由于其在日常生活和工业应用中的丰富可能性而受到了相当大的关注。人体三维建模的第一步是从图像和视频中恢复网格,网格通常被定义为3D空间中连接的一组顶点。这就是所谓的人体网状恢复(HMR)。这种HMR的研究一直基于复杂的优化技术,但由于近年来深度学习的巨大成功,它被重新表述为一个简单的回归问题,因此目前正在积极进行大量的研究。本文旨在提供一个全面的综述,特别关注基于深度学习的HMR方法。具体而言,本文涵盖了系统的分类以及每个研究时期的核心问题,多样化的方法,以及对基准数据集进行定性和定量的大量绩效评估,并为实现基于人力资源管理的商业化服务提供了建设性的讨论。这篇综述预计将作为一个简明的人力资源管理手册,而不是现有研究的大量收集。
{"title":"3D human mesh recovery: Comparative review, models, and prospects","authors":"Wonjun Kim","doi":"10.1016/j.jvcir.2025.104699","DOIUrl":"10.1016/j.jvcir.2025.104699","url":null,"abstract":"<div><div>As a demand for immersive services increases in various fields, the ability to express objects or scenes in 3D has become essential. In particular, 3D human modeling has gained considerable attentions due to its plentiful possibilities for daily life as well as industrial applications. The first step of 3D human modeling is to restore a mesh, which is commonly defined as a set of connected vertices in the 3D space, from images and videos. This is so-called human mesh recovery (HMR). Such HMR has been studied based on complicated optimization techniques, however, owing to the great success of deep learning in recent years, it has been reformulated as a simple regression problem, thus numerous studies are now being actively conducted. This paper aims at providing a comprehensive review with a special focus on deep learning-based methods for HMR. Specifically, this paper covers a systematic taxonomy along with questions at the heart of each research period, diverse methodologies, and abundant performance evaluations on benchmark datasets both qualitatively and quantitatively, and also gives constructive discussions for realization of HMR-based commercialization services. This review is expected to serve as a concise handbook to HMR rather than a vast collection of existing studies.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104699"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Theft model-based black-box adversarial attack in embedding space 基于盗窃模型的嵌入空间黑盒对抗攻击
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 DOI: 10.1016/j.jvcir.2025.104702
Rui Zhang , Shuliang Jiang , Zi Kang , Shuo Xu , Yuanlong Lv , Hui Xia
Existing transfer-based adversarial attacks suffer from poor transferability due to limitations of the proxy dataset or inaccurate imitation of the target model by the substitute model. Thus, we propose a theft model-based black-box adversarial attack in embedding space. The substitute model acts as the discriminator of the generative adversarial network, and we introduce a diversity loss to train the generator without relying on a proxy dataset, enabling it to imitate the target model better. Furthermore, we design a combined adversarial attack method that integrates the gradient-based attack and natural evolution strategy to construct adversarial examples in the embedding space search. This ensures that the adversarial examples are compelling on both the target and the substitute models. Experimental results demonstrate that our method has good imitation ability and transferability. When using VGG16, OUR outperforms TREMBA by 14.71% in un-targeted attack success rate and shows a 13.49% improvement in targeted attacks.
现有的基于转移的对抗性攻击由于代理数据集的限制或替代模型对目标模型的不准确模仿,存在可转移性差的问题。因此,我们在嵌入空间中提出了一种基于盗窃模型的黑盒对抗攻击。替代模型作为生成对抗网络的鉴别器,我们引入多样性损失来训练生成器,而不依赖于代理数据集,使其能够更好地模仿目标模型。在此基础上,设计了一种结合梯度攻击和自然进化策略的组合对抗攻击方法,在嵌入空间搜索中构造对抗样本。这确保了对抗性示例在目标模型和替代模型上都是引人注目的。实验结果表明,该方法具有良好的模仿能力和可移植性。使用VGG16时,OUR的非目标攻击成功率比TREMBA高14.71%,目标攻击成功率比TREMBA高13.49%。
{"title":"Theft model-based black-box adversarial attack in embedding space","authors":"Rui Zhang ,&nbsp;Shuliang Jiang ,&nbsp;Zi Kang ,&nbsp;Shuo Xu ,&nbsp;Yuanlong Lv ,&nbsp;Hui Xia","doi":"10.1016/j.jvcir.2025.104702","DOIUrl":"10.1016/j.jvcir.2025.104702","url":null,"abstract":"<div><div>Existing transfer-based adversarial attacks suffer from poor transferability due to limitations of the proxy dataset or inaccurate imitation of the target model by the substitute model. Thus, we propose a theft model-based black-box adversarial attack in embedding space. The substitute model acts as the discriminator of the generative adversarial network, and we introduce a diversity loss to train the generator without relying on a proxy dataset, enabling it to imitate the target model better. Furthermore, we design a combined adversarial attack method that integrates the gradient-based attack and natural evolution strategy to construct adversarial examples in the embedding space search. This ensures that the adversarial examples are compelling on both the target and the substitute models. Experimental results demonstrate that our method has good imitation ability and transferability. When using VGG16, OUR outperforms TREMBA by 14.71% in un-targeted attack success rate and shows a 13.49% improvement in targeted attacks.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104702"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CQR-UC: A color QR code-based underwater wireless communication method with GAN-based image enhancement CQR-UC:一种基于gan图像增强的彩色二维码水下无线通信方法
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 DOI: 10.1016/j.jvcir.2026.104708
Zheng Zhao , Yufan Feng , Shangxin Li , Qian Mao , Tingwei Chen , Qi Zhao , Xiaoya Fan
Underwater wireless communication is critical for ocean exploration, but traditional technologies often suffer from high cost, high power consumption, and bulky equipment. This paper presents CQR-UC, a short-range underwater wireless communication method based on color QR (CQR) codes. To improve the recognition of CQR codes in underwater environments, a CQR-GAN model is proposed to enhance CQR code images. In addition, a dedicated underwater communication protocol, CUP, is designed to support continuous and reliable bidirectional data transmission. Experimental results demonstrate the efficacy and performance of CQR-UC in various underwater environments. On resource-constrained devices, our CQR-UC system can achieve a cost below $40, power consumption under 15 W, and excellent portability. The code is publicly available at https://github.com/XploreAI-Lab/CQR-UC.
水下无线通信对于海洋探测至关重要,但传统技术往往存在成本高、功耗大、设备体积大等问题。提出了一种基于彩色QR码(CQR码)的水下短距离无线通信方法CQR- uc。为了提高水下环境下CQR码的识别率,提出了一种CQR- gan模型来增强CQR码图像。此外,还设计了专用的水下通信协议CUP,以支持连续可靠的双向数据传输。实验结果证明了CQR-UC在各种水下环境下的有效性和性能。在资源受限的设备上,我们的CQR-UC系统可以实现成本低于40美元,功耗低于15 W,并且具有出色的可移植性。该代码可在https://github.com/XploreAI-Lab/CQR-UC上公开获得。
{"title":"CQR-UC: A color QR code-based underwater wireless communication method with GAN-based image enhancement","authors":"Zheng Zhao ,&nbsp;Yufan Feng ,&nbsp;Shangxin Li ,&nbsp;Qian Mao ,&nbsp;Tingwei Chen ,&nbsp;Qi Zhao ,&nbsp;Xiaoya Fan","doi":"10.1016/j.jvcir.2026.104708","DOIUrl":"10.1016/j.jvcir.2026.104708","url":null,"abstract":"<div><div>Underwater wireless communication is critical for ocean exploration, but traditional technologies often suffer from high cost, high power consumption, and bulky equipment. This paper presents CQR-UC, a short-range underwater wireless communication method based on color QR (CQR) codes. To improve the recognition of CQR codes in underwater environments, a CQR-GAN model is proposed to enhance CQR code images. In addition, a dedicated underwater communication protocol, CUP, is designed to support continuous and reliable bidirectional data transmission. Experimental results demonstrate the efficacy and performance of CQR-UC in various underwater environments. On resource-constrained devices, our CQR-UC system can achieve a cost below $40, power consumption under 15 W, and excellent portability. The code is publicly available at <span><span>https://github.com/XploreAI-Lab/CQR-UC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104708"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
USRNet: A simple yet effective Underwater Scene Restoration Network USRNet:一个简单而有效的水下场景恢复网络
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 DOI: 10.1016/j.jvcir.2026.104710
Shabnam Thakur, Jhilik Bhattacharya, Shailendra Tiwari
Ensuring clear underwater visibility is crucial for disciplines such as marine robotics, oceanography and marine biology, which require fast, high-quality image processing to support real-time analysis. This paper introduces the Underwater Scene Restoration network(USRNet), whose efficient and lightweight architecture overcomes critical limitations of prior work. Unlike current approaches that separately estimate complex, underwater optical parameters, our proposed USRNet employs an end-to-end approach to jointly estimate the Transmission Map(TM) and Background Light(BL) within the network using a reformulated version of the Atmospheric Scattering Model(ASM). The USRNet consists of two distinct modules. First, a dedicated Color Cast Removal(CCR) module to neutralize color casts by learning the the inherent color shifts in underwater scenes. Second, the Scene Radiance Estimation (SRE) module, which focuses on reconstructing a high-quality approximation of the final restored image. Comprehensive evaluations across multiple datasets validate our approach in both quantitative and qualitative metrics.
确保清晰的水下能见度对于海洋机器人、海洋学和海洋生物学等学科至关重要,这些学科需要快速、高质量的图像处理来支持实时分析。本文介绍了水下场景恢复网络(USRNet),其高效、轻量级的结构克服了以往工作的严重局限性。与目前单独估计复杂水下光学参数的方法不同,我们提出的USRNet采用端到端方法,使用重新制定的大气散射模型(ASM)来联合估计网络内的传输图(TM)和背景光(BL)。USRNet由两个不同的模块组成。首先,一个专用的偏色去除(CCR)模块,通过学习在水下场景固有的色彩偏移来中和偏色。其次,场景亮度估计(SRE)模块,重点是重建最终恢复图像的高质量近似值。跨多个数据集的综合评估在定量和定性指标上验证了我们的方法。
{"title":"USRNet: A simple yet effective Underwater Scene Restoration Network","authors":"Shabnam Thakur,&nbsp;Jhilik Bhattacharya,&nbsp;Shailendra Tiwari","doi":"10.1016/j.jvcir.2026.104710","DOIUrl":"10.1016/j.jvcir.2026.104710","url":null,"abstract":"<div><div>Ensuring clear underwater visibility is crucial for disciplines such as marine robotics, oceanography and marine biology, which require fast, high-quality image processing to support real-time analysis. This paper introduces the Underwater Scene Restoration network(USRNet), whose efficient and lightweight architecture overcomes critical limitations of prior work. Unlike current approaches that separately estimate complex, underwater optical parameters, our proposed USRNet employs an end-to-end approach to jointly estimate the Transmission Map(TM) and Background Light(BL) within the network using a reformulated version of the Atmospheric Scattering Model(ASM). The USRNet consists of two distinct modules. First, a dedicated Color Cast Removal(CCR) module to neutralize color casts by learning the the inherent color shifts in underwater scenes. Second, the Scene Radiance Estimation (SRE) module, which focuses on reconstructing a high-quality approximation of the final restored image. Comprehensive evaluations across multiple datasets validate our approach in both quantitative and qualitative metrics.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104710"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3D human pose estimation based on a Hybrid approach of Transformer and GCN-Former 基于Transformer和GCN-Former混合方法的三维人体姿态估计
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 DOI: 10.1016/j.jvcir.2025.104696
Xiaojian Pan , Guang Li , Ningfei Zhang , Jianjun Li
Recently, self-supervised pretraining paradigms have been extensively investigated in the domain of skeleton-based 3D human pose estimation. Especially, methods based on masked prediction have elevated the performance of pretraining to new heights. The proposed two-stage model aims to capture richer and more significant information. Specifically, the pretraining module is designed to extract enhanced representations, and the Hybrid Dual-Stream Spatio-Temporal Network (HDSTN) processes these representations to recover detailed 3D pose information. In the pretraining phase, an improved teacher model uses the original input data to generate prediction targets for the student model. The proposed Hybrid Dual-Stream Spatio-Temporal Network (HDSTN) integrates Transformer-GCNFormer (TGFormer) blocks, which employ two parallel processing streams. The Transformer stream captures long-range dependencies, while the GCNFormer stream focuses on learning local spatial–temporal relationships between joints. By combining the strengths of both approaches, TGFormer reduces dimensionality efficiently and provides a more comprehensive representation of the 3D human pose structure. The local relationships between adjacent joints are leveraged by the GCNFormer module to generate a new representation that complements the Transformer’s output. By adaptively fusing these two representations, TGFormer demonstrates enhanced capability in learning the underlying 3D structure. This manuscript extends our earlier conference paper [Zhang et al., 2024 (AIHCIR)] which introduced a two-stage transformer-based pipeline for 3D pose lifting.
近年来,自监督预训练范式在基于骨骼的三维人体姿态估计领域得到了广泛的研究。特别是基于掩模预测的方法将预训练的性能提升到一个新的高度。提出的两阶段模型旨在捕获更丰富和更重要的信息。具体而言,预训练模块用于提取增强表征,混合双流时空网络(HDSTN)处理这些表征以恢复详细的三维姿态信息。在预训练阶段,改进的教师模型使用原始输入数据为学生模型生成预测目标。提出的混合双流时空网络(HDSTN)集成了Transformer-GCNFormer (TGFormer)模块,采用两个并行处理流。Transformer流捕获远程依赖关系,而GCNFormer流专注于学习关节之间的局部时空关系。通过结合两种方法的优势,TGFormer有效地降低了维度,并提供了3D人体姿势结构的更全面的表示。GCNFormer模块利用相邻节点之间的局部关系来生成补充Transformer输出的新表示。通过自适应融合这两种表示,TGFormer展示了增强的学习底层3D结构的能力。该手稿扩展了我们之前的会议论文[Zhang等人,2024 (AIHCIR)],该论文介绍了用于3D姿态提升的基于两级变压器的管道。
{"title":"3D human pose estimation based on a Hybrid approach of Transformer and GCN-Former","authors":"Xiaojian Pan ,&nbsp;Guang Li ,&nbsp;Ningfei Zhang ,&nbsp;Jianjun Li","doi":"10.1016/j.jvcir.2025.104696","DOIUrl":"10.1016/j.jvcir.2025.104696","url":null,"abstract":"<div><div>Recently, self-supervised pretraining paradigms have been extensively investigated in the domain of skeleton-based 3D human pose estimation. Especially, methods based on masked prediction have elevated the performance of pretraining to new heights. The proposed two-stage model aims to capture richer and more significant information. Specifically, the pretraining module is designed to extract enhanced representations, and the Hybrid Dual-Stream Spatio-Temporal Network (HDSTN) processes these representations to recover detailed 3D pose information. In the pretraining phase, an improved teacher model uses the original input data to generate prediction targets for the student model. The proposed Hybrid Dual-Stream Spatio-Temporal Network (HDSTN) integrates Transformer-GCNFormer (TGFormer) blocks, which employ two parallel processing streams. The Transformer stream captures long-range dependencies, while the GCNFormer stream focuses on learning local spatial–temporal relationships between joints. By combining the strengths of both approaches, TGFormer reduces dimensionality efficiently and provides a more comprehensive representation of the 3D human pose structure. The local relationships between adjacent joints are leveraged by the GCNFormer module to generate a new representation that complements the Transformer’s output. By adaptively fusing these two representations, TGFormer demonstrates enhanced capability in learning the underlying 3D structure. This manuscript extends our earlier conference paper [Zhang et al., 2024 (AIHCIR)] which introduced a two-stage transformer-based pipeline for 3D pose lifting.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104696"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DU-Net: A Dual U-Net for semantic text-guided style transfer 语义文本引导风格迁移的双U-Net
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 DOI: 10.1016/j.jvcir.2025.104680
Yang Zhao, Yongsheng Dong
CLIPstyler is a typical text-guided style transfer method. However, it often causes problems of content distortion and stylization inconsistency in the generated images. To alleviate this issue, in this paper we propose a Dual U-Net (DU-Net) architecture for semantic text-guided style transfer. Specifically, we first construct a Channel–Spatial Fusion Attention (CSFA) module and add it after each upsampling in our DU-Net. It simultaneously considers channel and spatial information to enhance the DU-Net’s ability to interpret and utilize input features. In addition, we design a novel loss function, Context-Aware Intersection over Union (CAIoU), which combines context aggregation with traditional IoU to optimize style transfer by balancing stylization and content preservation. Extensive qualitative and quantitative experiments on various images and texts show that the stylized images generated by our DU-Net outperform several representative methods in terms of style fidelity and content completeness. Code can be found at https://github.com/ZhaoMyang/DU-Net.
CLIPstyler是一种典型的文本引导样式转移方法。但是,它经常会导致生成的图像出现内容失真和样式不一致的问题。为了缓解这一问题,本文提出了一种用于语义文本引导风格迁移的双U-Net (DU-Net)架构。具体来说,我们首先构建了一个信道空间融合注意(CSFA)模块,并在每次上采样后将其添加到我们的du网络中。它同时考虑信道和空间信息,以增强DU-Net解释和利用输入特征的能力。此外,我们还设计了一种新的损失函数,上下文感知的联合交叉(CAIoU),它将上下文聚合与传统的IoU结合起来,通过平衡风格化和内容保存来优化风格转移。在各种图像和文本上进行的大量定性和定量实验表明,我们的DU-Net生成的风格化图像在风格保真度和内容完整性方面优于几种代表性方法。代码可以在https://github.com/ZhaoMyang/DU-Net上找到。
{"title":"DU-Net: A Dual U-Net for semantic text-guided style transfer","authors":"Yang Zhao,&nbsp;Yongsheng Dong","doi":"10.1016/j.jvcir.2025.104680","DOIUrl":"10.1016/j.jvcir.2025.104680","url":null,"abstract":"<div><div>CLIPstyler is a typical text-guided style transfer method. However, it often causes problems of content distortion and stylization inconsistency in the generated images. To alleviate this issue, in this paper we propose a Dual U-Net (DU-Net) architecture for semantic text-guided style transfer. Specifically, we first construct a Channel–Spatial Fusion Attention (CSFA) module and add it after each upsampling in our DU-Net. It simultaneously considers channel and spatial information to enhance the DU-Net’s ability to interpret and utilize input features. In addition, we design a novel loss function, Context-Aware Intersection over Union (CAIoU), which combines context aggregation with traditional IoU to optimize style transfer by balancing stylization and content preservation. Extensive qualitative and quantitative experiments on various images and texts show that the stylized images generated by our DU-Net outperform several representative methods in terms of style fidelity and content completeness. Code can be found at <span><span>https://github.com/ZhaoMyang/DU-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104680"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gradient degradation-aware rate control for VVC using Nash equilibrium 基于纳什均衡的梯度退化感知VVC速率控制
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 DOI: 10.1016/j.jvcir.2025.104705
Chenpeng Lu , Huanqiang Zeng , Chao Jiao , Jing Chen , Qi Lin , Huijie Zheng
The λ-domain rate control in Versatile Video Coding (VVC) achieves remarkable performance, enabling higher visual quality under stringent bit constraints. However, for high-resolution video, a significant portion of Coding Tree Units (CTUs) are the skipped block, and using the updated parameters from skip CTUs may lead to unreasonable bit allocation. This paper presents a novel rate control method for VVC that separately allocates bits to skip and non-skip CTUs. Skip CTUs’ bits can be calculated at once based on the frame’s target bits per pixel (bpp) without separate calculations. For non-skip CTUs, we first formulate the bit allocation problem as a Nash equilibrium problem inspired by the definition of the game theory. Subsequently, we design a utility function based on Gradient Magnitude Similarity Deviation (GMSD) to quantify the degradation in gradient information caused by encoding. The λ parameter for bit allocation of non-skip CTUs is calculated accordingly. The proposed method is implemented in VTM 13.0, and experimental results confirm its effectiveness in enhancing visual quality and achieving significant reductions in bit rates.
通用视频编码(VVC)中的λ域速率控制可以在严格的比特约束下实现更高的视觉质量。然而,对于高分辨率视频,很大一部分编码树单元(Coding Tree Units, ctu)是跳过的块,使用跳过的ctu更新的参数可能会导致比特分配不合理。提出了一种新的VVC速率控制方法,将比特分别分配给跳跃式和非跳跃式ctu。跳过ctu的比特可以根据帧的目标比特每像素(bpp)立即计算,而无需单独计算。对于非跳过的ctu,我们首先根据博弈论的定义将比特分配问题表述为纳什均衡问题。随后,我们设计了一个基于梯度量级相似偏差(GMSD)的效用函数来量化编码导致的梯度信息退化。据此计算了用于非跳变cpu位分配的λ参数。在VTM 13.0中实现了该方法,实验结果证实了该方法在提高视觉质量和显著降低比特率方面的有效性。
{"title":"Gradient degradation-aware rate control for VVC using Nash equilibrium","authors":"Chenpeng Lu ,&nbsp;Huanqiang Zeng ,&nbsp;Chao Jiao ,&nbsp;Jing Chen ,&nbsp;Qi Lin ,&nbsp;Huijie Zheng","doi":"10.1016/j.jvcir.2025.104705","DOIUrl":"10.1016/j.jvcir.2025.104705","url":null,"abstract":"<div><div>The <span><math><mi>λ</mi></math></span>-domain rate control in Versatile Video Coding (VVC) achieves remarkable performance, enabling higher visual quality under stringent bit constraints. However, for high-resolution video, a significant portion of Coding Tree Units (CTUs) are the skipped block, and using the updated parameters from skip CTUs may lead to unreasonable bit allocation. This paper presents a novel rate control method for VVC that separately allocates bits to skip and non-skip CTUs. Skip CTUs’ bits can be calculated at once based on the frame’s target bits per pixel (bpp) without separate calculations. For non-skip CTUs, we first formulate the bit allocation problem as a Nash equilibrium problem inspired by the definition of the game theory. Subsequently, we design a utility function based on Gradient Magnitude Similarity Deviation (GMSD) to quantify the degradation in gradient information caused by encoding. The <span><math><mi>λ</mi></math></span> parameter for bit allocation of non-skip CTUs is calculated accordingly. The proposed method is implemented in VTM 13.0, and experimental results confirm its effectiveness in enhancing visual quality and achieving significant reductions in bit rates.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104705"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mitigating bias in Few Shot Class Incremental Learning with Feature Augmentation and Logits Mix-up 基于特征增强和Logits混合的少镜头类增量学习中的偏差缓解
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-24 DOI: 10.1016/j.jvcir.2025.104687
Krishna Kumar Singh, K. Hima Bindu
Deep neural networks often excel at specific tasks, but face significant challenges with catastrophic forgetting when learning new classes incrementally. Recent state-of-the-art approaches in Few-Shot Class-Incremental Learning (FSCIL) predominantly utilize a decoupled classifier functioning as a prototype-only head after pre-training. The prototypes for new classes, derived from limited instances, tend to exhibit higher bias, resulting in poor performance at incremental classes. Recent methods have approached this issue through various prototype calibrations or classifier fine-tuning. However, they have limited improvements during incremental sessions.
This paper presents a novel approach to mitigate bias through Feature Augmentation and an entropy-weighted Logits Mix-up (FALM) method. The embedding derived from the final layer of the feature extractor is task-specific and may overlook certain features of unseen classes. This work incorporates a missing-pass filter applied to the features of an intermediate layer to augment the embedding of the final layer. Additionally, a logits mix-up is employed to reduce bias further by applying an entropy-weighted combination of logits resulting from three separate heads. Experiments on the miniImageNet, CIFAR100, and CUB200 datasets show that FALM achieves better performance compared to state-of-the-art (SOTA) models.
深度神经网络通常在特定任务上表现出色,但在逐步学习新课程时面临灾难性遗忘的重大挑战。在最近的几次类增量学习(FSCIL)中,最先进的方法主要利用预训练后的解耦分类器作为仅原型头部。来自有限实例的新类的原型倾向于表现出更高的偏差,从而导致增量类的性能较差。最近的方法通过各种原型校准或分类器微调来解决这个问题。然而,在增量会话期间,它们的改进有限。本文提出了一种通过特征增强和熵加权Logits混合(FALM)方法来减轻偏置的新方法。从特征提取器的最后一层派生的嵌入是特定于任务的,可能会忽略未见类的某些特征。这项工作将漏通滤波器应用于中间层的特征,以增强最后一层的嵌入。此外,通过应用由三个单独的正面产生的logits的熵加权组合,采用logits混合来进一步减少偏差。在miniImageNet、CIFAR100和CUB200数据集上的实验表明,与最先进的(SOTA)模型相比,FALM实现了更好的性能。
{"title":"Mitigating bias in Few Shot Class Incremental Learning with Feature Augmentation and Logits Mix-up","authors":"Krishna Kumar Singh,&nbsp;K. Hima Bindu","doi":"10.1016/j.jvcir.2025.104687","DOIUrl":"10.1016/j.jvcir.2025.104687","url":null,"abstract":"<div><div>Deep neural networks often excel at specific tasks, but face significant challenges with catastrophic forgetting when learning new classes incrementally. Recent state-of-the-art approaches in Few-Shot Class-Incremental Learning (FSCIL) predominantly utilize a decoupled classifier functioning as a prototype-only head after pre-training. The prototypes for new classes, derived from limited instances, tend to exhibit higher bias, resulting in poor performance at incremental classes. Recent methods have approached this issue through various prototype calibrations or classifier fine-tuning. However, they have limited improvements during incremental sessions.</div><div>This paper presents a novel approach to mitigate bias through <strong>F</strong>eature <strong>A</strong>ugmentation and an entropy-weighted <strong>L</strong>ogits <strong>M</strong>ix-up (FALM) method. The embedding derived from the final layer of the feature extractor is task-specific and may overlook certain features of unseen classes. This work incorporates a missing-pass filter applied to the features of an intermediate layer to augment the embedding of the final layer. Additionally, a logits mix-up is employed to reduce bias further by applying an entropy-weighted combination of logits resulting from three separate heads. Experiments on the miniImageNet, CIFAR100, and CUB200 datasets show that FALM achieves better performance compared to state-of-the-art (SOTA) models.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104687"},"PeriodicalIF":3.1,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FFTDiff: Tuning-free image texture transfer based on diffusion model FFTDiff:基于扩散模型的免调优图像纹理传输
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-22 DOI: 10.1016/j.jvcir.2025.104681
Shilin Li, Hao Wang, Anna Zhu
Image texture transfer is pivotal in computer vision, holding extensive application potential. Existing methods typically transfer color alongside texture, lacking inherent color preservation and thus requiring a cumbersome two-stage process: color alignment followed by style transfer. The recent emergence of diffusion models has significantly advanced this field; however, current diffusion-based approaches usually necessitate additional training. To address this, we propose FFTDiff, a novel texture transfer model leveraging pre-trained diffusion models and the Fast Fourier Transform (FFT), eliminating extra training requirements. FFTDiff disentangles texture from content and color within the frequency domain, independently extracting texture from reference images while preserving original colors and semantics. This extracted texture is then seamlessly integrated into the content image within the diffusion model’s latent space during denoising. Comprehensive experimental results demonstrate FFTDiff’s effectiveness, highlighting its capability for realistic, aesthetically pleasing texture transfer without compromising the original semantic content or color integrity.
图像纹理传递是计算机视觉的关键,具有广泛的应用潜力。现有的方法通常是将颜色与纹理一起转移,缺乏固有的颜色保存,因此需要繁琐的两阶段过程:颜色对齐然后是样式转移。最近出现的扩散模型极大地推动了这一领域的发展;然而,目前基于扩散的方法通常需要额外的培训。为了解决这个问题,我们提出了FFTDiff,一种利用预训练扩散模型和快速傅里叶变换(FFT)的新型纹理传输模型,消除了额外的训练要求。FFTDiff在频域内将纹理从内容和颜色中分离出来,在保留原始颜色和语义的同时,独立地从参考图像中提取纹理。然后,在去噪期间,将提取的纹理无缝地集成到扩散模型的潜在空间内的内容图像中。综合实验结果证明了FFTDiff的有效性,突出了其在不损害原始语义内容或颜色完整性的情况下实现逼真,美观的纹理转移的能力。
{"title":"FFTDiff: Tuning-free image texture transfer based on diffusion model","authors":"Shilin Li,&nbsp;Hao Wang,&nbsp;Anna Zhu","doi":"10.1016/j.jvcir.2025.104681","DOIUrl":"10.1016/j.jvcir.2025.104681","url":null,"abstract":"<div><div>Image texture transfer is pivotal in computer vision, holding extensive application potential. Existing methods typically transfer color alongside texture, lacking inherent color preservation and thus requiring a cumbersome two-stage process: color alignment followed by style transfer. The recent emergence of diffusion models has significantly advanced this field; however, current diffusion-based approaches usually necessitate additional training. To address this, we propose FFTDiff, a novel texture transfer model leveraging pre-trained diffusion models and the Fast Fourier Transform (FFT), eliminating extra training requirements. FFTDiff disentangles texture from content and color within the frequency domain, independently extracting texture from reference images while preserving original colors and semantics. This extracted texture is then seamlessly integrated into the content image within the diffusion model’s latent space during denoising. Comprehensive experimental results demonstrate FFTDiff’s effectiveness, highlighting its capability for realistic, aesthetically pleasing texture transfer without compromising the original semantic content or color integrity.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104681"},"PeriodicalIF":3.1,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Infrared remote sensing small ship target detection method based on spatial-semantic enhancement and feature reconstruction neck 基于空间语义增强和特征重构颈部的红外遥感小型舰船目标检测方法
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-19 DOI: 10.1016/j.jvcir.2025.104683
Fenglin Man , Chaofeng Li , Tuxin Guan
This paper introduces an innovative infrared remote sensing small ship target detection network that integrates spatial-semantic enhancement with feature reconstruction neck. Spatial-semantic enhancement module is designed to selectively amplify diverse information within the backbone, effectively addressing the challenge of false alarms. It employs a parameter-free attention mechanism to improve the positional information of shallow feature maps and utilizes dynamic dilated convolution to capture contextual information from deep feature maps. Additionally, the feature reconstruction neck consolidates multi-scale feature maps across spatial and channel dimensions respectively, incorporating attention adjacent-layer concatenation to minimize missed detections. To further enhance detection capabilities, we have developed an infrared ship detection head that leverages the advantages of the DINO decoder while accounting for the size characteristics of infrared small targets. Experimental results from the public infrared ship detection dataset (ISDD) indicate that our approach surpasses some other state-of-the-art methods, demonstrating superior detection performance both qualitatively and quantitatively.
本文介绍了一种结合空间语义增强和特征重构颈部的新型红外遥感小型船舶目标检测网络。空间语义增强模块旨在选择性地放大主干网内的各种信息,有效地解决假警报的挑战。该算法采用无参数注意机制来改进浅层特征图的位置信息,并利用动态扩展卷积来捕获深层特征图的上下文信息。此外,特征重建颈部分别整合了跨空间和通道维度的多尺度特征映射,并结合了注意力邻接层连接,以最大限度地减少遗漏的检测。为了进一步提高探测能力,我们开发了一种红外船舶探测头,利用DINO解码器的优势,同时考虑到红外小目标的尺寸特征。来自公共红外船舶检测数据集(ISDD)的实验结果表明,我们的方法超越了其他一些最先进的方法,在定性和定量上都展示了卓越的检测性能。
{"title":"Infrared remote sensing small ship target detection method based on spatial-semantic enhancement and feature reconstruction neck","authors":"Fenglin Man ,&nbsp;Chaofeng Li ,&nbsp;Tuxin Guan","doi":"10.1016/j.jvcir.2025.104683","DOIUrl":"10.1016/j.jvcir.2025.104683","url":null,"abstract":"<div><div>This paper introduces an innovative infrared remote sensing small ship target detection network that integrates spatial-semantic enhancement with feature reconstruction neck. Spatial-semantic enhancement module is designed to selectively amplify diverse information within the backbone, effectively addressing the challenge of false alarms. It employs a parameter-free attention mechanism to improve the positional information of shallow feature maps and utilizes dynamic dilated convolution to capture contextual information from deep feature maps. Additionally, the feature reconstruction neck consolidates multi-scale feature maps across spatial and channel dimensions respectively, incorporating attention adjacent-layer concatenation to minimize missed detections. To further enhance detection capabilities, we have developed an infrared ship detection head that leverages the advantages of the DINO decoder while accounting for the size characteristics of infrared small targets. Experimental results from the public infrared ship detection dataset (ISDD) indicate that our approach surpasses some other state-of-the-art methods, demonstrating superior detection performance both qualitatively and quantitatively.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104683"},"PeriodicalIF":3.1,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Visual Communication and Image Representation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1