首页 > 最新文献

Journal of Visual Communication and Image Representation最新文献

英文 中文
MixViT: Single image dehazing using Mixed Attention based Vision Transformer MixViT:使用基于混合注意力的视觉转换器的单图像去雾
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-10-24 DOI: 10.1016/j.jvcir.2025.104596
Banala Revanth, Manoj Kumar, Sanjay K. Dwivedi
In image dehazing, various vision transformer-based approaches have been applied, resulting in favorable outcomes. Nevertheless, these techniques require a significant amount of data for training. We have created a vision transformer that uses mixed attention to extract features at different levels for a given image. The proposed method, called MixViT, is a U-Net-based vision transformer that utilizes mixed attention for image dehazing. The MixViT architecture comprises an encoder and decoder with integrated skip connections. The MixViT model is trained using the I-Haze, O-Haze, NH-Haze and Dense-Haze datasets, employing the mean square error as the loss function. The proposed MixViT model has exceptional performance on I-Haze and O-Haze datasets, but shows moderate performance on NH-Haze and Dense-haze datasets. On average, the proposed method yields more favorable results in terms of complexity, as well as quantitative and visual outcomes, compared to the current state-of-the-art methods for image dehazing.
在图像去雾中,各种基于视觉变换的方法得到了应用,并取得了良好的效果。然而,这些技术需要大量的数据进行训练。我们已经创建了一个视觉转换器,它使用混合注意力来提取给定图像的不同层次的特征。所提出的方法称为MixViT,是一种基于u - net的视觉转换器,利用混合注意力进行图像去雾。MixViT架构包括一个编码器和一个解码器以及集成的跳过连接。MixViT模型使用I-Haze, O-Haze, NH-Haze和Dense-Haze数据集进行训练,采用均方误差作为损失函数。本文提出的MixViT模型在I-Haze和O-Haze数据集上表现优异,但在NH-Haze和Dense-haze数据集上表现中等。平均而言,与目前最先进的图像去雾方法相比,所提出的方法在复杂性以及定量和视觉结果方面产生了更有利的结果。
{"title":"MixViT: Single image dehazing using Mixed Attention based Vision Transformer","authors":"Banala Revanth,&nbsp;Manoj Kumar,&nbsp;Sanjay K. Dwivedi","doi":"10.1016/j.jvcir.2025.104596","DOIUrl":"10.1016/j.jvcir.2025.104596","url":null,"abstract":"<div><div>In image dehazing, various vision transformer-based approaches have been applied, resulting in favorable outcomes. Nevertheless, these techniques require a significant amount of data for training. We have created a vision transformer that uses mixed attention to extract features at different levels for a given image. The proposed method, called MixViT, is a U-Net-based vision transformer that utilizes mixed attention for image dehazing. The MixViT architecture comprises an encoder and decoder with integrated skip connections. The MixViT model is trained using the I-Haze, O-Haze, NH-Haze and Dense-Haze datasets, employing the mean square error as the loss function. The proposed MixViT model has exceptional performance on I-Haze and O-Haze datasets, but shows moderate performance on NH-Haze and Dense-haze datasets. On average, the proposed method yields more favorable results in terms of complexity, as well as quantitative and visual outcomes, compared to the current state-of-the-art methods for image dehazing.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"113 ","pages":"Article 104596"},"PeriodicalIF":3.1,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145365338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring efficient appearance prompts for light-weight object tracking 探索轻量级对象跟踪的有效外观提示
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-10-24 DOI: 10.1016/j.jvcir.2025.104608
Jinhui Wu , Wenkang Zhang , Wankou Yang
Recent advances in Transformer-based light-weight trackers have set new standards on various benchmarks due to their efficiency and effectiveness. However, despite these achievements, current light-weight trackers discard temporal modeling due to its complexity and inefficiency, which significantly limits their tracking performance in practical applications. To address this limitation, we propose EAPTrack, an efficient tracking model with appearance prompts. The key of EAPTrack lies in generating real-time appearance prompts to guide tracking while maintaining efficient inference, thus overcoming the limitations of static templates in adapting to changes. Unlike existing trackers with complex temporal modeling processes, EAPTrack employs a simple appearance prompt modulation module to generate appearance prompts with minimal computational overhead. Additionally, we design an efficient object encoder equipped with various acceleration mechanisms, which enhance efficiency by reducing the sequence length during feature extraction. Extensive experiments demonstrate the efficiency and effectiveness of our model. For example, on the GOT-10k benchmark, EAPTrack achieves 5.9% higher accuracy than the leading real-time trackers while maintaining comparable speeds at 156 FPS on GPU.
基于变压器的轻型跟踪器的最新进展由于其效率和有效性,为各种基准设定了新的标准。然而,尽管取得了这些成就,目前的轻量级跟踪器由于其复杂性和低效率而放弃了时间建模,这极大地限制了它们在实际应用中的跟踪性能。为了解决这一限制,我们提出了EAPTrack,一个具有外观提示的高效跟踪模型。EAPTrack的关键在于生成实时的外观提示来指导跟踪,同时保持高效的推理,从而克服了静态模板在适应变化方面的局限性。与现有具有复杂时间建模过程的跟踪器不同,EAPTrack采用简单的外观提示调制模块,以最小的计算开销生成外观提示。此外,我们设计了一个高效的目标编码器,配备了各种加速机制,通过减少特征提取过程中的序列长度来提高效率。大量实验证明了该模型的有效性和有效性。例如,在GOT-10k基准测试中,EAPTrack的精度比领先的实时跟踪器高5.9%,同时在GPU上保持156 FPS的相当速度。
{"title":"Exploring efficient appearance prompts for light-weight object tracking","authors":"Jinhui Wu ,&nbsp;Wenkang Zhang ,&nbsp;Wankou Yang","doi":"10.1016/j.jvcir.2025.104608","DOIUrl":"10.1016/j.jvcir.2025.104608","url":null,"abstract":"<div><div>Recent advances in Transformer-based light-weight trackers have set new standards on various benchmarks due to their efficiency and effectiveness. However, despite these achievements, current light-weight trackers discard temporal modeling due to its complexity and inefficiency, which significantly limits their tracking performance in practical applications. To address this limitation, we propose EAPTrack, an efficient tracking model with appearance prompts. The key of EAPTrack lies in generating real-time appearance prompts to guide tracking while maintaining efficient inference, thus overcoming the limitations of static templates in adapting to changes. Unlike existing trackers with complex temporal modeling processes, EAPTrack employs a simple appearance prompt modulation module to generate appearance prompts with minimal computational overhead. Additionally, we design an efficient object encoder equipped with various acceleration mechanisms, which enhance efficiency by reducing the sequence length during feature extraction. Extensive experiments demonstrate the efficiency and effectiveness of our model. For example, on the GOT-10k benchmark, EAPTrack achieves 5.9% higher accuracy than the leading real-time trackers while maintaining comparable speeds at 156 FPS on GPU.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"113 ","pages":"Article 104608"},"PeriodicalIF":3.1,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145418407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LoVCS: A local voxel center based descriptor for 3D object recognition LoVCS:基于局部体素中心的三维物体识别描述符
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-10-23 DOI: 10.1016/j.jvcir.2025.104621
Wuyong Tao , Xianghong Hua , Bufan Zhao , Dong Chen , Chong Wu , Danhua Min
3D object recognition remains an active research area in computer vision and graphics. Recognizing objects in cluttered scenes is challenging due to clutter and occlusion. Local feature descriptors (LFDs), known for their robustness to clutter and occlusion, are widely used for 3D object recognition. However, existing LFDs are often affected by noise and varying point density, leading to poor descriptor matching performance. To address this, we propose a new LFD in this paper. First, a novel weighting strategy is introduced, utilizing projection distances to calculate weights for neighboring points, thereby constructing a robust local reference frame (LRF). Next, a new feature attribution (i.e., local voxel center) is proposed to compute voxel values. These voxel values are concatenated to form the final feature descriptor. This feature attribution is resistant to noise and varying point density, enhancing the overall robustness of the LFD. Additionally, we design a 3D transformation estimation method to generate transformation hypotheses. This method ranks correspondences by distance ratio and traverses the top-ranked ones to compute transformations, reducing iterations and eliminating randomness while allowing predetermined iteration counts. Experiments demonstrate that the proposed LRF achieves high repeatability and the LFD exhibits excellent matching performance. The transformation estimation method is more accurate and computationally efficient. Overall, our 3D object recognition method achieves a high recognition rate. On three experimental datasets, it gets the recognition rates of 99.07%, 98.31% and 81.13%, respectively, surpassing the comparative methods. The code is available at: https://github.com/taowuyong?tab=repositories.
三维物体识别是计算机视觉和图形学领域的一个活跃研究领域。由于杂乱和遮挡,在杂乱的场景中识别物体是具有挑战性的。局部特征描述子(lfd)以其对杂波和遮挡的鲁棒性被广泛应用于三维目标识别。然而,现有的lfd经常受到噪声和点密度变化的影响,导致描述符匹配性能差。为了解决这个问题,本文提出了一种新的LFD。首先,引入一种新的加权策略,利用投影距离计算相邻点的权重,从而构建鲁棒局部参考帧(LRF);其次,提出了一种新的特征归属(即局部体素中心)来计算体素值。这些体素值被连接起来形成最终的特征描述符。这种特征属性可以抵抗噪声和变化的点密度,增强LFD的整体鲁棒性。此外,我们设计了一种三维变换估计方法来生成变换假设。该方法根据距离比对对应进行排序,遍历排名靠前的对应进行变换计算,减少了迭代,消除了随机性,同时允许预先确定迭代次数。实验表明,所提出的LRF具有较高的重复性,LFD具有良好的匹配性能。变换估计方法更精确,计算效率更高。总体而言,我们的三维物体识别方法实现了较高的识别率。在三个实验数据集上,该方法的识别率分别达到99.07%、98.31%和81.13%,均优于对比方法。代码可从https://github.com/taowuyong?tab=repositories获得。
{"title":"LoVCS: A local voxel center based descriptor for 3D object recognition","authors":"Wuyong Tao ,&nbsp;Xianghong Hua ,&nbsp;Bufan Zhao ,&nbsp;Dong Chen ,&nbsp;Chong Wu ,&nbsp;Danhua Min","doi":"10.1016/j.jvcir.2025.104621","DOIUrl":"10.1016/j.jvcir.2025.104621","url":null,"abstract":"<div><div>3D object recognition remains an active research area in computer vision and graphics. Recognizing objects in cluttered scenes is challenging due to clutter and occlusion. Local feature descriptors (LFDs), known for their robustness to clutter and occlusion, are widely used for 3D object recognition. However, existing LFDs are often affected by noise and varying point density, leading to poor descriptor matching performance. To address this, we propose a new LFD in this paper. First, a novel weighting strategy is introduced, utilizing projection distances to calculate weights for neighboring points, thereby constructing a robust local reference frame (LRF). Next, a new feature attribution (i.e., local voxel center) is proposed to compute voxel values. These voxel values are concatenated to form the final feature descriptor. This feature attribution is resistant to noise and varying point density, enhancing the overall robustness of the LFD. Additionally, we design a 3D transformation estimation method to generate transformation hypotheses. This method ranks correspondences by distance ratio and traverses the top-ranked ones to compute transformations, reducing iterations and eliminating randomness while allowing predetermined iteration counts. Experiments demonstrate that the proposed LRF achieves high repeatability and the LFD exhibits excellent matching performance. The transformation estimation method is more accurate and computationally efficient. Overall, our 3D object recognition method achieves a high recognition rate. On three experimental datasets, it gets the recognition rates of 99.07%, 98.31% and 81.13%, respectively, surpassing the comparative methods. The code is available at: <span><span>https://github.com/taowuyong?tab=repositories</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"113 ","pages":"Article 104621"},"PeriodicalIF":3.1,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145365339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Texture-aware fast mode decision and complexity allocation for VVC based point cloud compression 基于VVC的点云压缩纹理感知快速模式决策和复杂度分配
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-10-18 DOI: 10.1016/j.jvcir.2025.104610
Lewen Fan, Yun Zhang
Video-Based Point Cloud Compression (V-PCC) leverages Versatile Video Coding (VVC) to compress point clouds efficiently, yet suffers from high computational complexity that challenges real-time applications. To address this critical problem, we propose a texture-aware fast Coding Unit (CU) mode decision algorithm and a complexity allocation strategy for VVC-based V-PCC. By analyzing CU distributions and complexity characteristics, we introduce adaptive early termination thresholds that incorporate spatial, parent–child, and intra/inter CU correlations. Furthermore, we established a complexity allocation method by formulating and solving an optimization problem to determine relaxation factors for optimal complexity-efficiency trade-offs. Experimental results demonstrate that the proposed fast mode decision achieves an average of 33.89% and 44.59% complexity reductions compared to the anchor VVC-based V-PCC, which are better than those of the state-of-the-art fast mode decision schemes. Meanwhile, the average Bjónteggard Delta Bit Rate (BDBR) loss is 1.04% and 1.64%, which are negligible.
基于视频的点云压缩(V-PCC)利用通用视频编码(VVC)来高效压缩点云,但其计算复杂度较高,对实时应用构成了挑战。为了解决这一关键问题,我们提出了一种纹理感知的快速编码单元(CU)模式决策算法和基于vvc的V-PCC复杂度分配策略。通过分析CU分布和复杂性特征,我们引入了自适应早期终止阈值,该阈值包含空间、亲子和CU内部/之间的相关性。在此基础上,提出并求解了一个优化问题,建立了一种复杂性分配方法,以确定最优复杂性-效率权衡的松弛因子。实验结果表明,与基于锚点vvc的V-PCC相比,所提出的快速模式决策方案的复杂度平均降低了33.89%和44.59%,优于当前的快速模式决策方案。同时,Bjónteggard Delta比特率(BDBR)平均损耗为1.04%和1.64%,可以忽略不计。
{"title":"Texture-aware fast mode decision and complexity allocation for VVC based point cloud compression","authors":"Lewen Fan,&nbsp;Yun Zhang","doi":"10.1016/j.jvcir.2025.104610","DOIUrl":"10.1016/j.jvcir.2025.104610","url":null,"abstract":"<div><div>Video-Based Point Cloud Compression (V-PCC) leverages Versatile Video Coding (VVC) to compress point clouds efficiently, yet suffers from high computational complexity that challenges real-time applications. To address this critical problem, we propose a texture-aware fast Coding Unit (CU) mode decision algorithm and a complexity allocation strategy for VVC-based V-PCC. By analyzing CU distributions and complexity characteristics, we introduce adaptive early termination thresholds that incorporate spatial, parent–child, and intra/inter CU correlations. Furthermore, we established a complexity allocation method by formulating and solving an optimization problem to determine relaxation factors for optimal complexity-efficiency trade-offs. Experimental results demonstrate that the proposed fast mode decision achieves an average of 33.89% and 44.59% complexity reductions compared to the anchor VVC-based V-PCC, which are better than those of the state-of-the-art fast mode decision schemes. Meanwhile, the average Bjónteggard Delta Bit Rate (BDBR) loss is 1.04% and 1.64%, which are negligible.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"113 ","pages":"Article 104610"},"PeriodicalIF":3.1,"publicationDate":"2025-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145365333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SemMatcher: Semantic-aware feature matching with neighborhood consensus SemMatcher:基于邻域一致性的语义感知特征匹配
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-10-17 DOI: 10.1016/j.jvcir.2025.104611
Qimin Jiang , Xiaoyong Lu , Dong Liang , Songlin Du
Local feature matching is the core of many computer vision tasks. Current methods only consider the extracted points individually, disregarding the connections between keypoints and the scene information, making feature matching challenging in scenarios with rich changes in viewpoint and illumination. To address this problem, this paper proposes SemMatcher, a novel semantic-aware feature matching framework which combines scene information overlooked by keypoints. Specifically, co-visual regions are filtered out through semantic segmentation for more focused learning of subsequent attention mechanisms, which refer to obtaining regions of the same category in two images. In SemMatcher, we design a semantic-aware attention mechanism, which pays more attention to co-visual regions unlike conventional global learning, achieving a win–win situation in terms of efficiency and performance. Besides, to build connections between keypoints, we introduce a semantic-aware neighborhood consensus which incorporates neighborhood consensus into attentional aggregation and constructs contextualized neighborhood information. Extensive experiments on homography estimation, pose estimation and image matching demonstrate that the model is superior to other methods and yields outstanding performance improvements.
局部特征匹配是许多计算机视觉任务的核心。目前的方法只是单独考虑提取的点,忽略了关键点与场景信息之间的联系,使得在视点和光照变化丰富的场景下,特征匹配变得困难。为了解决这一问题,本文提出了一种新的语义感知特征匹配框架SemMatcher,该框架结合了被关键点忽略的场景信息。具体而言,通过语义分割过滤掉共视觉区域,以便对后续注意机制进行更集中的学习,即在两幅图像中获得相同类别的区域。在SemMatcher中,我们设计了一种语义感知的注意机制,与传统的全局学习不同,该机制更多地关注共同视觉区域,实现了效率和性能的双赢。此外,为了建立关键点之间的联系,我们引入了一种语义感知的邻域共识,该共识将邻域共识融入到注意聚合中,并构建了语境化的邻域信息。在单应性估计、姿态估计和图像匹配方面的大量实验表明,该模型优于其他方法,并取得了显著的性能改进。
{"title":"SemMatcher: Semantic-aware feature matching with neighborhood consensus","authors":"Qimin Jiang ,&nbsp;Xiaoyong Lu ,&nbsp;Dong Liang ,&nbsp;Songlin Du","doi":"10.1016/j.jvcir.2025.104611","DOIUrl":"10.1016/j.jvcir.2025.104611","url":null,"abstract":"<div><div>Local feature matching is the core of many computer vision tasks. Current methods only consider the extracted points individually, disregarding the connections between keypoints and the scene information, making feature matching challenging in scenarios with rich changes in viewpoint and illumination. To address this problem, this paper proposes SemMatcher, a novel semantic-aware feature matching framework which combines scene information overlooked by keypoints. Specifically, co-visual regions are filtered out through semantic segmentation for more focused learning of subsequent attention mechanisms, which refer to obtaining regions of the same category in two images. In SemMatcher, we design a semantic-aware attention mechanism, which pays more attention to co-visual regions unlike conventional global learning, achieving a win–win situation in terms of efficiency and performance. Besides, to build connections between keypoints, we introduce a semantic-aware neighborhood consensus which incorporates neighborhood consensus into attentional aggregation and constructs contextualized neighborhood information. Extensive experiments on homography estimation, pose estimation and image matching demonstrate that the model is superior to other methods and yields outstanding performance improvements.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"113 ","pages":"Article 104611"},"PeriodicalIF":3.1,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145365337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Edge detection-driven LightGBM for fast intra partition of H.266/VVC 边缘检测驱动的H.266/VVC快速内分割LightGBM
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-10-16 DOI: 10.1016/j.jvcir.2025.104606
Wenyu Wang , Jie Yao , Guosheng Yu , Xianlu Bian , Dandan Ding
To accelerate the quad-tree with nested multi-type tree (QTMT) partition process in H.266/VVC, we propose an edge detection-driven LightGBM method for fast coding unit (CU) partitioning. Unlike previous approaches using multiple binary classifiers—one for each partition candidate, our method reformulates the CU partition as a multi-class classification problem and employs LightGBM to resolve it. By extracting edge and gradient features that reflect texture complexity and direction, our method organizes a CU’s edges and gradient information as a feature vector, which is then fed into the LightGBM model to directly predict the probabilities of all candidate partitions. In this way, low-probability partitions can be efficiently skipped to reduce encoding time. Extensive experiments under the common test conditions of H.266/VVC demonstrate that our method achieves encoding time reductions of approximately 44% to 57% on VTM-15.0, with only 0.96% to 1.77% Bjøntegaard Delta Bitrate (BDBR) loss, significantly outperforming existing methods.
为了加快H.266/VVC中四叉树嵌套多类型树(QTMT)划分过程,提出了一种边缘检测驱动的快速编码单元(CU)划分的LightGBM方法。与以前使用多个二元分类器的方法(每个分区候选一个分类器)不同,我们的方法将CU分区重新定义为一个多类分类问题,并使用LightGBM来解决它。该方法通过提取反映纹理复杂性和方向的边缘和梯度特征,将CU的边缘和梯度信息组织为特征向量,然后将其输入LightGBM模型,直接预测所有候选分区的概率。通过这种方式,可以有效地跳过低概率分区以减少编码时间。在H.266/VVC的常见测试条件下进行的大量实验表明,我们的方法在VTM-15.0上实现了约44%至57%的编码时间减少,仅0.96%至1.77%的bj约整数δ比特率(BDBR)损失,显著优于现有方法。
{"title":"Edge detection-driven LightGBM for fast intra partition of H.266/VVC","authors":"Wenyu Wang ,&nbsp;Jie Yao ,&nbsp;Guosheng Yu ,&nbsp;Xianlu Bian ,&nbsp;Dandan Ding","doi":"10.1016/j.jvcir.2025.104606","DOIUrl":"10.1016/j.jvcir.2025.104606","url":null,"abstract":"<div><div>To accelerate the quad-tree with nested multi-type tree (QTMT) partition process in H.266/VVC, we propose an edge detection-driven LightGBM method for fast coding unit (CU) partitioning. Unlike previous approaches using multiple binary classifiers—one for each partition candidate, our method reformulates the CU partition as a multi-class classification problem and employs LightGBM to resolve it. By extracting edge and gradient features that reflect texture complexity and direction, our method organizes a CU’s edges and gradient information as a feature vector, which is then fed into the LightGBM model to directly predict the probabilities of all candidate partitions. In this way, low-probability partitions can be efficiently skipped to reduce encoding time. Extensive experiments under the common test conditions of H.266/VVC demonstrate that our method achieves encoding time reductions of approximately 44% to 57% on VTM-15.0, with only 0.96% to 1.77% Bjøntegaard Delta Bitrate (BDBR) loss, significantly outperforming existing methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"113 ","pages":"Article 104606"},"PeriodicalIF":3.1,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145365334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NeRF gets personal: Mask-NeRF for targeted scene elements reconstruction NeRF变得个性化:针对目标场景元素重建的Mask-NeRF
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-10-16 DOI: 10.1016/j.jvcir.2025.104609
Zhijie Li, Hong Long, Yunpeng Li, Changhua Li
Scene segmentation remains a challenge in neural view synthesis, particularly when reconstructing specific entities in complex environments. We propose Mask-NeRF, a framework that integrates the Segment Anything Model (SAM) with Neural Radiance Fields (NeRF) for instance-aware reconstruction. By generating entity-specific masks and pruning rays in non-target regions, Mask-NeRF improves rendering quality and efficiency. To address viewpoint and scale inconsistencies among masks, we introduce a Geometric Correction Module combining SIFT-based keypoint detection with FLANN-based feature matching for accurate alignment. A redesigned multi-scale positional encoding (L=10, N=4) further enhances spatial representation. Experiments show Mask-NeRF achieves an 8.87% improvement in rendering accuracy over NeRF while reducing computational cost, demonstrating strong potential for efficient reconstruction on resource-constrained platforms; nevertheless, real-time applicability requires further validation under broader conditions.
场景分割仍然是神经视图合成中的一个挑战,特别是在复杂环境中重建特定实体时。我们提出了Mask-NeRF,这是一个将分段任意模型(SAM)与神经辐射场(NeRF)相结合的框架,用于实例感知重建。通过在非目标区域生成实体特定遮罩和修剪光线,Mask-NeRF提高了渲染质量和效率。为了解决蒙版之间的视点和尺度不一致问题,我们引入了一种几何校正模块,将基于sift的关键点检测与基于flann的特征匹配相结合,以实现精确对齐。重新设计的多尺度位置编码(L=10, N=4)进一步增强了空间表征。实验表明,Mask-NeRF在降低计算成本的同时,渲染精度比NeRF提高了8.87%,显示出在资源受限的平台上进行高效重建的强大潜力;然而,实时适用性需要在更广泛的条件下进一步验证。
{"title":"NeRF gets personal: Mask-NeRF for targeted scene elements reconstruction","authors":"Zhijie Li,&nbsp;Hong Long,&nbsp;Yunpeng Li,&nbsp;Changhua Li","doi":"10.1016/j.jvcir.2025.104609","DOIUrl":"10.1016/j.jvcir.2025.104609","url":null,"abstract":"<div><div>Scene segmentation remains a challenge in neural view synthesis, particularly when reconstructing specific entities in complex environments. We propose <strong>Mask-NeRF</strong>, a framework that integrates the Segment Anything Model (SAM) with Neural Radiance Fields (NeRF) for instance-aware reconstruction. By generating entity-specific masks and pruning rays in non-target regions, Mask-NeRF improves rendering quality and efficiency. To address viewpoint and scale inconsistencies among masks, we introduce a Geometric Correction Module combining SIFT-based keypoint detection with FLANN-based feature matching for accurate alignment. A redesigned multi-scale positional encoding (<span><math><mrow><mi>L</mi><mo>=</mo><mn>10</mn></mrow></math></span>, <span><math><mrow><mi>N</mi><mo>=</mo><mn>4</mn></mrow></math></span>) further enhances spatial representation. Experiments show Mask-NeRF achieves an 8.87% improvement in rendering accuracy over NeRF while reducing computational cost, demonstrating strong potential for efficient reconstruction on resource-constrained platforms; nevertheless, real-time applicability requires further validation under broader conditions.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"113 ","pages":"Article 104609"},"PeriodicalIF":3.1,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145365336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A screen-shooting resilient watermarking based on Dual-Mode Convolution Block and dynamic learning strategy 基于双模卷积块和动态学习策略的截屏弹性水印
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-10-15 DOI: 10.1016/j.jvcir.2025.104597
Fei Peng , Shenghui Zhu , Min Long
To address the risk of image piracy caused by screen photography, this paper proposes an end-to-end robust watermarking scheme designed to resist screen-shooting attacks. It comprises an encoder, a noise layer, and a decoder. Specifically, the encoder and decoder are equipped with the DMCB structure, which combines dilated and standard convolutions to effectively enlarge the receptive field and enable the extraction of richer image features. Moreover, it selects optimal watermark embedding regions to ensure high imperceptibility while maintaining reliable extractability after screen-shooting. To further optimize the training process, a dynamic learning rate adjustment strategy is introduced to adaptively modify the learning rate based on a predefined schedule. This accelerates convergence, avoids local minima, and improves both the stability and accuracy of watermark extraction. Experimental results demonstrate its strong robustness under various shooting distances and angles, and the visual quality of images is preserved.
为了解决屏幕拍摄造成的图像盗版风险,本文提出了一种端到端的鲁棒水印方案,旨在抵御屏幕拍摄攻击。它包括编码器、噪声层和解码器。具体来说,编码器和解码器采用了DMCB结构,该结构结合了扩展卷积和标准卷积,有效地扩大了接收野,能够提取更丰富的图像特征。此外,该算法选择最优水印嵌入区域,在保证高隐蔽性的同时,还能在截屏后保持可靠的可提取性。为了进一步优化训练过程,引入了一种动态学习率调整策略,根据预定义的时间表自适应地修改学习率。这种方法加快了收敛速度,避免了局部极小值,提高了水印提取的稳定性和准确性。实验结果表明,该方法在不同的拍摄距离和角度下具有较强的鲁棒性,并能保持图像的视觉质量。
{"title":"A screen-shooting resilient watermarking based on Dual-Mode Convolution Block and dynamic learning strategy","authors":"Fei Peng ,&nbsp;Shenghui Zhu ,&nbsp;Min Long","doi":"10.1016/j.jvcir.2025.104597","DOIUrl":"10.1016/j.jvcir.2025.104597","url":null,"abstract":"<div><div>To address the risk of image piracy caused by screen photography, this paper proposes an end-to-end robust watermarking scheme designed to resist screen-shooting attacks. It comprises an encoder, a noise layer, and a decoder. Specifically, the encoder and decoder are equipped with the DMCB structure, which combines dilated and standard convolutions to effectively enlarge the receptive field and enable the extraction of richer image features. Moreover, it selects optimal watermark embedding regions to ensure high imperceptibility while maintaining reliable extractability after screen-shooting. To further optimize the training process, a dynamic learning rate adjustment strategy is introduced to adaptively modify the learning rate based on a predefined schedule. This accelerates convergence, avoids local minima, and improves both the stability and accuracy of watermark extraction. Experimental results demonstrate its strong robustness under various shooting distances and angles, and the visual quality of images is preserved.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"113 ","pages":"Article 104597"},"PeriodicalIF":3.1,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145365335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ALSA-UAD: Unsupervised anomaly detection on histopathology images using adversarial learning and simulated anomaly ALSA-UAD:使用对抗性学习和模拟异常对组织病理学图像进行无监督异常检测
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-10-15 DOI: 10.1016/j.jvcir.2025.104601
Yu-Chen Lai, Wei-Ta Chu
Automatic analyzing computational histopathology images has shown significant progress in aiding pathologists. However, developing a robust model based on supervised learning is challenging because of the scarcity of tumor-marked samples and unknown diseases. Unsupervised anomaly detection (UAD) methods that were mostly used in industrial inspection are thus proposed for histopathology images. UAD only requires normal samples for training and largely reduces the burden of labeling. This paper introduces a reconstruction-based UAD approach to improve representation learning based on adversarial learning and simulated anomalies. On the one hand, we mix up features extracted from normal images to build a smoother feature distribution and employ adversarial learning to enhance an autoencoder for image reconstruction. On the other hand, we simulate anomalous images by deformation and guide the autoencoder to catch the global characteristics of images well. We demonstrate its effectiveness on histopathology images and other medical image benchmarks and show state-of-the-art performance.
自动分析计算组织病理学图像在帮助病理学家方面取得了重大进展。然而,由于肿瘤标记样本和未知疾病的稀缺性,开发基于监督学习的鲁棒模型具有挑战性。无监督异常检测(UAD)方法主要用于工业检测,因此提出了组织病理学图像。UAD只需要正常样本进行训练,大大减少了标注的负担。本文介绍了一种基于重构的UAD方法,该方法基于对抗学习和模拟异常来改进表征学习。一方面,我们混合了从正常图像中提取的特征以构建更平滑的特征分布,并使用对抗学习来增强图像重建的自编码器。另一方面,我们通过变形来模拟异常图像,引导自编码器很好地捕捉图像的全局特征。我们展示了其在组织病理学图像和其他医学图像基准上的有效性,并展示了最先进的性能。
{"title":"ALSA-UAD: Unsupervised anomaly detection on histopathology images using adversarial learning and simulated anomaly","authors":"Yu-Chen Lai,&nbsp;Wei-Ta Chu","doi":"10.1016/j.jvcir.2025.104601","DOIUrl":"10.1016/j.jvcir.2025.104601","url":null,"abstract":"<div><div>Automatic analyzing computational histopathology images has shown significant progress in aiding pathologists. However, developing a robust model based on supervised learning is challenging because of the scarcity of tumor-marked samples and unknown diseases. Unsupervised anomaly detection (UAD) methods that were mostly used in industrial inspection are thus proposed for histopathology images. UAD only requires normal samples for training and largely reduces the burden of labeling. This paper introduces a reconstruction-based UAD approach to improve representation learning based on adversarial learning and simulated anomalies. On the one hand, we mix up features extracted from normal images to build a smoother feature distribution and employ adversarial learning to enhance an autoencoder for image reconstruction. On the other hand, we simulate anomalous images by deformation and guide the autoencoder to catch the global characteristics of images well. We demonstrate its effectiveness on histopathology images and other medical image benchmarks and show state-of-the-art performance.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"113 ","pages":"Article 104601"},"PeriodicalIF":3.1,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145326673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-branch interactive guided network based on gradient prior for image super-resolution 基于梯度先验的图像超分辨率双分支交互引导网络
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-10-14 DOI: 10.1016/j.jvcir.2025.104607
Ping Cao , Shuran Lin , Yanwu Yang , Chunjie Zhang
Deep learning-based image super-resolution methods have made significant progress. However, most of these methods attempt to improve super-resolution performance by using deeper and wider single-branch networks, while ignoring the gradient prior knowledge in images. Besides, it is extremely challenging to recover both of them simultaneously. To address these two issues, in this paper, we propose a Dual-branch Interactive Guided Network(DIGN) based on gradient prior, which not only focuses on restoring global information such as overall brightness and color distribution but also on preserving fine local details. It consists of two parallel branches, an image reconstruction branch which is responsible for restoring the HR image, and a gradient reconstruction branch that predicts the gradient map of the HR image. More importantly, we incorporate multiple bidirectional cross-attention modules between the two branches to guide each other. Experiments on five benchmark datasets demonstrate DIGN’s effectiveness.
基于深度学习的图像超分辨率方法取得了重大进展。然而,这些方法大多试图通过使用更深更广的单分支网络来提高超分辨率性能,而忽略了图像中的梯度先验知识。此外,同时恢复两者是极具挑战性的。为了解决这两个问题,本文提出了一种基于梯度先验的双分支交互引导网络(Dual-branch Interactive Guided Network, DIGN),该网络不仅注重恢复整体亮度和颜色分布等全局信息,而且注重保留局部细节。它由两个并行分支组成,一个是负责恢复HR图像的图像重建分支,另一个是预测HR图像梯度映射的梯度重建分支。更重要的是,我们在两个分支之间加入了多个双向交叉关注模块,以相互引导。在5个基准数据集上的实验证明了该算法的有效性。
{"title":"Dual-branch interactive guided network based on gradient prior for image super-resolution","authors":"Ping Cao ,&nbsp;Shuran Lin ,&nbsp;Yanwu Yang ,&nbsp;Chunjie Zhang","doi":"10.1016/j.jvcir.2025.104607","DOIUrl":"10.1016/j.jvcir.2025.104607","url":null,"abstract":"<div><div>Deep learning-based image super-resolution methods have made significant progress. However, most of these methods attempt to improve super-resolution performance by using deeper and wider single-branch networks, while ignoring the gradient prior knowledge in images. Besides, it is extremely challenging to recover both of them simultaneously. To address these two issues, in this paper, we propose a <strong>D</strong>ual-branch <strong>I</strong>nteractive <strong>G</strong>uided <strong>N</strong>etwork(DIGN) based on gradient prior, which not only focuses on restoring global information such as overall brightness and color distribution but also on preserving fine local details. It consists of two parallel branches, an image reconstruction branch which is responsible for restoring the HR image, and a gradient reconstruction branch that predicts the gradient map of the HR image. More importantly, we incorporate multiple bidirectional cross-attention modules between the two branches to guide each other. Experiments on five benchmark datasets demonstrate DIGN’s effectiveness.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"113 ","pages":"Article 104607"},"PeriodicalIF":3.1,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145326675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Visual Communication and Image Representation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1