首页 > 最新文献

Journal of Visual Communication and Image Representation最新文献

英文 中文
A robust watermarking approach for medical image authentication using dual image and quorum function 使用双图像和法定人数函数的医疗图像认证鲁棒水印方法
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-01 DOI: 10.1016/j.jvcir.2024.104299
To safeguard the identity and copyright of a patient’s medical documents, watermarking strategies are widely used. This work provides a new dual image-based watermarking approach using the quorum function (QF) and AD interpolation technique. AD interpolation is used to create the dual images which helps to increase the embedding capacity. Moreover, the rules for using the QF are designed in such a way, that the original bits are least affected after embedding. As a result, it increases the visual quality of the stego images. A shared secret key has been employed to protect the information hidden in the medical image and to maintain the privacy and confidentiality. The experimental result using PSNR, SSIM, NCC, and EC shows that the suggested technique gives an average PSNR of 68.44 dB and SSIM is close to 0.99 after inserting 786432 watermark bits, which demonstrates the superiority of the scheme over other state-of-the-art schemes.
为了保护病人医疗文件的身份和版权,水印策略被广泛使用。这项研究利用法定函数(QF)和 AD 插值技术提供了一种新的基于双图像的水印方法。AD 插值用于创建双图像,有助于提高嵌入容量。此外,设计使用 QF 的规则时考虑到了嵌入后对原始比特的影响最小。因此,它提高了偷窃图像的视觉质量。共享密钥被用来保护隐藏在医学图像中的信息,并维护隐私和保密性。使用 PSNR、SSIM、NCC 和 EC 的实验结果表明,在插入 786432 位水印后,建议的技术的平均 PSNR 为 68.44 dB,SSIM 接近 0.99,这表明该方案优于其他最先进的方案。
{"title":"A robust watermarking approach for medical image authentication using dual image and quorum function","authors":"","doi":"10.1016/j.jvcir.2024.104299","DOIUrl":"10.1016/j.jvcir.2024.104299","url":null,"abstract":"<div><div>To safeguard the identity and copyright of a patient’s medical documents, watermarking strategies are widely used. This work provides a new dual image-based watermarking approach using the quorum function (QF) and AD interpolation technique. AD interpolation is used to create the dual images which helps to increase the embedding capacity. Moreover, the rules for using the QF are designed in such a way, that the original bits are least affected after embedding. As a result, it increases the visual quality of the stego images. A shared secret key has been employed to protect the information hidden in the medical image and to maintain the privacy and confidentiality. The experimental result using PSNR, SSIM, NCC, and EC shows that the suggested technique gives an average PSNR of 68.44 dB and SSIM is close to 0.99 after inserting 786432 watermark bits, which demonstrates the superiority of the scheme over other state-of-the-art schemes.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust text watermarking based on average skeleton mass of characters against cross-media attacks 基于字符平均骨架质量的鲁棒文本水印技术对抗跨媒体攻击
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-01 DOI: 10.1016/j.jvcir.2024.104300
The wide spread of digital documents makes it essential to protect intellectual property and information security. As a key method of digital copyright protection, robust document watermarking technology has attracted much attention in this context. With the rapid development of current electronic devices, the ways of document theft are no longer limited to copy and transmission. Due to the convenient and fast shooting operation of the camera on paper or screen, current text watermarking methods need to be robust to cope with cross-media transmission. To realize the corresponding robust text watermarking, a text watermarking scheme based on the average skeleton mass of characters is proposed in this paper, and the average skeleton mass of adjacent characters is used to represent the watermark information. In this paper, a watermarking scheme is designed to modify character pixels, which can modify glyphs without loss of transparency and provide high embedding capacity. Compared with the existing manually designed font-based text watermarking schemes, this scheme does not need to accurately segment characters, nor does it rely on stretching characters to the same size for matching, which reduces the need for character segmentation. In addition, the experimental results show that the proposed watermarking scheme can be robust to the information transmission modes including print-scan, print-camera and screen-camera.
数字文档的广泛传播使得保护知识产权和信息安全变得至关重要。在此背景下,作为数字版权保护的一种重要方法,强大的文档水印技术备受关注。随着当前电子设备的飞速发展,文件盗窃的方式不再局限于复制和传输。由于相机在纸张或屏幕上的拍摄操作方便快捷,目前的文本水印方法需要具有健壮性,以应对跨媒体传输。为了实现相应的鲁棒文本水印,本文提出了一种基于字符平均骨架质量的文本水印方案,用相邻字符的平均骨架质量来表示水印信息。本文设计了一种修改字符像素的水印方案,可以在不损失透明度的情况下修改字形,并提供较高的嵌入容量。与现有人工设计的基于字体的文本水印方案相比,该方案无需对字符进行精确分割,也不依赖于将字符拉伸到相同大小进行匹配,从而减少了对字符分割的需求。此外,实验结果表明,所提出的水印方案对打印-扫描、打印-摄像头和屏幕-摄像头等信息传输模式具有鲁棒性。
{"title":"Robust text watermarking based on average skeleton mass of characters against cross-media attacks","authors":"","doi":"10.1016/j.jvcir.2024.104300","DOIUrl":"10.1016/j.jvcir.2024.104300","url":null,"abstract":"<div><div>The wide spread of digital documents makes it essential to protect intellectual property and information security. As a key method of digital copyright protection, robust document watermarking technology has attracted much attention in this context. With the rapid development of current electronic devices, the ways of document theft are no longer limited to copy and transmission. Due to the convenient and fast shooting operation of the camera on paper or screen, current text watermarking methods need to be robust to cope with cross-media transmission. To realize the corresponding robust text watermarking, a text watermarking scheme based on the average skeleton mass of characters is proposed in this paper, and the average skeleton mass of adjacent characters is used to represent the watermark information. In this paper, a watermarking scheme is designed to modify character pixels, which can modify glyphs without loss of transparency and provide high embedding capacity. Compared with the existing manually designed font-based text watermarking schemes, this scheme does not need to accurately segment characters, nor does it rely on stretching characters to the same size for matching, which reduces the need for character segmentation. In addition, the experimental results show that the proposed watermarking scheme can be robust to the information transmission modes including print-scan, print-camera and screen-camera.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effective image compression using hybrid DCT and hybrid capsule auto encoder for brain MR images 使用混合 DCT 和混合胶囊自动编码器对脑部 MR 图像进行有效压缩
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-01 DOI: 10.1016/j.jvcir.2024.104296
Nowadays, image compression is gaining popularity in various fields because of its storage and transmission capability. This work aims to introduce a medical image (MI) compression model in brain magnetic resonance images (MRI) to mitigate issues in bandwidth and storage. Initially, pre-processing is done to neglect the noises in inputs using the Adaptive Linear Smoothing and Histogram Equalization (ALSHE) method. Then, the Region of Interest (ROI) and Non-ROI parts are separately segmented by the Optimized Fuzzy C-Means (OFCM) approach for reducing high complexity issues. Finally, a novel Hybrid Discrete Cosine Transform-Improved Zero Wavelet (DCT-IZW) is proposed for lossless compression and Hybrid Equilibrium Optimization-Capsule Auto Encoder (EO-CAE) for lossy compression. Then, the compressed ROI and Non-ROI images are added together, and the inverse operation of the compression process is performed to obtain the reconstructed image. This study used BRATS (2015, 2018) datasets for simulation and attained better performance than other existing methods.
如今,图像压缩因其存储和传输能力强而在各个领域越来越受欢迎。这项工作旨在引入脑磁共振图像(MRI)中的医学图像压缩模型,以缓解带宽和存储问题。首先,使用自适应线性平滑和直方图均衡(ALSHE)方法进行预处理,以忽略输入中的噪声。然后,使用优化模糊 C-Means (OFCM) 方法分别分割感兴趣区域 (ROI) 和非感兴趣区域 (ROI) 部分,以减少高复杂性问题。最后,提出了用于无损压缩的新型混合离散余弦变换-改进零小波(DCT-IZW)和用于有损压缩的混合平衡优化-胶囊自动编码器(EO-CAE)。然后,将压缩后的 ROI 和非 ROI 图像相加,并对压缩过程进行逆运算,得到重建图像。该研究使用 BRATS(2015、2018)数据集进行模拟,取得了比其他现有方法更好的性能。
{"title":"Effective image compression using hybrid DCT and hybrid capsule auto encoder for brain MR images","authors":"","doi":"10.1016/j.jvcir.2024.104296","DOIUrl":"10.1016/j.jvcir.2024.104296","url":null,"abstract":"<div><div>Nowadays, image compression is gaining popularity in various fields because of its storage and transmission capability. This work aims to introduce a medical image (MI) compression model in brain magnetic resonance images (MRI) to mitigate issues in bandwidth and storage. Initially, pre-processing is done to neglect the noises in inputs using the Adaptive Linear Smoothing and Histogram Equalization (ALSHE) method. Then, the Region of Interest (ROI) and Non-ROI parts are separately segmented by the Optimized Fuzzy C-Means (OFCM) approach for reducing high complexity issues. Finally, a novel Hybrid Discrete Cosine Transform-Improved Zero Wavelet (DCT-IZW) is proposed for lossless compression and Hybrid Equilibrium Optimization-Capsule Auto Encoder (EO-CAE) for lossy compression. Then, the compressed ROI and Non-ROI images are added together, and the inverse operation of the compression process is performed to obtain the reconstructed image. This study used BRATS (2015, 2018) datasets for simulation and attained better performance than other existing methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diving deep into human action recognition in aerial videos: A survey 深入研究航拍视频中的人类动作识别:调查
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-23 DOI: 10.1016/j.jvcir.2024.104298
Human Action Recognition from Unmanned Aerial Vehicles is a dynamic research domain with significant benefits in scale, mobility, deployment, and covert observation. This paper offers a comprehensive review of state-of-the-art algorithms for human action recognition and provides a novel taxonomy that categorizes the reviewed methods into two broad categories: Localization based and Globalization based. These categories are defined by how actions are segmented from visual data and how their spatial and temporal structures are modeled. We examine these techniques, highlighting their strengths and limitations, and provide essential background on human action recognition, including fundamental concepts and challenges in aerial videos. Additionally, we discuss existing datasets, enabling a comparative analysis. This survey identifies gaps and suggests future research directions, serving as a catalyst for advancing human action recognition in aerial videos. To our knowledge, this is the first detailed review of this kind.
无人飞行器的人类动作识别是一个充满活力的研究领域,在规模、机动性、部署和隐蔽观察方面具有显著优势。本文对最先进的人类动作识别算法进行了全面评述,并提供了一种新颖的分类法,将所评述的方法分为两大类:基于本地化和基于全球化。这些类别是根据如何从视觉数据中分割动作以及如何对动作的空间和时间结构进行建模来定义的。我们研究了这些技术,强调了它们的优势和局限性,并介绍了人类动作识别的基本背景,包括基本概念和航空视频中的挑战。此外,我们还讨论了现有的数据集,以便进行比较分析。这项调查找出了差距,并提出了未来的研究方向,为推进航空视频中的人类动作识别起到了催化剂的作用。据我们所知,这是第一篇此类详细综述。
{"title":"Diving deep into human action recognition in aerial videos: A survey","authors":"","doi":"10.1016/j.jvcir.2024.104298","DOIUrl":"10.1016/j.jvcir.2024.104298","url":null,"abstract":"<div><div>Human Action Recognition from Unmanned Aerial Vehicles is a dynamic research domain with significant benefits in scale, mobility, deployment, and covert observation. This paper offers a comprehensive review of state-of-the-art algorithms for human action recognition and provides a novel taxonomy that categorizes the reviewed methods into two broad categories: Localization based and Globalization based. These categories are defined by how actions are segmented from visual data and how their spatial and temporal structures are modeled. We examine these techniques, highlighting their strengths and limitations, and provide essential background on human action recognition, including fundamental concepts and challenges in aerial videos. Additionally, we discuss existing datasets, enabling a comparative analysis. This survey identifies gaps and suggests future research directions, serving as a catalyst for advancing human action recognition in aerial videos. To our knowledge, this is the first detailed review of this kind.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142318711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Zero-CSC: Low-light image enhancement with zero-reference color self-calibration Zero-CSC:利用零参考色彩自校准功能增强低照度图像效果
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-20 DOI: 10.1016/j.jvcir.2024.104293

Zero-Reference Low-Light Image Enhancement (LLIE) techniques mainly focus on grey-scale inhomogeneities, and few methods consider how to explicitly recover a dark scene to achieve enhancements in color and overall illumination. In this paper, we introduce a novel Zero-Reference Color Self-Calibration framework for enhancing low-light images, termed as Zero-CSC. It effectively emphasizes channel-wise representations that contain fine-grained color information, achieving a natural result in a progressive manner. Furthermore, we propose a Light Up (LU) module with large-kernel convolutional blocks to improve overall illumination, which is implemented with a simple U-Net and further simplified with a light-weight structure. Experiments on representative datasets show that our model consistently achieves state-of-the-art performance in image signal-to-noise ratio, structural similarity, and color accuracy, setting new records on the challenging SICE dataset with improvements of 23.7% in image signal-to-noise ratio and 5.3% in structural similarity compared to the most advanced methods.

零参考低照度图像增强(LLIE)技术主要关注灰度不均匀性,很少有方法考虑如何明确恢复暗场景以实现色彩和整体照度的增强。在本文中,我们介绍了一种用于增强低照度图像的新型零参考色彩自校准框架,称为 Zero-CSC。它有效地强调了包含细粒度色彩信息的信道表示,以渐进的方式实现了自然的效果。此外,我们还提出了一个带有大核卷积块的亮度提升(LU)模块,以改善整体照度,该模块通过简单的 U-Net 实现,并通过轻量级结构进一步简化。在具有代表性的数据集上进行的实验表明,我们的模型在图像信噪比、结构相似性和色彩准确性方面始终保持着最先进的性能,在具有挑战性的 SICE 数据集上创造了新的记录,与最先进的方法相比,图像信噪比提高了 23.7%,结构相似性提高了 5.3%。
{"title":"Zero-CSC: Low-light image enhancement with zero-reference color self-calibration","authors":"","doi":"10.1016/j.jvcir.2024.104293","DOIUrl":"10.1016/j.jvcir.2024.104293","url":null,"abstract":"<div><p>Zero-Reference Low-Light Image Enhancement (LLIE) techniques mainly focus on grey-scale inhomogeneities, and few methods consider how to explicitly recover a dark scene to achieve enhancements in color and overall illumination. In this paper, we introduce a novel Zero-Reference Color Self-Calibration framework for enhancing low-light images, termed as Zero-CSC. It effectively emphasizes channel-wise representations that contain fine-grained color information, achieving a natural result in a progressive manner. Furthermore, we propose a Light Up (LU) module with large-kernel convolutional blocks to improve overall illumination, which is implemented with a simple U-Net and further simplified with a light-weight structure. Experiments on representative datasets show that our model consistently achieves state-of-the-art performance in image signal-to-noise ratio, structural similarity, and color accuracy, setting new records on the challenging SICE dataset with improvements of 23.7% in image signal-to-noise ratio and 5.3% in structural similarity compared to the most advanced methods.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142271204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
M-YOLOv8s: An improved small target detection algorithm for UAV aerial photography M-YOLOv8s:用于无人机航空摄影的改进型小目标检测算法
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-18 DOI: 10.1016/j.jvcir.2024.104289
The object of UAV target detection usually means small target with complicated backgrounds. In this paper, an object detection model M-YOLOv8s based on UAV aerial photography scene is proposed. Firstly, to solve the problem that the YOLOv8s model cannot adapt to small target detection, a small target detection head (STDH) module is introduced to fuse the location and appearance feature information of the shallow layers of the backbone network. Secondly, Inner-Wise intersection over union (Inner-WIoU) is designed as the boundary box regression loss, and auxiliary boundary calculation is used to accelerate the regression speed of the model. Thirdly, the structure of multi-scale feature pyramid network (MS-FPN) can effectively combine the shallow network information with the deep network information and improve the performance of the detection model. Furthermore, a multi-scale cross-spatial attention (MCSA) module is proposed to expand the feature space through multi-scale branch, and then achieves the aggregation of target features through cross-spatial interaction, which improves the ability of the model to extract target features. Finally, the experimental results show that our model does not only possess fewer parameters, but also the values of mAP0.5 are 6.6% and 5.4% higher than the baseline model on the Visdrone2019 validation dataset and test dataset, respectively. Then, as a conclusion, the M-YOLOv8s model achieves better detection performance than some existing ones, indicating that our proposed method can be more suitable for detecting the small targets.
无人机目标检测的对象通常是背景复杂的小型目标。本文提出了一种基于无人机航拍场景的目标检测模型 M-YOLOv8s。首先,为了解决 YOLOv8s 模型无法适应小目标检测的问题,引入了小目标检测头(STDH)模块,将骨干网络浅层的位置和外观特征信息进行融合。其次,设计了Inner-Wise intersection over union(Inner-WIoU)作为边界框回归损耗,并采用辅助边界计算加快模型回归速度。第三,多尺度特征金字塔网络(MS-FPN)结构能有效地将浅层网络信息与深层网络信息相结合,提高检测模型的性能。此外,还提出了多尺度跨空间注意(MCSA)模块,通过多尺度分支扩展特征空间,进而通过跨空间交互实现目标特征的聚合,提高了模型提取目标特征的能力。最后,实验结果表明,我们的模型不仅拥有更少的参数,而且在 Visdrone2019 验证数据集和测试数据集上,mAP0.5 的值分别比基线模型高出 6.6% 和 5.4%。综上所述,M-YOLOv8s 模型的检测性能优于现有的一些模型,表明我们提出的方法更适合检测小型目标。
{"title":"M-YOLOv8s: An improved small target detection algorithm for UAV aerial photography","authors":"","doi":"10.1016/j.jvcir.2024.104289","DOIUrl":"10.1016/j.jvcir.2024.104289","url":null,"abstract":"<div><div>The object of UAV target detection usually means small target with complicated backgrounds. In this paper, an object detection model M-YOLOv8s based on UAV aerial photography scene is proposed. Firstly, to solve the problem that the YOLOv8s model cannot adapt to small target detection, a small target detection head (STDH) module is introduced to fuse the location and appearance feature information of the shallow layers of the backbone network. Secondly, Inner-Wise intersection over union (Inner-WIoU) is designed as the boundary box regression loss, and auxiliary boundary calculation is used to accelerate the regression speed of the model. Thirdly, the structure of multi-scale feature pyramid network (MS-FPN) can effectively combine the shallow network information with the deep network information and improve the performance of the detection model. Furthermore, a multi-scale cross-spatial attention (MCSA) module is proposed to expand the feature space through multi-scale branch, and then achieves the aggregation of target features through cross-spatial interaction, which improves the ability of the model to extract target features. Finally, the experimental results show that our model does not only possess fewer parameters, but also the values of mAP<sub>0.5</sub> are 6.6% and 5.4% higher than the baseline model on the Visdrone2019 validation dataset and test dataset, respectively. Then, as a conclusion, the M-YOLOv8s model achieves better detection performance than some existing ones, indicating that our proposed method can be more suitable for detecting the small targets.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142315099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-complexity content-aware encoding optimization of batch video 批量视频的低复杂度内容感知编码优化
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-17 DOI: 10.1016/j.jvcir.2024.104295

With the proliferation of short-form video traffic, video service providers are faced with the challenge of balancing video quality and bandwidth consumption while processing massive volumes of videos. The most straightforward and simplistic approach is to set uniformly encoding parameters to all videos. However, such an approach fails to consider the differences in video content, and there may be alternative encoding parameter configuration approach that can improve global coding efficiency. Finding the optimal combination of encoding parameter configurations for a batch of videos requires an amount of redundant encoding, thereby introducing significant computational costs. To address this issue, we propose a low-complexity encoding parameter prediction model that can adaptively adjust the values of the encoding parameters based on video content. The experiments show that when only changing the value of the encoding parameter CRF, our prediction model can achieve 27.04%, 6.11%, and 15.92% bit saving in terms of PSNR, SSIM, and VMAF respectively, while maintaining an acceptable complexity compared to the approach using the same CRF value.

随着短视频流量的激增,视频服务提供商面临着如何在处理大量视频的同时平衡视频质量和带宽消耗的挑战。最简单直接的方法是为所有视频设置统一的编码参数。然而,这种方法没有考虑到视频内容的差异,可能有其他编码参数配置方法可以提高全局编码效率。为一批视频找到编码参数配置的最佳组合需要大量的冗余编码,从而带来了巨大的计算成本。为了解决这个问题,我们提出了一种低复杂度的编码参数预测模型,它可以根据视频内容自适应地调整编码参数值。实验表明,与使用相同 CRF 值的方法相比,仅改变编码参数 CRF 的值,我们的预测模型就能在 PSNR、SSIM 和 VMAF 方面分别节省 27.04%、6.11% 和 15.92% 的比特,同时保持可接受的复杂度。
{"title":"Low-complexity content-aware encoding optimization of batch video","authors":"","doi":"10.1016/j.jvcir.2024.104295","DOIUrl":"10.1016/j.jvcir.2024.104295","url":null,"abstract":"<div><p>With the proliferation of short-form video traffic, video service providers are faced with the challenge of balancing video quality and bandwidth consumption while processing massive volumes of videos. The most straightforward and simplistic approach is to set uniformly encoding parameters to all videos. However, such an approach fails to consider the differences in video content, and there may be alternative encoding parameter configuration approach that can improve global coding efficiency. Finding the optimal combination of encoding parameter configurations for a batch of videos requires an amount of redundant encoding, thereby introducing significant computational costs. To address this issue, we propose a low-complexity encoding parameter prediction model that can adaptively adjust the values of the encoding parameters based on video content. The experiments show that when only changing the value of the encoding parameter CRF, our prediction model can achieve 27.04%, 6.11%, and 15.92% bit saving in terms of PSNR, SSIM, and VMAF respectively, while maintaining an acceptable complexity compared to the approach using the same CRF value.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142239487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging occupancy map to accelerate video-based point cloud compression 利用占用图加速基于视频的点云压缩
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-16 DOI: 10.1016/j.jvcir.2024.104292

Video-based Point Cloud Compression enables point cloud streaming over the internet by converting dynamic 3D point clouds to 2D geometry and attribute videos, which are then compressed using 2D video codecs like H.266/VVC. However, the complex encoding process of H.266/VVC, such as the quadtree with nested multi-type tree (QTMT) partition, greatly hinders the practical application of V-PCC. To address this issue, we propose a fast CU partition method dedicated to V-PCC to accelerate the coding process. Specifically, we classify coding units (CUs) of projected images into three categories based on the occupancy map of a point cloud: unoccupied, partially occupied, and fully occupied. Subsequently, we employ either statistic-based rules or machine-learning models to manage the partition of each category. For unoccupied CUs, we terminate the partition directly; for partially occupied CUs with explicit directions, we selectively skip certain partition candidates; for the remaining CUs (partially occupied CUs with complex directions and fully occupied CUs), we train an edge-driven LightGBM model to predict the partition probability of each partition candidate automatically. Only partitions with high probabilities are retained for further Rate–Distortion (R–D) decisions. Comprehensive experiments demonstrate the superior performance of our proposed method: under the V-PCC common test conditions, our method reduces encoding time by 52% and 44% in geometry and attribute, respectively, while incurring only 0.68% (0.66%) BD-Rate loss in D1 (D2) measurements and 0.79% (luma) BD-Rate loss in attribute, significantly surpassing state-of-the-art works.

基于视频的点云压缩技术通过将动态三维点云转换为二维几何图形和属性视频,然后使用 H.266/VVC 等二维视频编解码器对其进行压缩,从而在互联网上实现点云流媒体传输。然而,H.266/VVC 复杂的编码过程,如四叉树嵌套多类型树(QTMT)分区,极大地阻碍了 V-PCC 的实际应用。为解决这一问题,我们提出了一种专用于 V-PCC 的快速 CU 分区方法,以加快编码过程。具体来说,我们根据点云的占用图将投影图像的编码单元(CU)分为三类:未占用、部分占用和完全占用。随后,我们采用基于统计的规则或机器学习模型来管理每个类别的分区。对于未被占用的 CU,我们直接终止分区;对于有明确方向的部分被占用的 CU,我们选择性地跳过某些候选分区;对于其余的 CU(有复杂方向的部分被占用的 CU 和完全被占用的 CU),我们训练一个边缘驱动的 LightGBM 模型来自动预测每个候选分区的分区概率。只有高概率的分区才会被保留下来,以便进一步做出速率失真(R-D)决策。综合实验证明了我们所提方法的优越性能:在 V-PCC 通用测试条件下,我们的方法在几何和属性方面分别缩短了 52% 和 44% 的编码时间,而在 D1 (D2) 测量中仅造成 0.68% (0.66%) 的 BD-Rate 损失,在属性方面造成 0.79% (luma) 的 BD-Rate 损失,大大超过了最先进的作品。
{"title":"Leveraging occupancy map to accelerate video-based point cloud compression","authors":"","doi":"10.1016/j.jvcir.2024.104292","DOIUrl":"10.1016/j.jvcir.2024.104292","url":null,"abstract":"<div><p>Video-based Point Cloud Compression enables point cloud streaming over the internet by converting dynamic 3D point clouds to 2D geometry and attribute videos, which are then compressed using 2D video codecs like H.266/VVC. However, the complex encoding process of H.266/VVC, such as the quadtree with nested multi-type tree (QTMT) partition, greatly hinders the practical application of V-PCC. To address this issue, we propose a fast CU partition method dedicated to V-PCC to accelerate the coding process. Specifically, we classify coding units (CUs) of projected images into three categories based on the occupancy map of a point cloud: unoccupied, partially occupied, and fully occupied. Subsequently, we employ either statistic-based rules or machine-learning models to manage the partition of each category. For unoccupied CUs, we terminate the partition directly; for partially occupied CUs with explicit directions, we selectively skip certain partition candidates; for the remaining CUs (partially occupied CUs with complex directions and fully occupied CUs), we train an edge-driven LightGBM model to predict the partition probability of each partition candidate automatically. Only partitions with high probabilities are retained for further Rate–Distortion (R–D) decisions. Comprehensive experiments demonstrate the superior performance of our proposed method: under the V-PCC common test conditions, our method reduces encoding time by 52% and 44% in geometry and attribute, respectively, while incurring only 0.68% (0.66%) BD-Rate loss in D1 (D2) measurements and 0.79% (luma) BD-Rate loss in attribute, significantly surpassing state-of-the-art works.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142239471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SR4KVQA: Video quality assessment database and metric for 4K super-resolution SR4KVQA:用于 4K 超分辨率的视频质量评估数据库和衡量标准
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-14 DOI: 10.1016/j.jvcir.2024.104290

The quality assessment for 4K super-resolution (SR) videos can be conducive to the optimization of video SR algorithms. To improve the subjective and objective consistency of the SR quality assessment, a 4K video database and a blind metric are proposed in this paper. In the database SR4KVQA, there are 30 4K pristine videos, from which 600 SR 4K distorted videos with mean opinion score (MOS) labels are generated by three classic interpolation methods, six SR algorithms based on the deep neural network (DNN), and two SR algorithms based on the generative adversarial network (GAN). The benchmark experiment of the proposed database indicates that video quality assessment (VQA) of the 4K SR videos is challenging for the existing metrics. Among those metrics, the Video-Swin-Transformer backbone demonstrates tremendous potential in the VQA task. Accordingly, a blind VQA metric based on the Video-Swin-Transformer backbone is established, where the normalized loss function and optimized spatio-temporal sampling strategy are applied. The experiment result manifests that the Pearson linear correlation coefficient (PLCC) and Spearman rank-order correlation coefficient (SROCC) of the proposed metric reach 0.8011 and 0.8275 respectively on the SR4KVQA database, which outperforms or competes with the state-of-the-art VQA metrics. The database and the code proposed in this paper are available in the GitHub repository, https://github.com/AlexReadyNico/SR4KVQA.

4K 超分辨率(SR)视频的质量评估有助于视频 SR 算法的优化。为了提高 SR 质量评估的主客观一致性,本文提出了一个 4K 视频数据库和一个盲指标。在 SR4KVQA 数据库中,有 30 个 4K 原始视频,并通过三种经典插值方法、六种基于深度神经网络(DNN)的 SR 算法和两种基于生成式对抗网络(GAN)的 SR 算法生成了 600 个带有平均意见分(MOS)标签的 SR 4K 失真视频。拟议数据库的基准实验表明,4K SR 视频的视频质量评估(VQA)对现有指标而言具有挑战性。在这些指标中,Video-Swin-Transformer 骨干指标在 VQA 任务中展现出巨大的潜力。因此,本文建立了基于视频-双赢-变换器骨干网的盲 VQA 指标,并应用了归一化损失函数和优化的时空采样策略。实验结果表明,在 SR4KVQA 数据库上,所提指标的皮尔逊线性相关系数(PLCC)和斯皮尔曼秩相关系数(SROCC)分别达到 0.8011 和 0.8275,优于或可与最先进的 VQA 指标竞争。本文提出的数据库和代码可从 GitHub 存储库 https://github.com/AlexReadyNico/SR4KVQA 获取。
{"title":"SR4KVQA: Video quality assessment database and metric for 4K super-resolution","authors":"","doi":"10.1016/j.jvcir.2024.104290","DOIUrl":"10.1016/j.jvcir.2024.104290","url":null,"abstract":"<div><p>The quality assessment for 4K super-resolution (SR) videos can be conducive to the optimization of video SR algorithms. To improve the subjective and objective consistency of the SR quality assessment, a 4K video database and a blind metric are proposed in this paper. In the database SR4KVQA, there are 30 4K pristine videos, from which 600 SR 4K distorted videos with mean opinion score (MOS) labels are generated by three classic interpolation methods, six SR algorithms based on the deep neural network (DNN), and two SR algorithms based on the generative adversarial network (GAN). The benchmark experiment of the proposed database indicates that video quality assessment (VQA) of the 4K SR videos is challenging for the existing metrics. Among those metrics, the Video-Swin-Transformer backbone demonstrates tremendous potential in the VQA task. Accordingly, a blind VQA metric based on the Video-Swin-Transformer backbone is established, where the normalized loss function and optimized spatio-temporal sampling strategy are applied. The experiment result manifests that the Pearson linear correlation coefficient (PLCC) and Spearman rank-order correlation coefficient (SROCC) of the proposed metric reach 0.8011 and 0.8275 respectively on the SR4KVQA database, which outperforms or competes with the state-of-the-art VQA metrics. The database and the code proposed in this paper are available in the GitHub repository, <span><span>https://github.com/AlexReadyNico/SR4KVQA</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142239470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data compensation and feature fusion for sketch based person retrieval 基于素描的人物检索的数据补偿和特征融合
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-12 DOI: 10.1016/j.jvcir.2024.104287
Sketch re-identification (Re-ID) aims to retrieve pedestrian photo in the gallery dataset by a query sketch drawn by professionals. The sketch Re-ID task has not been adequately studied because collecting such sketches is difficult and expensive. In addition, the significant modality difference between sketches and images makes extracting the discriminative feature information difficult. To address above issues, we introduce a novel sketch-style pedestrian dataset named Pseudo-Sketch dataset. Our proposed dataset maximizes the utilization of the existing person dataset resources and is freely available, thus effectively reducing the expenses associated with the training and deployment phases. Furthermore, to mitigate the modality gap between sketches and visible images, a cross-modal feature fusion network is proposed that incorporates information from each modality. Experiment results show that the proposed Pseudo-Sketch dataset can effectively complement the real sketch dataset, and the proposed network obtains competitive results than SOTA methods. The dataset will be released later.
素描再识别(Re-ID)旨在通过专业人员绘制的查询素描检索画廊数据集中的行人照片。素描再识别任务尚未得到充分研究,因为收集这类素描既困难又昂贵。此外,草图与图像之间存在显著的模态差异,这也给提取判别特征信息带来了困难。为解决上述问题,我们引入了一种名为 "伪草图 "的新型草图式行人数据集。我们提出的数据集最大限度地利用了现有的人物数据集资源,并且可以免费获取,从而有效降低了训练和部署阶段的相关费用。此外,为了缩小草图与可见图像之间的模态差距,我们还提出了一种跨模态特征融合网络,将每种模态的信息融合在一起。实验结果表明,所提出的 "伪草图 "数据集能有效补充真实草图数据集,而且所提出的网络比 SOTA 方法获得了更有竞争力的结果。该数据集将于稍后发布。
{"title":"Data compensation and feature fusion for sketch based person retrieval","authors":"","doi":"10.1016/j.jvcir.2024.104287","DOIUrl":"10.1016/j.jvcir.2024.104287","url":null,"abstract":"<div><div>Sketch re-identification (Re-ID) aims to retrieve pedestrian photo in the gallery dataset by a query sketch drawn by professionals. The sketch Re-ID task has not been adequately studied because collecting such sketches is difficult and expensive. In addition, the significant modality difference between sketches and images makes extracting the discriminative feature information difficult. To address above issues, we introduce a novel sketch-style pedestrian dataset named Pseudo-Sketch dataset. Our proposed dataset maximizes the utilization of the existing person dataset resources and is freely available, thus effectively reducing the expenses associated with the training and deployment phases. Furthermore, to mitigate the modality gap between sketches and visible images, a cross-modal feature fusion network is proposed that incorporates information from each modality. Experiment results show that the proposed Pseudo-Sketch dataset can effectively complement the real sketch dataset, and the proposed network obtains competitive results than SOTA methods. The dataset will be released later.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142312111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Visual Communication and Image Representation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1