Signal Processing-Image Communication最新文献_第6页

Learning content-aware feature fusion for guided depth map super-resolution 学习内容感知特征融合，实现引导式深度图超分辨率

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2024-05-09 DOI: 10.1016/j.image.2024.117140

Yifan Zuo , Hao Wang , Yaping Xu , Huimin Huang , Xiaoshui Huang , Xue Xia , Yuming Fang

RGB-D data including paired RGB color images and depth maps is widely used in downstream computer vision tasks. However, compared with the acquisition of high-resolution color images, the depth maps captured by consumer-level sensors are always in low resolution. Within decades of research, the most state-of-the-art (SOTA) methods of depth map super-resolution cannot adaptively tune the guidance fusion for all feature positions by channel-wise feature concatenation with spatially sharing convolutional kernels. This paper proposes JTFNet to resolve this issue, which simulates the traditional Joint Trilateral Filter (JTF). Specifically, a novel JTF block is introduced to adaptively tune the fusion pattern between the color features and the depth features for all feature positions. Moreover, based on the variant of JTF block whose target features and guidance features are in the cross-scale shape, the fusion for depth features is performed in a bi-directional way. Therefore, the error accumulation along scales can be effectively mitigated by iteratively HR feature guidance. Compared with the SOTA methods, the sufficient experiment is conducted on the mainstream synthetic datasets and real datasets, i.e., Middlebury, NYU and ToF-Mark, which shows remarkable improvement of our JTFNet.

RGB-D 数据包括成对的 RGB 彩色图像和深度图，被广泛应用于下游计算机视觉任务中。然而，与获取高分辨率彩色图像相比，消费级传感器捕获的深度图分辨率总是很低。在数十年的研究中，最先进的深度图超分辨率（SOTA）方法无法通过空间共享卷积核的信道特征串联来自适应地调整所有特征位置的引导融合。为解决这一问题，本文提出了模拟传统三边联合滤波器（JTF）的 JTFNet。具体来说，本文引入了一个新颖的 JTF 块，用于自适应调整所有特征位置的颜色特征与深度特征之间的融合模式。此外，基于目标特征和引导特征呈跨尺度形状的 JTF 块变体，深度特征的融合是以双向方式进行的。因此，通过迭代 HR 特征引导，可以有效减少沿尺度的误差累积。与 SOTA 方法相比，我们在主流合成数据集和真实数据集（即 Middlebury、NYU 和 ToF-Mark）上进行了充分的实验，结果表明我们的 JTFNet 有显著的改进。

{"title":"Learning content-aware feature fusion for guided depth map super-resolution","authors":"Yifan Zuo , Hao Wang , Yaping Xu , Huimin Huang , Xiaoshui Huang , Xue Xia , Yuming Fang","doi":"10.1016/j.image.2024.117140","DOIUrl":"https://doi.org/10.1016/j.image.2024.117140","url":null,"abstract":"<div><p>RGB-D data including paired RGB color images and depth maps is widely used in downstream computer vision tasks. However, compared with the acquisition of high-resolution color images, the depth maps captured by consumer-level sensors are always in low resolution. Within decades of research, the most state-of-the-art (SOTA) methods of depth map super-resolution cannot adaptively tune the guidance fusion for all feature positions by channel-wise feature concatenation with spatially sharing convolutional kernels. This paper proposes JTFNet to resolve this issue, which simulates the traditional Joint Trilateral Filter (JTF). Specifically, a novel JTF block is introduced to adaptively tune the fusion pattern between the color features and the depth features for all feature positions. Moreover, based on the variant of JTF block whose target features and guidance features are in the cross-scale shape, the fusion for depth features is performed in a bi-directional way. Therefore, the error accumulation along scales can be effectively mitigated by iteratively HR feature guidance. Compared with the SOTA methods, the sufficient experiment is conducted on the mainstream synthetic datasets and real datasets, <em>i.e.,</em> Middlebury, NYU and ToF-Mark, which shows remarkable improvement of our JTFNet.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"126 ","pages":"Article 117140"},"PeriodicalIF":3.5,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140914295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DCID: A divide and conquer approach to solving the trade-off problem between artifacts caused by enhancement procedure in image downscaling DCID：用分而治之的方法解决图像缩放中增强程序造成的伪影之间的权衡问题

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2024-05-08 DOI: 10.1016/j.image.2024.117133

Eun Su Kang, Yeon Jeong Chae, Jae Hyeon Park, Sung In Cho

Conventional research on image downscaling is conducted to improve the visual quality of the resultant downscaled image. However, there is an intractable problem, a trade-off relationship between artifacts such as aliasing and ringing, caused by enhancement procedure in image downscaling. To solve this problem, we propose a novel method that applies a divide-and-conquer approach for image downscaling (DCID). Specifically, the proposed DCID includes Weight-Net for dividing regions into enhancement first and artifact-least first regions and two generators that are optimized for divided regions to conquer the trade-off problem in the image downscaling task. The proposed method can generate a downscaled image without creating artifacts while preserving the perceptual quality of the input image. In objective and subjective evaluations, our experimental results show that the quality of the downscaled images generated by the proposed DCID is significantly better than benchmark methods.

传统的图像降频研究是为了提高降频后图像的视觉质量。然而，存在一个难以解决的问题，即图像降频过程中的增强程序所导致的混叠和振铃等伪影之间的权衡关系。为了解决这个问题，我们提出了一种应用分而治之法进行图像缩放的新方法（DCID）。具体来说，所提出的 DCID 包括用于将区域划分为增强优先区域和伪影最小优先区域的 Weight-Net，以及两个针对划分区域进行优化的生成器，以解决图像降尺度任务中的权衡问题。所提出的方法可以生成不产生伪像的降频图像，同时保持输入图像的感知质量。在客观和主观评估中，我们的实验结果表明，拟议的 DCID 生成的降尺度图像的质量明显优于基准方法。

引用次数: 0

Deep neural network based distortion parameter estimation for blind quality measurement of stereoscopic images 基于深度神经网络的失真参数估计，用于盲测立体图像质量

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2024-05-04 DOI: 10.1016/j.image.2024.117138

Yi Zhang , Damon M. Chandler , Xuanqin Mou

Stereoscopic/3D image quality measurement (SIQM) has emerged as an active and important research branch in image processing/computer vision field. Existing methods for blind/no-reference SIQM often train machine-learning models on degraded stereoscopic images for which human subjective quality ratings have been obtained, and they are thus constrained by the fact that only a limited number of 3D image quality datasets currently exist. Although methods have been proposed to overcome this restriction by predicting distortion parameters rather than quality scores, the approach is still limited to the time-consuming, hand-crafted features extracted to train the corresponding classification/regression models as well as the rather complicated binocular fusion/rivalry models used to predict the cyclopean view. In this paper, we explore the use of deep learning to predict distortion parameters, giving rise to a more efficient opinion-unaware SIQM technique. Specifically, a deep fusion-and-excitation network which takes into account the multiple-distortion interactions is proposed to perform distortion parameter estimation, thus avoiding hand-crafted features by using convolution layers while simultaneously accelerating the algorithm by using the GPU. Moreover, we measure distortion parameter values of the cyclopean view by using support vector regression models which are trained on the data obtained from a newly-designed subjective test. In this way, the potential errors in computing the disparity map and cyclopean view can be prevented, leading to a more rapid and precise 3D-vision distortion parameter estimation. Experimental results tested on various 3D image quality datasets demonstrate that our proposed method, in most cases, offers improved predictive performance over existing state-of-the-art methods.

立体/三维图像质量测量（SIQM）已成为图像处理/计算机视觉领域一个活跃而重要的研究分支。现有的盲/无参考 SIQM 方法通常是在已获得人类主观质量评分的降级立体图像上训练机器学习模型，因此这些方法受到目前仅有数量有限的三维图像质量数据集这一事实的限制。虽然有人提出了通过预测失真参数而不是质量分数来克服这一限制的方法，但这种方法仍然局限于为训练相应的分类/回归模型而提取的耗时的手工制作特征，以及用于预测环视视图的相当复杂的双眼融合/竞争模型。在本文中，我们将探索使用深度学习来预测失真参数，从而产生一种更高效的无舆情感知 SIQM 技术。具体来说，我们提出了一种考虑到多重失真相互作用的深度融合与激励网络来执行失真参数估计，从而避免使用卷积层手工创建特征，同时利用 GPU 加速算法。此外，我们还使用支持向量回归模型来测量环形视图的失真参数值，这些模型是根据从新设计的主观测试中获得的数据进行训练的。通过这种方法，可以避免在计算差距图和环视图时可能出现的误差，从而实现更快速、更精确的三维视觉失真参数估计。在各种三维图像质量数据集上测试的实验结果表明，与现有的先进方法相比，我们提出的方法在大多数情况下都能提高预测性能。

{"title":"Deep neural network based distortion parameter estimation for blind quality measurement of stereoscopic images","authors":"Yi Zhang , Damon M. Chandler , Xuanqin Mou","doi":"10.1016/j.image.2024.117138","DOIUrl":"https://doi.org/10.1016/j.image.2024.117138","url":null,"abstract":"<div><p>Stereoscopic/3D image quality measurement (SIQM) has emerged as an active and important research branch in image processing/computer vision field. Existing methods for blind/no-reference SIQM often train machine-learning models on degraded stereoscopic images for which human subjective quality ratings have been obtained, and they are thus constrained by the fact that only a limited number of 3D image quality datasets currently exist. Although methods have been proposed to overcome this restriction by predicting distortion parameters rather than quality scores, the approach is still limited to the time-consuming, hand-crafted features extracted to train the corresponding classification/regression models as well as the rather complicated binocular fusion/rivalry models used to predict the cyclopean view. In this paper, we explore the use of deep learning to predict distortion parameters, giving rise to a more efficient opinion-unaware SIQM technique. Specifically, a deep fusion-and-excitation network which takes into account the multiple-distortion interactions is proposed to perform distortion parameter estimation, thus avoiding hand-crafted features by using convolution layers while simultaneously accelerating the algorithm by using the GPU. Moreover, we measure distortion parameter values of the cyclopean view by using support vector regression models which are trained on the data obtained from a newly-designed subjective test. In this way, the potential errors in computing the disparity map and cyclopean view can be prevented, leading to a more rapid and precise 3D-vision distortion parameter estimation. Experimental results tested on various 3D image quality datasets demonstrate that our proposed method, in most cases, offers improved predictive performance over existing state-of-the-art methods.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"126 ","pages":"Article 117138"},"PeriodicalIF":3.5,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140906615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CVEGAN: A perceptually-inspired GAN for Compressed Video Enhancement CVEGAN：用于压缩视频增强的感知启发式 GAN

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2024-05-03 DOI: 10.1016/j.image.2024.117127

Di Ma, Fan Zhang, David R. Bull

We propose a new Generative Adversarial Network for Compressed Video frame quality Enhancement (CVEGAN). The CVEGAN generator benefits from the use of a novel Mul²Res block (with multiple levels of residual learning branches), an enhanced residual non-local block (ERNB) and an enhanced convolutional block attention module (ECBAM). The ERNB has also been employed in the discriminator to improve the representational capability. The training strategy has also been re-designed specifically for video compression applications, to employ a relativistic sphere GAN (ReSphereGAN) training methodology together with new perceptual loss functions. The proposed network has been fully evaluated in the context of two typical video compression enhancement tools: post-processing (PP) and spatial resolution adaptation (SRA). CVEGAN has been fully integrated into the MPEG HEVC and VVC video coding test models (HM 16.20 and VTM 7.0) and experimental results demonstrate significant coding gains (up to 28% for PP and 38% for SRA compared to the anchor) over existing state-of-the-art architectures for both coding tools across multiple datasets based on the HM 16.20. The respective gains for VTM 7.0 are up to 8.0% for PP and up to 20.3% for SRA.

我们提出了一种新的压缩视频帧质量增强生成对抗网络（CVEGAN）。CVEGAN 生成器采用了新型 Mul2Res 块（具有多级残差学习分支）、增强型残差非本地块（ERNB）和增强型卷积块注意模块（ECBAM）。在判别器中也采用了 ERNB，以提高表征能力。此外，还专门针对视频压缩应用重新设计了训练策略，采用相对论球形 GAN（ReSphereGAN）训练方法和新的感知损失函数。在后处理（PP）和空间分辨率适配（SRA）这两种典型的视频压缩增强工具中，对所提出的网络进行了全面评估。CVEGAN 已完全集成到 MPEG HEVC 和 VVC 视频编码测试模型（HM 16.20 和 VTM 7.0）中，实验结果表明，在基于 HM 16.20 的多个数据集中，CVEGAN 在两种编码工具的编码方面都比现有的最先进架构有显著提高（与锚点相比，PP 提高 28%，SRA 提高 38%）。VTM 7.0 在 PP 和 SRA 方面的收益分别高达 8.0% 和 20.3%。

{"title":"CVEGAN: A perceptually-inspired GAN for Compressed Video Enhancement","authors":"Di Ma, Fan Zhang, David R. Bull","doi":"10.1016/j.image.2024.117127","DOIUrl":"https://doi.org/10.1016/j.image.2024.117127","url":null,"abstract":"<div><p>We propose a new Generative Adversarial Network for Compressed Video frame quality Enhancement (CVEGAN). The CVEGAN generator benefits from the use of a novel Mul<sup>2</sup>Res block (with multiple levels of residual learning branches), an enhanced residual non-local block (ERNB) and an enhanced convolutional block attention module (ECBAM). The ERNB has also been employed in the discriminator to improve the representational capability. The training strategy has also been re-designed specifically for video compression applications, to employ a relativistic sphere GAN (ReSphereGAN) training methodology together with new perceptual loss functions. The proposed network has been fully evaluated in the context of two typical video compression enhancement tools: post-processing (PP) and spatial resolution adaptation (SRA). CVEGAN has been fully integrated into the MPEG HEVC and VVC video coding test models (HM 16.20 and VTM 7.0) and experimental results demonstrate significant coding gains (up to 28% for PP and 38% for SRA compared to the anchor) over existing state-of-the-art architectures for both coding tools across multiple datasets based on the HM 16.20. The respective gains for VTM 7.0 are up to 8.0% for PP and up to 20.3% for SRA.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"127 ","pages":"Article 117127"},"PeriodicalIF":3.5,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0923596524000286/pdfft?md5=3b459f9525f84784af198f2f1adf008e&pid=1-s2.0-S0923596524000286-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141323046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

STCC-Filter: A space-time-content correlation-based noise filter with self-adjusting threshold for event camera STCC 过滤器：基于空间-时间-内容相关性的噪声滤波器，可自动调整事件摄像机的阈值

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2024-04-29 DOI: 10.1016/j.image.2024.117136

Mengjie Li , Yujie Huang , Mingyu Wang , Wenhong Li , Xiaoyang Zeng

Bio-inspired event cameras have become a new paradigm of image sensors detecting illumination changes asynchronously and independently for each pixel. However, their sensitivity to noise degrades the output quality. Most existing denoising methods based on spatiotemporal correlation deteriorate in low light conditions due to frequently bursting noise. To tackle this challenge and remove noise for neuromorphic cameras, this paper proposes space–time-content correlation (STCC) and a novel noise filter with self-adjusted threshold, STCC-Filter. In the proposed denoising algorithm, content correlation is modeled based on the brightness change patterns caused by moving objects. Furthermore, space–time and content support from a sequence of events within the range specified by the threshold which can be programmed based on the real application scenarios are fully utilized to improve the robustness and performance of denoising. STCC-Filter is evaluated on widely used datasets and our labeled synthesized datasets. The experimental results demonstrate that the proposed method outperforms traditional spatiotemporal-correlation-based methods in removing more noise and preserving more signals.

受生物启发的事件相机已成为图像传感器的一种新模式，它能异步、独立地检测每个像素的光照变化。然而，它们对噪声的敏感性降低了输出质量。现有的大多数基于时空相关性的去噪方法在低光照条件下会因频繁出现的突发噪声而恶化。为了应对这一挑战并为神经形态相机去除噪声，本文提出了时空-内容相关性（STCC）和具有自调节阈值的新型噪声滤波器 STCC-Filter。在所提出的去噪算法中，内容相关性是根据移动物体引起的亮度变化模式来建模的。此外，还充分利用了阈值指定范围内事件序列的时空和内容支持，该阈值可根据实际应用场景进行编程，以提高去噪的鲁棒性和性能。STCC-Filter 在广泛使用的数据集和我们标注的合成数据集上进行了评估。实验结果表明，所提出的方法在去除更多噪声和保留更多信号方面优于传统的基于时空相关性的方法。

{"title":"STCC-Filter: A space-time-content correlation-based noise filter with self-adjusting threshold for event camera","authors":"Mengjie Li , Yujie Huang , Mingyu Wang , Wenhong Li , Xiaoyang Zeng","doi":"10.1016/j.image.2024.117136","DOIUrl":"https://doi.org/10.1016/j.image.2024.117136","url":null,"abstract":"<div><p>Bio-inspired event cameras have become a new paradigm of image sensors detecting illumination changes asynchronously and independently for each pixel. However, their sensitivity to noise degrades the output quality. Most existing denoising methods based on spatiotemporal correlation deteriorate in low light conditions due to frequently bursting noise. To tackle this challenge and remove noise for neuromorphic cameras, this paper proposes space–time-content correlation (STCC) and a novel noise filter with self-adjusted threshold, STCC-Filter. In the proposed denoising algorithm, content correlation is modeled based on the brightness change patterns caused by moving objects. Furthermore, space–time and content support from a sequence of events within the range specified by the threshold which can be programmed based on the real application scenarios are fully utilized to improve the robustness and performance of denoising. STCC-Filter is evaluated on widely used datasets and our labeled synthesized datasets. The experimental results demonstrate that the proposed method outperforms traditional spatiotemporal-correlation-based methods in removing more noise and preserving more signals.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"126 ","pages":"Article 117136"},"PeriodicalIF":3.5,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140894883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Universal deep demosaicking for sparse color filter arrays 针对稀疏彩色滤波器阵列的通用深度去马赛克技术

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2024-04-29 DOI: 10.1016/j.image.2024.117135

Chenyan Bai , Wenxing Qiao , Jia Li

Sparse color filter array (CFA) is a potential alternative for the commonly used Bayer CFA, which uses only red (R), green (G), and blue (B) pixels. In sparse CFAs, most pixels are panchromatic (white) ones and only a small percentage of pixels are RGB pixels. Sparse CFAs have the motivation of human visual system and superior low-light photography performance. However, most of the associated demosaicking methods highly depend on synthetic images and are limited to a few specific CFAs. In this paper, we propose a universal demosaicking method for sparse CFAs. Our method has two sequential steps: W-channel recovery and RGB-channel reconstruction. More specifically, it first uses the W channel inpainting network (WCI-Net) to recover the W channel. The first layer of WCI-Net performs the scatter-weighted interpolation, which enables the network to work with various CFAs. Then it employs the differentiable guided filter to reconstruct the RGB channels with the reference of recovered W channel. The differentiable guided filter introduces a binary mask to specify the positions of RGB pixels. So it can handle arbitrary sparse CFAs. Also, it can be trained end-to-end and hence could obtain superior performance but do not overfit the synthetic images. Experiments on clean and noisy images confirm the advantage of the proposed demosaicking method.

稀疏滤色镜阵列（CFA）是常用的拜尔滤色镜阵列的潜在替代品，拜尔滤色镜阵列只使用红色（R）、绿色（G）和蓝色（B）像素。在稀疏彩色滤波阵列中，大部分像素是全色（白色）像素，只有一小部分像素是 RGB 像素。稀疏 CFA 具有人类视觉系统的动机和优越的弱光摄影性能。然而，大多数相关的去马赛克方法都高度依赖于合成图像，而且仅限于少数特定的 CFA。本文提出了一种适用于稀疏 CFA 的通用去马赛克方法。我们的方法有两个连续步骤：W 信道恢复和 RGB 信道重建。更具体地说，它首先使用 W 信道内画网络（WCI-Net）来恢复 W 信道。WCI-Net 的第一层执行散点加权插值，这使得网络可以使用各种 CFA。然后，它采用可微引导滤波器，以恢复的 W 信道为参考重建 RGB 信道。可微引导滤波器引入了二进制掩码来指定 RGB 像素的位置。因此，它可以处理任意稀疏的 CFA。此外，它还可以进行端到端训练，因此可以获得卓越的性能，但不会过度拟合合成图像。对干净图像和噪声图像的实验证实了所提出的去马赛克方法的优势。

{"title":"Universal deep demosaicking for sparse color filter arrays","authors":"Chenyan Bai , Wenxing Qiao , Jia Li","doi":"10.1016/j.image.2024.117135","DOIUrl":"https://doi.org/10.1016/j.image.2024.117135","url":null,"abstract":"<div><p>Sparse color filter array (CFA) is a potential alternative for the commonly used Bayer CFA, which uses only red (R), green (G), and blue (B) pixels. In sparse CFAs, most pixels are panchromatic (white) ones and only a small percentage of pixels are RGB pixels. Sparse CFAs have the motivation of human visual system and superior low-light photography performance. However, most of the associated demosaicking methods highly depend on synthetic images and are limited to a few specific CFAs. In this paper, we propose a universal demosaicking method for sparse CFAs. Our method has two sequential steps: W-channel recovery and RGB-channel reconstruction. More specifically, it first uses the W channel inpainting network (WCI-Net) to recover the W channel. The first layer of WCI-Net performs the scatter-weighted interpolation, which enables the network to work with various CFAs. Then it employs the differentiable guided filter to reconstruct the RGB channels with the reference of recovered W channel. The differentiable guided filter introduces a binary mask to specify the positions of RGB pixels. So it can handle arbitrary sparse CFAs. Also, it can be trained end-to-end and hence could obtain superior performance but do not overfit the synthetic images. Experiments on clean and noisy images confirm the advantage of the proposed demosaicking method.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"126 ","pages":"Article 117135"},"PeriodicalIF":3.5,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140825435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Contrastive learning for deep tone mapping operator 深度音调映射算子的对比学习

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2024-04-29 DOI: 10.1016/j.image.2024.117130

Di Li , Mou Wang , Susanto Rahardja

Most existing tone mapping operators (TMOs) are developed based on prior assumptions of human visual system, and they are known to be sensitive to hyperparameters. In this paper, we proposed a straightforward yet efficient framework to automatically learn the priors and perform tone mapping in an end-to-end manner. The proposed algorithm utilizes a contrastive learning framework to enforce the content consistency between high dynamic range (HDR) inputs and low dynamic range (LDR) outputs. Since contrastive learning aims at maximizing the mutual information across different domains, no paired images or labels are required in our algorithm. Equipped with an attention-based U-Net to alleviate the aliasing and halo artifacts, our algorithm can produce sharp and visually appealing images over various complex real-world scenes, indicating that the proposed algorithm can be used as a strong baseline for future HDR image tone mapping task. Extensive experiments as well as subjective evaluations demonstrated that the proposed algorithm outperforms the existing state-of-the-art algorithms qualitatively and quantitatively. The code is available at https://github.com/xslidi/CATMO.

大多数现有的色调映射算子（TMO）都是基于人类视觉系统的先验假设开发的，众所周知，它们对超参数很敏感。在本文中，我们提出了一个简单而高效的框架，用于自动学习先验，并以端到端的方式执行音调映射。所提出的算法利用对比学习框架，在高动态范围（HDR）输入和低动态范围（LDR）输出之间实现内容一致性。由于对比学习旨在最大化不同领域的互信息，因此我们的算法不需要配对图像或标签。我们的算法配备了基于注意力的 U-Net，可减轻混叠和光晕伪影，能在各种复杂的真实世界场景中生成清晰且具有视觉吸引力的图像，这表明所提出的算法可作为未来 HDR 图像色调映射任务的有力基准。广泛的实验和主观评价表明，所提出的算法在质量和数量上都优于现有的最先进算法。代码见 https://github.com/xslidi/CATMO。

{"title":"Contrastive learning for deep tone mapping operator","authors":"Di Li , Mou Wang , Susanto Rahardja","doi":"10.1016/j.image.2024.117130","DOIUrl":"https://doi.org/10.1016/j.image.2024.117130","url":null,"abstract":"<div><p>Most existing tone mapping operators (TMOs) are developed based on prior assumptions of human visual system, and they are known to be sensitive to hyperparameters. In this paper, we proposed a straightforward yet efficient framework to automatically learn the priors and perform tone mapping in an end-to-end manner. The proposed algorithm utilizes a contrastive learning framework to enforce the content consistency between high dynamic range (HDR) inputs and low dynamic range (LDR) outputs. Since contrastive learning aims at maximizing the mutual information across different domains, no paired images or labels are required in our algorithm. Equipped with an attention-based U-Net to alleviate the aliasing and halo artifacts, our algorithm can produce sharp and visually appealing images over various complex real-world scenes, indicating that the proposed algorithm can be used as a strong baseline for future HDR image tone mapping task. Extensive experiments as well as subjective evaluations demonstrated that the proposed algorithm outperforms the existing state-of-the-art algorithms qualitatively and quantitatively. The code is available at <span>https://github.com/xslidi/CATMO</span><svg><path></path></svg>.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"126 ","pages":"Article 117130"},"PeriodicalIF":3.5,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140894882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

U-ATSS: A lightweight and accurate one-stage underwater object detection network U-ATSS：轻量级、精确的单级水下物体探测网络

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2024-04-28 DOI: 10.1016/j.image.2024.117137

Junjun Wu, Jinpeng Chen, Qinghua Lu, Jiaxi Li, Ningwei Qin, Kaixuan Chen, Xilin Liu

Due to the harsh and unknown marine environment and the limited diving ability of human beings, underwater robots become an important role in ocean exploration and development. However, the performance of underwater robots is limited by blurred images, low contrast and color deviation, which are resulted from complex underwater imaging environments. The existing mainstream object detection networks perform poorly when applied directly to underwater tasks. Although using a cascaded detector network can get high accuracy, the inference speed is too slow to apply to actual tasks. To address the above problems, this paper proposes a lightweight and accurate one-stage underwater object detection network, called U-ATSS. Firstly, we compressed the backbone of ATSS to significantly reduce the number of network parameters and improve the inference speed without losing the detection accuracy, to achieve lightweight and real-time performance of the underwater object detection network. Then, we propose a plug-and-play receptive field module F-ASPP, which can obtain larger receptive fields and richer spatial information, and optimize the learning rate strategy as well as classification loss function to significantly improve the detection accuracy and convergence speed. We evaluated and compared U-ATSS with other methods on the Kesci Underwater Object Detection Algorithm Competition dataset containing a variety of marine organisms. The experimental results show that U-ATSS not only has obvious lightweight characteristics, but also shows excellent performance and competitiveness in terms of detection accuracy.

由于海洋环境的恶劣和未知，以及人类有限的潜水能力，水下机器人在海洋探索和开发中发挥着重要作用。然而，复杂的水下成像环境导致的图像模糊、对比度低和色彩偏差等问题限制了水下机器人的性能。现有的主流物体检测网络在直接应用于水下任务时表现不佳。使用级联检测器网络虽然可以获得较高的精度，但推理速度太慢，无法应用于实际任务。针对上述问题，本文提出了一种轻量级、高精度的单级水下物体检测网络，称为 U-ATSS。首先，我们压缩了 ATSS 的骨干网，在不损失检测精度的前提下大幅减少了网络参数数量，提高了推理速度，实现了水下物体检测网络的轻量化和实时性。然后，我们提出了即插即用的感受野模块 F-ASPP，它可以获得更大的感受野和更丰富的空间信息，并优化了学习率策略和分类损失函数，显著提高了检测精度和收敛速度。我们在包含多种海洋生物的 Kesci 水下物体检测算法竞赛数据集上对 U-ATSS 和其他方法进行了评估和比较。实验结果表明，U-ATSS 不仅具有明显的轻量级特征，而且在检测精度方面也表现出优异的性能和竞争力。

{"title":"U-ATSS: A lightweight and accurate one-stage underwater object detection network","authors":"Junjun Wu, Jinpeng Chen, Qinghua Lu, Jiaxi Li, Ningwei Qin, Kaixuan Chen, Xilin Liu","doi":"10.1016/j.image.2024.117137","DOIUrl":"https://doi.org/10.1016/j.image.2024.117137","url":null,"abstract":"<div><p>Due to the harsh and unknown marine environment and the limited diving ability of human beings, underwater robots become an important role in ocean exploration and development. However, the performance of underwater robots is limited by blurred images, low contrast and color deviation, which are resulted from complex underwater imaging environments. The existing mainstream object detection networks perform poorly when applied directly to underwater tasks. Although using a cascaded detector network can get high accuracy, the inference speed is too slow to apply to actual tasks. To address the above problems, this paper proposes a lightweight and accurate one-stage underwater object detection network, called U-ATSS. Firstly, we compressed the backbone of ATSS to significantly reduce the number of network parameters and improve the inference speed without losing the detection accuracy, to achieve lightweight and real-time performance of the underwater object detection network. Then, we propose a plug-and-play receptive field module F-ASPP, which can obtain larger receptive fields and richer spatial information, and optimize the learning rate strategy as well as classification loss function to significantly improve the detection accuracy and convergence speed. We evaluated and compared U-ATSS with other methods on the Kesci Underwater Object Detection Algorithm Competition dataset containing a variety of marine organisms. The experimental results show that U-ATSS not only has obvious lightweight characteristics, but also shows excellent performance and competitiveness in terms of detection accuracy.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"126 ","pages":"Article 117137"},"PeriodicalIF":3.5,"publicationDate":"2024-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140906614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Image splicing detection using low-dimensional feature vector of texture features and Haralick features based on Gray Level Co-occurrence Matrix 使用基于灰度级共现矩阵的纹理特征和哈拉里克特征的低维特征向量进行图像拼接检测

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2024-04-27 DOI: 10.1016/j.image.2024.117134

Debjit Das, Ruchira Naskar

Digital image forgery has become hugely widespread, as numerous easy-to-use, low-cost image manipulation tools have become widely available to the common masses. Such forged images can be used with various malicious intentions, such as to harm the social reputation of renowned personalities, to perform identity fraud resulting in financial disasters, and many more illegitimate activities. Image splicing is a form of image forgery where an adversary intelligently combines portions from multiple source images to generate a natural-looking artificial image. Detection of image splicing attacks poses an open challenge in the forensic domain, and in recent literature, several significant research findings on image splicing detection have been described. However, the number of features documented in such works is significantly huge. Our aim in this work is to address the issue of feature set optimization while modeling image splicing detection as a classification problem and preserving the forgery detection efficiency reported in the state-of-the-art. This paper proposes an image-splicing detection scheme based on textural features and Haralick features computed from the input image’s Gray Level Co-occurrence Matrix (GLCM) and also localizes the spliced regions in a detected spliced image. We have explored the well-known Columbia Image Splicing Detection Evaluation Dataset and the DSO-1 dataset, which is more challenging because of its constituent post-processed color images. Experimental results prove that our proposed model obtains 95% accuracy in image splicing detection with an AUC score of 0.99, with an optimized feature set of dimensionality of 15 only.

数字图像伪造已经变得非常普遍，因为普通大众已经可以广泛获得大量易于使用、成本低廉的图像处理工具。这些伪造图像可被用于各种恶意目的，如损害知名人士的社会声誉、进行身份欺诈造成经济损失，以及其他许多非法活动。图像拼接是图像伪造的一种形式，敌方通过智能方式将多个源图像中的部分组合在一起，生成看起来自然的人工图像。图像拼接攻击的检测是取证领域的一项公开挑战，在最近的文献中，已经介绍了一些关于图像拼接检测的重要研究成果。然而，这些研究中记录的特征数量非常庞大。我们的目标是在将图像拼接检测建模为分类问题的同时，解决特征集优化问题，并保持最先进的伪造检测效率。本文提出了一种基于输入图像灰度共现矩阵（GLCM）计算出的纹理特征和哈拉里克特征的图像拼接检测方案，并对检测到的拼接图像中的拼接区域进行定位。我们对著名的哥伦比亚图像拼接检测评估数据集和 DSO-1 数据集进行了探索，DSO-1 数据集更具挑战性，因为它包含了经过后处理的彩色图像。实验结果证明，我们提出的模型在图像拼接检测中的准确率达到 95%，AUC 得分为 0.99，而优化特征集的维数仅为 15。

{"title":"Image splicing detection using low-dimensional feature vector of texture features and Haralick features based on Gray Level Co-occurrence Matrix","authors":"Debjit Das, Ruchira Naskar","doi":"10.1016/j.image.2024.117134","DOIUrl":"https://doi.org/10.1016/j.image.2024.117134","url":null,"abstract":"<div><p><em>Digital image forgery</em> has become hugely widespread, as numerous easy-to-use, low-cost image manipulation tools have become widely available to the common masses. Such forged images can be used with various malicious intentions, such as to harm the social reputation of renowned personalities, to perform identity fraud resulting in financial disasters, and many more illegitimate activities. <em>Image splicing</em> is a form of image forgery where an adversary intelligently combines portions from multiple source images to generate a natural-looking artificial image. Detection of image splicing attacks poses an open challenge in the forensic domain, and in recent literature, several significant research findings on image splicing detection have been described. However, the number of features documented in such works is significantly huge. Our aim in this work is to address the issue of feature set optimization while modeling image splicing detection as a classification problem and preserving the forgery detection efficiency reported in the state-of-the-art. This paper proposes an image-splicing detection scheme based on textural features and Haralick features computed from the input image’s Gray Level Co-occurrence Matrix (GLCM) and also localizes the spliced regions in a detected spliced image. We have explored the well-known Columbia Image Splicing Detection Evaluation Dataset and the DSO-1 dataset, which is more challenging because of its constituent post-processed color images. Experimental results prove that our proposed model obtains 95% accuracy in image splicing detection with an AUC score of 0.99, with an optimized feature set of dimensionality of 15 only.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"125 ","pages":"Article 117134"},"PeriodicalIF":3.5,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140816438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A flow-based multi-scale learning network for single image stochastic super-resolution 基于流量的多尺度学习网络，用于单图像随机超分辨率

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2024-04-24 DOI: 10.1016/j.image.2024.117132

Qianyu Wu , Zhongqian Hu , Aichun Zhu , Hui Tang , Jiaxin Zou , Yan Xi , Yang Chen

Single image super-resolution (SISR) is still an important while challenging task. Existing methods usually ignore the diversity of generated Super-Resolution (SR) images. The fine details of the corresponding high-resolution (HR) images cannot be confidently recovered due to the degradation of detail in low-resolution (LR) images. To address the above issue, this paper presents a flow-based multi-scale learning network (FMLnet) to explore the diverse mapping spaces for SR. First, we propose a multi-scale learning block (MLB) to extract the underlying features of the LR image. Second, the introduced pixel-wise multi-head attention allows our model to map multiple representation subspaces simultaneously. Third, by employing a normalizing flow module for a given LR input, our approach generates various stochastic SR outputs with high visual quality. The trade-off between fidelity and perceptual quality can be controlled. Finally, the experimental results on five datasets demonstrate that the proposed network outperforms the existing methods in terms of diversity, and achieves competitive PSNR/SSIM results. Code is available at https://github.com/qianyuwu/FMLnet.

单幅图像超分辨率（SISR）仍然是一项重要而又具有挑战性的任务。现有方法通常会忽略生成的超分辨率（SR）图像的多样性。由于低分辨率（LR）图像的细节退化，相应的高分辨率（HR）图像的精细细节无法可靠地恢复。为解决上述问题，本文提出了一种基于流的多尺度学习网络（FMLnet）来探索 SR 的不同映射空间。首先，我们提出了一个多尺度学习块（MLB）来提取 LR 图像的底层特征。其次，引入的像素多头注意力使我们的模型能够同时映射多个表示子空间。第三，通过对给定的 LR 输入采用归一化流模块，我们的方法可以生成各种具有高视觉质量的随机 SR 输出。保真度和感知质量之间的权衡是可以控制的。最后，在五个数据集上的实验结果表明，所提出的网络在多样性方面优于现有方法，并取得了具有竞争力的 PSNR/SSIM 结果。代码见 https://github.com/qianyuwu/FMLnet。

{"title":"A flow-based multi-scale learning network for single image stochastic super-resolution","authors":"Qianyu Wu , Zhongqian Hu , Aichun Zhu , Hui Tang , Jiaxin Zou , Yan Xi , Yang Chen","doi":"10.1016/j.image.2024.117132","DOIUrl":"10.1016/j.image.2024.117132","url":null,"abstract":"<div><p>Single image super-resolution (SISR) is still an important while challenging task. Existing methods usually ignore the diversity of generated Super-Resolution (SR) images. The fine details of the corresponding high-resolution (HR) images cannot be confidently recovered due to the degradation of detail in low-resolution (LR) images. To address the above issue, this paper presents a flow-based multi-scale learning network (FMLnet) to explore the diverse mapping spaces for SR. First, we propose a multi-scale learning block (MLB) to extract the underlying features of the LR image. Second, the introduced pixel-wise multi-head attention allows our model to map multiple representation subspaces simultaneously. Third, by employing a normalizing flow module for a given LR input, our approach generates various stochastic SR outputs with high visual quality. The trade-off between fidelity and perceptual quality can be controlled. Finally, the experimental results on five datasets demonstrate that the proposed network outperforms the existing methods in terms of diversity, and achieves competitive PSNR/SSIM results. Code is available at <span>https://github.com/qianyuwu/FMLnet</span><svg><path></path></svg>.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"125 ","pages":"Article 117132"},"PeriodicalIF":3.5,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140760491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0