Signal Processing-Image Communication最新文献_第10页

A coarse-to-fine multi-scale feature hybrid low-dose CT denoising network 一种由粗到细的多尺度特征混合低剂量CT去噪网络

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2023-10-01 DOI: 10.1016/j.image.2023.117009

Zefang Han , Hong Shangguan, Xiong Zhang, Xueying Cui, Yue Wang

With the growing development and wide clinical application of CT technology, the potential radiation damage to patients has sparked public concern. However, reducing the radiation dose may cause large amounts of noise and artifacts in the reconstructed images, which may affect the accuracy of the clinical diagnosis. Therefore, improving the quality of low-dose CT scans has become a popular research topic. Generative adversarial networks (GAN) have provided new research ideas for low-dose CT (LDCT) denoising. However, utilizing only image decomposition or adding new functional subnetworks cannot effectively fuse the same type of features with different scales (or different types of features). Thus, most current GAN-based denoising networks often suffer from low feature utilization and increased network complexity. To address these problems, we propose a coarse-to-fine multiscale feature hybrid low-dose CT denoising network (CMFHGAN). The generator consists of a global denoising module, local texture feature enhancement module, and self-calibration feature fusion module. The three modules complement each other and guarantee overall denoising performance. In addition, to further improve the denoising performance, we propose a multi-resolution inception discriminator with multiscale feature extraction ability. Experiments were performed on the Mayo and Piglet datasets, and the results showed that the proposed method outperformed the state-of-the-art denoising algorithms.

随着CT技术的不断发展和临床应用的广泛，其对患者的潜在辐射损伤引起了公众的关注。然而，降低辐射剂量可能会在重建图像中引起大量噪声和伪影，这可能会影响临床诊断的准确性。因此，提高低剂量CT扫描的质量已成为一个热门的研究课题。生成对抗性网络为低剂量CT去噪提供了新的研究思路。然而，仅利用图像分解或添加新的功能子网络不能有效地融合具有不同尺度（或不同类型特征）的同一类型特征。因此，当前大多数基于GAN的去噪网络往往存在特征利用率低和网络复杂性增加的问题。为了解决这些问题，我们提出了一种从粗到细的多尺度特征混合低剂量CT去噪网络（CMFHGAN）。生成器由全局去噪模块、局部纹理特征增强模块和自校准特征融合模块组成。这三个模块相辅相成，保证了整体去噪性能。此外，为了进一步提高去噪性能，我们提出了一种具有多尺度特征提取能力的多分辨率初始鉴别器。在Mayo和Piglet数据集上进行了实验，结果表明，所提出的方法优于最先进的去噪算法。

{"title":"A coarse-to-fine multi-scale feature hybrid low-dose CT denoising network","authors":"Zefang Han , Hong Shangguan, Xiong Zhang, Xueying Cui, Yue Wang","doi":"10.1016/j.image.2023.117009","DOIUrl":"https://doi.org/10.1016/j.image.2023.117009","url":null,"abstract":"<div><p><span><span>With the growing development and wide clinical application of CT technology, the potential radiation damage to patients has sparked public concern. However, reducing the radiation dose may cause large amounts of noise and artifacts in the reconstructed images, which may affect the accuracy of the clinical diagnosis. Therefore, improving the quality of low-dose CT scans has become a popular research topic. Generative adversarial networks (GAN) have provided new research ideas for low-dose CT (LDCT) denoising. However, utilizing only image decomposition or adding new functional </span>subnetworks<span> cannot effectively fuse the same type of features with different scales (or different types of features). Thus, most current GAN-based denoising networks often suffer from low feature utilization and increased network complexity. To address these problems, we propose a coarse-to-fine multiscale feature hybrid low-dose CT denoising network (CMFHGAN). The generator consists of a global denoising module, local texture feature enhancement module, and self-calibration </span></span>feature fusion<span> module. The three modules complement each other and guarantee overall denoising performance. In addition, to further improve the denoising performance, we propose a multi-resolution inception discriminator with multiscale feature extraction ability. Experiments were performed on the Mayo and Piglet datasets, and the results showed that the proposed method outperformed the state-of-the-art denoising algorithms.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117009"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49845014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RGB pixel n-grams: A texture descriptor RGB像素n-grams:纹理描述符

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2023-10-01 DOI: 10.1016/j.image.2023.117028

Fátima Belén Paiva Pavón , María Cristina Orué Gil , José Luis Vázquez Noguera , Helena Gómez-Adorno , Valentín Calzada-Ledesma

This article proposes the “RGB Pixel N-grams” descriptor, which uses a sequence of $n$ pixels to represent RGB color texture images. We conducted classification experiments with three different classifiers and five color texture image databases to evaluate the descriptor’s performance, using accuracy as the evaluation metric. These databases include various textures from different surfaces, sometimes under different lighting, scale, or rotation conditions. The proposed descriptor proved to be robust and competitive compared to other state-of-the-art descriptors, as it has better accuracy in classification results in most databases and classifiers.

本文提出了“RGB Pixel n -grams”描述符，它使用n个像素的序列来表示RGB彩色纹理图像。我们使用三种不同的分类器和五种颜色纹理图像数据库进行分类实验，以准确率为评价指标来评估描述符的性能。这些数据库包括来自不同表面的各种纹理，有时在不同的光照、比例或旋转条件下。与其他最先进的描述符相比，所提出的描述符具有鲁棒性和竞争力，因为它在大多数数据库和分类器中具有更好的分类结果准确性。

引用次数: 0

Dual attention guided multi-scale fusion network for RGB-D salient object detection 用于RGB-D显著目标检测的双注意力引导多尺度融合网络

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2023-10-01 DOI: 10.1016/j.image.2023.117004

Huan Gao, Jichang Guo, Yudong Wang, Jianan Dong

While recent research on salient object detection (SOD) has shown remarkable progress in leveraging both RGB and depth data, it is still worth exploring how to use the inherent relationship between the two to extract and fuse features more effectively, and further make more accurate predictions. In this paper, we consider combining the attention mechanism with the characteristics of the SOD, proposing the Dual Attention Guided Multi-scale Fusion Network. We design the multi-scale fusion block by combining multi-scale branches with channel attention to achieve better fusion of RGB and depth information. Using the characteristic of the SOD, the dual attention module is proposed to make the network pay more attention to the currently unpredicted saliency regions and the wrong parts in the already predicted regions. We perform an ablation study to verify the effectiveness of each component. Quantitative and qualitative experimental results demonstrate that our method achieves state-of-the-art (SOTA) performance.

尽管最近对显著对象检测（SOD）的研究在利用RGB和深度数据方面取得了显著进展，但如何利用两者之间的内在关系更有效地提取和融合特征，并进一步做出更准确的预测，仍然值得探索。本文考虑将注意力机制与超氧化物歧化酶的特点相结合，提出了双注意力引导的多尺度融合网络。我们通过将多尺度分支与通道注意力相结合来设计多尺度融合块，以实现RGB和深度信息的更好融合。利用超氧化物歧化酶的特性，提出了双注意力模块，使网络更加关注当前未预测的显著区域和已预测区域中的错误部分。我们进行了消融研究，以验证每个组件的有效性。定量和定性实验结果表明，我们的方法达到了最先进的（SOTA）性能。

引用次数: 0

Low-light image enhancement based on virtual exposure 基于虚拟曝光的微光图像增强

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2023-10-01 DOI: 10.1016/j.image.2023.117016

Wencheng Wang , Dongliang Yan , Xiaojin Wu , Weikai He , Zhenxue Chen , Xiaohui Yuan , Lun Li

Under poor illumination, the image information captured by a camera is partially lost, which seriously affects the visual perception of the human. Inspired by the idea that the fusion of multiexposure images can yield one high-quality image, an adaptive enhancement framework for a single low-light image is proposed based on the strategy of virtual exposure. In this framework, the exposure control parameters are adaptively generated through a statistical analysis of the low-light image, and a virtual exposure enhancer constructed by a quadratic function is applied to generate several image frames from a single input image. Then, on the basis of generating weight maps by three factors, i.e., contrast, saturation and saliency, the image sequences and weight images are transformed by a Laplacian pyramid and Gaussian pyramid, respectively, and multiscale fusion is implemented layer by layer. Finally, the enhanced result is obtained by pyramid reconstruction rule. Compared with the experimental results of several state-of-the-art methods on five datasets, the proposed method shows its superiority on several image quality evaluation metrics. This method requires neither image calibration nor camera response function estimation and has a more flexible application range. It can weaken the possibility of overenhancement, effectively avoid the appearance of a halo in the enhancement results, and adaptively improve the visual information fidelity.

在较差的光照条件下，摄像机捕捉到的图像信息会部分丢失，严重影响人的视觉感知。基于多曝光图像融合可获得高质量图像的思想，提出了一种基于虚拟曝光策略的单幅弱光图像自适应增强框架。该框架通过对低光图像进行统计分析，自适应生成曝光控制参数，并利用二次函数构造虚拟曝光增强器，从单个输入图像生成多个图像帧。然后，在对比度、饱和度和显著性三个因素生成权重图的基础上，分别用拉普拉斯金字塔和高斯金字塔对图像序列和权重图像进行变换，逐层实现多尺度融合;最后，利用金字塔重构规则得到增强结果。与几种最新方法在5个数据集上的实验结果进行比较，表明了该方法在多个图像质量评价指标上的优越性。该方法既不需要图像标定，也不需要估计相机响应函数，应用范围更加灵活。它可以减弱过度增强的可能性，有效避免增强结果中出现光晕，自适应地提高视觉信息保真度。

{"title":"Low-light image enhancement based on virtual exposure","authors":"Wencheng Wang , Dongliang Yan , Xiaojin Wu , Weikai He , Zhenxue Chen , Xiaohui Yuan , Lun Li","doi":"10.1016/j.image.2023.117016","DOIUrl":"https://doi.org/10.1016/j.image.2023.117016","url":null,"abstract":"<div><p>Under poor illumination, the image information captured by a camera is partially lost, which seriously affects the visual perception of the human. Inspired by the idea that the fusion of multiexposure images can yield one high-quality image, an adaptive enhancement framework for a single low-light image is proposed based on the strategy of virtual exposure. In this framework, the exposure control parameters are adaptively generated through a statistical analysis of the low-light image, and a virtual exposure enhancer constructed by a quadratic function<span><span><span> is applied to generate several image frames from a single input image. Then, on the basis of generating weight maps by three factors, i.e., contrast, saturation and saliency, the image sequences and weight images are transformed by a Laplacian pyramid<span> and Gaussian pyramid, respectively, and multiscale fusion is implemented layer by layer. Finally, the enhanced result is obtained by pyramid reconstruction rule. Compared with the experimental results of several state-of-the-art methods on five datasets, the proposed method shows its superiority on several image quality evaluation metrics. This method requires neither image calibration nor </span></span>camera response function estimation and has a more flexible application range. It can weaken the possibility of overenhancement, effectively avoid the appearance of a halo in the enhancement results, and adaptively improve the visual </span>information fidelity.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117016"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49881553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Determination of Lagrange multipliers for interframe EZBC/JP2K 帧间EZBC/JP2K拉格朗日乘子的确定

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2023-10-01 DOI: 10.1016/j.image.2023.117030

Yuan Liu , John W. Woods

Interframe EZBC/JP2K has been shown to be an effective fine-grain scalable video coding system. However, its Lagrange multiplier values for motion estimation of multiple temporal levels are not specified, and must be specified by the user in the config file in order to run the program. In this paper, we investigate how to select these Lagrange parameters for optimized performance. By designing an iterative mechanism, we make it possible for the encoder to adaptively select Lagrange multipliers based on the feedback of Y-PSNR closed GOP performance. Experimental results regarding both classic test video clips and their concatenations are obtained and discussed. We also present a new analytical model for optimized Lagrange multiplier selection in terms of target Y-PSNR.

帧间EZBC/JP2K已被证明是一种有效的细粒度可扩展视频编码系统。然而，它的拉格朗日乘子值用于多个时间水平的运动估计是没有指定的，并且必须由用户在配置文件中指定，以便运行程序。在本文中，我们研究了如何选择这些拉格朗日参数来优化性能。通过设计迭代机制，使编码器能够基于Y-PSNR闭合GOP性能的反馈自适应选择拉格朗日乘法器。给出了经典测试视频片段及其拼接的实验结果，并对其进行了讨论。我们还提出了一种新的基于目标Y-PSNR的拉格朗日乘子优化选择的解析模型。

引用次数: 0

Deep steerable pyramid wavelet network for unified JPEG compression artifact reduction 用于统一JPEG压缩伪影减少的深度可操纵金字塔小波网络

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2023-10-01 DOI: 10.1016/j.image.2023.117011

Yi Zhang , Damon M. Chandler , Xuanqin Mou

Although numerous methods have been proposed to remove blocking artifacts in JPEG-compressed images, one important issue not well addressed so far is the construction of a unified model that requires no prior knowledge of the JPEG encoding parameters to operate effectively on different compression-level images (grayscale/color) while occupying relatively small storage space to save and run. To address this issue, in this paper, we present a unified JPEG compression artifact reduction model called DSPW-Net, which employs (1) the deep steerable pyramid wavelet transform network for Y-channel restoration, and (2) the classic U-Net architecture for CbCr-channel restoration. To enable our model to work effectively on images with a wide range of compression levels, the quality factor (QF) related features extracted by the convolutional layers in the QF-estimation network are incorporated in the two restoration branches. Meanwhile, recursive blocks with shared parameters are utilized to drastically reduce model parameters and shared-source residual learning is employed to avoid the gradient vanishing/explosion problem in training. Extensive quantitative and qualitative results tested on various benchmark datasets demonstrate the effectiveness of our model as compared with other state-of-the-art deblocking methods.

尽管已经提出了许多方法来消除JPEG压缩图像中的阻塞伪像，但迄今为止没有很好地解决的一个重要问题是构建一个统一的模型，该模型不需要事先了解JPEG编码参数，就可以在不同压缩级别的图像(灰度/彩色)上有效地操作，同时占用相对较小的存储空间来保存和运行。为了解决这一问题，本文提出了一种统一的JPEG压缩伪迹减少模型DSPW-Net，该模型采用(1)深度可转向金字塔小波变换网络进行y通道恢复，(2)经典的U-Net结构进行cbcr通道恢复。为了使我们的模型能够有效地处理具有广泛压缩级别的图像，在QF估计网络中，将卷积层提取的质量因子(QF)相关特征合并到两个恢复分支中。同时，利用具有共享参数的递归块来大幅减少模型参数，并利用共享源残差学习来避免训练中的梯度消失/爆炸问题。在各种基准数据集上测试的大量定量和定性结果表明，与其他最先进的块化方法相比，我们的模型是有效的。

{"title":"Deep steerable pyramid wavelet network for unified JPEG compression artifact reduction","authors":"Yi Zhang , Damon M. Chandler , Xuanqin Mou","doi":"10.1016/j.image.2023.117011","DOIUrl":"https://doi.org/10.1016/j.image.2023.117011","url":null,"abstract":"<div><p><span>Although numerous methods have been proposed to remove blocking artifacts in JPEG-compressed images, one important issue not well addressed so far is the construction of a unified model that requires no prior knowledge of the JPEG encoding parameters to operate effectively on different compression-level images (grayscale/color) while occupying relatively small storage space to save and run. To address this issue, in this paper, we present a unified JPEG compression artifact<span> reduction model called DSPW-Net, which employs (1) the deep steerable pyramid wavelet transform network for Y-channel restoration, and (2) the classic U-Net architecture for CbCr-channel restoration. To enable our model to work effectively on images with a wide range of compression levels, the quality factor (QF) related features extracted by the </span></span>convolutional layers in the QF-estimation network are incorporated in the two restoration branches. Meanwhile, recursive blocks with shared parameters are utilized to drastically reduce model parameters and shared-source residual learning is employed to avoid the gradient vanishing/explosion problem in training. Extensive quantitative and qualitative results tested on various benchmark datasets demonstrate the effectiveness of our model as compared with other state-of-the-art deblocking methods.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117011"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49896187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Soccer line mark segmentation and classification with stochastic watershed transform 基于随机分水岭变换的足球标线分割与分类

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2023-10-01 DOI: 10.1016/j.image.2023.117014

Daniel Berjón, Carlos Cuevas, Narciso García

Augmented reality applications are beginning to change the way sports are broadcast, providing richer experiences and valuable insights to fans. The first step of augmented reality systems is camera calibration, possibly based on detecting the line markings of the playing field. Most existing proposals for line detection rely on edge detection and Hough transform, but radial distortion and extraneous edges cause inaccurate or spurious detections of line markings. We propose a novel strategy to automatically and accurately segment and classify line markings. First, line points are segmented thanks to a stochastic watershed transform that is robust to radial distortions, since it makes no assumptions about line straightness, and is unaffected by the presence of players or the ball. The line points are then linked to primitive structures (straight lines and ellipses) thanks to a very efficient procedure that makes no assumptions about the number of primitives that appear in each image. The strategy has been tested on a new and public database composed by 60 annotated images from matches in five stadiums. The results obtained have proven that the proposed strategy is more robust and accurate than existing approaches, achieving successful line mark detection even under challenging conditions.

增强现实应用程序正在开始改变体育节目的播放方式，为球迷提供更丰富的体验和有价值的见解。增强现实系统的第一步是相机校准，可能是基于检测比赛场地的标线。大多数现有的线检测方案都依赖于边缘检测和霍夫变换，但径向失真和无关边缘会导致对线标记的不准确或虚假检测。我们提出了一种新的策略来自动准确地分割和分类标线。首先，由于随机分水岭变换对径向失真具有鲁棒性，因此线点被分割，因为它不对直线度进行假设，并且不受球员或球的存在的影响。然后，线点被链接到基元结构（直线和椭圆），这要归功于一个非常有效的过程，该过程不对每个图像中出现的基元的数量进行假设。该策略已在一个新的公共数据库中进行了测试，该数据库由五个体育场比赛的60张注释图像组成。所获得的结果证明，所提出的策略比现有方法更稳健、更准确，即使在具有挑战性的条件下也能成功地检测线迹。

{"title":"Soccer line mark segmentation and classification with stochastic watershed transform","authors":"Daniel Berjón, Carlos Cuevas, Narciso García","doi":"10.1016/j.image.2023.117014","DOIUrl":"https://doi.org/10.1016/j.image.2023.117014","url":null,"abstract":"<div><p>Augmented reality applications are beginning to change the way sports are broadcast, providing richer experiences and valuable insights to fans. The first step of augmented reality systems is camera calibration, possibly based on detecting the line markings of the playing field. Most existing proposals for line detection rely on edge detection and Hough transform, but radial distortion and extraneous edges cause inaccurate or spurious detections of line markings. We propose a novel strategy to automatically and accurately segment and classify line markings. First, line points are segmented thanks to a stochastic watershed transform that is robust to radial distortions, since it makes no assumptions about line straightness, and is unaffected by the presence of players or the ball. The line points are then linked to primitive structures (straight lines and ellipses) thanks to a very efficient procedure that makes no assumptions about the number of primitives that appear in each image. The strategy has been tested on a new and public database composed by 60 annotated images from matches in five stadiums. The results obtained have proven that the proposed strategy is more robust and accurate than existing approaches, achieving successful line mark detection even under challenging conditions.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117014"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49845015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-scale deep feature fusion based sparse dictionary selection for video summarization 基于多尺度深度特征融合的视频摘要稀疏字典选择

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2023-10-01 DOI: 10.1016/j.image.2023.117006

Xiao Wu , Mingyang Ma , Shuai Wan , Xiuxiu Han , Shaohui Mei

The explosive growth of video data constitutes a series of new challenges in computer vision, and the function of video summarization (VS) is becoming more and more prominent. Recent works have shown the effectiveness of sparse dictionary selection (SDS) based VS, which selects a representative frame set to sufficiently reconstruct a given video. Existing SDS based VS methods use conventional handcrafted features or single-scale deep features, which could diminish their summarization performance due to the underutilization of frame feature representation. Deep learning techniques based on convolutional neural networks (CNNs) exhibit powerful capabilities among various vision tasks, as the CNN provides excellent feature representation. Therefore, in this paper, a multi-scale deep feature fusion based sparse dictionary selection (MSDFF-SDS) is proposed for VS. Specifically, multi-scale features include the directly extracted features from the last fully connected layer and the global average pooling (GAP) processed features from intermediate layers, then VS is formulated as a problem of minimizing the reconstruction error using the multi-scale deep feature fusion. In our formulation, the contribution of each scale of features can be adjusted by a balance parameter, and the row-sparsity consistency of the simultaneous reconstruction coefficient is used to select as few keyframes as possible. The resulting MSDFF-SDS model is solved by using an efficient greedy pursuit algorithm. Experimental results on two benchmark datasets demonstrate that the proposed MSDFF-SDS improves the F-score of keyframe based summarization more than 3% compared with the existing SDS methods, and performs better than most deep-learning methods for skimming based summarization.

视频数据的爆炸式增长构成了计算机视觉领域的一系列新挑战，视频摘要（VS）的功能越来越突出。最近的工作已经表明了基于稀疏字典选择（SDS）的VS的有效性，该VS选择具有代表性的帧集来充分重建给定的视频。现有的基于SDS的VS方法使用传统的手工特征或单尺度深度特征，由于框架特征表示的利用不足，这可能会降低其摘要性能。基于卷积神经网络（CNNs）的深度学习技术在各种视觉任务中表现出强大的能力，因为CNN提供了出色的特征表示。因此，本文针对VS提出了一种基于多尺度深度特征融合的稀疏字典选择（MSDFF-SDS）。具体而言，多尺度特征包括来自最后一个完全连接层的直接提取特征和来自中间层的全局平均池（GAP）处理特征，则VS被公式化为使用多尺度深度特征融合来最小化重构误差的问题。在我们的公式中，每个尺度的特征的贡献可以通过平衡参数来调整，同时重建系数的行稀疏性一致性用于选择尽可能少的关键帧。通过使用有效的贪婪追求算法来求解由此产生的MSDFF-SDS模型。在两个基准数据集上的实验结果表明，与现有的SDS方法相比，所提出的MSDFF-SDS将基于关键帧的摘要的F分数提高了3%以上，并且在基于略读的摘要中表现优于大多数深度学习方法。

{"title":"Multi-scale deep feature fusion based sparse dictionary selection for video summarization","authors":"Xiao Wu , Mingyang Ma , Shuai Wan , Xiuxiu Han , Shaohui Mei","doi":"10.1016/j.image.2023.117006","DOIUrl":"https://doi.org/10.1016/j.image.2023.117006","url":null,"abstract":"<div><p>The explosive growth of video data constitutes a series of new challenges in computer vision<span><span>, and the function of video summarization (VS) is becoming more and more prominent. Recent works have shown the effectiveness of sparse dictionary selection (SDS) based VS, which selects a representative frame set to sufficiently reconstruct a given video. Existing SDS based VS methods use conventional handcrafted features or single-scale deep features, which could diminish their summarization performance due to the underutilization of frame feature representation. Deep learning<span> techniques based on convolutional neural networks<span> (CNNs) exhibit powerful capabilities among various vision tasks, as the CNN provides excellent feature representation. Therefore, in this paper, a multi-scale deep feature fusion<span> based sparse dictionary selection (MSDFF-SDS) is proposed for VS. Specifically, multi-scale features include the directly extracted features from the last fully connected layer and the global average pooling (GAP) processed features from intermediate layers, then VS is formulated as a problem of minimizing the reconstruction error using the multi-scale deep feature fusion. In our formulation, the contribution of each scale of features can be adjusted by a balance parameter, and the row-sparsity consistency of the simultaneous reconstruction coefficient is used to select as few </span></span></span></span>keyframes as possible. The resulting MSDFF-SDS model is solved by using an efficient greedy pursuit algorithm. Experimental results on two benchmark datasets demonstrate that the proposed MSDFF-SDS improves the F-score of keyframe based summarization more than 3% compared with the existing SDS methods, and performs better than most deep-learning methods for skimming based summarization.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117006"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49844963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Surprise-based JND estimation for perceptual quantization in H.265/HEVC codecs 基于惊喜的H.265/HEVC编解码器感知量化JND估计

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2023-10-01 DOI: 10.1016/j.image.2023.117019

Hongkui Wang , Li Yu , Hailang Yang , Haifeng Xu , Haibing Yin , Guangtao Zhai , Tianzong Li , Zhuo Kuang

Just noticeable distortion (JND), reflecting the perceptual redundancy directly, has been widely used in image and video compression. However, the human visual system (HVS) is extremely complex and the visual signal processing has not been fully understood, which result in existing JND models are not accurate enough and the bitrate saving of JND-based perceptual compression schemes is limited. This paper presents a novel pixel-based JND model for videos and a JND-based perceptual quantization scheme for HEVC codecs. In particular, positive and negative perception effects of the inter-frame difference and the motion information are analyzed and measured with an information-theoretic approach. Then, a surprise-based JND model is developed for perceptual video coding (PVC). In our PVC scheme, the frame-level perceptual quantization parameter (QP) is derived on the premise that the coding distortion is infinitely close to the estimated JND threshold. On the basis of the frame-level perceptual QP, we determine the perceptual QP for each coding unit through a perceptual adjustment function to achieve better perceptual quality. Experimental results indicate that the proposed JND model outperforms existing models significantly, the proposed perceptual quantization scheme improves video compression efficiency with better perceptual quality and lower coding complexity.

刚显失真(JND)是一种直接反映感知冗余的方法，在图像和视频压缩中得到了广泛的应用。然而，人类视觉系统极其复杂，对视觉信号的处理还没有完全了解，这导致现有的JND模型不够精确，并且基于JND的感知压缩方案的比特率节省有限。本文提出了一种新的基于像素的视频JND模型和一种基于JND的HEVC编解码器感知量化方案。特别地，用信息论的方法分析和测量了帧间差分和运动信息的积极和消极感知效应。在此基础上，提出了一种基于惊奇度的感知视频编码JND模型。在我们的PVC方案中，帧级感知量化参数(QP)是在编码失真无限接近估计的JND阈值的前提下导出的。在帧级感知QP的基础上，通过感知调节函数确定每个编码单元的感知QP，以获得更好的感知质量。实验结果表明，所提出的JND模型明显优于现有的模型，所提出的感知量化方案以更好的感知质量和更低的编码复杂度提高了视频压缩效率。

{"title":"Surprise-based JND estimation for perceptual quantization in H.265/HEVC codecs","authors":"Hongkui Wang , Li Yu , Hailang Yang , Haifeng Xu , Haibing Yin , Guangtao Zhai , Tianzong Li , Zhuo Kuang","doi":"10.1016/j.image.2023.117019","DOIUrl":"https://doi.org/10.1016/j.image.2023.117019","url":null,"abstract":"<div><p><span>Just noticeable distortion (JND), reflecting the perceptual redundancy directly, has been widely used in image and video compression. However, the </span>human visual system<span><span> (HVS) is extremely complex and the visual signal processing has not been fully understood, which result in existing JND models are not accurate enough and the bitrate saving of JND-based perceptual compression schemes<span> is limited. This paper presents a novel pixel-based JND model for videos and a JND-based perceptual quantization scheme for HEVC codecs. In particular, positive and negative perception effects of the inter-frame difference and the motion information are analyzed and measured with an information-theoretic approach. Then, a surprise-based JND model is developed for perceptual video coding (PVC). In our PVC scheme, the frame-level perceptual quantization parameter (QP) is derived on the premise that the coding distortion is infinitely close to the estimated JND threshold. On the basis of the frame-level perceptual QP, we determine the perceptual QP for each coding unit through a perceptual adjustment function to achieve better </span></span>perceptual quality. Experimental results indicate that the proposed JND model outperforms existing models significantly, the proposed perceptual quantization scheme improves video compression efficiency with better perceptual quality and lower coding complexity.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117019"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49881551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Joint adjustment image steganography networks 联合调整图像隐写网络

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2023-10-01 DOI: 10.1016/j.image.2023.117022

Le Zhang , Yao Lu , Tong Li , Guangming Lu

Image steganography aims to achieve covert communication between two partners utilizing stego images generated by hiding secret images within cover images. Existing deep image steganography methods have been rapidly developed in this area. Such methods, however, usually generate the stego images and reveal the secret images using one-process networks, lacking sufficient refinement in these methods. Thus, the security and quality of stego and revealed secret images still have much room for promotion, especially for large-capacity image steganography. This paper proposes Joint Adjustment Image Steganography Networks (JAIS-Nets), containing a series of coarse-to-fine iterative adjustment processes, for image steganography. Our JAIS-Nets first proposes Cross-Process Contrastive Refinement (CPCR) adjustment method, using the cross-process contrastive information from cover-stego and secret-revealed secret image pairs, to iteratively refine the generated stego and revealed secret images, respectively. In addition, our JAIS-Nets further proposes Cross-Process Multi-Scale (CPMS) adjustment method, using the cross-process multi-scale information from different scales cover-stego and secret-revealed secret image pairs, to directly adjust and enhance the intermediate representations of the proposed JAIS-Nets. Integrating the proposed CPCR with CPMS methods, the proposed JAIS-Nets can jointly adjust the quality of the stego and revealed secret images at both the learning process and image scale levels. Extensive experiments demonstrate that our JAIS-Nets can achieve state-of-the-art performances on the security and quality of the stego and revealed secret images on both the regular and large capacity image steganography.

图像隐写术的目的是利用隐藏在封面图像中的秘密图像生成的隐写图像来实现两个合作伙伴之间的秘密通信。现有的深度图像隐写技术在这一领域得到了迅速发展。然而，这些方法通常使用单过程网络生成隐写图像和揭示秘密图像，缺乏足够的细化。因此，隐写和泄露秘密图像的安全性和质量仍有很大的提升空间，特别是大容量图像隐写。本文提出了一种联合平差图像隐写网络(jis - nets)，该网络包含一系列从粗到精的迭代平差过程，用于图像隐写。我们的JAIS-Nets首先提出了跨过程对比细化(CPCR)平差方法，利用覆盖-隐进和秘密-揭示的秘密图像对的跨过程对比信息，分别对生成的隐进和揭示的秘密图像进行迭代细化。此外，我们的JAIS-Nets进一步提出了跨过程多尺度(CPMS)平差方法，利用不同尺度覆盖-隐藏和秘密-揭示的秘密图像对的跨过程多尺度信息，直接调整和增强所提出的JAIS-Nets的中间表示。将所提出的CPCR方法与CPMS方法相结合，所提出的JAIS-Nets可以在学习过程和图像尺度水平上共同调整隐入图像和揭示秘密图像的质量。大量的实验表明，我们的JAIS-Nets在常规和大容量图像隐写上都能达到最先进的安全性和隐写质量，并揭示了秘密图像。

{"title":"Joint adjustment image steganography networks","authors":"Le Zhang , Yao Lu , Tong Li , Guangming Lu","doi":"10.1016/j.image.2023.117022","DOIUrl":"https://doi.org/10.1016/j.image.2023.117022","url":null,"abstract":"<div><p>Image steganography aims to achieve covert communication<span><span> between two partners utilizing stego images generated by hiding </span>secret images<span> within cover images. Existing deep image steganography methods have been rapidly developed in this area. Such methods, however, usually generate the stego images and reveal the secret images using one-process networks, lacking sufficient refinement in these methods. Thus, the security and quality of stego and revealed secret images still have much room for promotion, especially for large-capacity image steganography. This paper proposes Joint Adjustment Image Steganography Networks (JAIS-Nets), containing a series of coarse-to-fine iterative adjustment processes, for image steganography. Our JAIS-Nets first proposes Cross-Process Contrastive Refinement (CPCR) adjustment method, using the cross-process contrastive information from cover-stego and secret-revealed secret image pairs, to iteratively refine the generated stego and revealed secret images, respectively. In addition, our JAIS-Nets further proposes Cross-Process Multi-Scale (CPMS) adjustment method, using the cross-process multi-scale information from different scales cover-stego and secret-revealed secret image pairs, to directly adjust and enhance the intermediate representations of the proposed JAIS-Nets. Integrating the proposed CPCR with CPMS methods, the proposed JAIS-Nets can jointly adjust the quality of the stego and revealed secret images at both the learning process and image scale levels. Extensive experiments demonstrate that our JAIS-Nets can achieve state-of-the-art performances on the security and quality of the stego and revealed secret images on both the regular and large capacity image steganography.</span></span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117022"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49896214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0