Signal Processing-Image Communication最新文献_第5页

LMNet: A learnable multi-scale cost volume for stereo matching LMNet：用于立体匹配的可学习多尺度成本量

IF 3.4 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2024-07-11 DOI: 10.1016/j.image.2024.117169

Jiatao Liu , Yaping Zhang

Calculating disparities through stereo matching is an important step in a variety of machine vision tasks used for robotics and similar applications. The use of deep neural networks for stereo matching requires the construction of a matching cost volume. However, the occluded, non-textured, and reflective regions are ill-posed, which cannot be directly matched. In previous studies, a direct calculation has typically been used to measure matching costs for single-scale feature maps, which makes it difficult to predict disparity for ill-posed regions. Thus, we propose a learnable multi-scale matching cost calculation method (LMNet) to improve the accuracy of stereo matching. This learned matching cost can reasonably estimate the disparity of the regions that are conventionally difficult to match. Multi-level 3D dilation convolutions for multi-scale features are introduced during constructing cost volumes because the receptive field of the convolution kernels is limited. The experimental results show that the proposed method achieves significant improvement in ill-posed regions. Compared with the classical architecture GwcNet, End-Point-Error (EPE) of the proposed method on the Scene Flow dataset is reduced by 16.46%. The number of parameters and required calculations are also reduced by 8.71% and 20.05%, respectively. The proposed model code and pre-training parameters are available at: https://github.com/jt-liu/LMNet.

通过立体匹配计算差异是机器人和类似应用中各种机器视觉任务的重要步骤。使用深度神经网络进行立体匹配需要构建一个匹配成本量。然而，遮挡区域、非纹理区域和反射区域是不确定的，无法直接进行匹配。在以往的研究中，通常采用直接计算的方法来衡量单尺度特征图的匹配成本，这样就很难预测不确定区域的差距。因此，我们提出了一种可学习的多尺度匹配成本计算方法（LMNet），以提高立体匹配的准确性。这种学习匹配成本可以合理地估计传统上难以匹配的区域的差距。由于卷积核的感受野有限，因此在构建成本卷时引入了针对多尺度特征的多级三维扩张卷积。实验结果表明，所提出的方法在条件不佳的区域取得了显著的改进。与经典架构 GwcNet 相比，所提方法在场景流数据集上的终点错误率（EPE）降低了 16.46%。参数数量和所需计算量也分别减少了 8.71% 和 20.05%。建议的模型代码和预训练参数可在以下网址获取：https://github.com/jt-liu/LMNet。

{"title":"LMNet: A learnable multi-scale cost volume for stereo matching","authors":"Jiatao Liu , Yaping Zhang","doi":"10.1016/j.image.2024.117169","DOIUrl":"10.1016/j.image.2024.117169","url":null,"abstract":"<div><p>Calculating disparities through stereo matching is an important step in a variety of machine vision tasks used for robotics and similar applications. The use of deep neural networks for stereo matching requires the construction of a matching cost volume. However, the occluded, non-textured, and reflective regions are ill-posed, which cannot be directly matched. In previous studies, a direct calculation has typically been used to measure matching costs for single-scale feature maps, which makes it difficult to predict disparity for ill-posed regions. Thus, we propose a learnable multi-scale matching cost calculation method (LMNet) to improve the accuracy of stereo matching. This learned matching cost can reasonably estimate the disparity of the regions that are conventionally difficult to match. Multi-level 3D dilation convolutions for multi-scale features are introduced during constructing cost volumes because the receptive field of the convolution kernels is limited. The experimental results show that the proposed method achieves significant improvement in ill-posed regions. Compared with the classical architecture GwcNet, End-Point-Error (EPE) of the proposed method on the Scene Flow dataset is reduced by 16.46%. The number of parameters and required calculations are also reduced by 8.71% and 20.05%, respectively. The proposed model code and pre-training parameters are available at: <span><span>https://github.com/jt-liu/LMNet</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"128 ","pages":"Article 117169"},"PeriodicalIF":3.4,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141698516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-scale strip-shaped convolution attention network for lightweight image super-resolution 用于轻量级图像超分辨率的多尺度条形卷积注意力网络

IF 3.4 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2024-07-11 DOI: 10.1016/j.image.2024.117166

Ke Xu, Lulu Pan, Guohua Peng, Wenbo Zhang, Yanheng Lv, Guo Li, Lingxiao Li, Le Lei

Lightweight convolutional neural networks for Single Image Super-Resolution (SISR) have exhibited remarkable performance improvements in recent years. These models achieve excellent performance by relying on attention mechanisms that incorporate square-shaped convolutions to enhance feature representation. However, these approaches still suffer from redundancy which comes from square-shaped convolutional kernels and overlooks the utilization of multi-scale information. In this paper, we propose a novel attention mechanism called Multi-scale Strip-shaped convolution Attention (MSA), which utilizes three sets of differently sized depth-wise separable stripe convolution kernels in parallel to replace the redundant square-shaped convolution attention and extract multi-scale features. We also generalize MSA to other lightweight neural network models, and experimental results show that MSA outperforms other convolutional based attention mechanisms. Building upon MSA, we propose an Efficient Feature Extraction Block (EFEB), a lightweight block for SISR. Finally, based on EFEB, we propose a lightweight image super-resolution neural network named Multi-scale Strip-shaped convolution Attention Network (MSAN). Experiments demonstrate that MSAN outperforms existing state-of-the-art lightweight SR methods with fewer parameters and lower computational complexity.

近年来，用于单图像超分辨率（SISR）的轻量级卷积神经网络的性能有了显著提高。这些模型依靠包含方形卷积的注意力机制来增强特征表示，从而实现了出色的性能。然而，这些方法仍然存在方形卷积核带来的冗余问题，忽略了对多尺度信息的利用。在本文中，我们提出了一种名为 "多尺度条形卷积注意力"（MSA）的新型注意力机制，它利用三组大小不同的深度可分离条形卷积核并行取代冗余的方形卷积注意力，提取多尺度特征。我们还将 MSA 推广到其他轻量级神经网络模型，实验结果表明 MSA 优于其他基于卷积的注意力机制。在 MSA 的基础上，我们提出了高效特征提取块（EFEB），这是 SISR 的轻量级块。最后，基于 EFEB，我们提出了一种轻量级图像超分辨率神经网络，名为多尺度带状卷积注意力网络（MSAN）。实验证明，MSAN 以更少的参数和更低的计算复杂度超越了现有的最先进的轻量级 SR 方法。

{"title":"Multi-scale strip-shaped convolution attention network for lightweight image super-resolution","authors":"Ke Xu, Lulu Pan, Guohua Peng, Wenbo Zhang, Yanheng Lv, Guo Li, Lingxiao Li, Le Lei","doi":"10.1016/j.image.2024.117166","DOIUrl":"10.1016/j.image.2024.117166","url":null,"abstract":"<div><p>Lightweight convolutional neural networks for Single Image Super-Resolution (SISR) have exhibited remarkable performance improvements in recent years. These models achieve excellent performance by relying on attention mechanisms that incorporate square-shaped convolutions to enhance feature representation. However, these approaches still suffer from redundancy which comes from square-shaped convolutional kernels and overlooks the utilization of multi-scale information. In this paper, we propose a novel attention mechanism called Multi-scale Strip-shaped convolution Attention (MSA), which utilizes three sets of differently sized depth-wise separable stripe convolution kernels in parallel to replace the redundant square-shaped convolution attention and extract multi-scale features. We also generalize MSA to other lightweight neural network models, and experimental results show that MSA outperforms other convolutional based attention mechanisms. Building upon MSA, we propose an Efficient Feature Extraction Block (EFEB), a lightweight block for SISR. Finally, based on EFEB, we propose a lightweight image super-resolution neural network named Multi-scale Strip-shaped convolution Attention Network (MSAN). Experiments demonstrate that MSAN outperforms existing state-of-the-art lightweight SR methods with fewer parameters and lower computational complexity.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"128 ","pages":"Article 117166"},"PeriodicalIF":3.4,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141704349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A foreground-context dual-guided network for light-field salient object detection 用于光场突出物体检测的前景-语境双引导网络

IF 3.4 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2024-06-26 DOI: 10.1016/j.image.2024.117165

Xin Zheng , Boyang Wang , Deyang Liu , Chengtao Lv , Jiebin Yan , Ping An

Light-field salient object detection (SOD) has become an emerging trend as it records comprehensive information about natural scenes that can benefit salient object detection in various ways. However, salient object detection models with light-field data as input have not been thoroughly explored. The existing methods cannot effectively suppress the noise, and it is difficult to distinguish the foreground and background under challenging conditions including self-similarity, complex backgrounds, large depth of field, and non-Lambertian scenarios. In order to extract the feature of light-field images effectively and suppress the noise in light-field, in this paper, we propose a foreground and context dual guided network. Specifically, we design a global context extraction module (GCEM) and a local foreground extraction module (LFEM). GCEM is used to suppress global noise and roughly predict saliency maps. GCEM also can extract global context information from deep-level features to guide decoding process. By extracting local information from shallow-level, LFEM refines the prediction obtained by GCEM. In addition, we use RGB images to enhance the light-field images before the input GCEM. Experimental results show that our proposed method is effective in suppressing global noise and achieves better results when dealing with transparent objects and complex backgrounds. The experimental results show that the proposed method outperforms several other state-of-the-art methods on three light-field datasets.

光场突出物体检测（SOD）已成为一种新兴趋势，因为它记录了自然场景的全面信息，能以各种方式帮助突出物体检测。然而，以光场数据为输入的突出物体检测模型尚未得到深入探讨。现有方法无法有效抑制噪声，而且在自相似性、复杂背景、大景深和非朗伯场景等挑战性条件下很难区分前景和背景。为了有效提取光场图像的特征并抑制光场噪声，本文提出了一种前景和背景双引导网络。具体来说，我们设计了一个全局上下文提取模块（GCEM）和一个局部前景提取模块（LFEM）。GCEM 用于抑制全局噪声并粗略预测显著性地图。GCEM 还能从深层特征中提取全局上下文信息，以指导解码过程。通过从浅层提取局部信息，LFEM 可以完善 GCEM 所获得的预测结果。此外，在输入 GCEM 之前，我们使用 RGB 图像来增强光场图像。实验结果表明，我们提出的方法能有效抑制全局噪声，在处理透明物体和复杂背景时能取得更好的效果。实验结果表明，在三个光场数据集上，我们提出的方法优于其他几种最先进的方法。

{"title":"A foreground-context dual-guided network for light-field salient object detection","authors":"Xin Zheng , Boyang Wang , Deyang Liu , Chengtao Lv , Jiebin Yan , Ping An","doi":"10.1016/j.image.2024.117165","DOIUrl":"https://doi.org/10.1016/j.image.2024.117165","url":null,"abstract":"<div><p>Light-field salient object detection (SOD) has become an emerging trend as it records comprehensive information about natural scenes that can benefit salient object detection in various ways. However, salient object detection models with light-field data as input have not been thoroughly explored. The existing methods cannot effectively suppress the noise, and it is difficult to distinguish the foreground and background under challenging conditions including self-similarity, complex backgrounds, large depth of field, and non-Lambertian scenarios. In order to extract the feature of light-field images effectively and suppress the noise in light-field, in this paper, we propose a foreground and context dual guided network. Specifically, we design a global context extraction module (GCEM) and a local foreground extraction module (LFEM). GCEM is used to suppress global noise and roughly predict saliency maps. GCEM also can extract global context information from deep-level features to guide decoding process. By extracting local information from shallow-level, LFEM refines the prediction obtained by GCEM. In addition, we use RGB images to enhance the light-field images before the input GCEM. Experimental results show that our proposed method is effective in suppressing global noise and achieves better results when dealing with transparent objects and complex backgrounds. The experimental results show that the proposed method outperforms several other state-of-the-art methods on three light-field datasets.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"128 ","pages":"Article 117165"},"PeriodicalIF":3.4,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141540522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PGGNet: Pyramid gradual-guidance network for RGB-D indoor scene semantic segmentation PGGNet：用于 RGB-D 室内场景语义分割的金字塔渐导网络

IF 3.4 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2024-06-22 DOI: 10.1016/j.image.2024.117164

Wujie Zhou , Gao Xu , Meixin Fang , Shanshan Mao , Rongwang Yang , Lu Yu

In RGB-D (red–green–blue and depth) scene semantic segmentation, depth maps provide rich spatial information to RGB images to achieve high performance. However, properly aggregating depth information and reducing noise and information loss during feature encoding after fusion are challenging aspects in scene semantic segmentation. To overcome these problems, we propose a pyramid gradual-guidance network for RGB-D indoor scene semantic segmentation. First, the quality of depth information is improved by a modality-enhancement fusion module and RGB image fusion. Then, the representation of semantic information is improved by multiscale operations. The two resulting adjacent features are used in a feature refinement module with an attention mechanism to extract semantic information. The features from adjacent modules are successively used to form an encoding pyramid, which can substantially reduce information loss and thereby ensure information integrity. Finally, we gradually integrate features at the same scale obtained from the encoding pyramid during decoding to obtain high-quality semantic segmentation. Experimental results obtained from two commonly used indoor scene datasets demonstrate that the proposed pyramid gradual-guidance network attains the highest level of performance in semantic segmentation, as compared to other existing methods.

在 RGB-D（红-绿-蓝和深度）场景语义分割中，深度图为 RGB 图像提供了丰富的空间信息，从而实现了高性能。然而，在场景语义分割中，如何正确聚合深度信息并减少融合后特征编码过程中的噪声和信息丢失是一个具有挑战性的问题。为了克服这些问题，我们提出了一种用于 RGB-D 室内场景语义分割的金字塔渐导网络。首先，通过模态增强融合模块和 RGB 图像融合提高深度信息的质量。然后，通过多尺度操作改进语义信息的表示。由此产生的两个相邻特征被用于带有注意力机制的特征提取模块，以提取语义信息。相邻模块的特征相继用于形成编码金字塔，可大大减少信息丢失，从而确保信息的完整性。最后，我们在解码过程中逐步整合从编码金字塔中获得的同一尺度的特征，从而获得高质量的语义分割。两个常用室内场景数据集的实验结果表明，与其他现有方法相比，所提出的金字塔渐进引导网络在语义分割方面达到了最高水平。

{"title":"PGGNet: Pyramid gradual-guidance network for RGB-D indoor scene semantic segmentation","authors":"Wujie Zhou , Gao Xu , Meixin Fang , Shanshan Mao , Rongwang Yang , Lu Yu","doi":"10.1016/j.image.2024.117164","DOIUrl":"https://doi.org/10.1016/j.image.2024.117164","url":null,"abstract":"<div><p>In RGB-D (red–green–blue and depth) scene semantic segmentation, depth maps provide rich spatial information to RGB images to achieve high performance. However, properly aggregating depth information and reducing noise and information loss during feature encoding after fusion are challenging aspects in scene semantic segmentation. To overcome these problems, we propose a pyramid gradual-guidance network for RGB-D indoor scene semantic segmentation. First, the quality of depth information is improved by a modality-enhancement fusion module and RGB image fusion. Then, the representation of semantic information is improved by multiscale operations. The two resulting adjacent features are used in a feature refinement module with an attention mechanism to extract semantic information. The features from adjacent modules are successively used to form an encoding pyramid, which can substantially reduce information loss and thereby ensure information integrity. Finally, we gradually integrate features at the same scale obtained from the encoding pyramid during decoding to obtain high-quality semantic segmentation. Experimental results obtained from two commonly used indoor scene datasets demonstrate that the proposed pyramid gradual-guidance network attains the highest level of performance in semantic segmentation, as compared to other existing methods.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"128 ","pages":"Article 117164"},"PeriodicalIF":3.4,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141479791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Line segment detectors with deformable attention 具有可变形注意力的线段检测器

IF 3.4 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2024-06-07 DOI: 10.1016/j.image.2024.117155

Shoufeng Tang , Shuo Zhou , Xiamin Tong , Jingyuan Gao , Yuhao Wang , Xiaojun Ma

The object detectors based on Transformers are advancing rapidly. On the contrary, the development of line segment detectors is relatively slow. It is noteworthy that the object and line segments are both 2D targets. In this work, we design a line segment detection algorithm based on deformable attention. Leveraging this algorithm and the line segment loss function, we transform the object detectors, Deformable DETR and ViDT, into end-to-end line segment detectors named Deformable LETR and ViDTLE, respectively. In order to adapt the idea of sparse modeling for line segment detection, we propose a new attention mechanism named line segment deformable attention (LSDA). This mechanism focuses on the valuable positions under the guidance of reference line to refine line segments. We design an auxiliary algorithm named line segment iterative refinement for LSDA. With as few modifications as possible, we transform two object detection detectors, namely SMCA DETR and PnP DETR into competitive line segment detectors named SMCA LETR and PnP LETR, respectively. The experimental results show that the performances of the proposed methods are efficient.

基于变压器的物体检测器发展迅速。相反，线段检测器的发展则相对缓慢。值得注意的是，物体和线段都是二维目标。在这项工作中，我们设计了一种基于可变形注意力的线段检测算法。利用该算法和线段损失函数，我们将物体检测器（Deformable DETR 和 ViDT）转换为端到端线段检测器，分别命名为 Deformable LETR 和 ViDTLE。为了将稀疏建模的思想应用于线段检测，我们提出了一种名为线段可变形关注（LSDA）的新关注机制。该机制关注参考线引导下的有价值位置，以细化线段。我们为 LSDA 设计了一种名为线段迭代细化的辅助算法。通过尽可能少的修改，我们将 SMCA DETR 和 PnP DETR 这两种目标检测器分别转化为 SMCA LETR 和 PnP LETR 这两种具有竞争力的线段检测器。实验结果表明，所提出的方法性能高效。

{"title":"Line segment detectors with deformable attention","authors":"Shoufeng Tang , Shuo Zhou , Xiamin Tong , Jingyuan Gao , Yuhao Wang , Xiaojun Ma","doi":"10.1016/j.image.2024.117155","DOIUrl":"10.1016/j.image.2024.117155","url":null,"abstract":"<div><p>The object detectors based on Transformers are advancing rapidly. On the contrary, the development of line segment detectors is relatively slow. It is noteworthy that the object and line segments are both 2D targets. In this work, we design a line segment detection algorithm based on deformable attention. Leveraging this algorithm and the line segment loss function, we transform the object detectors, Deformable DETR and ViDT, into end-to-end line segment detectors named Deformable LETR and ViDTLE, respectively. In order to adapt the idea of sparse modeling for line segment detection, we propose a new attention mechanism named line segment deformable attention (LSDA). This mechanism focuses on the valuable positions under the guidance of reference line to refine line segments. We design an auxiliary algorithm named line segment iterative refinement for LSDA. With as few modifications as possible, we transform two object detection detectors, namely SMCA DETR and PnP DETR into competitive line segment detectors named SMCA LETR and PnP LETR, respectively. The experimental results show that the performances of the proposed methods are efficient.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"128 ","pages":"Article 117155"},"PeriodicalIF":3.4,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141403728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Quality evaluation of point cloud compression techniques 点云压缩技术的质量评估

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2024-06-07 DOI: 10.1016/j.image.2024.117156

Joao Prazeres , Manuela Pereira , Antonio M.G. Pinheiro

A study on the quality evaluation of point clouds in the presence of coding distortions is presented. For that, four different point cloud coding solutions, notably the standardized MPEG codecs G-PCC and V-PCC, a deep learning-based coding solution RS-DLPCC, and Draco, are compared using a subjective evaluation methodology. Furthermore, several full-reference, reduced-reference and no-reference point cloud quality metrics are evaluated. Two different point cloud normal computation methods were tested for the metrics that rely on them, notably the Cloud Compare quadric fitting method with radius of five, ten, and twenty and Meshlab KNN with K six, ten, and eighteen. To generalize the results, the objective quality metrics were also benchmarked on a public database, with mean opinion scores available. To evaluate the statistical differences between the metrics, the Krasula method was employed. The Point Cloud Quality Metric reveals the best performance and a very good representation of the subjective results, as well as being the metric with the most statistically significant results. It was also revealed that the Cloud Compare quadric fitting method with radius 10 and 20 produced the most reliable normals for the metrics dependent on them. Finally, the study revealed that the most commonly used metrics fail to accurately predict the compression quality when artifacts generated by deep learning methods are present.

本文介绍了对编码失真情况下点云质量评估的研究。为此，采用主观评价方法比较了四种不同的点云编码解决方案，特别是标准化 MPEG 编解码器 G-PCC 和 V-PCC、基于深度学习的编码解决方案 RS-DLPCC 和 Draco。此外，还对几种全参考、减参考和无参考点云质量指标进行了评估。对两种不同的点云法线计算方法进行了测试，特别是半径分别为 5、10 和 20 的 Cloud Compare 四维拟合方法和 K 分别为 6、10 和 18 的 Meshlab KNN。为了推广结果，还在公共数据库中对客观质量指标进行了基准测试，并提供了平均意见分数。为了评估指标之间的统计差异，采用了 Krasula 方法。点云质量度量显示了最佳性能，很好地代表了主观结果，同时也是统计结果最显著的度量。研究还发现，半径为 10 和 20 的云比较四边形拟合方法为依赖于它们的指标产生了最可靠的法线。最后，研究表明，当深度学习方法产生伪影时，最常用的指标无法准确预测压缩质量。

{"title":"Quality evaluation of point cloud compression techniques","authors":"Joao Prazeres , Manuela Pereira , Antonio M.G. Pinheiro","doi":"10.1016/j.image.2024.117156","DOIUrl":"https://doi.org/10.1016/j.image.2024.117156","url":null,"abstract":"<div><p>A study on the quality evaluation of point clouds in the presence of coding distortions is presented. For that, four different point cloud coding solutions, notably the standardized MPEG codecs G-PCC and V-PCC, a deep learning-based coding solution RS-DLPCC, and Draco, are compared using a subjective evaluation methodology. Furthermore, several full-reference, reduced-reference and no-reference point cloud quality metrics are evaluated. Two different point cloud normal computation methods were tested for the metrics that rely on them, notably the Cloud Compare quadric fitting method with radius of five, ten, and twenty and Meshlab KNN with K six, ten, and eighteen. To generalize the results, the objective quality metrics were also benchmarked on a public database, with mean opinion scores available. To evaluate the statistical differences between the metrics, the Krasula method was employed. The Point Cloud Quality Metric reveals the best performance and a very good representation of the subjective results, as well as being the metric with the most statistically significant results. It was also revealed that the Cloud Compare quadric fitting method with radius 10 and 20 produced the most reliable normals for the metrics dependent on them. Finally, the study revealed that the most commonly used metrics fail to accurately predict the compression quality when artifacts generated by deep learning methods are present.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"128 ","pages":"Article 117156"},"PeriodicalIF":3.5,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141328978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

“Sparse + Low-Rank” tensor completion approach for recovering images and videos 用于恢复图像和视频的 "稀疏 + 低域 "张量补全方法

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2024-05-24 DOI: 10.1016/j.image.2024.117152

Chenjian Pan , Chen Ling , Hongjin He , Liqun Qi , Yanwei Xu

Recovering color images and videos from highly undersampled data is a fundamental and challenging task in face recognition and computer vision. By the multi-dimensional nature of color images and videos, in this paper, we propose a novel tensor completion approach, which is able to efficiently explore the sparsity of tensor data under the discrete cosine transform (DCT). Specifically, we introduce two “sparse + low-rank” tensor completion models as well as two implementable algorithms for finding their solutions. The first one is a DCT-based sparse plus weighted nuclear norm induced low-rank minimization model. The second one is a DCT-based sparse plus $p$ -shrinking mapping induced low-rank optimization model. Moreover, we accordingly propose two implementable augmented Lagrangian-based algorithms for solving the underlying optimization models. A series of numerical experiments including color image inpainting and video data recovery demonstrate that our proposed approach performs better than many existing state-of-the-art tensor completion methods, especially for the case when the ratio of missing data is high.

从高度采样不足的数据中恢复彩色图像和视频是人脸识别和计算机视觉领域一项基本而又具有挑战性的任务。鉴于彩色图像和视频的多维特性，我们在本文中提出了一种新颖的张量补全方法，该方法能够在离散余弦变换（DCT）下有效地探索张量数据的稀疏性。具体来说，我们引入了两种 "稀疏 + 低秩 "张量补全模型，以及两种可实现的算法来寻找它们的解决方案。第一种是基于 DCT 的稀疏加权核规范诱导低秩最小化模型。第二个是基于 DCT 的稀疏加-缩减映射诱导的低阶优化模型。此外，我们还相应地提出了两种可实现的基于增强拉格朗日的算法，用于求解底层优化模型。包括彩色图像绘制和视频数据恢复在内的一系列数值实验表明，我们提出的方法比许多现有的最先进的张量补全方法性能更好，尤其是在缺失数据比例较高的情况下。

{"title":"“Sparse + Low-Rank” tensor completion approach for recovering images and videos","authors":"Chenjian Pan , Chen Ling , Hongjin He , Liqun Qi , Yanwei Xu","doi":"10.1016/j.image.2024.117152","DOIUrl":"10.1016/j.image.2024.117152","url":null,"abstract":"<div><p>Recovering color images and videos from highly undersampled data is a fundamental and challenging task in face recognition and computer vision. By the multi-dimensional nature of color images and videos, in this paper, we propose a novel tensor completion approach, which is able to efficiently explore the sparsity of tensor data under the discrete cosine transform (DCT). Specifically, we introduce two “sparse + low-rank” tensor completion models as well as two implementable algorithms for finding their solutions. The first one is a DCT-based sparse plus weighted nuclear norm induced low-rank minimization model. The second one is a DCT-based sparse plus <span><math><mi>p</mi></math></span>-shrinking mapping induced low-rank optimization model. Moreover, we accordingly propose two implementable augmented Lagrangian-based algorithms for solving the underlying optimization models. A series of numerical experiments including color image inpainting and video data recovery demonstrate that our proposed approach performs better than many existing state-of-the-art tensor completion methods, especially for the case when the ratio of missing data is high.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"127 ","pages":"Article 117152"},"PeriodicalIF":3.5,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141192434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Transformer based Douglas-Rachford unrolling network for compressed sensing 基于变压器的压缩传感道格拉斯-拉赫福德展开网络

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2024-05-24 DOI: 10.1016/j.image.2024.117153

Yueming Su , Qiusheng Lian , Dan Zhang , Baoshun Shi

Compressed sensing (CS) with the binary sampling matrix is hardware-friendly and memory-saving in the signal processing field. Existing Convolutional Neural Network (CNN)-based CS methods show potential restrictions in exploiting non-local similarity and lack interpretability. In parallel, the emerging Transformer architecture performs well in modelling long-range correlations. To further improve the CS reconstruction quality from highly under-sampled CS measurements, a Transformer based deep unrolling reconstruction network abbreviated as DR-TransNet is proposed, whose design is inspired by the traditional iterative Douglas-Rachford algorithm. It combines the merits of structure insights of optimization-based methods and the speed of the network-based ones. Therein, a U-type Transformer based proximal sub-network is elaborated to reconstruct images in the wavelet domain and the spatial domain as an auxiliary mode, which aims to explore local informative details and global long-term interaction of the images. Specially, a flexible single model is trained to address the CS reconstruction with different binary CS sampling ratios. Compared with the state-of-the-art CS reconstruction methods with the binary sampling matrix, the proposed method can achieve appealing improvements in terms of Peak Signal to Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM) and visual metrics. Codes are available at https://github.com/svyueming/DR-TransNet.

在信号处理领域，采用二进制采样矩阵的压缩传感（CS）既方便硬件，又节省内存。现有的基于卷积神经网络（CNN）的压缩传感方法在利用非局部相似性方面存在潜在限制，并且缺乏可解释性。与此同时，新兴的 Transformer 架构在模拟长距离相关性方面表现出色。为了进一步提高高度采样不足的 CS 测量的 CS 重建质量，我们提出了一种基于 Transformer 的深度开卷重建网络，简称 DR-TransNet，其设计灵感来自传统的迭代 Douglas-Rachford 算法。它结合了基于优化方法的结构洞察力和基于网络方法的速度优势。其中，详细阐述了基于 U 型变换器的近端子网络，以小波域和空间域作为辅助模式重建图像，旨在探索图像的局部信息细节和全局长期交互。特别是，针对不同二元 CS 采样比的 CS 重建，训练了一个灵活的单一模型。与采用二进制采样矩阵的最先进 CS 重建方法相比，所提出的方法在峰值信噪比（PSNR）、结构相似度指数（SSIM）和视觉指标方面都取得了令人满意的改进。代码见 https://github.com/svyueming/DR-TransNet。

{"title":"Transformer based Douglas-Rachford unrolling network for compressed sensing","authors":"Yueming Su , Qiusheng Lian , Dan Zhang , Baoshun Shi","doi":"10.1016/j.image.2024.117153","DOIUrl":"10.1016/j.image.2024.117153","url":null,"abstract":"<div><p>Compressed sensing (CS) with the binary sampling matrix is hardware-friendly and memory-saving in the signal processing field. Existing Convolutional Neural Network (CNN)-based CS methods show potential restrictions in exploiting non-local similarity and lack interpretability. In parallel, the emerging Transformer architecture performs well in modelling long-range correlations. To further improve the CS reconstruction quality from highly under-sampled CS measurements, a Transformer based deep unrolling reconstruction network abbreviated as DR-TransNet is proposed, whose design is inspired by the traditional iterative Douglas-Rachford algorithm. It combines the merits of structure insights of optimization-based methods and the speed of the network-based ones. Therein, a U-type Transformer based proximal sub-network is elaborated to reconstruct images in the wavelet domain and the spatial domain as an auxiliary mode, which aims to explore local informative details and global long-term interaction of the images. Specially, a flexible single model is trained to address the CS reconstruction with different binary CS sampling ratios. Compared with the state-of-the-art CS reconstruction methods with the binary sampling matrix, the proposed method can achieve appealing improvements in terms of Peak Signal to Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM) and visual metrics. Codes are available at <span>https://github.com/svyueming/DR-TransNet</span><svg><path></path></svg>.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"127 ","pages":"Article 117153"},"PeriodicalIF":3.5,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141143653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reinforced Res-Unet transformer for underwater image enhancement 用于水下图像增强的增强型 Res-Unet 变压器

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2024-05-22 DOI: 10.1016/j.image.2024.117154

Peitong Li , Jiaying Chen , Chengtao Cai

Light propagation through water is subject to varying degrees of energy loss, causing captured images to display characteristics of color distortion, reduced contrast, and indistinct details and textures. The data-driven approach offers significant advantages over traditional algorithms, such as improved accuracy and reduced computational costs. However, challenges such as optimizing network architecture, refining coding techniques, and expanding database resources must be addressed to ensure the generation of high-quality reconstructed images across diverse tasks. In this paper, an underwater image enhancement network based on feature fusion is proposed named RUTUIE, which integrates feature fusion techniques. It leverages the strengths of both Resnet and U-shape architecture, primarily structured around a streamlined up-and-down sampling mechanism. Specifically, the U-shaped structure serves as the backbone of ResNet, equipped with two feature transformers at both the encoding and decoding ends, which are linked by a single-stage up-and-down sampling structure. This architecture is designed to minimize the omission of minor features during feature scale transformations. Furthermore, the improved Transformer encoder leverages a feature-level attention mechanism and the advantages of CNNs, endowing the network with both local and global perceptual capabilities. Then, we propose and demonstrate that embedding an adaptive feature selection module at appropriate locations can retain more learned feature representations. Moreover, the application of a previously proposed color transfer method for synthesizing underwater images and augmenting network training. Extensive experiments demonstrate that our work effectively corrects color casts, reconstructs the rich texture information in natural scenes, and improves the contrast.

光线在水中传播时会产生不同程度的能量损失，导致拍摄到的图像显示出色彩失真、对比度降低、细节和纹理不清晰等特征。与传统算法相比，数据驱动方法具有显著优势，如提高精确度和降低计算成本。然而，要确保在各种任务中生成高质量的重建图像，必须应对优化网络架构、完善编码技术和扩展数据库资源等挑战。本文提出了一种基于特征融合的水下图像增强网络，名为 RUTUIE，它集成了特征融合技术。它充分利用了 Resnet 和 U 型结构的优势，主要围绕精简的上下采样机制构建。具体来说，U 型结构作为 ResNet 的骨干，在编码和解码两端配备了两个特征变换器，并通过单级上下采样结构将其连接起来。这种结构旨在最大限度地减少在特征比例转换过程中对次要特征的遗漏。此外，改进后的 Transformer 编码器利用了特征级关注机制和 CNN 的优势，使网络同时具备局部和全局感知能力。然后，我们提出并证明了在适当位置嵌入自适应特征选择模块可以保留更多已学特征表征。此外，我们还将之前提出的色彩转移方法应用于合成水下图像和增强网络训练。大量实验证明，我们的工作能有效纠正偏色，重建自然场景中丰富的纹理信息，并提高对比度。

{"title":"Reinforced Res-Unet transformer for underwater image enhancement","authors":"Peitong Li , Jiaying Chen , Chengtao Cai","doi":"10.1016/j.image.2024.117154","DOIUrl":"10.1016/j.image.2024.117154","url":null,"abstract":"<div><p>Light propagation through water is subject to varying degrees of energy loss, causing captured images to display characteristics of color distortion, reduced contrast, and indistinct details and textures. The data-driven approach offers significant advantages over traditional algorithms, such as improved accuracy and reduced computational costs. However, challenges such as optimizing network architecture, refining coding techniques, and expanding database resources must be addressed to ensure the generation of high-quality reconstructed images across diverse tasks. In this paper, an underwater image enhancement network based on feature fusion is proposed named RUTUIE, which integrates feature fusion techniques. It leverages the strengths of both Resnet and U-shape architecture, primarily structured around a streamlined up-and-down sampling mechanism. Specifically, the U-shaped structure serves as the backbone of ResNet, equipped with two feature transformers at both the encoding and decoding ends, which are linked by a single-stage up-and-down sampling structure. This architecture is designed to minimize the omission of minor features during feature scale transformations. Furthermore, the improved Transformer encoder leverages a feature-level attention mechanism and the advantages of CNNs, endowing the network with both local and global perceptual capabilities. Then, we propose and demonstrate that embedding an adaptive feature selection module at appropriate locations can retain more learned feature representations. Moreover, the application of a previously proposed color transfer method for synthesizing underwater images and augmenting network training. Extensive experiments demonstrate that our work effectively corrects color casts, reconstructs the rich texture information in natural scenes, and improves the contrast.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"127 ","pages":"Article 117154"},"PeriodicalIF":3.5,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141140727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Self-supervised 3D vehicle detection based on monocular images 基于单目图像的自监督 3D 车辆检测

IF 3.5 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2024-05-18 DOI: 10.1016/j.image.2024.117149

He Liu, Yi Sun

The deep learning-based 3D object detection literature on monocular images has been dominated by methods that require supervision in the form of 3D bounding box annotations for training. However, obtaining sufficient 3D annotations is expensive, laborious and prone to introducing errors. To address this problem, we propose a monocular self-supervised approach towards 3D object detection relying solely on observed RGB data rather than 3D bounding boxes for training. We leverage differentiable rendering to apply visual alignment to depth maps, instance masks and point clouds for self-supervision. Furthermore, considering the complexity of autonomous driving scenes, we introduce a point cloud filter to reduce noise impact and design an automatic training set pruning strategy suitable for the self-supervised framework to further improve network performance. We provide detailed experiments on the KITTI benchmark and achieve competitive performance with existing self-supervised methods as well as some fully supervised methods.

基于深度学习的单目图像三维物体检测文献主要采用需要三维边界框注释作为训练监督的方法。然而，获取足够的三维注释既昂贵又费力，还容易引入错误。为了解决这个问题，我们提出了一种单目自监督三维物体检测方法，该方法仅依靠观察到的 RGB 数据而非三维边界框进行训练。我们利用可微分渲染技术对深度图、实例掩码和点云进行视觉对齐，从而实现自我监督。此外，考虑到自动驾驶场景的复杂性，我们引入了点云滤波器来降低噪声影响，并设计了适合自监督框架的自动训练集剪枝策略，以进一步提高网络性能。我们在 KITTI 基准上进行了详细的实验，与现有的自监督方法和一些完全监督方法相比，取得了具有竞争力的性能。

引用次数: 0