首页 > 最新文献

Signal Processing-Image Communication最新文献

英文 中文
Line segment detectors with deformable attention 具有可变形注意力的线段检测器
IF 3.4 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-06-07 DOI: 10.1016/j.image.2024.117155
Shoufeng Tang , Shuo Zhou , Xiamin Tong , Jingyuan Gao , Yuhao Wang , Xiaojun Ma

The object detectors based on Transformers are advancing rapidly. On the contrary, the development of line segment detectors is relatively slow. It is noteworthy that the object and line segments are both 2D targets. In this work, we design a line segment detection algorithm based on deformable attention. Leveraging this algorithm and the line segment loss function, we transform the object detectors, Deformable DETR and ViDT, into end-to-end line segment detectors named Deformable LETR and ViDTLE, respectively. In order to adapt the idea of sparse modeling for line segment detection, we propose a new attention mechanism named line segment deformable attention (LSDA). This mechanism focuses on the valuable positions under the guidance of reference line to refine line segments. We design an auxiliary algorithm named line segment iterative refinement for LSDA. With as few modifications as possible, we transform two object detection detectors, namely SMCA DETR and PnP DETR into competitive line segment detectors named SMCA LETR and PnP LETR, respectively. The experimental results show that the performances of the proposed methods are efficient.

基于变压器的物体检测器发展迅速。相反,线段检测器的发展则相对缓慢。值得注意的是,物体和线段都是二维目标。在这项工作中,我们设计了一种基于可变形注意力的线段检测算法。利用该算法和线段损失函数,我们将物体检测器(Deformable DETR 和 ViDT)转换为端到端线段检测器,分别命名为 Deformable LETR 和 ViDTLE。为了将稀疏建模的思想应用于线段检测,我们提出了一种名为线段可变形关注(LSDA)的新关注机制。该机制关注参考线引导下的有价值位置,以细化线段。我们为 LSDA 设计了一种名为线段迭代细化的辅助算法。通过尽可能少的修改,我们将 SMCA DETR 和 PnP DETR 这两种目标检测器分别转化为 SMCA LETR 和 PnP LETR 这两种具有竞争力的线段检测器。实验结果表明,所提出的方法性能高效。
{"title":"Line segment detectors with deformable attention","authors":"Shoufeng Tang ,&nbsp;Shuo Zhou ,&nbsp;Xiamin Tong ,&nbsp;Jingyuan Gao ,&nbsp;Yuhao Wang ,&nbsp;Xiaojun Ma","doi":"10.1016/j.image.2024.117155","DOIUrl":"10.1016/j.image.2024.117155","url":null,"abstract":"<div><p>The object detectors based on Transformers are advancing rapidly. On the contrary, the development of line segment detectors is relatively slow. It is noteworthy that the object and line segments are both 2D targets. In this work, we design a line segment detection algorithm based on deformable attention. Leveraging this algorithm and the line segment loss function, we transform the object detectors, Deformable DETR and ViDT, into end-to-end line segment detectors named Deformable LETR and ViDTLE, respectively. In order to adapt the idea of sparse modeling for line segment detection, we propose a new attention mechanism named line segment deformable attention (LSDA). This mechanism focuses on the valuable positions under the guidance of reference line to refine line segments. We design an auxiliary algorithm named line segment iterative refinement for LSDA. With as few modifications as possible, we transform two object detection detectors, namely SMCA DETR and PnP DETR into competitive line segment detectors named SMCA LETR and PnP LETR, respectively. The experimental results show that the performances of the proposed methods are efficient.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"128 ","pages":"Article 117155"},"PeriodicalIF":3.4,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141403728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quality evaluation of point cloud compression techniques 点云压缩技术的质量评估
IF 3.5 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-06-07 DOI: 10.1016/j.image.2024.117156
Joao Prazeres , Manuela Pereira , Antonio M.G. Pinheiro

A study on the quality evaluation of point clouds in the presence of coding distortions is presented. For that, four different point cloud coding solutions, notably the standardized MPEG codecs G-PCC and V-PCC, a deep learning-based coding solution RS-DLPCC, and Draco, are compared using a subjective evaluation methodology. Furthermore, several full-reference, reduced-reference and no-reference point cloud quality metrics are evaluated. Two different point cloud normal computation methods were tested for the metrics that rely on them, notably the Cloud Compare quadric fitting method with radius of five, ten, and twenty and Meshlab KNN with K six, ten, and eighteen. To generalize the results, the objective quality metrics were also benchmarked on a public database, with mean opinion scores available. To evaluate the statistical differences between the metrics, the Krasula method was employed. The Point Cloud Quality Metric reveals the best performance and a very good representation of the subjective results, as well as being the metric with the most statistically significant results. It was also revealed that the Cloud Compare quadric fitting method with radius 10 and 20 produced the most reliable normals for the metrics dependent on them. Finally, the study revealed that the most commonly used metrics fail to accurately predict the compression quality when artifacts generated by deep learning methods are present.

本文介绍了对编码失真情况下点云质量评估的研究。为此,采用主观评价方法比较了四种不同的点云编码解决方案,特别是标准化 MPEG 编解码器 G-PCC 和 V-PCC、基于深度学习的编码解决方案 RS-DLPCC 和 Draco。此外,还对几种全参考、减参考和无参考点云质量指标进行了评估。对两种不同的点云法线计算方法进行了测试,特别是半径分别为 5、10 和 20 的 Cloud Compare 四维拟合方法和 K 分别为 6、10 和 18 的 Meshlab KNN。为了推广结果,还在公共数据库中对客观质量指标进行了基准测试,并提供了平均意见分数。为了评估指标之间的统计差异,采用了 Krasula 方法。点云质量度量显示了最佳性能,很好地代表了主观结果,同时也是统计结果最显著的度量。研究还发现,半径为 10 和 20 的云比较四边形拟合方法为依赖于它们的指标产生了最可靠的法线。最后,研究表明,当深度学习方法产生伪影时,最常用的指标无法准确预测压缩质量。
{"title":"Quality evaluation of point cloud compression techniques","authors":"Joao Prazeres ,&nbsp;Manuela Pereira ,&nbsp;Antonio M.G. Pinheiro","doi":"10.1016/j.image.2024.117156","DOIUrl":"https://doi.org/10.1016/j.image.2024.117156","url":null,"abstract":"<div><p>A study on the quality evaluation of point clouds in the presence of coding distortions is presented. For that, four different point cloud coding solutions, notably the standardized MPEG codecs G-PCC and V-PCC, a deep learning-based coding solution RS-DLPCC, and Draco, are compared using a subjective evaluation methodology. Furthermore, several full-reference, reduced-reference and no-reference point cloud quality metrics are evaluated. Two different point cloud normal computation methods were tested for the metrics that rely on them, notably the Cloud Compare quadric fitting method with radius of five, ten, and twenty and Meshlab KNN with K six, ten, and eighteen. To generalize the results, the objective quality metrics were also benchmarked on a public database, with mean opinion scores available. To evaluate the statistical differences between the metrics, the Krasula method was employed. The Point Cloud Quality Metric reveals the best performance and a very good representation of the subjective results, as well as being the metric with the most statistically significant results. It was also revealed that the Cloud Compare quadric fitting method with radius 10 and 20 produced the most reliable normals for the metrics dependent on them. Finally, the study revealed that the most commonly used metrics fail to accurately predict the compression quality when artifacts generated by deep learning methods are present.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"128 ","pages":"Article 117156"},"PeriodicalIF":3.5,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141328978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
“Sparse + Low-Rank” tensor completion approach for recovering images and videos 用于恢复图像和视频的 "稀疏 + 低域 "张量补全方法
IF 3.5 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-05-24 DOI: 10.1016/j.image.2024.117152
Chenjian Pan , Chen Ling , Hongjin He , Liqun Qi , Yanwei Xu

Recovering color images and videos from highly undersampled data is a fundamental and challenging task in face recognition and computer vision. By the multi-dimensional nature of color images and videos, in this paper, we propose a novel tensor completion approach, which is able to efficiently explore the sparsity of tensor data under the discrete cosine transform (DCT). Specifically, we introduce two “sparse + low-rank” tensor completion models as well as two implementable algorithms for finding their solutions. The first one is a DCT-based sparse plus weighted nuclear norm induced low-rank minimization model. The second one is a DCT-based sparse plus p-shrinking mapping induced low-rank optimization model. Moreover, we accordingly propose two implementable augmented Lagrangian-based algorithms for solving the underlying optimization models. A series of numerical experiments including color image inpainting and video data recovery demonstrate that our proposed approach performs better than many existing state-of-the-art tensor completion methods, especially for the case when the ratio of missing data is high.

从高度采样不足的数据中恢复彩色图像和视频是人脸识别和计算机视觉领域一项基本而又具有挑战性的任务。鉴于彩色图像和视频的多维特性,我们在本文中提出了一种新颖的张量补全方法,该方法能够在离散余弦变换(DCT)下有效地探索张量数据的稀疏性。具体来说,我们引入了两种 "稀疏 + 低秩 "张量补全模型,以及两种可实现的算法来寻找它们的解决方案。第一种是基于 DCT 的稀疏加权核规范诱导低秩最小化模型。第二个是基于 DCT 的稀疏加-缩减映射诱导的低阶优化模型。此外,我们还相应地提出了两种可实现的基于增强拉格朗日的算法,用于求解底层优化模型。包括彩色图像绘制和视频数据恢复在内的一系列数值实验表明,我们提出的方法比许多现有的最先进的张量补全方法性能更好,尤其是在缺失数据比例较高的情况下。
{"title":"“Sparse + Low-Rank” tensor completion approach for recovering images and videos","authors":"Chenjian Pan ,&nbsp;Chen Ling ,&nbsp;Hongjin He ,&nbsp;Liqun Qi ,&nbsp;Yanwei Xu","doi":"10.1016/j.image.2024.117152","DOIUrl":"10.1016/j.image.2024.117152","url":null,"abstract":"<div><p>Recovering color images and videos from highly undersampled data is a fundamental and challenging task in face recognition and computer vision. By the multi-dimensional nature of color images and videos, in this paper, we propose a novel tensor completion approach, which is able to efficiently explore the sparsity of tensor data under the discrete cosine transform (DCT). Specifically, we introduce two “sparse + low-rank” tensor completion models as well as two implementable algorithms for finding their solutions. The first one is a DCT-based sparse plus weighted nuclear norm induced low-rank minimization model. The second one is a DCT-based sparse plus <span><math><mi>p</mi></math></span>-shrinking mapping induced low-rank optimization model. Moreover, we accordingly propose two implementable augmented Lagrangian-based algorithms for solving the underlying optimization models. A series of numerical experiments including color image inpainting and video data recovery demonstrate that our proposed approach performs better than many existing state-of-the-art tensor completion methods, especially for the case when the ratio of missing data is high.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"127 ","pages":"Article 117152"},"PeriodicalIF":3.5,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141192434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer based Douglas-Rachford unrolling network for compressed sensing 基于变压器的压缩传感道格拉斯-拉赫福德展开网络
IF 3.5 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-05-24 DOI: 10.1016/j.image.2024.117153
Yueming Su , Qiusheng Lian , Dan Zhang , Baoshun Shi

Compressed sensing (CS) with the binary sampling matrix is hardware-friendly and memory-saving in the signal processing field. Existing Convolutional Neural Network (CNN)-based CS methods show potential restrictions in exploiting non-local similarity and lack interpretability. In parallel, the emerging Transformer architecture performs well in modelling long-range correlations. To further improve the CS reconstruction quality from highly under-sampled CS measurements, a Transformer based deep unrolling reconstruction network abbreviated as DR-TransNet is proposed, whose design is inspired by the traditional iterative Douglas-Rachford algorithm. It combines the merits of structure insights of optimization-based methods and the speed of the network-based ones. Therein, a U-type Transformer based proximal sub-network is elaborated to reconstruct images in the wavelet domain and the spatial domain as an auxiliary mode, which aims to explore local informative details and global long-term interaction of the images. Specially, a flexible single model is trained to address the CS reconstruction with different binary CS sampling ratios. Compared with the state-of-the-art CS reconstruction methods with the binary sampling matrix, the proposed method can achieve appealing improvements in terms of Peak Signal to Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM) and visual metrics. Codes are available at https://github.com/svyueming/DR-TransNet.

在信号处理领域,采用二进制采样矩阵的压缩传感(CS)既方便硬件,又节省内存。现有的基于卷积神经网络(CNN)的压缩传感方法在利用非局部相似性方面存在潜在限制,并且缺乏可解释性。与此同时,新兴的 Transformer 架构在模拟长距离相关性方面表现出色。为了进一步提高高度采样不足的 CS 测量的 CS 重建质量,我们提出了一种基于 Transformer 的深度开卷重建网络,简称 DR-TransNet,其设计灵感来自传统的迭代 Douglas-Rachford 算法。它结合了基于优化方法的结构洞察力和基于网络方法的速度优势。其中,详细阐述了基于 U 型变换器的近端子网络,以小波域和空间域作为辅助模式重建图像,旨在探索图像的局部信息细节和全局长期交互。特别是,针对不同二元 CS 采样比的 CS 重建,训练了一个灵活的单一模型。与采用二进制采样矩阵的最先进 CS 重建方法相比,所提出的方法在峰值信噪比(PSNR)、结构相似度指数(SSIM)和视觉指标方面都取得了令人满意的改进。代码见 https://github.com/svyueming/DR-TransNet。
{"title":"Transformer based Douglas-Rachford unrolling network for compressed sensing","authors":"Yueming Su ,&nbsp;Qiusheng Lian ,&nbsp;Dan Zhang ,&nbsp;Baoshun Shi","doi":"10.1016/j.image.2024.117153","DOIUrl":"10.1016/j.image.2024.117153","url":null,"abstract":"<div><p>Compressed sensing (CS) with the binary sampling matrix is hardware-friendly and memory-saving in the signal processing field. Existing Convolutional Neural Network (CNN)-based CS methods show potential restrictions in exploiting non-local similarity and lack interpretability. In parallel, the emerging Transformer architecture performs well in modelling long-range correlations. To further improve the CS reconstruction quality from highly under-sampled CS measurements, a Transformer based deep unrolling reconstruction network abbreviated as DR-TransNet is proposed, whose design is inspired by the traditional iterative Douglas-Rachford algorithm. It combines the merits of structure insights of optimization-based methods and the speed of the network-based ones. Therein, a U-type Transformer based proximal sub-network is elaborated to reconstruct images in the wavelet domain and the spatial domain as an auxiliary mode, which aims to explore local informative details and global long-term interaction of the images. Specially, a flexible single model is trained to address the CS reconstruction with different binary CS sampling ratios. Compared with the state-of-the-art CS reconstruction methods with the binary sampling matrix, the proposed method can achieve appealing improvements in terms of Peak Signal to Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM) and visual metrics. Codes are available at <span>https://github.com/svyueming/DR-TransNet</span><svg><path></path></svg>.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"127 ","pages":"Article 117153"},"PeriodicalIF":3.5,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141143653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reinforced Res-Unet transformer for underwater image enhancement 用于水下图像增强的增强型 Res-Unet 变压器
IF 3.5 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-05-22 DOI: 10.1016/j.image.2024.117154
Peitong Li , Jiaying Chen , Chengtao Cai

Light propagation through water is subject to varying degrees of energy loss, causing captured images to display characteristics of color distortion, reduced contrast, and indistinct details and textures. The data-driven approach offers significant advantages over traditional algorithms, such as improved accuracy and reduced computational costs. However, challenges such as optimizing network architecture, refining coding techniques, and expanding database resources must be addressed to ensure the generation of high-quality reconstructed images across diverse tasks. In this paper, an underwater image enhancement network based on feature fusion is proposed named RUTUIE, which integrates feature fusion techniques. It leverages the strengths of both Resnet and U-shape architecture, primarily structured around a streamlined up-and-down sampling mechanism. Specifically, the U-shaped structure serves as the backbone of ResNet, equipped with two feature transformers at both the encoding and decoding ends, which are linked by a single-stage up-and-down sampling structure. This architecture is designed to minimize the omission of minor features during feature scale transformations. Furthermore, the improved Transformer encoder leverages a feature-level attention mechanism and the advantages of CNNs, endowing the network with both local and global perceptual capabilities. Then, we propose and demonstrate that embedding an adaptive feature selection module at appropriate locations can retain more learned feature representations. Moreover, the application of a previously proposed color transfer method for synthesizing underwater images and augmenting network training. Extensive experiments demonstrate that our work effectively corrects color casts, reconstructs the rich texture information in natural scenes, and improves the contrast.

光线在水中传播时会产生不同程度的能量损失,导致拍摄到的图像显示出色彩失真、对比度降低、细节和纹理不清晰等特征。与传统算法相比,数据驱动方法具有显著优势,如提高精确度和降低计算成本。然而,要确保在各种任务中生成高质量的重建图像,必须应对优化网络架构、完善编码技术和扩展数据库资源等挑战。本文提出了一种基于特征融合的水下图像增强网络,名为 RUTUIE,它集成了特征融合技术。它充分利用了 Resnet 和 U 型结构的优势,主要围绕精简的上下采样机制构建。具体来说,U 型结构作为 ResNet 的骨干,在编码和解码两端配备了两个特征变换器,并通过单级上下采样结构将其连接起来。这种结构旨在最大限度地减少在特征比例转换过程中对次要特征的遗漏。此外,改进后的 Transformer 编码器利用了特征级关注机制和 CNN 的优势,使网络同时具备局部和全局感知能力。然后,我们提出并证明了在适当位置嵌入自适应特征选择模块可以保留更多已学特征表征。此外,我们还将之前提出的色彩转移方法应用于合成水下图像和增强网络训练。大量实验证明,我们的工作能有效纠正偏色,重建自然场景中丰富的纹理信息,并提高对比度。
{"title":"Reinforced Res-Unet transformer for underwater image enhancement","authors":"Peitong Li ,&nbsp;Jiaying Chen ,&nbsp;Chengtao Cai","doi":"10.1016/j.image.2024.117154","DOIUrl":"10.1016/j.image.2024.117154","url":null,"abstract":"<div><p>Light propagation through water is subject to varying degrees of energy loss, causing captured images to display characteristics of color distortion, reduced contrast, and indistinct details and textures. The data-driven approach offers significant advantages over traditional algorithms, such as improved accuracy and reduced computational costs. However, challenges such as optimizing network architecture, refining coding techniques, and expanding database resources must be addressed to ensure the generation of high-quality reconstructed images across diverse tasks. In this paper, an underwater image enhancement network based on feature fusion is proposed named RUTUIE, which integrates feature fusion techniques. It leverages the strengths of both Resnet and U-shape architecture, primarily structured around a streamlined up-and-down sampling mechanism. Specifically, the U-shaped structure serves as the backbone of ResNet, equipped with two feature transformers at both the encoding and decoding ends, which are linked by a single-stage up-and-down sampling structure. This architecture is designed to minimize the omission of minor features during feature scale transformations. Furthermore, the improved Transformer encoder leverages a feature-level attention mechanism and the advantages of CNNs, endowing the network with both local and global perceptual capabilities. Then, we propose and demonstrate that embedding an adaptive feature selection module at appropriate locations can retain more learned feature representations. Moreover, the application of a previously proposed color transfer method for synthesizing underwater images and augmenting network training. Extensive experiments demonstrate that our work effectively corrects color casts, reconstructs the rich texture information in natural scenes, and improves the contrast.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"127 ","pages":"Article 117154"},"PeriodicalIF":3.5,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141140727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-supervised 3D vehicle detection based on monocular images 基于单目图像的自监督 3D 车辆检测
IF 3.5 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-05-18 DOI: 10.1016/j.image.2024.117149
He Liu, Yi Sun

The deep learning-based 3D object detection literature on monocular images has been dominated by methods that require supervision in the form of 3D bounding box annotations for training. However, obtaining sufficient 3D annotations is expensive, laborious and prone to introducing errors. To address this problem, we propose a monocular self-supervised approach towards 3D object detection relying solely on observed RGB data rather than 3D bounding boxes for training. We leverage differentiable rendering to apply visual alignment to depth maps, instance masks and point clouds for self-supervision. Furthermore, considering the complexity of autonomous driving scenes, we introduce a point cloud filter to reduce noise impact and design an automatic training set pruning strategy suitable for the self-supervised framework to further improve network performance. We provide detailed experiments on the KITTI benchmark and achieve competitive performance with existing self-supervised methods as well as some fully supervised methods.

基于深度学习的单目图像三维物体检测文献主要采用需要三维边界框注释作为训练监督的方法。然而,获取足够的三维注释既昂贵又费力,还容易引入错误。为了解决这个问题,我们提出了一种单目自监督三维物体检测方法,该方法仅依靠观察到的 RGB 数据而非三维边界框进行训练。我们利用可微分渲染技术对深度图、实例掩码和点云进行视觉对齐,从而实现自我监督。此外,考虑到自动驾驶场景的复杂性,我们引入了点云滤波器来降低噪声影响,并设计了适合自监督框架的自动训练集剪枝策略,以进一步提高网络性能。我们在 KITTI 基准上进行了详细的实验,与现有的自监督方法和一些完全监督方法相比,取得了具有竞争力的性能。
{"title":"Self-supervised 3D vehicle detection based on monocular images","authors":"He Liu,&nbsp;Yi Sun","doi":"10.1016/j.image.2024.117149","DOIUrl":"10.1016/j.image.2024.117149","url":null,"abstract":"<div><p>The deep learning-based 3D object detection literature on monocular images has been dominated by methods that require supervision in the form of 3D bounding box annotations for training. However, obtaining sufficient 3D annotations is expensive, laborious and prone to introducing errors. To address this problem, we propose a monocular self-supervised approach towards 3D object detection relying solely on observed RGB data rather than 3D bounding boxes for training. We leverage differentiable rendering to apply visual alignment to depth maps, instance masks and point clouds for self-supervision. Furthermore, considering the complexity of autonomous driving scenes, we introduce a point cloud filter to reduce noise impact and design an automatic training set pruning strategy suitable for the self-supervised framework to further improve network performance. We provide detailed experiments on the KITTI benchmark and achieve competitive performance with existing self-supervised methods as well as some fully supervised methods.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"127 ","pages":"Article 117149"},"PeriodicalIF":3.5,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141130565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HorSR: High-order spatial interactions and residual global filter for efficient image super-resolution HorSR:用于高效图像超分辨率的高阶空间相互作用和残差全局滤波器
IF 3.5 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-05-18 DOI: 10.1016/j.image.2024.117148
Fengsui Wang , Xi Chu

Recent advances in efficient image super-resolution (EISR) include convolutional neural networks, which exploit distillation and aggregation strategies with copious channel split and concatenation operations to fully exploit limited hierarchical features. In contrast, the Transformer network presents a challenge for EISR because multiheaded self-attention is a computationally demanding process. To respond to this challenge, this paper proposes replacing multiheaded self-attention in the Transformer network with global filtering and recursive gated convolution. This strategy allows us to design a high-order spatial interaction and residual global filter network for efficient image super-resolution (HorSR), which comprises three components: a shallow feature extraction module, a deep feature extraction module, and a high-quality image-reconstruction module. In particular, the deep feature extraction module comprises residual global filtering and recursive gated convolution blocks. The experimental results show that the HorSR network provides state-of-the-art performance with the lowest FLOPs of existing EISR methods.

高效图像超分辨率(EISR)的最新进展包括卷积神经网络,它利用蒸馏和聚合策略,通过大量的通道分割和连接操作,充分利用有限的层次特征。相比之下,Transformer 网络对 EISR 提出了挑战,因为多头自注意是一个计算要求很高的过程。为了应对这一挑战,本文提出用全局滤波和递归门控卷积取代 Transformer 网络中的多头自注意。通过这一策略,我们设计出了用于高效图像超分辨率(HorSR)的高阶空间交互和残差全局滤波网络,它由三个部分组成:浅层特征提取模块、深层特征提取模块和高质量图像重建模块。其中,深度特征提取模块由残差全局滤波和递归门控卷积块组成。实验结果表明,在现有的 EISR 方法中,HorSR 网络以最低的 FLOPs 提供了最先进的性能。
{"title":"HorSR: High-order spatial interactions and residual global filter for efficient image super-resolution","authors":"Fengsui Wang ,&nbsp;Xi Chu","doi":"10.1016/j.image.2024.117148","DOIUrl":"10.1016/j.image.2024.117148","url":null,"abstract":"<div><p>Recent advances in efficient image super-resolution (EISR) include convolutional neural networks, which exploit distillation and aggregation strategies with copious channel split and concatenation operations to fully exploit limited hierarchical features. In contrast, the Transformer network presents a challenge for EISR because multiheaded self-attention is a computationally demanding process. To respond to this challenge, this paper proposes replacing multiheaded self-attention in the Transformer network with global filtering and recursive gated convolution. This strategy allows us to design a high-order spatial interaction and residual global filter network for efficient image super-resolution (HorSR), which comprises three components: a shallow feature extraction module, a deep feature extraction module, and a high-quality image-reconstruction module. In particular, the deep feature extraction module comprises residual global filtering and recursive gated convolution blocks. The experimental results show that the HorSR network provides state-of-the-art performance with the lowest FLOPs of existing EISR methods.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"127 ","pages":"Article 117148"},"PeriodicalIF":3.5,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141144376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking deep models on retinal fundus disease diagnosis and a large-scale dataset 视网膜眼底疾病诊断深度模型和大规模数据集的基准测试
IF 3.5 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-05-14 DOI: 10.1016/j.image.2024.117151
Xue Xia , Ying Li , Guobei Xiao , Kun Zhan , Jinhua Yan , Chao Cai , Yuming Fang , Guofu Huang

Retinal fundus imaging contributes to monitoring the vision of patients by providing views of the interior surface of the eyes. Machine learning models greatly aided ophthalmologists in detecting retinal disorders from color fundus images. Hence, the quality of the data is pivotal for enhancing diagnosis algorithms, which ultimately benefits vision care and maintenance. To facilitate further research in this domain, we introduce the Eye Disease Diagnosis and Fundus Synthesis (EDDFS) dataset, comprising 28,877 fundus images. These include 15,000 healthy samples and a diverse range of images depicting various disorders such as diabetic retinopathy, age-related macular degeneration, glaucoma, pathological myopia, hypertension retinopathy, retinal vein occlusion, and Laser photocoagulation. In addition to providing the dataset, we propose a Transformer-joint convolution network for automated eye disease screening. Firstly, a co-attention structure is integrated to capture long-range attention information along with local features. Secondly, a cross-stage feature fusion module is designed to extract multi-level and disease-related information. By leveraging the dataset and our proposed network, we establish benchmarks for disease screening and grading tasks. Our experimental results underscore the network’s proficiency in both multi-label and single-label disease diagnosis, while also showcasing the dataset’s capability in supporting fundus synthesis. (The dataset and code will be available on https://github.com/xia-xx-cv/EDDFS_dataset.)

视网膜眼底成像通过提供眼睛内部表面的视图,有助于监测患者的视力。机器学习模型极大地帮助了眼科医生从彩色眼底图像中检测视网膜疾病。因此,数据质量对于增强诊断算法至关重要,最终有利于视力保健和维护。为了促进该领域的进一步研究,我们引入了眼疾诊断与眼底合成(EDDFS)数据集,该数据集由 28,877 张眼底图像组成。其中包括 15,000 个健康样本和各种疾病图像,如糖尿病视网膜病变、老年性黄斑变性、青光眼、病理性近视、高血压视网膜病变、视网膜静脉闭塞和激光光凝。除了提供数据集之外,我们还提出了一种用于自动眼病筛查的变换器-关节卷积网络。首先,我们整合了共同注意结构,以捕捉长距离注意信息和局部特征。其次,我们设计了一个跨阶段特征融合模块,以提取多层次的疾病相关信息。通过利用数据集和我们提出的网络,我们建立了疾病筛查和分级任务的基准。我们的实验结果表明了该网络在多标签和单标签疾病诊断方面的能力,同时也展示了该数据集在支持眼底合成方面的能力。(数据集和代码将在 https://github.com/xia-xx-cv/EDDFS_dataset 上提供)。
{"title":"Benchmarking deep models on retinal fundus disease diagnosis and a large-scale dataset","authors":"Xue Xia ,&nbsp;Ying Li ,&nbsp;Guobei Xiao ,&nbsp;Kun Zhan ,&nbsp;Jinhua Yan ,&nbsp;Chao Cai ,&nbsp;Yuming Fang ,&nbsp;Guofu Huang","doi":"10.1016/j.image.2024.117151","DOIUrl":"10.1016/j.image.2024.117151","url":null,"abstract":"<div><p>Retinal fundus imaging contributes to monitoring the vision of patients by providing views of the interior surface of the eyes. Machine learning models greatly aided ophthalmologists in detecting retinal disorders from color fundus images. Hence, the quality of the data is pivotal for enhancing diagnosis algorithms, which ultimately benefits vision care and maintenance. To facilitate further research in this domain, we introduce the Eye Disease Diagnosis and Fundus Synthesis (EDDFS) dataset, comprising 28,877 fundus images. These include 15,000 healthy samples and a diverse range of images depicting various disorders such as diabetic retinopathy, age-related macular degeneration, glaucoma, pathological myopia, hypertension retinopathy, retinal vein occlusion, and Laser photocoagulation. In addition to providing the dataset, we propose a Transformer-joint convolution network for automated eye disease screening. Firstly, a co-attention structure is integrated to capture long-range attention information along with local features. Secondly, a cross-stage feature fusion module is designed to extract multi-level and disease-related information. By leveraging the dataset and our proposed network, we establish benchmarks for disease screening and grading tasks. Our experimental results underscore the network’s proficiency in both multi-label and single-label disease diagnosis, while also showcasing the dataset’s capability in supporting fundus synthesis. (<em>The dataset and code will be available on</em> <span>https://github.com/xia-xx-cv/EDDFS_dataset</span><svg><path></path></svg>.)</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"127 ","pages":"Article 117151"},"PeriodicalIF":3.5,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0923596524000523/pdfft?md5=243413eb9a59b33a94d00c614393aaf1&pid=1-s2.0-S0923596524000523-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141038978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep multi-scale feature mixture model for image super-resolution with multiple-focal-length degradation 用于多焦距退化图像超分辨率的深度多尺度特征混合模型
IF 3.5 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-05-10 DOI: 10.1016/j.image.2024.117139
Jun Xiao , Qian Ye , Rui Zhao , Kin-Man Lam , Kao Wan

Single image super-resolution is a classic problem in computer vision. In recent years, deep learning-based models have achieved unprecedented success with this problem. However, most existing deep super-resolution models unavoidably produce degraded results when applied to real-world images captured by cameras with different focal lengths. The degradation in these images is called multiple-focal-length degradation, which is spatially variant and more complicated than the bicubic downsampling degradation. To address such a challenging issue, we propose a multi-scale feature mixture model in this paper. The proposed model can intensively exploit local patterns from different scales for image super-resolution. To improve the performance, we further propose a novel loss function based on the Laplacian pyramid, which guides the model to recover the information separately of different frequency subbands. Comprehensive experiments show that our proposed model has a better ability to preserve the structure of objects and generate high-quality images, leading to the best performance compared with other state-of-the-art deep single image super-resolution methods.

单图像超分辨率是计算机视觉领域的一个经典问题。近年来,基于深度学习的模型在这一问题上取得了前所未有的成功。然而,大多数现有的深度超分辨率模型在应用于由不同焦距的相机拍摄的真实世界图像时,不可避免地会产生降级结果。这些图像中的劣化现象被称为多焦距劣化,它在空间上存在差异,而且比双三次降采样劣化更为复杂。为了解决这个具有挑战性的问题,我们在本文中提出了一种多尺度特征混合模型。该模型可以充分利用不同尺度的局部模式来实现图像超分辨率。为了提高模型的性能,我们进一步提出了一种基于拉普拉斯金字塔的新型损失函数,引导模型分别恢复不同频率子带的信息。综合实验表明,我们提出的模型能更好地保留物体的结构并生成高质量的图像,与其他最先进的深度单图像超分辨率方法相比,性能最佳。
{"title":"Deep multi-scale feature mixture model for image super-resolution with multiple-focal-length degradation","authors":"Jun Xiao ,&nbsp;Qian Ye ,&nbsp;Rui Zhao ,&nbsp;Kin-Man Lam ,&nbsp;Kao Wan","doi":"10.1016/j.image.2024.117139","DOIUrl":"10.1016/j.image.2024.117139","url":null,"abstract":"<div><p>Single image super-resolution is a classic problem in computer vision. In recent years, deep learning-based models have achieved unprecedented success with this problem. However, most existing deep super-resolution models unavoidably produce degraded results when applied to real-world images captured by cameras with different focal lengths. The degradation in these images is called multiple-focal-length degradation, which is spatially variant and more complicated than the bicubic downsampling degradation. To address such a challenging issue, we propose a multi-scale feature mixture model in this paper. The proposed model can intensively exploit local patterns from different scales for image super-resolution. To improve the performance, we further propose a novel loss function based on the Laplacian pyramid, which guides the model to recover the information separately of different frequency subbands. Comprehensive experiments show that our proposed model has a better ability to preserve the structure of objects and generate high-quality images, leading to the best performance compared with other state-of-the-art deep single image super-resolution methods.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"127 ","pages":"Article 117139"},"PeriodicalIF":3.5,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141028694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weakly supervised instance segmentation via class double-activation maps and boundary localization 通过类双激活图和边界定位进行弱监督实例分割
IF 3.5 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-05-10 DOI: 10.1016/j.image.2024.117150
Jin Peng, Yongxiong Wang, Zhiqun Pan

Weakly supervised instance segmentation based on image-level class labels has recently gained much attention, in which the primary key step is to generate the pseudo labels based on class activation maps (CAMs). Most methods adopt binary cross-entropy (BCE) loss to train the classification model. However, since BCE loss is not class mutually exclusive, activations among classes occur independently. Thus, not only do foreground classes are wrongly activated as background, but also incorrect activations among confusing classes are occurred in the foreground. To solve this problem, we propose the Class Double-Activation Map, called Double-CAM. Firstly, the vanilla CAM is extracted from the multi-label classifier and then fused with the output feature map of backbone. The enhanced feature map of each class is fed into the single-label classification branch with softmax cross-entropy (SCE) loss and entropy minimization module, from which the more accurate Double-CAM is extracted. It refines the vanilla CAM to improve the quality of pseudo labels. Secondly, to mine object edge cues from Double-CAM, we propose the Boundary Localization (BL) module to synthesize boundary annotations, so as to provide constraints for label propagation more explicitly without adding additional supervision. The quality of pseudo masks is also improved substantially with the addition of BL module. Finally, the generated pseudo labels are used to train fully supervised instance segmentation networks. The evaluations on VOC and COCO datasets show that our method achieves excellent performance, outperforming mainstream weakly supervised segmentation methods at the same supervisory level, even those that depend on stronger supervision.

基于图像级类标签的弱监督实例分割最近备受关注,其中的主要关键步骤是根据类激活图(CAM)生成伪标签。大多数方法采用二元交叉熵(BCE)损失来训练分类模型。然而,由于 BCE 损失不是类间互斥的,类间的激活是独立发生的。因此,不仅前景类被错误地激活为背景类,而且混淆类之间的错误激活也会发生在前景类中。为了解决这个问题,我们提出了 "类双激活图"(Class Double-Activation Map),简称 "双激活图"(Double-CAM)。首先,从多标签分类器中提取 vanilla CAM,然后与骨干输出特征图融合。每个类别的增强特征图被送入带有软最大交叉熵(SCE)损失和熵最小化模块的单标签分类分支,从中提取出更精确的 Double-CAM。它完善了 vanilla CAM,提高了伪标签的质量。其次,为了从 Double-CAM 中挖掘物体边缘线索,我们提出了边界定位(BL)模块来合成边界注释,从而在不增加额外监督的情况下为标签传播提供更明确的约束。加入 BL 模块后,伪掩码的质量也得到了大幅提高。最后,生成的伪标签被用于训练完全监督的实例分割网络。在 VOC 和 COCO 数据集上的评估结果表明,我们的方法取得了优异的性能,在相同监督级别下优于主流的弱监督分割方法,甚至优于那些依赖于更强监督的方法。
{"title":"Weakly supervised instance segmentation via class double-activation maps and boundary localization","authors":"Jin Peng,&nbsp;Yongxiong Wang,&nbsp;Zhiqun Pan","doi":"10.1016/j.image.2024.117150","DOIUrl":"10.1016/j.image.2024.117150","url":null,"abstract":"<div><p>Weakly supervised instance segmentation based on image-level class labels has recently gained much attention, in which the primary key step is to generate the pseudo labels based on class activation maps (CAMs). Most methods adopt binary cross-entropy (BCE) loss to train the classification model. However, since BCE loss is not class mutually exclusive, activations among classes occur independently. Thus, not only do foreground classes are wrongly activated as background, but also incorrect activations among confusing classes are occurred in the foreground. To solve this problem, we propose the Class Double-Activation Map, called Double-CAM. Firstly, the vanilla CAM is extracted from the multi-label classifier and then fused with the output feature map of backbone. The enhanced feature map of each class is fed into the single-label classification branch with softmax cross-entropy (SCE) loss and entropy minimization module, from which the more accurate Double-CAM is extracted. It refines the vanilla CAM to improve the quality of pseudo labels. Secondly, to mine object edge cues from Double-CAM, we propose the Boundary Localization (BL) module to synthesize boundary annotations, so as to provide constraints for label propagation more explicitly without adding additional supervision. The quality of pseudo masks is also improved substantially with the addition of BL module. Finally, the generated pseudo labels are used to train fully supervised instance segmentation networks. The evaluations on VOC and COCO datasets show that our method achieves excellent performance, outperforming mainstream weakly supervised segmentation methods at the same supervisory level, even those that depend on stronger supervision.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"127 ","pages":"Article 117150"},"PeriodicalIF":3.5,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141030265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Signal Processing-Image Communication
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1