IEEE Transactions on Image Processing最新文献

英文中文

Robust Low-Rank Tensor Minimization via a New Tensor Spectral k-Support Norm. 通过新的张量光谱 k 支持规范实现稳健的低张量最小化

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-10-15 DOI: 10.1109/TIP.2019.2946445

Jian Lou, Yiu-Ming Cheung

Recently, based on a new tensor algebraic framework for third-order tensors, the tensor singular value decomposition (t-SVD) and its associated tubal rank definition have shed new light on low-rank tensor modeling. Its applications to robust image/video recovery and background modeling show promising performance due to its superior capability in modeling cross-channel/frame information. Under the t-SVD framework, we propose a new tensor norm called tensor spectral k-support norm (TSP-k) by an alternative convex relaxation. As an interpolation between the existing tensor nuclear norm (TNN) and tensor Frobenius norm (TFN), it is able to simultaneously drive minor singular values to zero to induce low-rankness, and to capture more global information for better preserving intrinsic structure. We provide the proximal operator and the polar operator for the TSP-k norm as key optimization blocks, along with two showcase optimization algorithms for medium-and large-size tensors. Experiments on synthetic, image and video datasets in medium and large sizes, all verify the superiority of the TSP-k norm and the effectiveness of both optimization methods in comparison with the existing counterparts.

最近，基于一种新的三阶张量代数框架，张量奇异值分解（t-SVD）及其相关的管阶定义为低阶张量建模带来了新的启示。由于其在跨信道/帧信息建模方面的卓越能力，它在鲁棒图像/视频恢复和背景建模方面的应用显示出良好的性能。在 t-SVD 框架下，我们通过另一种凸松弛方法提出了一种新的张量规范，称为张量谱 k 支持规范（TSP-k）。作为现有的张量核规范（TNN）和张量弗罗贝尼斯规范（TFN）之间的一种插值，它能同时将次要奇异值归零以诱导低rankness，并捕获更多全局信息以更好地保留内在结构。我们提供了 TSP-k 规范的近算子和极算子作为关键的优化模块，以及两种针对中型和大型张量的展示优化算法。在中型和大型合成、图像和视频数据集上进行的实验都验证了 TSP-k 准则的优越性，以及这两种优化方法与现有同类方法相比的有效性。

{"title":"Robust Low-Rank Tensor Minimization via a New Tensor Spectral k-Support Norm.","authors":"Jian Lou, Yiu-Ming Cheung","doi":"10.1109/TIP.2019.2946445","DOIUrl":"10.1109/TIP.2019.2946445","url":null,"abstract":"Recently, based on a new tensor algebraic framework for third-order tensors, the tensor singular value decomposition (t-SVD) and its associated tubal rank definition have shed new light on low-rank tensor modeling. Its applications to robust image/video recovery and background modeling show promising performance due to its superior capability in modeling cross-channel/frame information. Under the t-SVD framework, we propose a new tensor norm called tensor spectral k-support norm (TSP-k) by an alternative convex relaxation. As an interpolation between the existing tensor nuclear norm (TNN) and tensor Frobenius norm (TFN), it is able to simultaneously drive minor singular values to zero to induce low-rankness, and to capture more global information for better preserving intrinsic structure. We provide the proximal operator and the polar operator for the TSP-k norm as key optimization blocks, along with two showcase optimization algorithms for medium-and large-size tensors. Experiments on synthetic, image and video datasets in medium and large sizes, all verify the superiority of the TSP-k norm and the effectiveness of both optimization methods in comparison with the existing counterparts.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Graph Sequence Recurrent Neural Network for Vision-based Freezing of Gait Detection. 基于视觉的冻结步态检测的图序列循环神经网络

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-10-15 DOI: 10.1109/TIP.2019.2946469

Kun Hu, Zhiyong Wang, Wei Wang, Kaylena A Ehgoetz Martens, Liang Wang, Tieniu Tan, Simon J G Lewis, David Dagan Feng

Freezing of gait (FoG) is one of the most common symptoms of Parkinson's disease (PD), a neurodegenerative disorder which impacts millions of people around the world. Accurate assessment of FoG is critical for the management of PD and to evaluate the efficacy of treatments. Currently, the assessment of FoG requires well-trained experts to perform time-consuming annotations via vision-based observations. Thus, automatic FoG detection algorithms are needed. In this study, we formulate vision-based FoG detection, as a fine-grained graph sequence modelling task, by representing the anatomic joints in each temporal segment with a directed graph, since FoG events can be observed through the motion patterns of joints. A novel deep learning method is proposed, namely graph sequence recurrent neural network (GS-RNN), to characterize the FoG patterns by devising graph recurrent cells, which take graph sequences of dynamic structures as inputs. For the cases of which prior edge annotations are not available, a data-driven based adjacency estimation method is further proposed. To the best of our knowledge, this is one of the first studies on vision-based FoG detection using deep neural networks designed for graph sequences of dynamic structures. Experimental results on more than 150 videos collected from 45 patients demonstrated promising performance of the proposed GS-RNN for FoG detection with an AUC value of 0.90.

步态冻结（FoG）是帕金森病（PD）最常见的症状之一，帕金森病是一种影响全球数百万人的神经退行性疾病。准确评估 FoG 对于帕金森病的管理和评估治疗效果至关重要。目前，FoG 评估需要训练有素的专家通过视觉观察进行耗时的注释。因此，我们需要自动 FoG 检测算法。在本研究中，我们将基于视觉的 FoG 检测表述为细粒度图序列建模任务，通过有向图表示每个时间片段中的解剖关节，因为 FoG 事件可以通过关节的运动模式观察到。本文提出了一种新颖的深度学习方法，即图序列递归神经网络（GS-RNN），通过设计以动态结构图序列为输入的图递归单元来表征 FoG 模式。针对没有先验边注释的情况，进一步提出了基于数据驱动的邻接估计方法。据我们所知，这是利用专为动态结构图序列设计的深度神经网络进行基于视觉的 FoG 检测的首批研究之一。对从 45 名患者身上收集的 150 多段视频进行的实验结果表明，所提出的 GS-RNN 在 FoG 检测方面表现出色，AUC 值达到 0.90。

{"title":"Graph Sequence Recurrent Neural Network for Vision-based Freezing of Gait Detection.","authors":"Kun Hu, Zhiyong Wang, Wei Wang, Kaylena A Ehgoetz Martens, Liang Wang, Tieniu Tan, Simon J G Lewis, David Dagan Feng","doi":"10.1109/TIP.2019.2946469","DOIUrl":"10.1109/TIP.2019.2946469","url":null,"abstract":"Freezing of gait (FoG) is one of the most common symptoms of Parkinson's disease (PD), a neurodegenerative disorder which impacts millions of people around the world. Accurate assessment of FoG is critical for the management of PD and to evaluate the efficacy of treatments. Currently, the assessment of FoG requires well-trained experts to perform time-consuming annotations via vision-based observations. Thus, automatic FoG detection algorithms are needed. In this study, we formulate vision-based FoG detection, as a fine-grained graph sequence modelling task, by representing the anatomic joints in each temporal segment with a directed graph, since FoG events can be observed through the motion patterns of joints. A novel deep learning method is proposed, namely graph sequence recurrent neural network (GS-RNN), to characterize the FoG patterns by devising graph recurrent cells, which take graph sequences of dynamic structures as inputs. For the cases of which prior edge annotations are not available, a data-driven based adjacency estimation method is further proposed. To the best of our knowledge, this is one of the first studies on vision-based FoG detection using deep neural networks designed for graph sequences of dynamic structures. Experimental results on more than 150 videos collected from 45 patients demonstrated promising performance of the proposed GS-RNN for FoG detection with an AUC value of 0.90.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Blind Quality Metric of DIBR-Synthesized Images in the Discrete Wavelet Transform Domain. 离散小波变换域中 DIBR 合成图像的盲质量度量。

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-10-10 DOI: 10.1109/TIP.2019.2945675

Guangcheng Wang, Zhongyuan Wang, Ke Gu, Leida Li, Zhifang Xia, Lifang Wu

Free viewpoint video (FVV) has received considerable attention owing to its widespread applications in several areas such as immersive entertainment, remote surveillance and distanced education. Since FVV images are synthesized via a depth image-based rendering (DIBR) procedure in the "blind" environment (without reference images), a real-time and reliable blind quality assessment metric is urgently required. However, the existing image quality assessment metrics are insensitive to the geometric distortions engendered by DIBR. In this research, a novel blind method of DIBR-synthesized images is proposed based on measuring geometric distortion, global sharpness and image complexity. First, a DIBR-synthesized image is decomposed into wavelet subbands by using discrete wavelet transform. Then, the Canny operator is employed to detect the edges of the binarized low-frequency subband and high-frequency subbands. The edge similarities between the binarized low-frequency subband and high-frequency subbands are further computed to quantify geometric distortions in DIBR-synthesized images. Second, the log-energies of wavelet subbands are calculated to evaluate global sharpness in DIBR-synthesized images. Third, a hybrid filter combining the autoregressive and bilateral filters is adopted to compute image complexity. Finally, the overall quality score is derived to normalize geometric distortion and global sharpness by the image complexity. Experiments show that our proposed quality method is superior to the competing reference-free state-of-the-art DIBR-synthesized image quality models.

自由视角视频（FVV）因其在沉浸式娱乐、远程监控和远程教育等多个领域的广泛应用而备受关注。由于 FVV 图像是在 "盲 "环境（无参考图像）下通过基于深度图像的渲染（DIBR）程序合成的，因此迫切需要一种实时可靠的盲点质量评估指标。然而，现有的图像质量评估指标对 DIBR 产生的几何失真不敏感。本研究提出了一种基于测量几何失真、全局清晰度和图像复杂度的 DIBR 合成图像盲法。首先，利用离散小波变换将 DIBR 合成图像分解成小波子带。然后，利用 Canny 算子检测二值化的低频子带和高频子带的边缘。进一步计算二值化低频子带和高频子带之间的边缘相似度，以量化 DIBR 合成图像中的几何失真。第二，计算小波子带的对数能量，以评估 DIBR 合成图像的全局清晰度。第三，采用自回归滤波器和双边滤波器相结合的混合滤波器来计算图像复杂度。最后，通过图像复杂度对几何失真和全局清晰度进行归一化处理，得出总体质量得分。实验表明，我们提出的质量方法优于同类无参考的最先进 DIBR 合成图像质量模型。

{"title":"Blind Quality Metric of DIBR-Synthesized Images in the Discrete Wavelet Transform Domain.","authors":"Guangcheng Wang, Zhongyuan Wang, Ke Gu, Leida Li, Zhifang Xia, Lifang Wu","doi":"10.1109/TIP.2019.2945675","DOIUrl":"10.1109/TIP.2019.2945675","url":null,"abstract":"Free viewpoint video (FVV) has received considerable attention owing to its widespread applications in several areas such as immersive entertainment, remote surveillance and distanced education. Since FVV images are synthesized via a depth image-based rendering (DIBR) procedure in the \"blind\" environment (without reference images), a real-time and reliable blind quality assessment metric is urgently required. However, the existing image quality assessment metrics are insensitive to the geometric distortions engendered by DIBR. In this research, a novel blind method of DIBR-synthesized images is proposed based on measuring geometric distortion, global sharpness and image complexity. First, a DIBR-synthesized image is decomposed into wavelet subbands by using discrete wavelet transform. Then, the Canny operator is employed to detect the edges of the binarized low-frequency subband and high-frequency subbands. The edge similarities between the binarized low-frequency subband and high-frequency subbands are further computed to quantify geometric distortions in DIBR-synthesized images. Second, the log-energies of wavelet subbands are calculated to evaluate global sharpness in DIBR-synthesized images. Third, a hybrid filter combining the autoregressive and bilateral filters is adopted to compute image complexity. Finally, the overall quality score is derived to normalize geometric distortion and global sharpness by the image complexity. Experiments show that our proposed quality method is superior to the competing reference-free state-of-the-art DIBR-synthesized image quality models.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Parallax Tolerant Light Field Stitching for Hand-held Plenoptic Cameras. 手持式全光学相机的视差容限光场拼接。

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-10-10 DOI: 10.1109/TIP.2019.2945687

Xin Jin, Pei Wang, Qionghai Dai

Light field (LF) stitching is a potential solution to improve the field of view (FOV) for hand-held plenoptic cameras. Existing LF stitching methods cannot provide accurate registration for scenes with large depth variation. In this paper, a novel LF stitching method is proposed to handle parallax in the LFs more flexibly and accurately. First, a depth layer map (DLM) is proposed to guarantee adequate feature points on each depth layer. For the regions of nondeterministic depth, superpixel layer map (SLM) is proposed based on LF spatial correlation analysis to refine the depth layer assignments. Then, DLM-SLM-based LF registration is proposed to derive the location dependent homography transforms accurately and to warp LFs to its corresponding position without parallax interference. 4D graph-cut is further applied to fuse the registration results for higher LF spatial continuity and angular continuity. Horizontal, vertical and multi-LF stitching are tested for different scenes, which demonstrates the superior performance provided by the proposed method in terms of subjective quality of the stitched LFs, epipolar plane image consistency in the stitched LF, and perspective-averaged correlation between the stitched LF and the input LFs.

光场（LF）拼接是改善手持式全光学摄像机视场（FOV）的一种潜在解决方案。现有的光场拼接方法无法为深度变化较大的场景提供精确的配准。本文提出了一种新颖的 LF 拼接方法，可以更灵活、更准确地处理 LF 中的视差。首先，本文提出了一个深度图层图（DLM），以保证每个深度图层上都有足够的特征点。对于深度不确定的区域，提出了基于 LF 空间相关性分析的超像素层图（SLM），以细化深度层的分配。然后，提出了基于 DLM-SLM 的 LF 配准方法，以准确推导出与位置相关的同构变换，并在没有视差干扰的情况下将 LF warp 到相应的位置。进一步应用 4D 图形切割来融合配准结果，以获得更高的 LF 空间连续性和角度连续性。对不同场景进行了水平、垂直和多 LF 拼接测试，结果表明，在拼接 LF 的主观质量、拼接 LF 的外极面图像一致性以及拼接 LF 与输入 LF 之间的透视平均相关性等方面，所提出的方法都具有卓越的性能。

{"title":"Parallax Tolerant Light Field Stitching for Hand-held Plenoptic Cameras.","authors":"Xin Jin, Pei Wang, Qionghai Dai","doi":"10.1109/TIP.2019.2945687","DOIUrl":"10.1109/TIP.2019.2945687","url":null,"abstract":"Light field (LF) stitching is a potential solution to improve the field of view (FOV) for hand-held plenoptic cameras. Existing LF stitching methods cannot provide accurate registration for scenes with large depth variation. In this paper, a novel LF stitching method is proposed to handle parallax in the LFs more flexibly and accurately. First, a depth layer map (DLM) is proposed to guarantee adequate feature points on each depth layer. For the regions of nondeterministic depth, superpixel layer map (SLM) is proposed based on LF spatial correlation analysis to refine the depth layer assignments. Then, DLM-SLM-based LF registration is proposed to derive the location dependent homography transforms accurately and to warp LFs to its corresponding position without parallax interference. 4D graph-cut is further applied to fuse the registration results for higher LF spatial continuity and angular continuity. Horizontal, vertical and multi-LF stitching are tested for different scenes, which demonstrates the superior performance provided by the proposed method in terms of subjective quality of the stitched LFs, epipolar plane image consistency in the stitched LF, and perspective-averaged correlation between the stitched LF and the input LFs.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient Evaluation of Image Quality via Deep-Learning Approximation of Perceptual Metrics. 通过感知指标的深度学习近似法高效评估图像质量。

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-10-07 DOI: 10.1109/TIP.2019.2944079

Alessandro Artusi, Francesco Banterle, Fabio Carrara, Alejandro Moreo

Image metrics based on Human Visual System (HVS) play a remarkable role in the evaluation of complex image processing algorithms. However, mimicking the HVS is known to be complex and computationally expensive (both in terms of time and memory), and its usage is thus limited to a few applications and to small input data. All of this makes such metrics not fully attractive in real-world scenarios. To address these issues, we propose Deep Image Quality Metric (DIQM), a deep-learning approach to learn the global image quality feature (mean-opinion-score). DIQM can emulate existing visual metrics efficiently, reducing the computational costs by more than an order of magnitude with respect to existing implementations.

基于人类视觉系统（HVS）的图像度量在复杂图像处理算法的评估中发挥着重要作用。然而，众所周知，模仿人类视觉系统既复杂又耗费计算资源（包括时间和内存），因此其应用仅限于少数应用和较小的输入数据。所有这些都使得这类指标在现实世界中并不完全具有吸引力。为了解决这些问题，我们提出了深度图像质量度量（DIQM），这是一种学习全局图像质量特征（平均意见分数）的深度学习方法。DIQM 可以高效地模拟现有的视觉度量，与现有的实现方法相比，计算成本降低了一个数量级以上。

引用次数: 0

Exploiting Block-sparsity for Hyperspectral Kronecker Compressive Sensing: a Tensor-based Bayesian Method. 利用块稀疏性实现高光谱克朗克尔压缩传感：基于张量的贝叶斯方法

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-10-07 DOI: 10.1109/TIP.2019.2944722

Rongqiang Zhao, Qiang Wang, Jun Fu, Luquan Ren

Bayesian methods are attracting increasing attention in the field of compressive sensing (CS), as they are applicable to recover signals from random measurements. However, these methods have limited use in many tensor-based cases such as hyperspectral Kronecker compressive sensing (HKCS), because they exploit the sparsity in only one dimension. In this paper, we propose a novel Bayesian model for HKCS in an attempt to overcome the above limitation. The model exploits multi-dimensional block-sparsity such that the information redundancies in all dimensions are eliminated. Laplace prior distributions are employed for sparse coefficients in each dimension, and their coupling is consistent with the multi-dimensional block-sparsity model. Based on the proposed model, we develop a tensor-based Bayesian reconstruction algorithm, which decouples the hyperparameters for each dimension via a low-complexity technique. Experimental results demonstrate that the proposed method is able to provide more accurate reconstruction than existing Bayesian methods at a satisfactory speed. Additionally, the proposed method can not only be used for HKCS, it also has the potential to be extended to other multi-dimensional CS applications and to multi-dimensional block-sparse-based data recovery.

贝叶斯方法适用于从随机测量中恢复信号，因此在压缩传感（CS）领域受到越来越多的关注。然而，这些方法在许多基于张量的情况下（如高光谱 Kronecker 压缩传感（HKCS））使用有限，因为它们只利用了一个维度的稀疏性。在本文中，我们提出了一种用于 HKCS 的新型贝叶斯模型，试图克服上述局限性。该模型利用了多维块稀疏性，从而消除了所有维度的信息冗余。每个维度的稀疏系数都采用了拉普拉斯先验分布，它们之间的耦合与多维块稀疏性模型是一致的。基于所提出的模型，我们开发了一种基于张量的贝叶斯重建算法，该算法通过一种低复杂度技术解耦了每个维度的超参数。实验结果表明，与现有的贝叶斯方法相比，所提出的方法能以令人满意的速度提供更精确的重建。此外，所提出的方法不仅可用于香港计算机辅助分析，还具有扩展到其他多维计算机辅助分析应用和基于块稀疏的多维数据恢复的潜力。

{"title":"Exploiting Block-sparsity for Hyperspectral Kronecker Compressive Sensing: a Tensor-based Bayesian Method.","authors":"Rongqiang Zhao, Qiang Wang, Jun Fu, Luquan Ren","doi":"10.1109/TIP.2019.2944722","DOIUrl":"10.1109/TIP.2019.2944722","url":null,"abstract":"Bayesian methods are attracting increasing attention in the field of compressive sensing (CS), as they are applicable to recover signals from random measurements. However, these methods have limited use in many tensor-based cases such as hyperspectral Kronecker compressive sensing (HKCS), because they exploit the sparsity in only one dimension. In this paper, we propose a novel Bayesian model for HKCS in an attempt to overcome the above limitation. The model exploits multi-dimensional block-sparsity such that the information redundancies in all dimensions are eliminated. Laplace prior distributions are employed for sparse coefficients in each dimension, and their coupling is consistent with the multi-dimensional block-sparsity model. Based on the proposed model, we develop a tensor-based Bayesian reconstruction algorithm, which decouples the hyperparameters for each dimension via a low-complexity technique. Experimental results demonstrate that the proposed method is able to provide more accurate reconstruction than existing Bayesian methods at a satisfactory speed. Additionally, the proposed method can not only be used for HKCS, it also has the potential to be extended to other multi-dimensional CS applications and to multi-dimensional block-sparse-based data recovery.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Skeleton Filter: A Self-Symmetric Filter for Skeletonization in Noisy Text Images. 骨架滤波器：用于噪声文本图像骨架化的自对称滤波器

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-10-07 DOI: 10.1109/TIP.2019.2944560

Xiuxiu Bai, Lele Ye, Jihua Zhu, Li Zhu, Taku Komura

Robustly computing the skeletons of objects in natural images is difficult due to the large variations in shape boundaries and the large amount of noise in the images. Inspired by recent findings in neuroscience, we propose the Skeleton Filter, which is a novel model for skeleton extraction from natural images. The Skeleton Filter consists of a pair of oppositely oriented Gabor-like filters; by applying the Skeleton Filter in various orientations to an image at multiple resolutions and fusing the results, our system can robustly extract the skeleton even under highly noisy conditions. We evaluate the performance of our approach using challenging noisy text datasets and demonstrate that our pipeline realizes state-of-the-art performance for extracting the text skeleton. Moreover, the presence of Gabor filters in the human visual system and the simple architecture of the Skeleton Filter can help explain the strong capabilities of humans in perceiving skeletons of objects, even under dramatically noisy conditions.

由于形状边界的巨大变化和图像中的大量噪声，很难在自然图像中可靠地计算出物体的骨架。受神经科学最新研究成果的启发，我们提出了骨架过滤器，这是一种从自然图像中提取骨架的新型模型。骨架滤波器由一对方向相反的 Gabor 类滤波器组成；通过将不同方向的骨架滤波器应用于多分辨率的图像并融合结果，我们的系统即使在高噪声条件下也能稳健地提取骨架。我们使用具有挑战性的高噪声文本数据集评估了我们方法的性能，并证明我们的管道在提取文本骨架方面实现了最先进的性能。此外，Gabor 滤波器在人类视觉系统中的存在以及骨架滤波器的简单架构有助于解释人类即使在高噪声条件下也能感知物体骨架的强大能力。

引用次数: 0

Deep Coupled ISTA Network for Multi-modal Image Super-Resolution. 用于多模态图像超分辨率的深度耦合 ISTA 网络

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-10-03 DOI: 10.1109/TIP.2019.2944270

Xin Deng, Pier Luigi Dragotti

Given a low-resolution (LR) image, multi-modal image super-resolution (MISR) aims to find the high-resolution (HR) version of this image with the guidance of an HR image from another modality. In this paper, we use a model-based approach to design a new deep network architecture for MISR. We first introduce a novel joint multi-modal dictionary learning (JMDL) algorithm to model cross-modality dependency. In JMDL, we simultaneously learn three dictionaries and two transform matrices to combine the modalities. Then, by unfolding the iterative shrinkage and thresholding algorithm (ISTA), we turn the JMDL model into a deep neural network, called deep coupled ISTA network. Since the network initialization plays an important role in deep network training, we further propose a layer-wise optimization algorithm (LOA) to initialize the parameters of the network before running back-propagation strategy. Specifically, we model the network initialization as a multi-layer dictionary learning problem, and solve it through convex optimization. The proposed LOA is demonstrated to effectively decrease the training loss and increase the reconstruction accuracy. Finally, we compare our method with other state-of-the-art methods in the MISR task. The numerical results show that our method consistently outperforms others both quantitatively and qualitatively at different upscaling factors for various multi-modal scenarios.

给定一幅低分辨率（LR）图像，多模态图像超分辨率（MISR）的目的是在另一种模态的高分辨率图像的引导下找到该图像的高分辨率（HR）版本。在本文中，我们采用基于模型的方法为 MISR 设计了一种新的深度网络架构。我们首先引入了一种新颖的联合多模态字典学习（JMDL）算法，对跨模态依赖性进行建模。在 JMDL 中，我们同时学习三个字典和两个变换矩阵，以结合模态。然后，通过展开迭代收缩和阈值算法（ISTA），我们将 JMDL 模型转化为深度神经网络，即深度耦合 ISTA 网络。由于网络初始化在深度网络训练中起着重要作用，我们进一步提出了一种层优化算法（LOA），用于在运行反向传播策略之前初始化网络参数。具体来说，我们将网络初始化建模为多层字典学习问题，并通过凸优化来解决。实验证明，所提出的 LOA 能有效减少训练损失，提高重建精度。最后，我们将我们的方法与 MISR 任务中的其他先进方法进行了比较。数值结果表明，对于各种多模态场景，在不同的放大系数下，我们的方法在定量和定性上都始终优于其他方法。

{"title":"Deep Coupled ISTA Network for Multi-modal Image Super-Resolution.","authors":"Xin Deng, Pier Luigi Dragotti","doi":"10.1109/TIP.2019.2944270","DOIUrl":"10.1109/TIP.2019.2944270","url":null,"abstract":"Given a low-resolution (LR) image, multi-modal image super-resolution (MISR) aims to find the high-resolution (HR) version of this image with the guidance of an HR image from another modality. In this paper, we use a model-based approach to design a new deep network architecture for MISR. We first introduce a novel joint multi-modal dictionary learning (JMDL) algorithm to model cross-modality dependency. In JMDL, we simultaneously learn three dictionaries and two transform matrices to combine the modalities. Then, by unfolding the iterative shrinkage and thresholding algorithm (ISTA), we turn the JMDL model into a deep neural network, called deep coupled ISTA network. Since the network initialization plays an important role in deep network training, we further propose a layer-wise optimization algorithm (LOA) to initialize the parameters of the network before running back-propagation strategy. Specifically, we model the network initialization as a multi-layer dictionary learning problem, and solve it through convex optimization. The proposed LOA is demonstrated to effectively decrease the training loss and increase the reconstruction accuracy. Finally, we compare our method with other state-of-the-art methods in the MISR task. The numerical results show that our method consistently outperforms others both quantitatively and qualitatively at different upscaling factors for various multi-modal scenarios.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62589478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Semi-Supervised Human Detection via Region Proposal Networks Aided by Verification. 通过验证辅助区域建议网络进行半监督式人体检测

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-10-03 DOI: 10.1109/TIP.2019.2944306

Si Wu, Wenhao Wu, Shiyao Lei, Sihao Lin, Rui Li, Zhiwen Yu, Hau-San Wong

In this paper, we explore how to leverage readily available unlabeled data to improve semi-supervised human detection performance. For this purpose, we specifically modify the region proposal network (RPN) for learning on a partially labeled dataset. Based on commonly observed false positive types, a verification module is developed to assess foreground human objects in the candidate regions to provide an important cue for filtering the RPN's proposals. The remaining proposals with high confidence scores are then used as pseudo annotations for re-training our detection model. To reduce the risk of error propagation in the training process, we adopt a self-paced training strategy to progressively include more pseudo annotations generated by the previous model over multiple training rounds. The resulting detector re-trained on the augmented data can be expected to have better detection performance. The effectiveness of the main components of this framework is verified through extensive experiments, and the proposed approach achieves state-of-the-art detection results on multiple scene-specific human detection benchmarks in the semi-supervised setting.

在本文中，我们探讨了如何利用随时可用的非标记数据来提高半监督式人体检测性能。为此，我们专门修改了区域建议网络（RPN），以便在部分标记的数据集上进行学习。根据通常观察到的误报类型，我们开发了一个验证模块，用于评估候选区域中的前景人类对象，为过滤 RPN 的建议提供重要线索。然后，剩余的高置信度建议将作为伪注释，用于重新训练我们的检测模型。为了降低训练过程中错误传播的风险，我们采用了自定步调的训练策略，在多轮训练中逐步加入更多由上一模型生成的伪注释。在增强数据上重新训练的检测器有望获得更好的检测性能。通过大量实验验证了这一框架主要组成部分的有效性，在半监督环境下，所提出的方法在多个特定场景的人类检测基准上取得了最先进的检测结果。

{"title":"Semi-Supervised Human Detection via Region Proposal Networks Aided by Verification.","authors":"Si Wu, Wenhao Wu, Shiyao Lei, Sihao Lin, Rui Li, Zhiwen Yu, Hau-San Wong","doi":"10.1109/TIP.2019.2944306","DOIUrl":"10.1109/TIP.2019.2944306","url":null,"abstract":"In this paper, we explore how to leverage readily available unlabeled data to improve semi-supervised human detection performance. For this purpose, we specifically modify the region proposal network (RPN) for learning on a partially labeled dataset. Based on commonly observed false positive types, a verification module is developed to assess foreground human objects in the candidate regions to provide an important cue for filtering the RPN's proposals. The remaining proposals with high confidence scores are then used as pseudo annotations for re-training our detection model. To reduce the risk of error propagation in the training process, we adopt a self-paced training strategy to progressively include more pseudo annotations generated by the previous model over multiple training rounds. The resulting detector re-trained on the augmented data can be expected to have better detection performance. The effectiveness of the main components of this framework is verified through extensive experiments, and the proposed approach achieves state-of-the-art detection results on multiple scene-specific human detection benchmarks in the semi-supervised setting.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62589764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning Interleaved Cascade of Shrinkage Fields for Joint Image Dehazing and Denoising. 为联合图像去污和去噪学习交错级联收缩场

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-09-30 DOI: 10.1109/TIP.2019.2942504

Qingbo Wu, Wenqi Ren, Xiaochun Cao

Most existing image dehazing methods deteriorate to different extents when processing hazy inputs with noise. The main reason is that the commonly adopted two-step strategy tends to amplify noise in the inverse operation of division by the transmission. To address this problem, we learn an interleaved Cascade of Shrinkage Fields (CSF) to reduce noise in jointly recovering the transmission map and the scene radiance from a single hazy image. Specifically, an auxiliary shrinkage field (SF) model is integrated into each cascade of the proposed scheme to reduce undesirable artifacts during the transmission estimation. Different from conventional CSF, our learned SF models have special visual patterns, which facilitate the specific task of noise reduction in haze removal. Furthermore, a numerical algorithm is proposed to efficiently update the scene radiance and the transmission map in each cascade. Extensive experiments on synthetic and real-world data demonstrate that the proposed algorithm performs favorably against state-of-the-art dehazing methods on hazy and noisy images.

现有的大多数图像去噪方法在处理带有噪声的朦胧输入时都会出现不同程度的恶化。主要原因是，通常采用的两步策略往往会在除法传输的逆操作中放大噪声。为了解决这个问题，我们学习了一种交错级联收缩场（CSF），以减少从单幅朦胧图像中联合恢复透射图和场景辐射率时的噪声。具体地说，在拟议方案的每个级联中都集成了一个辅助收缩场（SF）模型，以减少传输估计过程中的不良伪影。与传统的 CSF 不同，我们学习的 SF 模型具有特殊的视觉模式，这有助于在去除雾霾的过程中完成降噪这一特定任务。此外，我们还提出了一种数值算法，用于在每个级联中有效地更新场景辐照度和传输图。在合成数据和真实世界数据上进行的大量实验表明，与最先进的去雾霾和噪声图像处理方法相比，所提出的算法性能更佳。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

IEEE Transactions on Image Processing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀