Pub Date : 2019-10-10DOI: 10.1109/TIP.2019.2945687
Xin Jin, Pei Wang, Qionghai Dai
Light field (LF) stitching is a potential solution to improve the field of view (FOV) for hand-held plenoptic cameras. Existing LF stitching methods cannot provide accurate registration for scenes with large depth variation. In this paper, a novel LF stitching method is proposed to handle parallax in the LFs more flexibly and accurately. First, a depth layer map (DLM) is proposed to guarantee adequate feature points on each depth layer. For the regions of nondeterministic depth, superpixel layer map (SLM) is proposed based on LF spatial correlation analysis to refine the depth layer assignments. Then, DLM-SLM-based LF registration is proposed to derive the location dependent homography transforms accurately and to warp LFs to its corresponding position without parallax interference. 4D graph-cut is further applied to fuse the registration results for higher LF spatial continuity and angular continuity. Horizontal, vertical and multi-LF stitching are tested for different scenes, which demonstrates the superior performance provided by the proposed method in terms of subjective quality of the stitched LFs, epipolar plane image consistency in the stitched LF, and perspective-averaged correlation between the stitched LF and the input LFs.
{"title":"Parallax Tolerant Light Field Stitching for Hand-held Plenoptic Cameras.","authors":"Xin Jin, Pei Wang, Qionghai Dai","doi":"10.1109/TIP.2019.2945687","DOIUrl":"10.1109/TIP.2019.2945687","url":null,"abstract":"<p><p>Light field (LF) stitching is a potential solution to improve the field of view (FOV) for hand-held plenoptic cameras. Existing LF stitching methods cannot provide accurate registration for scenes with large depth variation. In this paper, a novel LF stitching method is proposed to handle parallax in the LFs more flexibly and accurately. First, a depth layer map (DLM) is proposed to guarantee adequate feature points on each depth layer. For the regions of nondeterministic depth, superpixel layer map (SLM) is proposed based on LF spatial correlation analysis to refine the depth layer assignments. Then, DLM-SLM-based LF registration is proposed to derive the location dependent homography transforms accurately and to warp LFs to its corresponding position without parallax interference. 4D graph-cut is further applied to fuse the registration results for higher LF spatial continuity and angular continuity. Horizontal, vertical and multi-LF stitching are tested for different scenes, which demonstrates the superior performance provided by the proposed method in terms of subjective quality of the stitched LFs, epipolar plane image consistency in the stitched LF, and perspective-averaged correlation between the stitched LF and the input LFs.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-07DOI: 10.1109/TIP.2019.2944079
Alessandro Artusi, Francesco Banterle, Fabio Carrara, Alejandro Moreo
Image metrics based on Human Visual System (HVS) play a remarkable role in the evaluation of complex image processing algorithms. However, mimicking the HVS is known to be complex and computationally expensive (both in terms of time and memory), and its usage is thus limited to a few applications and to small input data. All of this makes such metrics not fully attractive in real-world scenarios. To address these issues, we propose Deep Image Quality Metric (DIQM), a deep-learning approach to learn the global image quality feature (mean-opinion-score). DIQM can emulate existing visual metrics efficiently, reducing the computational costs by more than an order of magnitude with respect to existing implementations.
{"title":"Efficient Evaluation of Image Quality via Deep-Learning Approximation of Perceptual Metrics.","authors":"Alessandro Artusi, Francesco Banterle, Fabio Carrara, Alejandro Moreo","doi":"10.1109/TIP.2019.2944079","DOIUrl":"10.1109/TIP.2019.2944079","url":null,"abstract":"<p><p>Image metrics based on Human Visual System (HVS) play a remarkable role in the evaluation of complex image processing algorithms. However, mimicking the HVS is known to be complex and computationally expensive (both in terms of time and memory), and its usage is thus limited to a few applications and to small input data. All of this makes such metrics not fully attractive in real-world scenarios. To address these issues, we propose Deep Image Quality Metric (DIQM), a deep-learning approach to learn the global image quality feature (mean-opinion-score). DIQM can emulate existing visual metrics efficiently, reducing the computational costs by more than an order of magnitude with respect to existing implementations.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62589666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-07DOI: 10.1109/TIP.2019.2944722
Rongqiang Zhao, Qiang Wang, Jun Fu, Luquan Ren
Bayesian methods are attracting increasing attention in the field of compressive sensing (CS), as they are applicable to recover signals from random measurements. However, these methods have limited use in many tensor-based cases such as hyperspectral Kronecker compressive sensing (HKCS), because they exploit the sparsity in only one dimension. In this paper, we propose a novel Bayesian model for HKCS in an attempt to overcome the above limitation. The model exploits multi-dimensional block-sparsity such that the information redundancies in all dimensions are eliminated. Laplace prior distributions are employed for sparse coefficients in each dimension, and their coupling is consistent with the multi-dimensional block-sparsity model. Based on the proposed model, we develop a tensor-based Bayesian reconstruction algorithm, which decouples the hyperparameters for each dimension via a low-complexity technique. Experimental results demonstrate that the proposed method is able to provide more accurate reconstruction than existing Bayesian methods at a satisfactory speed. Additionally, the proposed method can not only be used for HKCS, it also has the potential to be extended to other multi-dimensional CS applications and to multi-dimensional block-sparse-based data recovery.
{"title":"Exploiting Block-sparsity for Hyperspectral Kronecker Compressive Sensing: a Tensor-based Bayesian Method.","authors":"Rongqiang Zhao, Qiang Wang, Jun Fu, Luquan Ren","doi":"10.1109/TIP.2019.2944722","DOIUrl":"10.1109/TIP.2019.2944722","url":null,"abstract":"<p><p>Bayesian methods are attracting increasing attention in the field of compressive sensing (CS), as they are applicable to recover signals from random measurements. However, these methods have limited use in many tensor-based cases such as hyperspectral Kronecker compressive sensing (HKCS), because they exploit the sparsity in only one dimension. In this paper, we propose a novel Bayesian model for HKCS in an attempt to overcome the above limitation. The model exploits multi-dimensional block-sparsity such that the information redundancies in all dimensions are eliminated. Laplace prior distributions are employed for sparse coefficients in each dimension, and their coupling is consistent with the multi-dimensional block-sparsity model. Based on the proposed model, we develop a tensor-based Bayesian reconstruction algorithm, which decouples the hyperparameters for each dimension via a low-complexity technique. Experimental results demonstrate that the proposed method is able to provide more accurate reconstruction than existing Bayesian methods at a satisfactory speed. Additionally, the proposed method can not only be used for HKCS, it also has the potential to be extended to other multi-dimensional CS applications and to multi-dimensional block-sparse-based data recovery.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-07DOI: 10.1109/TIP.2019.2944560
Xiuxiu Bai, Lele Ye, Jihua Zhu, Li Zhu, Taku Komura
Robustly computing the skeletons of objects in natural images is difficult due to the large variations in shape boundaries and the large amount of noise in the images. Inspired by recent findings in neuroscience, we propose the Skeleton Filter, which is a novel model for skeleton extraction from natural images. The Skeleton Filter consists of a pair of oppositely oriented Gabor-like filters; by applying the Skeleton Filter in various orientations to an image at multiple resolutions and fusing the results, our system can robustly extract the skeleton even under highly noisy conditions. We evaluate the performance of our approach using challenging noisy text datasets and demonstrate that our pipeline realizes state-of-the-art performance for extracting the text skeleton. Moreover, the presence of Gabor filters in the human visual system and the simple architecture of the Skeleton Filter can help explain the strong capabilities of humans in perceiving skeletons of objects, even under dramatically noisy conditions.
{"title":"Skeleton Filter: A Self-Symmetric Filter for Skeletonization in Noisy Text Images.","authors":"Xiuxiu Bai, Lele Ye, Jihua Zhu, Li Zhu, Taku Komura","doi":"10.1109/TIP.2019.2944560","DOIUrl":"10.1109/TIP.2019.2944560","url":null,"abstract":"<p><p>Robustly computing the skeletons of objects in natural images is difficult due to the large variations in shape boundaries and the large amount of noise in the images. Inspired by recent findings in neuroscience, we propose the Skeleton Filter, which is a novel model for skeleton extraction from natural images. The Skeleton Filter consists of a pair of oppositely oriented Gabor-like filters; by applying the Skeleton Filter in various orientations to an image at multiple resolutions and fusing the results, our system can robustly extract the skeleton even under highly noisy conditions. We evaluate the performance of our approach using challenging noisy text datasets and demonstrate that our pipeline realizes state-of-the-art performance for extracting the text skeleton. Moreover, the presence of Gabor filters in the human visual system and the simple architecture of the Skeleton Filter can help explain the strong capabilities of humans in perceiving skeletons of objects, even under dramatically noisy conditions.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62589989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-03DOI: 10.1109/TIP.2019.2944270
Xin Deng, Pier Luigi Dragotti
Given a low-resolution (LR) image, multi-modal image super-resolution (MISR) aims to find the high-resolution (HR) version of this image with the guidance of an HR image from another modality. In this paper, we use a model-based approach to design a new deep network architecture for MISR. We first introduce a novel joint multi-modal dictionary learning (JMDL) algorithm to model cross-modality dependency. In JMDL, we simultaneously learn three dictionaries and two transform matrices to combine the modalities. Then, by unfolding the iterative shrinkage and thresholding algorithm (ISTA), we turn the JMDL model into a deep neural network, called deep coupled ISTA network. Since the network initialization plays an important role in deep network training, we further propose a layer-wise optimization algorithm (LOA) to initialize the parameters of the network before running back-propagation strategy. Specifically, we model the network initialization as a multi-layer dictionary learning problem, and solve it through convex optimization. The proposed LOA is demonstrated to effectively decrease the training loss and increase the reconstruction accuracy. Finally, we compare our method with other state-of-the-art methods in the MISR task. The numerical results show that our method consistently outperforms others both quantitatively and qualitatively at different upscaling factors for various multi-modal scenarios.
给定一幅低分辨率(LR)图像,多模态图像超分辨率(MISR)的目的是在另一种模态的高分辨率图像的引导下找到该图像的高分辨率(HR)版本。在本文中,我们采用基于模型的方法为 MISR 设计了一种新的深度网络架构。我们首先引入了一种新颖的联合多模态字典学习(JMDL)算法,对跨模态依赖性进行建模。在 JMDL 中,我们同时学习三个字典和两个变换矩阵,以结合模态。然后,通过展开迭代收缩和阈值算法(ISTA),我们将 JMDL 模型转化为深度神经网络,即深度耦合 ISTA 网络。由于网络初始化在深度网络训练中起着重要作用,我们进一步提出了一种层优化算法(LOA),用于在运行反向传播策略之前初始化网络参数。具体来说,我们将网络初始化建模为多层字典学习问题,并通过凸优化来解决。实验证明,所提出的 LOA 能有效减少训练损失,提高重建精度。最后,我们将我们的方法与 MISR 任务中的其他先进方法进行了比较。数值结果表明,对于各种多模态场景,在不同的放大系数下,我们的方法在定量和定性上都始终优于其他方法。
{"title":"Deep Coupled ISTA Network for Multi-modal Image Super-Resolution.","authors":"Xin Deng, Pier Luigi Dragotti","doi":"10.1109/TIP.2019.2944270","DOIUrl":"10.1109/TIP.2019.2944270","url":null,"abstract":"<p><p>Given a low-resolution (LR) image, multi-modal image super-resolution (MISR) aims to find the high-resolution (HR) version of this image with the guidance of an HR image from another modality. In this paper, we use a model-based approach to design a new deep network architecture for MISR. We first introduce a novel joint multi-modal dictionary learning (JMDL) algorithm to model cross-modality dependency. In JMDL, we simultaneously learn three dictionaries and two transform matrices to combine the modalities. Then, by unfolding the iterative shrinkage and thresholding algorithm (ISTA), we turn the JMDL model into a deep neural network, called deep coupled ISTA network. Since the network initialization plays an important role in deep network training, we further propose a layer-wise optimization algorithm (LOA) to initialize the parameters of the network before running back-propagation strategy. Specifically, we model the network initialization as a multi-layer dictionary learning problem, and solve it through convex optimization. The proposed LOA is demonstrated to effectively decrease the training loss and increase the reconstruction accuracy. Finally, we compare our method with other state-of-the-art methods in the MISR task. The numerical results show that our method consistently outperforms others both quantitatively and qualitatively at different upscaling factors for various multi-modal scenarios.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62589478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-03DOI: 10.1109/TIP.2019.2944306
Si Wu, Wenhao Wu, Shiyao Lei, Sihao Lin, Rui Li, Zhiwen Yu, Hau-San Wong
In this paper, we explore how to leverage readily available unlabeled data to improve semi-supervised human detection performance. For this purpose, we specifically modify the region proposal network (RPN) for learning on a partially labeled dataset. Based on commonly observed false positive types, a verification module is developed to assess foreground human objects in the candidate regions to provide an important cue for filtering the RPN's proposals. The remaining proposals with high confidence scores are then used as pseudo annotations for re-training our detection model. To reduce the risk of error propagation in the training process, we adopt a self-paced training strategy to progressively include more pseudo annotations generated by the previous model over multiple training rounds. The resulting detector re-trained on the augmented data can be expected to have better detection performance. The effectiveness of the main components of this framework is verified through extensive experiments, and the proposed approach achieves state-of-the-art detection results on multiple scene-specific human detection benchmarks in the semi-supervised setting.
{"title":"Semi-Supervised Human Detection via Region Proposal Networks Aided by Verification.","authors":"Si Wu, Wenhao Wu, Shiyao Lei, Sihao Lin, Rui Li, Zhiwen Yu, Hau-San Wong","doi":"10.1109/TIP.2019.2944306","DOIUrl":"10.1109/TIP.2019.2944306","url":null,"abstract":"<p><p>In this paper, we explore how to leverage readily available unlabeled data to improve semi-supervised human detection performance. For this purpose, we specifically modify the region proposal network (RPN) for learning on a partially labeled dataset. Based on commonly observed false positive types, a verification module is developed to assess foreground human objects in the candidate regions to provide an important cue for filtering the RPN's proposals. The remaining proposals with high confidence scores are then used as pseudo annotations for re-training our detection model. To reduce the risk of error propagation in the training process, we adopt a self-paced training strategy to progressively include more pseudo annotations generated by the previous model over multiple training rounds. The resulting detector re-trained on the augmented data can be expected to have better detection performance. The effectiveness of the main components of this framework is verified through extensive experiments, and the proposed approach achieves state-of-the-art detection results on multiple scene-specific human detection benchmarks in the semi-supervised setting.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62589764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-30DOI: 10.1109/TIP.2019.2942504
Qingbo Wu, Wenqi Ren, Xiaochun Cao
Most existing image dehazing methods deteriorate to different extents when processing hazy inputs with noise. The main reason is that the commonly adopted two-step strategy tends to amplify noise in the inverse operation of division by the transmission. To address this problem, we learn an interleaved Cascade of Shrinkage Fields (CSF) to reduce noise in jointly recovering the transmission map and the scene radiance from a single hazy image. Specifically, an auxiliary shrinkage field (SF) model is integrated into each cascade of the proposed scheme to reduce undesirable artifacts during the transmission estimation. Different from conventional CSF, our learned SF models have special visual patterns, which facilitate the specific task of noise reduction in haze removal. Furthermore, a numerical algorithm is proposed to efficiently update the scene radiance and the transmission map in each cascade. Extensive experiments on synthetic and real-world data demonstrate that the proposed algorithm performs favorably against state-of-the-art dehazing methods on hazy and noisy images.
现有的大多数图像去噪方法在处理带有噪声的朦胧输入时都会出现不同程度的恶化。主要原因是,通常采用的两步策略往往会在除法传输的逆操作中放大噪声。为了解决这个问题,我们学习了一种交错级联收缩场(CSF),以减少从单幅朦胧图像中联合恢复透射图和场景辐射率时的噪声。具体地说,在拟议方案的每个级联中都集成了一个辅助收缩场(SF)模型,以减少传输估计过程中的不良伪影。与传统的 CSF 不同,我们学习的 SF 模型具有特殊的视觉模式,这有助于在去除雾霾的过程中完成降噪这一特定任务。此外,我们还提出了一种数值算法,用于在每个级联中有效地更新场景辐照度和传输图。在合成数据和真实世界数据上进行的大量实验表明,与最先进的去雾霾和噪声图像处理方法相比,所提出的算法性能更佳。
{"title":"Learning Interleaved Cascade of Shrinkage Fields for Joint Image Dehazing and Denoising.","authors":"Qingbo Wu, Wenqi Ren, Xiaochun Cao","doi":"10.1109/TIP.2019.2942504","DOIUrl":"10.1109/TIP.2019.2942504","url":null,"abstract":"<p><p>Most existing image dehazing methods deteriorate to different extents when processing hazy inputs with noise. The main reason is that the commonly adopted two-step strategy tends to amplify noise in the inverse operation of division by the transmission. To address this problem, we learn an interleaved Cascade of Shrinkage Fields (CSF) to reduce noise in jointly recovering the transmission map and the scene radiance from a single hazy image. Specifically, an auxiliary shrinkage field (SF) model is integrated into each cascade of the proposed scheme to reduce undesirable artifacts during the transmission estimation. Different from conventional CSF, our learned SF models have special visual patterns, which facilitate the specific task of noise reduction in haze removal. Furthermore, a numerical algorithm is proposed to efficiently update the scene radiance and the transmission map in each cascade. Extensive experiments on synthetic and real-world data demonstrate that the proposed algorithm performs favorably against state-of-the-art dehazing methods on hazy and noisy images.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62588486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-25DOI: 10.1109/TIP.2019.2938310
Kai-Fu Yang, Xian-Shi Zhang, Yong-Jie Li
Image enhancement is an important pre-processing step for many computer vision applications especially regarding the scenes in poor visibility conditions. In this work, we develop a unified two-pathway model inspired by the biological vision, especially the early visual mechanisms, which contributes to image enhancement tasks including low dynamic range (LDR) image enhancement and high dynamic range (HDR) image tone mapping. Firstly, the input image is separated and sent into two visual pathways: structure-pathway and detail-pathway, corresponding to the M-and P-pathway in the early visual system, which code the low-and high-frequency visual information, respectively. In the structure-pathway, an extended biological normalization model is used to integrate the global and local luminance adaptation, which can handle the visual scenes with varying illuminations. On the other hand, the detail enhancement and local noise suppression are achieved in the detail-pathway based on local energy weighting. Finally, the outputs of structure-and detail-pathway are integrated to achieve the low-light image enhancement. In addition, the proposed model can also be used for tone mapping of HDR images with some fine-tuning steps. Extensive experiments on three datasets (two LDR image datasets and one HDR scene dataset) show that the proposed model can handle the visual enhancement tasks mentioned above efficiently and outperform the related state-of-the-art methods.
图像增强是许多计算机视觉应用的重要预处理步骤,尤其是在能见度较低的场景中。在这项工作中,我们受生物视觉特别是早期视觉机制的启发,建立了一个统一的双通道模型,该模型有助于图像增强任务,包括低动态范围(LDR)图像增强和高动态范围(HDR)图像色调映射。首先,输入图像被分离并送入两条视觉通路:结构通路和细节通路,分别对应于早期视觉系统中的 M 通路和 P 通路,它们分别编码低频和高频视觉信息。在结构通路中,一个扩展的生物归一化模型被用来整合全局和局部亮度适应,从而可以处理不同光照度的视觉场景。另一方面,细节通路基于局部能量加权实现细节增强和局部噪声抑制。最后,整合结构和细节通路的输出,实现弱光图像增强。此外,通过一些微调步骤,所提出的模型还可用于 HDR 图像的色调映射。在三个数据集(两个低照度图像数据集和一个高照度场景数据集)上进行的大量实验表明,所提出的模型可以高效地处理上述视觉增强任务,并优于相关的先进方法。
{"title":"A Biological Vision Inspired Framework for Image Enhancement in Poor Visibility Conditions.","authors":"Kai-Fu Yang, Xian-Shi Zhang, Yong-Jie Li","doi":"10.1109/TIP.2019.2938310","DOIUrl":"10.1109/TIP.2019.2938310","url":null,"abstract":"<p><p>Image enhancement is an important pre-processing step for many computer vision applications especially regarding the scenes in poor visibility conditions. In this work, we develop a unified two-pathway model inspired by the biological vision, especially the early visual mechanisms, which contributes to image enhancement tasks including low dynamic range (LDR) image enhancement and high dynamic range (HDR) image tone mapping. Firstly, the input image is separated and sent into two visual pathways: structure-pathway and detail-pathway, corresponding to the M-and P-pathway in the early visual system, which code the low-and high-frequency visual information, respectively. In the structure-pathway, an extended biological normalization model is used to integrate the global and local luminance adaptation, which can handle the visual scenes with varying illuminations. On the other hand, the detail enhancement and local noise suppression are achieved in the detail-pathway based on local energy weighting. Finally, the outputs of structure-and detail-pathway are integrated to achieve the low-light image enhancement. In addition, the proposed model can also be used for tone mapping of HDR images with some fine-tuning steps. Extensive experiments on three datasets (two LDR image datasets and one HDR scene dataset) show that the proposed model can handle the visual enhancement tasks mentioned above efficiently and outperform the related state-of-the-art methods.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62586368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-25DOI: 10.1109/TIP.2019.2941327
Ruxin Wang, Mingming Gong, Dacheng Tao
The performance of single image super-resolution (SISR) has been largely improved by innovative designs of deep architectures. An important claim raised by these designs is that the deep models have large receptive field size and strong nonlinearity. However, we are concerned about the question that which factor, receptive field size or model depth, is more critical for SISR. Towards revealing the answers, in this paper, we propose a strategy based on dilated convolution to investigate how the two factors affect the performance of SISR. Our findings from exhaustive investigations suggest that SISR is more sensitive to the changes of receptive field size than to the model depth variations, and that the model depth must be congruent with the receptive field size to produce improved performance. These findings inspire us to design a shallower architecture which can save computational and memory cost while preserving comparable effectiveness with respect to a much deeper architecture.
{"title":"Receptive Field Size vs. Model Depth for Single Image Super-resolution.","authors":"Ruxin Wang, Mingming Gong, Dacheng Tao","doi":"10.1109/TIP.2019.2941327","DOIUrl":"10.1109/TIP.2019.2941327","url":null,"abstract":"<p><p>The performance of single image super-resolution (SISR) has been largely improved by innovative designs of deep architectures. An important claim raised by these designs is that the deep models have large receptive field size and strong nonlinearity. However, we are concerned about the question that which factor, receptive field size or model depth, is more critical for SISR. Towards revealing the answers, in this paper, we propose a strategy based on dilated convolution to investigate how the two factors affect the performance of SISR. Our findings from exhaustive investigations suggest that SISR is more sensitive to the changes of receptive field size than to the model depth variations, and that the model depth must be congruent with the receptive field size to produce improved performance. These findings inspire us to design a shallower architecture which can save computational and memory cost while preserving comparable effectiveness with respect to a much deeper architecture.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62587513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-25DOI: 10.1109/TIP.2019.2941660
Zhuo Chen, Kui Fan, Shiqi Wang, Lingyu Duan, Weisi Lin, Alex C Kot
The recent advances of hardware technology have made the intelligent analysis equipped at the front-end with deep learning more prevailing and practical. To better enable the intelligent sensing at the front-end, instead of compressing and transmitting visual signals or the ultimately utilized top-layer deep learning features, we propose to compactly represent and convey the intermediate-layer deep learning features with high generalization capability, to facilitate the collaborating approach between front and cloud ends. This strategy enables a good balance among the computational load, transmission load and the generalization ability for cloud servers when deploying the deep neural networks for large scale cloud based visual analysis. Moreover, the presented strategy also makes the standardization of deep feature coding more feasible and promising, as a series of tasks can simultaneously benefit from the transmitted intermediate layer features. We also present the results for evaluations of both lossless and lossy deep feature compression, which provide meaningful investigations and baselines for future research and standardization activities.
{"title":"Intermediate Deep Feature Compression: Toward Intelligent Sensing.","authors":"Zhuo Chen, Kui Fan, Shiqi Wang, Lingyu Duan, Weisi Lin, Alex C Kot","doi":"10.1109/TIP.2019.2941660","DOIUrl":"10.1109/TIP.2019.2941660","url":null,"abstract":"<p><p>The recent advances of hardware technology have made the intelligent analysis equipped at the front-end with deep learning more prevailing and practical. To better enable the intelligent sensing at the front-end, instead of compressing and transmitting visual signals or the ultimately utilized top-layer deep learning features, we propose to compactly represent and convey the intermediate-layer deep learning features with high generalization capability, to facilitate the collaborating approach between front and cloud ends. This strategy enables a good balance among the computational load, transmission load and the generalization ability for cloud servers when deploying the deep neural networks for large scale cloud based visual analysis. Moreover, the presented strategy also makes the standardization of deep feature coding more feasible and promising, as a series of tasks can simultaneously benefit from the transmitted intermediate layer features. We also present the results for evaluations of both lossless and lossy deep feature compression, which provide meaningful investigations and baselines for future research and standardization activities.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62587991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}