首页 > 最新文献

IEEE Transactions on Image Processing最新文献

英文 中文
Geometry Coding for Dynamic Voxelized Point Clouds Using Octrees and Multiple Contexts. 使用八叉树和多语境为动态体素化点云进行几何编码
IF 10.6 1区 计算机科学 Q1 Computer Science Pub Date : 2019-08-01 DOI: 10.1109/TIP.2019.2931466
Diogo C Garcia, Tiago A Fonseca, Renan U Ferreira, Ricardo L de Queiroz

We present a method to compress geometry information of point clouds that explores redundancies across consecutive frames of a sequence. It uses octrees and works by progressively increasing resolution of the octree. At each branch of the tree, we generate an approximation of the child nodes by a number of methods which are used as contexts to drive an arithmetic coder. The best approximation, i.e. the context that yields the least amount of encoding bits, is selected and the chosen method is indicated as side information for replication at the decoder. The core of our method is a context-based arithmetic coder in which a reference octree is used as reference to encode the current octree, thus providing 255 contexts for each output octet. The 255×255 frequency histogram is viewed as a discrete 3D surface and is conveyed to the decoder using another octree. We present two methods to generate the predictions (contexts) which use adjacent frames in the sequence (inter-frame) and one method that works purely intra-frame. The encoder continuously switches the best mode among the three and conveys such information to the decoder. Since an intra-frame prediction is present, our coder can also work in purely intra-frame mode, as well. Extensive results are presented to show the method's potential against many compression alternatives for the geometry information in dynamic voxelized point clouds.

我们提出了一种压缩点云几何信息的方法,这种方法可以探索序列中连续帧的冗余。该方法使用八叉树,并通过逐步提高八叉树的分辨率来实现。在八叉树的每个分支上,我们通过多种方法生成子节点的近似值,并将其作为上下文来驱动算术编码器。我们会选择最佳近似值,即产生最少编码比特的上下文,并将所选方法作为侧信息在解码器中进行复制。我们方法的核心是基于上下文的算术编码器,其中参考八进制被用作当前八进制的编码参考,从而为每个输出八进制提供 255 个上下文。255×255 频率直方图被视为一个离散的三维表面,并通过另一个八叉树传达给解码器。我们提出了两种利用序列中相邻帧(帧间)生成预测(上下文)的方法,以及一种纯粹在帧内工作的方法。编码器不断切换这三种方法中的最佳模式,并将这些信息传递给解码器。由于存在帧内预测,我们的编码器也可以在纯帧内模式下工作。本文展示了大量结果,显示了该方法在动态体素化点云几何信息压缩方面的潜力。
{"title":"Geometry Coding for Dynamic Voxelized Point Clouds Using Octrees and Multiple Contexts.","authors":"Diogo C Garcia, Tiago A Fonseca, Renan U Ferreira, Ricardo L de Queiroz","doi":"10.1109/TIP.2019.2931466","DOIUrl":"10.1109/TIP.2019.2931466","url":null,"abstract":"<p><p>We present a method to compress geometry information of point clouds that explores redundancies across consecutive frames of a sequence. It uses octrees and works by progressively increasing resolution of the octree. At each branch of the tree, we generate an approximation of the child nodes by a number of methods which are used as contexts to drive an arithmetic coder. The best approximation, i.e. the context that yields the least amount of encoding bits, is selected and the chosen method is indicated as side information for replication at the decoder. The core of our method is a context-based arithmetic coder in which a reference octree is used as reference to encode the current octree, thus providing 255 contexts for each output octet. The 255×255 frequency histogram is viewed as a discrete 3D surface and is conveyed to the decoder using another octree. We present two methods to generate the predictions (contexts) which use adjacent frames in the sequence (inter-frame) and one method that works purely intra-frame. The encoder continuously switches the best mode among the three and conveys such information to the decoder. Since an intra-frame prediction is present, our coder can also work in purely intra-frame mode, as well. Extensive results are presented to show the method's potential against many compression alternatives for the geometry information in dynamic voxelized point clouds.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":null,"pages":null},"PeriodicalIF":10.6,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62584307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse Representation based Video Quality Assessment for Synthesized 3D Videos. 基于稀疏表示的合成三维视频质量评估
IF 10.6 1区 计算机科学 Q1 Computer Science Pub Date : 2019-07-29 DOI: 10.1109/TIP.2019.2929433
Yun Zhang, Huan Zhang, Mei Yu, Sam Kwong, Yo-Sung Ho

The temporal flicker distortion is one of the most annoying noises in synthesized virtual view videos when they are rendered by compressed multi-view video plus depth in Three Dimensional (3D) video system. To assess the synthesized view video quality and further optimize the compression techniques in 3D video system, objective video quality assessment which can accurately measure the flicker distortion is highly needed. In this paper, we propose a full reference sparse representation based video quality assessment method towards synthesized 3D videos. Firstly, a synthesized video, treated as a 3D volume data with spatial (X-Y) and temporal (T) domains, is reformed and decomposed as a number of spatially neighboring temporal layers, i.e., X-T or Y-T planes. Gradient features in temporal layers of the synthesized video and strong edges of depth maps are used as key features in detecting the location of flicker distortions. Secondly, dictionary learning and sparse representation for the temporal layers are then derived and applied to effectively represent the temporal flicker distortion. Thirdly, a rank pooling method is used to pool all the temporal layer scores and obtain the score for the flicker distortion. Finally, the temporal flicker distortion measurement is combined with the conventional spatial distortion measurement to assess the quality of synthesized 3D videos. Experimental results on synthesized video quality database demonstrate our proposed method is significantly superior to other state-of-the-art methods, especially on the view synthesis distortions induced from depth videos.

在三维(3D)视频系统中,当合成虚拟视图视频通过压缩多视图视频和深度视频渲染时,时间闪烁失真是最恼人的噪声之一。为了评估合成视图视频质量并进一步优化三维视频系统中的压缩技术,亟需能够准确测量闪烁失真的客观视频质量评估。本文针对合成三维视频提出了一种基于全参考稀疏表示的视频质量评估方法。首先,合成视频被视为具有空间(X-Y)域和时间(T)域的三维体数据,被重构并分解为多个空间上相邻的时间层,即 X-T 或 Y-T 平面。合成视频时间层的梯度特征和深度图的强边缘是检测闪烁失真的关键特征。其次,对时间层进行字典学习和稀疏表示,从而有效地表示时间闪烁失真。第三,使用秩集合方法集合所有时间层得分,得到闪烁失真的得分。最后,将时间闪烁失真测量与传统的空间失真测量相结合,评估合成三维视频的质量。合成视频质量数据库的实验结果表明,我们提出的方法明显优于其他最先进的方法,尤其是在深度视频引起的视图合成失真方面。
{"title":"Sparse Representation based Video Quality Assessment for Synthesized 3D Videos.","authors":"Yun Zhang, Huan Zhang, Mei Yu, Sam Kwong, Yo-Sung Ho","doi":"10.1109/TIP.2019.2929433","DOIUrl":"10.1109/TIP.2019.2929433","url":null,"abstract":"<p><p>The temporal flicker distortion is one of the most annoying noises in synthesized virtual view videos when they are rendered by compressed multi-view video plus depth in Three Dimensional (3D) video system. To assess the synthesized view video quality and further optimize the compression techniques in 3D video system, objective video quality assessment which can accurately measure the flicker distortion is highly needed. In this paper, we propose a full reference sparse representation based video quality assessment method towards synthesized 3D videos. Firstly, a synthesized video, treated as a 3D volume data with spatial (X-Y) and temporal (T) domains, is reformed and decomposed as a number of spatially neighboring temporal layers, i.e., X-T or Y-T planes. Gradient features in temporal layers of the synthesized video and strong edges of depth maps are used as key features in detecting the location of flicker distortions. Secondly, dictionary learning and sparse representation for the temporal layers are then derived and applied to effectively represent the temporal flicker distortion. Thirdly, a rank pooling method is used to pool all the temporal layer scores and obtain the score for the flicker distortion. Finally, the temporal flicker distortion measurement is combined with the conventional spatial distortion measurement to assess the quality of synthesized 3D videos. Experimental results on synthesized video quality database demonstrate our proposed method is significantly superior to other state-of-the-art methods, especially on the view synthesis distortions induced from depth videos.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":null,"pages":null},"PeriodicalIF":10.6,"publicationDate":"2019-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62583733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced Fuzzy-based Local Information Algorithm for Sonar Image Segmentation. 基于模糊局部信息的声纳图像分割增强算法
IF 10.6 1区 计算机科学 Q1 Computer Science Pub Date : 2019-07-29 DOI: 10.1109/TIP.2019.2930148
Avi Abu, Roee Diamant

The recent boost in undersea operations has led to the development of high-resolution sonar systems mounted on autonomous vehicles. These vehicles are used to scan the seafloor in search of different objects such as sunken ships, archaeological sites, and submerged mines. An important part of the detection operation is the segmentation of sonar images, where the object's highlight and shadow are distinguished from the seabed background. In this work, we focus on the automatic segmentation of sonar images. We present our enhanced fuzzybased with Kernel metric (EnFK) algorithm for the segmentation of sonar images which, in an attempt to improve segmentation accuracy, introduces two new fuzzy terms of local spatial and statistical information. Our algorithm includes a preliminary de-noising algorithm which, together with the original image, feeds into the segmentation procedure to avoid trapping to local minima and to improve convergence. The result is a segmentation procedure that specifically suits the intensity inhomogeneity and the complex seabed texture of sonar images. We tested our approach using simulated images, real sonar images, and sonar images that we created in two different sea experiments, using multibeam sonar and synthetic aperture sonar. The results show accurate segmentation performance that is far beyond the stateof-the-art results.

最近,海底作业的发展带动了安装在自动驾驶车辆上的高分辨率声纳系统的开发。这些车辆用于扫描海底,寻找不同的物体,如沉船、考古遗址和水下地雷。探测操作的一个重要部分是分割声纳图像,将物体的亮点和阴影与海底背景区分开来。在这项工作中,我们的重点是声纳图像的自动分割。为了提高分割精度,我们引入了两个新的局部空间和统计信息模糊项。我们的算法包括一个初步的去噪算法,该算法与原始图像一起输入到分割程序中,以避免陷入局部最小值并提高收敛性。因此,这种分割程序特别适合声纳图像的强度不均匀性和复杂的海底纹理。我们使用模拟图像、真实声纳图像以及我们在两个不同的海上实验中使用多波束声纳和合成孔径声纳创建的声纳图像对我们的方法进行了测试。结果表明,该方法的精确分割性能远远超过了最先进的结果。
{"title":"Enhanced Fuzzy-based Local Information Algorithm for Sonar Image Segmentation.","authors":"Avi Abu, Roee Diamant","doi":"10.1109/TIP.2019.2930148","DOIUrl":"10.1109/TIP.2019.2930148","url":null,"abstract":"<p><p>The recent boost in undersea operations has led to the development of high-resolution sonar systems mounted on autonomous vehicles. These vehicles are used to scan the seafloor in search of different objects such as sunken ships, archaeological sites, and submerged mines. An important part of the detection operation is the segmentation of sonar images, where the object's highlight and shadow are distinguished from the seabed background. In this work, we focus on the automatic segmentation of sonar images. We present our enhanced fuzzybased with Kernel metric (EnFK) algorithm for the segmentation of sonar images which, in an attempt to improve segmentation accuracy, introduces two new fuzzy terms of local spatial and statistical information. Our algorithm includes a preliminary de-noising algorithm which, together with the original image, feeds into the segmentation procedure to avoid trapping to local minima and to improve convergence. The result is a segmentation procedure that specifically suits the intensity inhomogeneity and the complex seabed texture of sonar images. We tested our approach using simulated images, real sonar images, and sonar images that we created in two different sea experiments, using multibeam sonar and synthetic aperture sonar. The results show accurate segmentation performance that is far beyond the stateof-the-art results.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":null,"pages":null},"PeriodicalIF":10.6,"publicationDate":"2019-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62583632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Homologous Component Analysis for Domain Adaptation. 用于领域适应的同源成分分析。
IF 10.6 1区 计算机科学 Q1 Computer Science Pub Date : 2019-07-29 DOI: 10.1109/TIP.2019.2929421
Youfa Liu, Weiping Tu, Bo Du, Lefei Zhang, Dacheng Tao

Covariate shift assumption based domain adaptation approaches usually utilize only one common transformation to align marginal distributions and make conditional distributions preserved. However, one common transformation may cause loss of useful information, such as variances and neighborhood relationship in both source and target domain. To address this problem, we propose a novel method called homologous component analysis (HCA) where we try to find two totally different but homologous transformations to align distributions with side information and make conditional distributions preserved. As it is hard to find a closed form solution to the corresponding optimization problem, we solve them by means of the alternating direction minimizing method (ADMM) in the context of Stiefel manifolds. We also provide a generalization error bound for domain adaptation in semi-supervised case and two transformations can help to decrease this upper bound more than only one common transformation does. Extensive experiments on synthetic and real data show the effectiveness of the proposed method by comparing its classification accuracy with the state-of-the-art methods and numerical evidence on chordal distance and Frobenius distance shows that resulting optimal transformations are different.

基于共变移动假设的域适应方法通常只利用一种共同变换来调整边际分布并保留条件分布。然而,一种共同变换可能会导致有用信息的丢失,如源域和目标域中的方差和邻域关系。为了解决这个问题,我们提出了一种名为同源成分分析(HCA)的新方法,试图找到两种完全不同但同源的变换来对齐具有边际信息的分布,并使条件分布得以保留。由于很难找到相应优化问题的闭式解,我们在 Stiefel 流形的背景下通过交替方向最小化方法(ADMM)来解决它们。我们还为半监督情况下的域适应提供了一个泛化误差约束,与只有一个普通变换相比,两个变换更有助于降低这一上限。在合成数据和真实数据上进行的大量实验表明,通过与最先进方法的分类准确性进行比较,我们提出的方法非常有效;弦距和弗罗贝纽斯距的数值证据表明,我们提出的最佳变换是不同的。
{"title":"Homologous Component Analysis for Domain Adaptation.","authors":"Youfa Liu, Weiping Tu, Bo Du, Lefei Zhang, Dacheng Tao","doi":"10.1109/TIP.2019.2929421","DOIUrl":"10.1109/TIP.2019.2929421","url":null,"abstract":"<p><p>Covariate shift assumption based domain adaptation approaches usually utilize only one common transformation to align marginal distributions and make conditional distributions preserved. However, one common transformation may cause loss of useful information, such as variances and neighborhood relationship in both source and target domain. To address this problem, we propose a novel method called homologous component analysis (HCA) where we try to find two totally different but homologous transformations to align distributions with side information and make conditional distributions preserved. As it is hard to find a closed form solution to the corresponding optimization problem, we solve them by means of the alternating direction minimizing method (ADMM) in the context of Stiefel manifolds. We also provide a generalization error bound for domain adaptation in semi-supervised case and two transformations can help to decrease this upper bound more than only one common transformation does. Extensive experiments on synthetic and real data show the effectiveness of the proposed method by comparing its classification accuracy with the state-of-the-art methods and numerical evidence on chordal distance and Frobenius distance shows that resulting optimal transformations are different.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":null,"pages":null},"PeriodicalIF":10.6,"publicationDate":"2019-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62583614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unambiguous Scene Text Segmentation with Referring Expression Comprehension. 利用参照表达理解进行无歧义场景文本分割。
IF 10.6 1区 计算机科学 Q1 Computer Science Pub Date : 2019-07-26 DOI: 10.1109/TIP.2019.2930176
Xuejian Rong, Chucai Yi, Yingli Tian

Text instance provides valuable information for the understanding and interpretation of natural scenes. The rich, precise high-level semantics embodied in the text could be beneficial for understanding the world around us, and empower a wide range of real-world applications. While most recent visual phrase grounding approaches focus on general objects, this paper explores extracting designated texts and predicting unambiguous scene text segmentation mask, i.e. scene text segmentation from natural language descriptions (referring expressions) like orange text on a little boy in black swinging a bat. The solution of this novel problem enables accurate segmentation of scene text instances from the complex background. In our proposed framework, a unified deep network jointly models visual and linguistic information by encoding both region-level and pixel-level visual features of natural scene images into spatial feature maps, and then decode them into saliency response map of text instances. To conduct quantitative evaluations, we establish a new scene text referring expression segmentation dataset: COCO-CharRef. Experimental results demonstrate the effectiveness of the proposed framework on the text instance segmentation task. By combining image-based visual features with language-based textual explanations, our framework outperforms baselines that are derived from state-of-the-art text localization and natural language object retrieval methods on COCO-CharRef dataset.

文本实例为理解和解释自然场景提供了宝贵的信息。文本中体现的丰富、精确的高级语义有助于理解我们周围的世界,并为现实世界中的各种应用提供支持。最近的视觉短语接地方法大多集中在一般物体上,而本文则探索提取指定文本并预测无歧义的场景文本分割掩码,即从自然语言描述(指代表达)中进行场景文本分割,如一个黑衣小男孩挥舞球棒的橙色文本。解决了这个新问题,就能从复杂的背景中准确地分割出场景文本实例。在我们提出的框架中,统一的深度网络通过将自然场景图像的区域级和像素级视觉特征编码为空间特征图,然后将其解码为文本实例的显著性响应图,从而对视觉和语言信息进行联合建模。为了进行定量评估,我们建立了一个新的场景文本引用表达分割数据集:COCO-CharRef。实验结果证明了所提出的框架在文本实例分割任务中的有效性。通过将基于图像的视觉特征与基于语言的文本解释相结合,我们的框架在 COCO-CharRef 数据集上的表现优于最先进的文本定位和自然语言对象检索方法。
{"title":"Unambiguous Scene Text Segmentation with Referring Expression Comprehension.","authors":"Xuejian Rong, Chucai Yi, Yingli Tian","doi":"10.1109/TIP.2019.2930176","DOIUrl":"10.1109/TIP.2019.2930176","url":null,"abstract":"<p><p>Text instance provides valuable information for the understanding and interpretation of natural scenes. The rich, precise high-level semantics embodied in the text could be beneficial for understanding the world around us, and empower a wide range of real-world applications. While most recent visual phrase grounding approaches focus on general objects, this paper explores extracting designated texts and predicting unambiguous scene text segmentation mask, i.e. scene text segmentation from natural language descriptions (referring expressions) like orange text on a little boy in black swinging a bat. The solution of this novel problem enables accurate segmentation of scene text instances from the complex background. In our proposed framework, a unified deep network jointly models visual and linguistic information by encoding both region-level and pixel-level visual features of natural scene images into spatial feature maps, and then decode them into saliency response map of text instances. To conduct quantitative evaluations, we establish a new scene text referring expression segmentation dataset: COCO-CharRef. Experimental results demonstrate the effectiveness of the proposed framework on the text instance segmentation task. By combining image-based visual features with language-based textual explanations, our framework outperforms baselines that are derived from state-of-the-art text localization and natural language object retrieval methods on COCO-CharRef dataset.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":null,"pages":null},"PeriodicalIF":10.6,"publicationDate":"2019-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62583753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hyperspectral Image Denoising via Matrix Factorization and Deep Prior Regularization. 通过矩阵因式分解和深度优先正则化实现高光谱图像去噪。
IF 10.6 1区 计算机科学 Q1 Computer Science Pub Date : 2019-07-19 DOI: 10.1109/TIP.2019.2928627
Baihong Lin, Xiaoming Tao, Jianhua Lu

Deep learning has been successfully introduced for 2D-image denoising, but it is still unsatisfactory for hyperspectral image (HSI) denosing due to the unacceptable computational complexity of the end-to-end training process and the difficulty of building a universal 3D-image training dataset. In this paper, instead of developing an end-to-end deep learning denoising network, we propose a hyperspectral image denoising framework for the removal of mixed Gaussian impulse noise, in which the denoising problem is modeled as a convolutional neural network (CNN) constrained non-negative matrix factorization problem. Using the proximal alternating linearized minimization, the optimization can be divided into three steps: the update of the spectral matrix, the update of the abundance matrix and the estimation of the sparse noise. Then, we design the CNN architecture and proposed two training schemes, which can allow the CNN to be trained with a 2D-image dataset. Compared with the state-of-the-art denoising methods, the proposed method has relatively good performance on the removal of the Gaussian and mixed Gaussian impulse noises. More importantly, the proposed model can be only trained once by a 2D-image dataset, but can be used to denoise HSIs with different numbers of channel bands.

深度学习已被成功引入二维图像去噪,但由于端到端训练过程的计算复杂度难以接受,以及难以建立通用的三维图像训练数据集,它在高光谱图像(HSI)去噪方面仍不尽如人意。在本文中,我们没有开发端到端深度学习去噪网络,而是提出了一个用于去除混合高斯脉冲噪声的高光谱图像去噪框架,其中将去噪问题建模为一个卷积神经网络(CNN)约束非负矩阵因式分解问题。利用近端交替线性化最小化,优化可分为三个步骤:频谱矩阵更新、丰度矩阵更新和稀疏噪声估计。然后,我们设计了 CNN 架构,并提出了两种训练方案,使 CNN 可以使用二维图像数据集进行训练。与最先进的去噪方法相比,所提出的方法在去除高斯和混合高斯脉冲噪声方面具有相对较好的性能。更重要的是,所提出的模型只需通过二维图像数据集进行一次训练,但可用于对不同信道带数的 HSI 进行去噪。
{"title":"Hyperspectral Image Denoising via Matrix Factorization and Deep Prior Regularization.","authors":"Baihong Lin, Xiaoming Tao, Jianhua Lu","doi":"10.1109/TIP.2019.2928627","DOIUrl":"10.1109/TIP.2019.2928627","url":null,"abstract":"<p><p>Deep learning has been successfully introduced for 2D-image denoising, but it is still unsatisfactory for hyperspectral image (HSI) denosing due to the unacceptable computational complexity of the end-to-end training process and the difficulty of building a universal 3D-image training dataset. In this paper, instead of developing an end-to-end deep learning denoising network, we propose a hyperspectral image denoising framework for the removal of mixed Gaussian impulse noise, in which the denoising problem is modeled as a convolutional neural network (CNN) constrained non-negative matrix factorization problem. Using the proximal alternating linearized minimization, the optimization can be divided into three steps: the update of the spectral matrix, the update of the abundance matrix and the estimation of the sparse noise. Then, we design the CNN architecture and proposed two training schemes, which can allow the CNN to be trained with a 2D-image dataset. Compared with the state-of-the-art denoising methods, the proposed method has relatively good performance on the removal of the Gaussian and mixed Gaussian impulse noises. More importantly, the proposed model can be only trained once by a 2D-image dataset, but can be used to denoise HSIs with different numbers of channel bands.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":null,"pages":null},"PeriodicalIF":10.6,"publicationDate":"2019-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62583046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weighted Guided Image Filtering with Steering Kernel. 使用转向核的加权引导图像过滤技术
IF 10.6 1区 计算机科学 Q1 Computer Science Pub Date : 2019-07-19 DOI: 10.1109/TIP.2019.2928631
Zhonggui Sun, Bo Han, Jie Li, Jin Zhang, Xinbo Gao

Due to its local property, guided image filter (GIF) generally suffers from halo artifacts near edges. To make up for the deficiency, a weighted guided image filter (WGIF) was proposed recently by incorporating an edge-aware weighting into the filtering process. It takes the advantages of local and global operations, and achieves better performance in edge-preserving. However, edge direction, a vital property of the guidance image, is not considered fully in these guided filters. In order to overcome the drawback, we propose a novel version of GIF, which can leverage the edge direction more sufficiently. In particular, we utilize the steering kernel to adaptively learn the direction and incorporate the learning results into the filtering process to improve the filter's behavior. Theoretical analysis shows that the proposed method can get more powerful performance with preserving edges and reducing halo artifacts effectively. Similar conclusions are also reached through the thorough experiments including edge-aware smoothing, detail enhancement, denoising and dehazing.

由于其局部特性,引导图像滤波器(GIF)通常会在边缘附近出现光晕伪影。为了弥补这一缺陷,最近有人提出了加权引导图像滤波器(WGIF),在滤波过程中加入边缘感知加权。它兼顾了局部操作和全局操作的优点,在边缘保护方面取得了更好的性能。然而,边缘方向作为引导图像的一个重要属性,在这些引导滤波器中并没有得到充分考虑。为了克服这一缺点,我们提出了一种新的 GIF 版本,它能更充分地利用边缘方向。特别是,我们利用转向核来自适应地学习方向,并将学习结果纳入滤波过程,以改进滤波器的行为。理论分析表明,所提出的方法可以在保留边缘和有效减少光晕伪影方面获得更强大的性能。通过对边缘感知平滑、细节增强、去噪和去色等方面的深入实验,也得出了类似的结论。
{"title":"Weighted Guided Image Filtering with Steering Kernel.","authors":"Zhonggui Sun, Bo Han, Jie Li, Jin Zhang, Xinbo Gao","doi":"10.1109/TIP.2019.2928631","DOIUrl":"10.1109/TIP.2019.2928631","url":null,"abstract":"<p><p>Due to its local property, guided image filter (GIF) generally suffers from halo artifacts near edges. To make up for the deficiency, a weighted guided image filter (WGIF) was proposed recently by incorporating an edge-aware weighting into the filtering process. It takes the advantages of local and global operations, and achieves better performance in edge-preserving. However, edge direction, a vital property of the guidance image, is not considered fully in these guided filters. In order to overcome the drawback, we propose a novel version of GIF, which can leverage the edge direction more sufficiently. In particular, we utilize the steering kernel to adaptively learn the direction and incorporate the learning results into the filtering process to improve the filter's behavior. Theoretical analysis shows that the proposed method can get more powerful performance with preserving edges and reducing halo artifacts effectively. Similar conclusions are also reached through the thorough experiments including edge-aware smoothing, detail enhancement, denoising and dehazing.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":null,"pages":null},"PeriodicalIF":10.6,"publicationDate":"2019-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62583151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Re-Caption: Saliency-Enhanced Image Captioning through Two-Phase Learning. 再字幕:通过两阶段学习进行显著性增强图像字幕制作
IF 10.6 1区 计算机科学 Q1 Computer Science Pub Date : 2019-07-17 DOI: 10.1109/TIP.2019.2928144
Lian Zhou, Yuejie Zhang, Yugang Jiang, Tao Zhang, Weiguo Fan

Visual and semantic saliency are important in image captioning. However, single-phase image captioning benefits little from limited saliency without a saliency predictor. In this paper, a novel saliency-enhanced re-captioning framework via two-phase learning is proposed to enhance the single-phase image captioning. In the framework, visual saliency and semantic saliency are distilled from the first-phase model and fused with the second-phase model for model self-boosting. The visual saliency mechanism can generate a saliency map and a saliency mask for an image without learning a saliency map predictor. The semantic saliency mechanism sheds some lights on the properties of words with part-of-speech Noun in a caption. Besides, another type of saliency, sample saliency is proposed to explicitly compute the saliency degree of each sample, which helps for more robust image captioning. In addition, how to combine the above three types of saliency for further performance boost is also examined. Our framework can treat an image captioning model as a saliency extractor, which may benefit other captioning models and related tasks. The experimental results on both the Flickr30k and MSCOCO datasets show that the saliency-enhanced models can obtain promising performance gains.

视觉和语义显著性在图像标题中非常重要。然而,在没有显著性预测器的情况下,单相图像字幕从有限的显著性中获益甚微。本文提出了一种通过两阶段学习来增强单阶段图像标题的新颖的突出度增强再标题框架。在该框架中,视觉显著性和语义显著性从第一阶段模型中提炼出来,并与第二阶段模型融合,以实现模型自增强。视觉显著性机制可以在不学习显著性图预测器的情况下生成图像的显著性图和显著性掩码。语义突出机制可以揭示标题中带有部分词性名词的词的特性。此外,还提出了另一种类型的显著性,即样本显著性,以明确计算每个样本的显著程度,这有助于更稳健的图像标题制作。此外,我们还研究了如何结合上述三种类型的显著性来进一步提高性能。我们的框架可以将图像标题模型视为显著性提取器,这可能会使其他标题模型和相关任务受益。在 Flickr30k 和 MSCOCO 数据集上的实验结果表明,突出度增强模型可以获得可喜的性能提升。
{"title":"Re-Caption: Saliency-Enhanced Image Captioning through Two-Phase Learning.","authors":"Lian Zhou, Yuejie Zhang, Yugang Jiang, Tao Zhang, Weiguo Fan","doi":"10.1109/TIP.2019.2928144","DOIUrl":"10.1109/TIP.2019.2928144","url":null,"abstract":"<p><p>Visual and semantic saliency are important in image captioning. However, single-phase image captioning benefits little from limited saliency without a saliency predictor. In this paper, a novel saliency-enhanced re-captioning framework via two-phase learning is proposed to enhance the single-phase image captioning. In the framework, visual saliency and semantic saliency are distilled from the first-phase model and fused with the second-phase model for model self-boosting. The visual saliency mechanism can generate a saliency map and a saliency mask for an image without learning a saliency map predictor. The semantic saliency mechanism sheds some lights on the properties of words with part-of-speech Noun in a caption. Besides, another type of saliency, sample saliency is proposed to explicitly compute the saliency degree of each sample, which helps for more robust image captioning. In addition, how to combine the above three types of saliency for further performance boost is also examined. Our framework can treat an image captioning model as a saliency extractor, which may benefit other captioning models and related tasks. The experimental results on both the Flickr30k and MSCOCO datasets show that the saliency-enhanced models can obtain promising performance gains.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":null,"pages":null},"PeriodicalIF":10.6,"publicationDate":"2019-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62583477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Modality-Specific Representations for Visible-Infrared Person Re-Identification. 为可见光-红外线人员再识别学习特定模态表示。
IF 10.6 1区 计算机科学 Q1 Computer Science Pub Date : 2019-07-17 DOI: 10.1109/TIP.2019.2928126
Zhanxiang Feng, Jianhuang Lai, Xiaohua Xie

Traditional person re-identification (re-id) methods perform poorly under changing illuminations. This situation can be addressed by using dual-cameras that capture visible images in a bright environment and infrared images in a dark environment. Yet, this scheme needs to solve the visible-infrared matching issue, which is largely under-studied. Matching pedestrians across heterogeneous modalities is extremely challenging because of different visual characteristics. In this paper, we propose a novel framework that employ modality-specific networks to tackle with the heterogeneous matching problem. The proposed framework utilizes the modality-related information and extracts modality-specific representations (MSR) by constructing an individual network for each modality. In addition, a cross-modality Euclidean constraint is introduced to narrow the gap between different networks. We also integrate the modality-shared layers into modality-specific networks to extract shareable information and use a modality-shared identity loss to facilitate the extraction of modality-invariant features. Then a modality-specific discriminant metric is learned for each domain to strengthen the discriminative power of MSR. Eventually, we use a view classifier to learn view information. The experiments demonstrate that the MSR effectively improves the performance of deep networks on VI-REID and remarkably outperforms the state-of-the-art methods.

传统的人员再识别(re-id)方法在光照变化的情况下表现不佳。这种情况可以通过使用双摄像头来解决,即在明亮环境中捕捉可见光图像,在黑暗环境中捕捉红外图像。然而,这一方案需要解决可见光-红外匹配问题,而这一问题在很大程度上还没有得到充分研究。由于不同的视觉特征,跨异构模态匹配行人极具挑战性。在本文中,我们提出了一个新颖的框架,利用特定模态网络来解决异构匹配问题。所提出的框架利用了与模态相关的信息,并通过为每种模态构建一个单独的网络来提取特定模态表征(MSR)。此外,我们还引入了跨模态欧氏约束,以缩小不同网络之间的差距。我们还将模态共享层整合到特定模态网络中,以提取可共享信息,并使用模态共享身份损失来促进模态不变特征的提取。然后为每个域学习特定模态的判别度量,以加强 MSR 的判别能力。最后,我们使用视图分类器来学习视图信息。实验证明,MSR 有效地提高了深度网络在 VI-REID 上的性能,并明显优于最先进的方法。
{"title":"Learning Modality-Specific Representations for Visible-Infrared Person Re-Identification.","authors":"Zhanxiang Feng, Jianhuang Lai, Xiaohua Xie","doi":"10.1109/TIP.2019.2928126","DOIUrl":"10.1109/TIP.2019.2928126","url":null,"abstract":"<p><p>Traditional person re-identification (re-id) methods perform poorly under changing illuminations. This situation can be addressed by using dual-cameras that capture visible images in a bright environment and infrared images in a dark environment. Yet, this scheme needs to solve the visible-infrared matching issue, which is largely under-studied. Matching pedestrians across heterogeneous modalities is extremely challenging because of different visual characteristics. In this paper, we propose a novel framework that employ modality-specific networks to tackle with the heterogeneous matching problem. The proposed framework utilizes the modality-related information and extracts modality-specific representations (MSR) by constructing an individual network for each modality. In addition, a cross-modality Euclidean constraint is introduced to narrow the gap between different networks. We also integrate the modality-shared layers into modality-specific networks to extract shareable information and use a modality-shared identity loss to facilitate the extraction of modality-invariant features. Then a modality-specific discriminant metric is learned for each domain to strengthen the discriminative power of MSR. Eventually, we use a view classifier to learn view information. The experiments demonstrate that the MSR effectively improves the performance of deep networks on VI-REID and remarkably outperforms the state-of-the-art methods.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":null,"pages":null},"PeriodicalIF":10.6,"publicationDate":"2019-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62583176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inpainting vs denoising for dose reduction in scanning-beam microscopies. 在扫描光束显微镜中减少剂量的涂色与去噪。
IF 10.6 1区 计算机科学 Q1 Computer Science Pub Date : 2019-07-17 DOI: 10.1109/TIP.2019.2928133
Toby Sanders, Christian Dwyer

We consider sampling strategies for reducing the radiation dose during image acquisition in scanning-beam microscopies, such as SEM, STEM, and STXM. Our basic assumption is that we may acquire subsampled image data (with some pixels missing) and then inpaint the missing data using a compressed-sensing approach. Our noise model consists of Poisson noise plus random Gaussian noise. We include the possibility of acquiring fully-sampled image data, in which case the inpainting approach reduces to a denoising procedure. We use numerical simulations to compare the accuracy of reconstructed images with the "ground truths." The results generally indicate that, for sufficiently high radiation doses, higher sampling rates achieve greater accuracy, commensurate with well-established literature. However, for very low radiation doses, where the Poisson noise and/or random Gaussian noise begins to dominate, then our results indicate that subsampling/inpainting can result in smaller reconstruction errors. We also present an information-theoretic analysis, which allows us to quantify the amount of information gained through the different sampling strategies and enables some broader discussion of the main results.

我们考虑了在扫描电子显微镜、STEM 和 STXM 等扫描光束显微镜的图像采集过程中降低辐射剂量的采样策略。我们的基本假设是,我们可以获取子采样图像数据(部分像素缺失),然后使用压缩传感方法对缺失数据进行补绘。我们的噪声模型包括泊松噪声和随机高斯噪声。我们还考虑到了获取全采样图像数据的可能性,在这种情况下,内绘方法简化为去噪程序。我们使用数值模拟来比较重建图像与 "地面实况 "的准确性。结果普遍表明,对于足够高的辐射剂量,较高的采样率能获得更高的精度,这与已发表的文献相符。然而,对于极低的辐射剂量,泊松噪声和/或随机高斯噪声开始占主导地位,那么我们的结果表明,子采样/绘制可以带来较小的重建误差。我们还进行了信息理论分析,从而量化了通过不同采样策略获得的信息量,并对主要结果进行了更广泛的讨论。
{"title":"Inpainting vs denoising for dose reduction in scanning-beam microscopies.","authors":"Toby Sanders, Christian Dwyer","doi":"10.1109/TIP.2019.2928133","DOIUrl":"10.1109/TIP.2019.2928133","url":null,"abstract":"<p><p>We consider sampling strategies for reducing the radiation dose during image acquisition in scanning-beam microscopies, such as SEM, STEM, and STXM. Our basic assumption is that we may acquire subsampled image data (with some pixels missing) and then inpaint the missing data using a compressed-sensing approach. Our noise model consists of Poisson noise plus random Gaussian noise. We include the possibility of acquiring fully-sampled image data, in which case the inpainting approach reduces to a denoising procedure. We use numerical simulations to compare the accuracy of reconstructed images with the \"ground truths.\" The results generally indicate that, for sufficiently high radiation doses, higher sampling rates achieve greater accuracy, commensurate with well-established literature. However, for very low radiation doses, where the Poisson noise and/or random Gaussian noise begins to dominate, then our results indicate that subsampling/inpainting can result in smaller reconstruction errors. We also present an information-theoretic analysis, which allows us to quantify the amount of information gained through the different sampling strategies and enables some broader discussion of the main results.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":null,"pages":null},"PeriodicalIF":10.6,"publicationDate":"2019-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62583239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Image Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1