IEEE Transactions on Image Processing最新文献_第7页

Quality Measurement of Images on Mobile Streaming Interfaces Deployed at Scale. 大规模部署的移动流媒体界面上的图像质量测量。

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-09-11 DOI: 10.1109/TIP.2019.2939733

Zeina Sinno, Anush Moorthy, Jan De Cock, Zhi Li, Alan C Bovik

With the growing use of smart cellular devices for entertainment purposes, audio and video streaming services now offer an increasingly wide variety of popular mobile applications that offer portable and accessible ways to consume content. The user interfaces of these applications have become increasingly visual in nature, and are commonly loaded with dense multimedia content such as thumbnail images, animated GIFs, and short videos. To efficiently render these and to aid rapid download to the client display, it is necessary to compress, scale and color subsample them. These operations introduce distortions, reducing the appeal of the application. It is desirable to be able to automatically monitor and govern the visual qualities of these small images, which are usually small images. However, while there exists a variety of high-performing image quality assessment (IQA) algorithms, none have been designed for this particular use case. This kind of content often has unique characteristics, such as overlaid graphics, intentional brightness, gradients, text, and warping. We describe a study we conducted on the subjective and objective quality of images embedded in the displayed user interfaces of mobile streaming applications. We created a database of typical "billboard" and "thumbnail" images viewed on such services. Using the collected data, we studied the effects of compression, scaling and chroma-subsampling on perceived quality by conducting a subjective study. We also evaluated the performance of leading picture quality prediction models on the new database. We report some surprising results regarding algorithm performance, and find that there remains ample scope for future model development.

随着人们越来越多地将智能移动设备用于娱乐目的，音频和视频流媒体服务现在提供了越来越多种类的流行移动应用程序，这些应用程序提供了便携、易用的内容消费方式。这些应用程序的用户界面在本质上已变得越来越可视化，通常装载有密集的多媒体内容，如缩略图、GIF 动画和视频短片。为了有效地渲染这些内容并帮助快速下载到客户端显示器，有必要对其进行压缩、缩放和色彩子采样。这些操作会带来失真，降低应用程序的吸引力。我们希望能够自动监控和管理这些小图像（通常是小图像）的视觉质量。然而，虽然有各种高性能的图像质量评估（IQA）算法，但没有一种是为这种特殊用例而设计的。这类内容通常具有独特的特征，如叠加图形、有意的亮度、渐变、文本和翘曲。我们介绍了一项关于移动流媒体应用显示的用户界面中嵌入的图像的主观和客观质量的研究。我们创建了一个数据库，收录了在此类服务上观看的典型 "广告牌 "和 "缩略图 "图像。利用收集到的数据，我们通过主观研究，研究了压缩、缩放和色度取样对感知质量的影响。我们还评估了新数据库中主要图片质量预测模型的性能。我们报告了有关算法性能的一些令人惊讶的结果，并发现未来的模型开发仍有广阔的空间。

{"title":"Quality Measurement of Images on Mobile Streaming Interfaces Deployed at Scale.","authors":"Zeina Sinno, Anush Moorthy, Jan De Cock, Zhi Li, Alan C Bovik","doi":"10.1109/TIP.2019.2939733","DOIUrl":"10.1109/TIP.2019.2939733","url":null,"abstract":"With the growing use of smart cellular devices for entertainment purposes, audio and video streaming services now offer an increasingly wide variety of popular mobile applications that offer portable and accessible ways to consume content. The user interfaces of these applications have become increasingly visual in nature, and are commonly loaded with dense multimedia content such as thumbnail images, animated GIFs, and short videos. To efficiently render these and to aid rapid download to the client display, it is necessary to compress, scale and color subsample them. These operations introduce distortions, reducing the appeal of the application. It is desirable to be able to automatically monitor and govern the visual qualities of these small images, which are usually small images. However, while there exists a variety of high-performing image quality assessment (IQA) algorithms, none have been designed for this particular use case. This kind of content often has unique characteristics, such as overlaid graphics, intentional brightness, gradients, text, and warping. We describe a study we conducted on the subjective and objective quality of images embedded in the displayed user interfaces of mobile streaming applications. We created a database of typical \"billboard\" and \"thumbnail\" images viewed on such services. Using the collected data, we studied the effects of compression, scaling and chroma-subsampling on perceived quality by conducting a subjective study. We also evaluated the performance of leading picture quality prediction models on the new database. We report some surprising results regarding algorithm performance, and find that there remains ample scope for future model development.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62586518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Color Control Functions for Multiprimary Displays I: Robustness Analysis and Optimization Formulations. 多主显示器的色彩控制函数 I：稳健性分析和优化公式。

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-09-06 DOI: 10.1109/TIP.2019.2937067

Carlos Eduardo Rodriguez-Pardo, Gaurav Sharma

Color management for a multiprimary display requires, as a fundamental step, the determination of a color control function (CCF) that specifies control values for reproducing each color in the display's gamut. Multiprimary displays offer alternative choices of control values for reproducing a color in the interior of the gamut and accordingly alternative choices of CCFs. Under ideal conditions, alternative CCFs render colors identically. However, deviations in the spectral distributions of the primaries and the diversity of cone sensitivities among observers impact alternative CCFs differently, and, in particular, make some CCFs prone to artifacts in rendered images. We develop a framework for analyzing robustness of CCFs for multiprimary displays against primary and observer variations, incorporating a common model of human color perception. Using the framework, we propose analytical and numerical approaches for determining robust CCFs. First, via analytical development, we: (a) demonstrate that linearity of the CCF in tristimulus space endows it with resilience to variations, particularly, linearity can ensure invariance of the gray axis, (b) construct an axially linear CCF that is defined by the property of linearity over constant chromaticity loci, and (c) obtain an analytical form for the axially linear CCF that demonstrates it is continuous but suffers from the limitation that it does not have continuous derivatives. Second, to overcome the limitation of the axially linear CCF, we motivate and develop two variational objective functions for optimization of multiprimary CCFs, the first aims to preserve color transitions in the presence of primary/observer variations and the second combines this objective with desirable invariance along the gray axis, by incorporating the axially linear CCF. A companion Part II paper, presents an algorithmic approach for numerically computing optimal CCFs for the two alternative variational objective functions proposed here and presents results comparing alternative CCFs for several different 4,5, and 6 primary designs.

多主显示屏的色彩管理需要确定色彩控制函数（CCF），该函数规定了显示屏色域中每种颜色的控制值，这是一个基本步骤。多主显示屏可为再现色域内部的颜色提供不同的控制值选择，因此也可提供不同的 CCF 选择。在理想情况下，不同的 CCF 呈现的色彩是相同的。然而，原色光谱分布的偏差和观察者锥体敏感度的多样性会对替代 CCF 产生不同的影响，特别是会使某些 CCF 在渲染图像中容易出现伪影。我们开发了一个框架，用于分析多原色显示的 CCF 对原色和观察者变化的稳健性，并结合了人类色彩感知的通用模型。利用该框架，我们提出了确定稳健 CCF 的分析和数值方法。首先，通过分析发展，我们(a) 证明三刺激空间中 CCF 的线性使其对变化具有弹性，尤其是线性可确保灰轴不变；(b) 构建轴向线性 CCF，该 CCF 由恒定色度位置上的线性特性定义；(c) 获得轴向线性 CCF 的分析形式，该形式证明它是连续的，但受到它没有连续导数的限制。其次，为了克服轴向线性 CCF 的局限性，我们提出并开发了两个用于优化多主 CCF 的变分目标函数，第一个目标函数的目的是在存在主/观察者变化的情况下保持色彩过渡，第二个目标函数通过纳入轴向线性 CCF，将这一目标与沿灰度轴的理想不变性结合起来。第二部分的配套论文介绍了一种算法方法，用于数值计算本文提出的两种备选变异目标函数的最佳 CCF，并介绍了对几种不同的 4、5 和 6 级设计的备选 CCF 进行比较的结果。

{"title":"Color Control Functions for Multiprimary Displays I: Robustness Analysis and Optimization Formulations.","authors":"Carlos Eduardo Rodriguez-Pardo, Gaurav Sharma","doi":"10.1109/TIP.2019.2937067","DOIUrl":"10.1109/TIP.2019.2937067","url":null,"abstract":"Color management for a multiprimary display requires, as a fundamental step, the determination of a color control function (CCF) that specifies control values for reproducing each color in the display's gamut. Multiprimary displays offer alternative choices of control values for reproducing a color in the interior of the gamut and accordingly alternative choices of CCFs. Under ideal conditions, alternative CCFs render colors identically. However, deviations in the spectral distributions of the primaries and the diversity of cone sensitivities among observers impact alternative CCFs differently, and, in particular, make some CCFs prone to artifacts in rendered images. We develop a framework for analyzing robustness of CCFs for multiprimary displays against primary and observer variations, incorporating a common model of human color perception. Using the framework, we propose analytical and numerical approaches for determining robust CCFs. First, via analytical development, we: (a) demonstrate that linearity of the CCF in tristimulus space endows it with resilience to variations, particularly, linearity can ensure invariance of the gray axis, (b) construct an axially linear CCF that is defined by the property of linearity over constant chromaticity loci, and (c) obtain an analytical form for the axially linear CCF that demonstrates it is continuous but suffers from the limitation that it does not have continuous derivatives. Second, to overcome the limitation of the axially linear CCF, we motivate and develop two variational objective functions for optimization of multiprimary CCFs, the first aims to preserve color transitions in the presence of primary/observer variations and the second combines this objective with desirable invariance along the gray axis, by incorporating the axially linear CCF. A companion Part II paper, presents an algorithmic approach for numerically computing optimal CCFs for the two alternative variational objective functions proposed here and presents results comparing alternative CCFs for several different 4,5, and 6 primary designs.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62585578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Tunable VVC Frame Partitioning based on Lightweight Machine Learning. 基于轻量级机器学习的可调 VVC 帧分区。

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-09-06 DOI: 10.1109/TIP.2019.2938670

Thomas Amestoy, Alexandre Mercat, Wassim Hamidouche, Daniel Menard, Cyril Bergeron

Block partition structure is a critical module in video coding scheme to achieve significant gap of compression performance. Under the exploration of the future video coding standard, named Versatile Video Coding (VVC), a new Quad Tree Binary Tree (QTBT) block partition structure has been introduced. In addition to the QT block partitioning defined in High Efficiency Video Coding (HEVC) standard, new horizontal and vertical BT partitions are enabled, which drastically increases the encoding time compared to HEVC. In this paper, we propose a lightweight and tunable QTBT partitioning scheme based on a Machine Learning (ML) approach. The proposed solution uses Random Forest classifiers to determine for each coding block the most probable partition modes. To minimize the encoding loss induced by misclassification, risk intervals for classifier decisions are introduced in the proposed solution. By varying the size of risk intervals, tunable trade-off between encoding complexity reduction and coding loss is achieved. The proposed solution implemented in the JEM-7.0 software offers encoding complexity reductions ranging from 30average for only 0.7% to 3.0% Bjxntegaard Delta Rate (BDBR) increase in Random Access (RA) coding configuration, with very slight overhead induced by Random Forest. The proposed solution based on Random Forest classifiers is also efficient to reduce the complexity of the Multi-Type Tree (MTT) partitioning scheme under the VTM-5.0 software, with complexity reductions ranging from 25% to 61% in average for only 0.4% to 2.2% BD-BR increase.

块分割结构是视频编码方案中的一个关键模块，它能使压缩性能达到明显的差距。在对未来视频编码标准--多功能视频编码（VVC）--的探索中，引入了一种新的四叉树二叉树（QTBT）块分割结构。除了在高效视频编码（HEVC）标准中定义的 QT 块分区外，还启用了新的水平和垂直 BT 分区，这与 HEVC 相比大大增加了编码时间。在本文中，我们提出了一种基于机器学习 (ML) 方法的轻量级可调 QTBT 分区方案。建议的解决方案使用随机森林分类器为每个编码块确定最可能的分区模式。为了尽量减少错误分类造成的编码损失，建议的解决方案中引入了分类器决策的风险区间。通过改变风险区间的大小，可在降低编码复杂度和编码损失之间实现可调整的权衡。在 JEM-7.0 软件中实施的拟议解决方案在随机存取（RA）编码配置中仅增加了 0.7% 到 3.0% 的 Bjxntegaard Delta Rate (BDBR)，编码复杂度平均降低了 30%，而随机森林造成的开销非常小。在 VTM-5.0 软件下，基于随机森林分类器的拟议解决方案还能有效降低多类型树（MTT）分区方案的复杂度，在 BD-BR 仅增加 0.4% 至 2.2% 的情况下，复杂度平均降低 25% 至 61%。

{"title":"Tunable VVC Frame Partitioning based on Lightweight Machine Learning.","authors":"Thomas Amestoy, Alexandre Mercat, Wassim Hamidouche, Daniel Menard, Cyril Bergeron","doi":"10.1109/TIP.2019.2938670","DOIUrl":"10.1109/TIP.2019.2938670","url":null,"abstract":"Block partition structure is a critical module in video coding scheme to achieve significant gap of compression performance. Under the exploration of the future video coding standard, named Versatile Video Coding (VVC), a new Quad Tree Binary Tree (QTBT) block partition structure has been introduced. In addition to the QT block partitioning defined in High Efficiency Video Coding (HEVC) standard, new horizontal and vertical BT partitions are enabled, which drastically increases the encoding time compared to HEVC. In this paper, we propose a lightweight and tunable QTBT partitioning scheme based on a Machine Learning (ML) approach. The proposed solution uses Random Forest classifiers to determine for each coding block the most probable partition modes. To minimize the encoding loss induced by misclassification, risk intervals for classifier decisions are introduced in the proposed solution. By varying the size of risk intervals, tunable trade-off between encoding complexity reduction and coding loss is achieved. The proposed solution implemented in the JEM-7.0 software offers encoding complexity reductions ranging from 30average for only 0.7% to 3.0% Bjxntegaard Delta Rate (BDBR) increase in Random Access (RA) coding configuration, with very slight overhead induced by Random Forest. The proposed solution based on Random Forest classifiers is also efficient to reduce the complexity of the Multi-Type Tree (MTT) partitioning scheme under the VTM-5.0 software, with complexity reductions ranging from 25% to 61% in average for only 0.4% to 2.2% BD-BR increase.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62586590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-09-05 DOI: 10.1109/TIP.2019.2938321

Yu Zheng, Jianping Fan, Ji Zhang, Xinbo Gao

In multi-task learning, multiple interrelated tasks are jointly learned to achieve better performance. In many cases, if we can identify which tasks are related, we can also clearly identify which tasks are unrelated. In the past, most researchers emphasized exploiting correlations among interrelated tasks while completely ignoring the unrelated tasks that may provide valuable prior knowledge for multi-task learning. In this paper, a new approach is developed to hierarchically learn a tree of multi-task metrics by leveraging prior knowledge about both the related tasks and unrelated tasks. First, a visual tree is constructed to hierarchically organize large numbers of image categories in a coarse-to-fine fashion. Over the visual tree, a multi-task metric classifier is learned for each node by exploiting both the related and unrelated tasks, where the learning tasks for training the classifiers for the sibling child nodes under the same parent node are treated as the interrelated tasks, and the others are treated as the unrelated tasks. In addition, the node-specific metric for the parent node is propagated to its sibling child nodes to control inter-level error propagation. Our experimental results demonstrate that our hierarchical metric learning algorithm achieves better results than other state-of-the-art algorithms.

在多任务学习中，多个相互关联的任务被联合学习，以获得更好的性能。在许多情况下，如果我们能确定哪些任务是相关的，我们也就能清楚地确定哪些任务是不相关的。过去，大多数研究人员强调利用相互关联任务之间的相关性，而完全忽略了可能为多任务学习提供宝贵先验知识的不相关任务。本文开发了一种新方法，利用相关任务和无关任务的先验知识，分层学习多任务指标树。首先，构建视觉树，以从粗到细的方式分层组织大量图像类别。在视觉树上，通过利用相关任务和非相关任务为每个节点学习多任务度量分类器，其中用于训练同一父节点下同胞子节点分类器的学习任务被视为相互关联的任务，而其他任务则被视为非相关任务。此外，父节点的节点特定度量也会传播到同级子节点，以控制层级间的误差传播。实验结果表明，我们的分层度量学习算法比其他最先进的算法取得了更好的效果。

{"title":"Exploiting Related and Unrelated Tasks for Hierarchical Metric Learning and Image Classification.","authors":"Yu Zheng, Jianping Fan, Ji Zhang, Xinbo Gao","doi":"10.1109/TIP.2019.2938321","DOIUrl":"10.1109/TIP.2019.2938321","url":null,"abstract":"In multi-task learning, multiple interrelated tasks are jointly learned to achieve better performance. In many cases, if we can identify which tasks are related, we can also clearly identify which tasks are unrelated. In the past, most researchers emphasized exploiting correlations among interrelated tasks while completely ignoring the unrelated tasks that may provide valuable prior knowledge for multi-task learning. In this paper, a new approach is developed to hierarchically learn a tree of multi-task metrics by leveraging prior knowledge about both the related tasks and unrelated tasks. First, a visual tree is constructed to hierarchically organize large numbers of image categories in a coarse-to-fine fashion. Over the visual tree, a multi-task metric classifier is learned for each node by exploiting both the related and unrelated tasks, where the learning tasks for training the classifiers for the sibling child nodes under the same parent node are treated as the interrelated tasks, and the others are treated as the unrelated tasks. In addition, the node-specific metric for the parent node is propagated to its sibling child nodes to control inter-level error propagation. Our experimental results demonstrate that our hierarchical metric learning algorithm achieves better results than other state-of-the-art algorithms.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62586099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep Cascade Model based Face Recognition: When Deep-layered Learning Meets Small Data. 基于深度级联模型的人脸识别：当深度层学习遇到小数据时。

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-09-05 DOI: 10.1109/TIP.2019.2938307

Lei Zhang, Ji Liu, Bob Zhanga, David Zhangb, Ce Zhu

Sparse representation based classification (SRC), nuclear-norm matrix regression (NMR), and deep learning (DL) have achieved a great success in face recognition (FR). However, there still exist some intrinsic limitations among them. SRC and NMR based coding methods belong to one-step model, such that the latent discriminative information of the coding error vector cannot be fully exploited. DL, as a multi-step model, can learn powerful representation, but relies on large-scale data and computation resources for numerous parameters training with complicated back-propagation. Straightforward training of deep neural networks from scratch on small-scale data is almost infeasible. Therefore, in order to develop efficient algorithms that are specifically adapted for small-scale data, we propose to derive the deep models of SRC and NMR. Specifically, in this paper, we propose an end-to-end deep cascade model (DCM) based on SRC and NMR with hierarchical learning, nonlinear transformation and multi-layer structure for corrupted face recognition. The contributions include four aspects. First, an end-to-end deep cascade model for small-scale data without back-propagation is proposed. Second, a multi-level pyramid structure is integrated for local feature representation. Third, for introducing nonlinear transformation in layer-wise learning, softmax vector coding of the errors with class discrimination is proposed. Fourth, the existing representation methods can be easily integrated into our DCM framework. Experiments on a number of small-scale benchmark FR datasets demonstrate the superiority of the proposed model over state-of-the-art counterparts. Additionally, a perspective that deep-layered learning does not have to be convolutional neural network with back-propagation optimization is consolidated. The demo code is available in https://github.com/liuji93/DCM.

基于稀疏表示的分类（SRC）、核规范矩阵回归（NMR）和深度学习（DL）在人脸识别（FR）领域取得了巨大成功。然而，它们之间仍存在一些固有的局限性。基于 SRC 和 NMR 的编码方法属于单步模型，因此无法充分利用编码误差向量的潜在判别信息。DL 作为一种多步骤模型，可以学习强大的表示，但需要依赖大规模的数据和计算资源，通过复杂的反向传播进行大量参数训练。在小规模数据上从零开始直接训练深度神经网络几乎是不可行的。因此，为了开发专门适用于小规模数据的高效算法，我们建议推导出 SRC 和 NMR 的深度模型。具体地说，在本文中，我们提出了一种端到端的深度级联模型（DCM），它基于具有分层学习、非线性变换和多层结构的 SRC 和 NMR，用于损坏的人脸识别。其贡献包括四个方面。首先，提出了一种用于小规模数据的端到端深度级联模型，无需反向传播。第二，为局部特征表示集成了多级金字塔结构。第三，为在分层学习中引入非线性变换，提出了具有类别区分的误差软最大向量编码。第四，现有的表示方法可以很容易地集成到我们的 DCM 框架中。在一些小规模基准 FR 数据集上的实验证明，所提出的模型优于最先进的同行模型。此外，我们还巩固了一种观点，即深度分层学习不一定非要采用反向传播优化的卷积神经网络。演示代码见 https://github.com/liuji93/DCM。

{"title":"Deep Cascade Model based Face Recognition: When Deep-layered Learning Meets Small Data.","authors":"Lei Zhang, Ji Liu, Bob Zhanga, David Zhangb, Ce Zhu","doi":"10.1109/TIP.2019.2938307","DOIUrl":"10.1109/TIP.2019.2938307","url":null,"abstract":"Sparse representation based classification (SRC), nuclear-norm matrix regression (NMR), and deep learning (DL) have achieved a great success in face recognition (FR). However, there still exist some intrinsic limitations among them. SRC and NMR based coding methods belong to one-step model, such that the latent discriminative information of the coding error vector cannot be fully exploited. DL, as a multi-step model, can learn powerful representation, but relies on large-scale data and computation resources for numerous parameters training with complicated back-propagation. Straightforward training of deep neural networks from scratch on small-scale data is almost infeasible. Therefore, in order to develop efficient algorithms that are specifically adapted for small-scale data, we propose to derive the deep models of SRC and NMR. Specifically, in this paper, we propose an end-to-end deep cascade model (DCM) based on SRC and NMR with hierarchical learning, nonlinear transformation and multi-layer structure for corrupted face recognition. The contributions include four aspects. First, an end-to-end deep cascade model for small-scale data without back-propagation is proposed. Second, a multi-level pyramid structure is integrated for local feature representation. Third, for introducing nonlinear transformation in layer-wise learning, softmax vector coding of the errors with class discrimination is proposed. Fourth, the existing representation methods can be easily integrated into our DCM framework. Experiments on a number of small-scale benchmark FR datasets demonstrate the superiority of the proposed model over state-of-the-art counterparts. Additionally, a perspective that deep-layered learning does not have to be convolutional neural network with back-propagation optimization is consolidated. The demo code is available in https://github.com/liuji93/DCM.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62586446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multiple Cycle-in-Cycle Generative Adversarial Networks for Unsupervised Image Super-Resolution. 用于无监督图像超分辨率的多循环生成对抗网络

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-09-05 DOI: 10.1109/TIP.2019.2938347

Yongbing Zhang, Siyuan Liu, Chao Dong, Xinfeng Zhang, Yuan Yuan

With the help of convolutional neural networks (CNN), the single image super-resolution problem has been widely studied. Most of these CNN based methods focus on learning a model to map a low-resolution (LR) image to a highresolution (HR) image, where the LR image is downsampled from the HR image with a known model. However, in a more general case when the process of the down-sampling is unknown and the LR input is degraded by noises and blurring, it is difficult to acquire the LR and HR image pairs for traditional supervised learning. Inspired by the recent unsupervised imagestyle translation applications using unpaired data, we propose a multiple Cycle-in-Cycle network structure to deal with the more general case using multiple generative adversarial networks (GAN) as the basis components. The first network cycle aims at mapping the noisy and blurry LR input to a noise-free LR space, then a new cycle with a well-trained ×2 network model is orderly introduced to super-resolve the intermediate output of the former cycle. The number of total cycles depends on the different up-sampling factors (×2, ×4, ×8). Finally, all modules are trained in an end-to-end manner to get the desired HR output. Quantitative indexes and qualitative results show that our proposed method achieves comparable performance with the state-of-the-art supervised models.

在卷积神经网络（CNN）的帮助下，人们对单图像超分辨率问题进行了广泛研究。这些基于卷积神经网络的方法大多侧重于学习将低分辨率（LR）图像映射到高分辨率（HR）图像的模型，其中 LR 图像是通过已知模型从 HR 图像向下采样得到的。然而，在更普遍的情况下，当下采样过程未知，且低分辨率输入因噪声和模糊而退化时，传统的监督学习就很难获得低分辨率和高分辨率图像对。受最近使用无配对数据的无监督图像式翻译应用的启发，我们提出了一种多循环网络结构（Cycle-in-Cycle network structure），以多个生成式对抗网络（GAN）作为基础组件来处理更一般的情况。第一个网络循环的目的是将有噪声的模糊 LR 输入映射到无噪声的 LR 空间，然后有序地引入一个具有训练有素的 ×2 网络模型的新循环，对前一个循环的中间输出进行超分辨率处理。总循环次数取决于不同的上采样因子（×2、×4、×8）。最后，以端到端方式对所有模块进行训练，以获得所需的人力资源输出。定量指标和定性结果表明，我们提出的方法与最先进的监督模型性能相当。

{"title":"Multiple Cycle-in-Cycle Generative Adversarial Networks for Unsupervised Image Super-Resolution.","authors":"Yongbing Zhang, Siyuan Liu, Chao Dong, Xinfeng Zhang, Yuan Yuan","doi":"10.1109/TIP.2019.2938347","DOIUrl":"10.1109/TIP.2019.2938347","url":null,"abstract":"With the help of convolutional neural networks (CNN), the single image super-resolution problem has been widely studied. Most of these CNN based methods focus on learning a model to map a low-resolution (LR) image to a highresolution (HR) image, where the LR image is downsampled from the HR image with a known model. However, in a more general case when the process of the down-sampling is unknown and the LR input is degraded by noises and blurring, it is difficult to acquire the LR and HR image pairs for traditional supervised learning. Inspired by the recent unsupervised imagestyle translation applications using unpaired data, we propose a multiple Cycle-in-Cycle network structure to deal with the more general case using multiple generative adversarial networks (GAN) as the basis components. The first network cycle aims at mapping the noisy and blurry LR input to a noise-free LR space, then a new cycle with a well-trained ×2 network model is orderly introduced to super-resolve the intermediate output of the former cycle. The number of total cycles depends on the different up-sampling factors (×2, ×4, ×8). Finally, all modules are trained in an end-to-end manner to get the desired HR output. Quantitative indexes and qualitative results show that our proposed method achieves comparable performance with the state-of-the-art supervised models.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62586945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Edge-Sensitive Human Cutout with Hierarchical Granularity and Loopy Matting Guidance. 具有层次粒度和 Loopy Matting 指导功能的边缘敏感人体剪切。

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-09-05 DOI: 10.1109/TIP.2019.2930146

Jingwen Ye, Yongcheng Jing, Xinchao Wang, Kairi Ou, Dacheng Tao, Mingli Song

Human parsing and matting play important roles in various applications, such as dress collocation, clothing recommendation, and image editing. In this paper, we propose a lightweight hybrid model that unifies the fully-supervised hierarchical-granularity parsing task and the unsupervised matting one. Our model comprises two parts, the extensible hierarchical semantic segmentation block using CNN and the matting module composed of guided filters. Given a human image, the segmentation block stage-1 first obtains a primitive segmentation map to separate the human and the background. The primitive segmentation is then fed into stage-2 together with the original image to give a rough segmentation of human body. This procedure is repeated in the stage-3 to acquire a refined segmentation. The matting module takes as input the above estimated segmentation maps and produces the matting map, in a fully unsupervised manner. The obtained matting map is then in turn fed back to the CNN in the first block for refining the semantic segmentation results.

在服饰搭配、服装推荐和图像编辑等各种应用中，人类解析和匹配发挥着重要作用。在本文中，我们提出了一种轻量级混合模型，它将完全监督下的层次-粒度解析任务和无监督下的匹配任务统一起来。我们的模型由两部分组成，一部分是使用 CNN 的可扩展分层语义分割模块，另一部分是由引导式过滤器组成的消隐模块。给定一幅人体图像，分割模块 stage-1 首先获取一个原始分割图，以分离人体和背景。然后将原始分割图与原始图像一起送入 stage-2，从而得到人体的粗略分割图。这一过程在 stage-3 中重复，以获得精细的分割图。消隐模块将上述估计的分割图作为输入，以完全无监督的方式生成消隐图。获得的消隐图反过来反馈给第一区块中的 CNN，以完善语义分割结果。

引用次数: 0

Multi-View Video Synopsis via Simultaneous Object-Shifting and View-Switching Optimization. 通过同时优化对象移动和视图切换实现多视图视频概要

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-09-04 DOI: 10.1109/TIP.2019.2938086

Zhensong Zhang, Yongwei Nie, Hanqiu Sun, Qing Zhang, Qiuxia Lai, Guiqing Li, Mingyu Xiao

We present a method for synopsizing multiple videos captured by a set of surveillance cameras with some overlapped field-of-views. Currently, object-based approaches that directly shift objects along the time axis are already able to compute compact synopsis results for multiple surveillance videos. The challenge is how to present the multiple synopsis results in a more compact and understandable way. Previous approaches show them side by side on the screen, which however is difficult for user to comprehend. In this paper, we solve the problem by joint object-shifting and camera view-switching. Firstly, we synchronize the input videos, and group the same object in different videos together. Then we shift the groups of objects along the time axis to obtain multiple synopsis videos. Instead of showing them simultaneously, we just show one of them at each time, and allow to switch among the views of different synopsis videos. In this view switching way, we obtain just a single synopsis results consisting of content from all the input videos, which is much easier for user to follow and understand. To obtain the best synopsis result, we construct a simultaneous object-shifting and view-switching optimization framework instead of solving them separately. We also present an alternative optimization strategy composed of graph cuts and dynamic programming to solve the unified optimization. Experiments demonstrate that our single synopsis video generated from multiple input videos is compact, complete, and easy to understand.

我们提出了一种方法，用于对一组监控摄像机拍摄的多路视频进行视场重叠的综合分析。目前，直接沿时间轴移动对象的基于对象的方法已经能够为多个监控视频计算出紧凑的概要结果。目前的挑战是如何以更紧凑、更易懂的方式呈现多个概要结果。以前的方法是将它们并排显示在屏幕上，但用户很难理解。在本文中，我们通过联合对象移动和摄像机视图切换来解决这一问题。首先，我们对输入视频进行同步，并将不同视频中的相同物体分组。然后，我们沿时间轴移动对象组，得到多个概要视频。我们不同时播放这些视频，而是每次只播放其中一个，并允许在不同的概要视频中切换视图。通过这种视图切换方式，我们只需获得一个由所有输入视频内容组成的提要结果，这更便于用户观看和理解。为了获得最佳的概要结果，我们构建了一个同时进行对象移动和视图切换的优化框架，而不是分别解决这两个问题。我们还提出了一种由图切割和动态编程组成的替代优化策略，以解决统一优化问题。实验证明，我们从多个输入视频生成的单一视频概要紧凑、完整且易于理解。

{"title":"Multi-View Video Synopsis via Simultaneous Object-Shifting and View-Switching Optimization.","authors":"Zhensong Zhang, Yongwei Nie, Hanqiu Sun, Qing Zhang, Qiuxia Lai, Guiqing Li, Mingyu Xiao","doi":"10.1109/TIP.2019.2938086","DOIUrl":"10.1109/TIP.2019.2938086","url":null,"abstract":"We present a method for synopsizing multiple videos captured by a set of surveillance cameras with some overlapped field-of-views. Currently, object-based approaches that directly shift objects along the time axis are already able to compute compact synopsis results for multiple surveillance videos. The challenge is how to present the multiple synopsis results in a more compact and understandable way. Previous approaches show them side by side on the screen, which however is difficult for user to comprehend. In this paper, we solve the problem by joint object-shifting and camera view-switching. Firstly, we synchronize the input videos, and group the same object in different videos together. Then we shift the groups of objects along the time axis to obtain multiple synopsis videos. Instead of showing them simultaneously, we just show one of them at each time, and allow to switch among the views of different synopsis videos. In this view switching way, we obtain just a single synopsis results consisting of content from all the input videos, which is much easier for user to follow and understand. To obtain the best synopsis result, we construct a simultaneous object-shifting and view-switching optimization framework instead of solving them separately. We also present an alternative optimization strategy composed of graph cuts and dynamic programming to solve the unified optimization. Experiments demonstrate that our single synopsis video generated from multiple input videos is compact, complete, and easy to understand.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62586355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust Seismic Image Interpolation with Mathematical Morphological Constraint. 利用数学形态学约束进行鲁棒地震图像插值。

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-08-30 DOI: 10.1109/TIP.2019.2936744

Weilin Huang, Jianxin Liu

Seismic image interpolation is a currently popular research subject in modern reflection seismology. The interpolation problem is generally treated as a process of inversion. Under the compressed sensing framework, various sparse transformations and low-rank constraints based methods have great performances in recovering irregularly missing traces. However, in the case of regularly missing traces, their applications are limited because of the strong spatial aliasing energies. In addition, the erratic noise always poses a serious impact on the interpolation results obtained by the sparse transformations and low-rank constraints-based methods. This is because the erratic noise is far from satisfying the statistical assumption behind these methods. In this study, we propose a mathematical morphology-based interpolation technique, which constrains the morphological scale of the model in the inversion process. The inversion problem is solved by the shaping regularization approach. The mathematical morphological constraint (MMC)-based interpolation technique has a satisfactory robustness to the spatial aliasing and erratic energies. We provide a detailed algorithmic framework and discuss the extension from 2D to higher dimensional version and the back operator in the shaping inversion. A group of numerical examples demonstrates the successful performance of the proposed technique.

地震图像插值是现代反射地震学的一个热门研究课题。插值问题一般被视为一个反演过程。在压缩传感框架下，各种基于稀疏变换和低秩约束的方法在恢复不规则缺失地震道方面表现出色。然而，对于有规律的缺失轨迹，由于存在很强的空间混叠能量，这些方法的应用受到了限制。此外，无规律噪声总是对基于稀疏变换和低阶约束的方法所获得的插值结果造成严重影响。这是因为无规律噪声远远不能满足这些方法背后的统计假设。在本研究中，我们提出了一种基于数学形态学的插值技术，在反演过程中对模型的形态尺度进行约束。反演问题由整形正则化方法解决。基于数学形态约束（MMC）的插值技术对空间混叠和不稳定能量具有令人满意的鲁棒性。我们提供了详细的算法框架，并讨论了从二维到高维版本的扩展以及整形反演中的回算子。一组数值示例证明了所提技术的成功性能。

{"title":"Robust Seismic Image Interpolation with Mathematical Morphological Constraint.","authors":"Weilin Huang, Jianxin Liu","doi":"10.1109/TIP.2019.2936744","DOIUrl":"10.1109/TIP.2019.2936744","url":null,"abstract":"Seismic image interpolation is a currently popular research subject in modern reflection seismology. The interpolation problem is generally treated as a process of inversion. Under the compressed sensing framework, various sparse transformations and low-rank constraints based methods have great performances in recovering irregularly missing traces. However, in the case of regularly missing traces, their applications are limited because of the strong spatial aliasing energies. In addition, the erratic noise always poses a serious impact on the interpolation results obtained by the sparse transformations and low-rank constraints-based methods. This is because the erratic noise is far from satisfying the statistical assumption behind these methods. In this study, we propose a mathematical morphology-based interpolation technique, which constrains the morphological scale of the model in the inversion process. The inversion problem is solved by the shaping regularization approach. The mathematical morphological constraint (MMC)-based interpolation technique has a satisfactory robustness to the spatial aliasing and erratic energies. We provide a detailed algorithmic framework and discuss the extension from 2D to higher dimensional version and the back operator in the shaping inversion. A group of numerical examples demonstrates the successful performance of the proposed technique.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62585953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

From pairwise comparisons and rating to a unified quality scale. 从成对比较和评级到统一质量标准。

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-08-28 DOI: 10.1109/TIP.2019.2936103

Maria Perez-Ortiz, Aliaksei Mikhailiuk, Emin Zerman, Vedad Hulusic, Giuseppe Valenzise, Rafal K Mantiuk

The goal of psychometric scaling is the quantification of perceptual experiences, understanding the relationship between an external stimulus, the internal representation and the response. In this paper, we propose a probabilistic framework to fuse the outcome of different psychophysical experimental protocols, namely rating and pairwise comparisons experiments. Such a method can be used for merging existing datasets of subjective nature and for experiments in which both measurements are collected. We analyze and compare the outcomes of both types of experimental protocols in terms of time and accuracy in a set of simulations and experiments with benchmark and real-world image quality assessment datasets, showing the necessity of scaling and the advantages of each protocol and mixing. Although most of our examples focus on image quality assessment, our findings generalize to any other subjective quality-of-experience task.

心理测量的目标是量化知觉体验，理解外部刺激、内部表征和反应之间的关系。在本文中，我们提出了一种概率框架，用于融合不同心理物理实验方案（即评级和成对比较实验）的结果。这种方法可用于合并现有的主观性数据集，也可用于同时收集两种测量结果的实验。我们利用基准数据集和现实世界的图像质量评估数据集进行了一系列模拟和实验，分析并比较了两种实验方案在时间和准确性方面的结果，显示了缩放的必要性以及每种方案和混合方案的优势。虽然我们的大多数例子都集中在图像质量评估上，但我们的发现也适用于任何其他主观体验质量任务。

引用次数: 0