Pub Date : 2019-09-11DOI: 10.1109/TIP.2019.2939733
Zeina Sinno, Anush Moorthy, Jan De Cock, Zhi Li, Alan C Bovik
With the growing use of smart cellular devices for entertainment purposes, audio and video streaming services now offer an increasingly wide variety of popular mobile applications that offer portable and accessible ways to consume content. The user interfaces of these applications have become increasingly visual in nature, and are commonly loaded with dense multimedia content such as thumbnail images, animated GIFs, and short videos. To efficiently render these and to aid rapid download to the client display, it is necessary to compress, scale and color subsample them. These operations introduce distortions, reducing the appeal of the application. It is desirable to be able to automatically monitor and govern the visual qualities of these small images, which are usually small images. However, while there exists a variety of high-performing image quality assessment (IQA) algorithms, none have been designed for this particular use case. This kind of content often has unique characteristics, such as overlaid graphics, intentional brightness, gradients, text, and warping. We describe a study we conducted on the subjective and objective quality of images embedded in the displayed user interfaces of mobile streaming applications. We created a database of typical "billboard" and "thumbnail" images viewed on such services. Using the collected data, we studied the effects of compression, scaling and chroma-subsampling on perceived quality by conducting a subjective study. We also evaluated the performance of leading picture quality prediction models on the new database. We report some surprising results regarding algorithm performance, and find that there remains ample scope for future model development.
{"title":"Quality Measurement of Images on Mobile Streaming Interfaces Deployed at Scale.","authors":"Zeina Sinno, Anush Moorthy, Jan De Cock, Zhi Li, Alan C Bovik","doi":"10.1109/TIP.2019.2939733","DOIUrl":"10.1109/TIP.2019.2939733","url":null,"abstract":"<p><p>With the growing use of smart cellular devices for entertainment purposes, audio and video streaming services now offer an increasingly wide variety of popular mobile applications that offer portable and accessible ways to consume content. The user interfaces of these applications have become increasingly visual in nature, and are commonly loaded with dense multimedia content such as thumbnail images, animated GIFs, and short videos. To efficiently render these and to aid rapid download to the client display, it is necessary to compress, scale and color subsample them. These operations introduce distortions, reducing the appeal of the application. It is desirable to be able to automatically monitor and govern the visual qualities of these small images, which are usually small images. However, while there exists a variety of high-performing image quality assessment (IQA) algorithms, none have been designed for this particular use case. This kind of content often has unique characteristics, such as overlaid graphics, intentional brightness, gradients, text, and warping. We describe a study we conducted on the subjective and objective quality of images embedded in the displayed user interfaces of mobile streaming applications. We created a database of typical \"billboard\" and \"thumbnail\" images viewed on such services. Using the collected data, we studied the effects of compression, scaling and chroma-subsampling on perceived quality by conducting a subjective study. We also evaluated the performance of leading picture quality prediction models on the new database. We report some surprising results regarding algorithm performance, and find that there remains ample scope for future model development.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62586518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-06DOI: 10.1109/TIP.2019.2937067
Carlos Eduardo Rodriguez-Pardo, Gaurav Sharma
Color management for a multiprimary display requires, as a fundamental step, the determination of a color control function (CCF) that specifies control values for reproducing each color in the display's gamut. Multiprimary displays offer alternative choices of control values for reproducing a color in the interior of the gamut and accordingly alternative choices of CCFs. Under ideal conditions, alternative CCFs render colors identically. However, deviations in the spectral distributions of the primaries and the diversity of cone sensitivities among observers impact alternative CCFs differently, and, in particular, make some CCFs prone to artifacts in rendered images. We develop a framework for analyzing robustness of CCFs for multiprimary displays against primary and observer variations, incorporating a common model of human color perception. Using the framework, we propose analytical and numerical approaches for determining robust CCFs. First, via analytical development, we: (a) demonstrate that linearity of the CCF in tristimulus space endows it with resilience to variations, particularly, linearity can ensure invariance of the gray axis, (b) construct an axially linear CCF that is defined by the property of linearity over constant chromaticity loci, and (c) obtain an analytical form for the axially linear CCF that demonstrates it is continuous but suffers from the limitation that it does not have continuous derivatives. Second, to overcome the limitation of the axially linear CCF, we motivate and develop two variational objective functions for optimization of multiprimary CCFs, the first aims to preserve color transitions in the presence of primary/observer variations and the second combines this objective with desirable invariance along the gray axis, by incorporating the axially linear CCF. A companion Part II paper, presents an algorithmic approach for numerically computing optimal CCFs for the two alternative variational objective functions proposed here and presents results comparing alternative CCFs for several different 4,5, and 6 primary designs.
{"title":"Color Control Functions for Multiprimary Displays I: Robustness Analysis and Optimization Formulations.","authors":"Carlos Eduardo Rodriguez-Pardo, Gaurav Sharma","doi":"10.1109/TIP.2019.2937067","DOIUrl":"10.1109/TIP.2019.2937067","url":null,"abstract":"<p><p>Color management for a multiprimary display requires, as a fundamental step, the determination of a color control function (CCF) that specifies control values for reproducing each color in the display's gamut. Multiprimary displays offer alternative choices of control values for reproducing a color in the interior of the gamut and accordingly alternative choices of CCFs. Under ideal conditions, alternative CCFs render colors identically. However, deviations in the spectral distributions of the primaries and the diversity of cone sensitivities among observers impact alternative CCFs differently, and, in particular, make some CCFs prone to artifacts in rendered images. We develop a framework for analyzing robustness of CCFs for multiprimary displays against primary and observer variations, incorporating a common model of human color perception. Using the framework, we propose analytical and numerical approaches for determining robust CCFs. First, via analytical development, we: (a) demonstrate that linearity of the CCF in tristimulus space endows it with resilience to variations, particularly, linearity can ensure invariance of the gray axis, (b) construct an axially linear CCF that is defined by the property of linearity over constant chromaticity loci, and (c) obtain an analytical form for the axially linear CCF that demonstrates it is continuous but suffers from the limitation that it does not have continuous derivatives. Second, to overcome the limitation of the axially linear CCF, we motivate and develop two variational objective functions for optimization of multiprimary CCFs, the first aims to preserve color transitions in the presence of primary/observer variations and the second combines this objective with desirable invariance along the gray axis, by incorporating the axially linear CCF. A companion Part II paper, presents an algorithmic approach for numerically computing optimal CCFs for the two alternative variational objective functions proposed here and presents results comparing alternative CCFs for several different 4,5, and 6 primary designs.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62585578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-06DOI: 10.1109/TIP.2019.2938670
Thomas Amestoy, Alexandre Mercat, Wassim Hamidouche, Daniel Menard, Cyril Bergeron
Block partition structure is a critical module in video coding scheme to achieve significant gap of compression performance. Under the exploration of the future video coding standard, named Versatile Video Coding (VVC), a new Quad Tree Binary Tree (QTBT) block partition structure has been introduced. In addition to the QT block partitioning defined in High Efficiency Video Coding (HEVC) standard, new horizontal and vertical BT partitions are enabled, which drastically increases the encoding time compared to HEVC. In this paper, we propose a lightweight and tunable QTBT partitioning scheme based on a Machine Learning (ML) approach. The proposed solution uses Random Forest classifiers to determine for each coding block the most probable partition modes. To minimize the encoding loss induced by misclassification, risk intervals for classifier decisions are introduced in the proposed solution. By varying the size of risk intervals, tunable trade-off between encoding complexity reduction and coding loss is achieved. The proposed solution implemented in the JEM-7.0 software offers encoding complexity reductions ranging from 30average for only 0.7% to 3.0% Bjxntegaard Delta Rate (BDBR) increase in Random Access (RA) coding configuration, with very slight overhead induced by Random Forest. The proposed solution based on Random Forest classifiers is also efficient to reduce the complexity of the Multi-Type Tree (MTT) partitioning scheme under the VTM-5.0 software, with complexity reductions ranging from 25% to 61% in average for only 0.4% to 2.2% BD-BR increase.
{"title":"Tunable VVC Frame Partitioning based on Lightweight Machine Learning.","authors":"Thomas Amestoy, Alexandre Mercat, Wassim Hamidouche, Daniel Menard, Cyril Bergeron","doi":"10.1109/TIP.2019.2938670","DOIUrl":"10.1109/TIP.2019.2938670","url":null,"abstract":"<p><p>Block partition structure is a critical module in video coding scheme to achieve significant gap of compression performance. Under the exploration of the future video coding standard, named Versatile Video Coding (VVC), a new Quad Tree Binary Tree (QTBT) block partition structure has been introduced. In addition to the QT block partitioning defined in High Efficiency Video Coding (HEVC) standard, new horizontal and vertical BT partitions are enabled, which drastically increases the encoding time compared to HEVC. In this paper, we propose a lightweight and tunable QTBT partitioning scheme based on a Machine Learning (ML) approach. The proposed solution uses Random Forest classifiers to determine for each coding block the most probable partition modes. To minimize the encoding loss induced by misclassification, risk intervals for classifier decisions are introduced in the proposed solution. By varying the size of risk intervals, tunable trade-off between encoding complexity reduction and coding loss is achieved. The proposed solution implemented in the JEM-7.0 software offers encoding complexity reductions ranging from 30average for only 0.7% to 3.0% Bjxntegaard Delta Rate (BDBR) increase in Random Access (RA) coding configuration, with very slight overhead induced by Random Forest. The proposed solution based on Random Forest classifiers is also efficient to reduce the complexity of the Multi-Type Tree (MTT) partitioning scheme under the VTM-5.0 software, with complexity reductions ranging from 25% to 61% in average for only 0.4% to 2.2% BD-BR increase.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62586590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-05DOI: 10.1109/TIP.2019.2938321
Yu Zheng, Jianping Fan, Ji Zhang, Xinbo Gao
In multi-task learning, multiple interrelated tasks are jointly learned to achieve better performance. In many cases, if we can identify which tasks are related, we can also clearly identify which tasks are unrelated. In the past, most researchers emphasized exploiting correlations among interrelated tasks while completely ignoring the unrelated tasks that may provide valuable prior knowledge for multi-task learning. In this paper, a new approach is developed to hierarchically learn a tree of multi-task metrics by leveraging prior knowledge about both the related tasks and unrelated tasks. First, a visual tree is constructed to hierarchically organize large numbers of image categories in a coarse-to-fine fashion. Over the visual tree, a multi-task metric classifier is learned for each node by exploiting both the related and unrelated tasks, where the learning tasks for training the classifiers for the sibling child nodes under the same parent node are treated as the interrelated tasks, and the others are treated as the unrelated tasks. In addition, the node-specific metric for the parent node is propagated to its sibling child nodes to control inter-level error propagation. Our experimental results demonstrate that our hierarchical metric learning algorithm achieves better results than other state-of-the-art algorithms.
{"title":"Exploiting Related and Unrelated Tasks for Hierarchical Metric Learning and Image Classification.","authors":"Yu Zheng, Jianping Fan, Ji Zhang, Xinbo Gao","doi":"10.1109/TIP.2019.2938321","DOIUrl":"10.1109/TIP.2019.2938321","url":null,"abstract":"<p><p>In multi-task learning, multiple interrelated tasks are jointly learned to achieve better performance. In many cases, if we can identify which tasks are related, we can also clearly identify which tasks are unrelated. In the past, most researchers emphasized exploiting correlations among interrelated tasks while completely ignoring the unrelated tasks that may provide valuable prior knowledge for multi-task learning. In this paper, a new approach is developed to hierarchically learn a tree of multi-task metrics by leveraging prior knowledge about both the related tasks and unrelated tasks. First, a visual tree is constructed to hierarchically organize large numbers of image categories in a coarse-to-fine fashion. Over the visual tree, a multi-task metric classifier is learned for each node by exploiting both the related and unrelated tasks, where the learning tasks for training the classifiers for the sibling child nodes under the same parent node are treated as the interrelated tasks, and the others are treated as the unrelated tasks. In addition, the node-specific metric for the parent node is propagated to its sibling child nodes to control inter-level error propagation. Our experimental results demonstrate that our hierarchical metric learning algorithm achieves better results than other state-of-the-art algorithms.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62586099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-05DOI: 10.1109/TIP.2019.2938307
Lei Zhang, Ji Liu, Bob Zhanga, David Zhangb, Ce Zhu
Sparse representation based classification (SRC), nuclear-norm matrix regression (NMR), and deep learning (DL) have achieved a great success in face recognition (FR). However, there still exist some intrinsic limitations among them. SRC and NMR based coding methods belong to one-step model, such that the latent discriminative information of the coding error vector cannot be fully exploited. DL, as a multi-step model, can learn powerful representation, but relies on large-scale data and computation resources for numerous parameters training with complicated back-propagation. Straightforward training of deep neural networks from scratch on small-scale data is almost infeasible. Therefore, in order to develop efficient algorithms that are specifically adapted for small-scale data, we propose to derive the deep models of SRC and NMR. Specifically, in this paper, we propose an end-to-end deep cascade model (DCM) based on SRC and NMR with hierarchical learning, nonlinear transformation and multi-layer structure for corrupted face recognition. The contributions include four aspects. First, an end-to-end deep cascade model for small-scale data without back-propagation is proposed. Second, a multi-level pyramid structure is integrated for local feature representation. Third, for introducing nonlinear transformation in layer-wise learning, softmax vector coding of the errors with class discrimination is proposed. Fourth, the existing representation methods can be easily integrated into our DCM framework. Experiments on a number of small-scale benchmark FR datasets demonstrate the superiority of the proposed model over state-of-the-art counterparts. Additionally, a perspective that deep-layered learning does not have to be convolutional neural network with back-propagation optimization is consolidated. The demo code is available in https://github.com/liuji93/DCM.
{"title":"Deep Cascade Model based Face Recognition: When Deep-layered Learning Meets Small Data.","authors":"Lei Zhang, Ji Liu, Bob Zhanga, David Zhangb, Ce Zhu","doi":"10.1109/TIP.2019.2938307","DOIUrl":"10.1109/TIP.2019.2938307","url":null,"abstract":"<p><p>Sparse representation based classification (SRC), nuclear-norm matrix regression (NMR), and deep learning (DL) have achieved a great success in face recognition (FR). However, there still exist some intrinsic limitations among them. SRC and NMR based coding methods belong to one-step model, such that the latent discriminative information of the coding error vector cannot be fully exploited. DL, as a multi-step model, can learn powerful representation, but relies on large-scale data and computation resources for numerous parameters training with complicated back-propagation. Straightforward training of deep neural networks from scratch on small-scale data is almost infeasible. Therefore, in order to develop efficient algorithms that are specifically adapted for small-scale data, we propose to derive the deep models of SRC and NMR. Specifically, in this paper, we propose an end-to-end deep cascade model (DCM) based on SRC and NMR with hierarchical learning, nonlinear transformation and multi-layer structure for corrupted face recognition. The contributions include four aspects. First, an end-to-end deep cascade model for small-scale data without back-propagation is proposed. Second, a multi-level pyramid structure is integrated for local feature representation. Third, for introducing nonlinear transformation in layer-wise learning, softmax vector coding of the errors with class discrimination is proposed. Fourth, the existing representation methods can be easily integrated into our DCM framework. Experiments on a number of small-scale benchmark FR datasets demonstrate the superiority of the proposed model over state-of-the-art counterparts. Additionally, a perspective that deep-layered learning does not have to be convolutional neural network with back-propagation optimization is consolidated. The demo code is available in https://github.com/liuji93/DCM.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62586446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the help of convolutional neural networks (CNN), the single image super-resolution problem has been widely studied. Most of these CNN based methods focus on learning a model to map a low-resolution (LR) image to a highresolution (HR) image, where the LR image is downsampled from the HR image with a known model. However, in a more general case when the process of the down-sampling is unknown and the LR input is degraded by noises and blurring, it is difficult to acquire the LR and HR image pairs for traditional supervised learning. Inspired by the recent unsupervised imagestyle translation applications using unpaired data, we propose a multiple Cycle-in-Cycle network structure to deal with the more general case using multiple generative adversarial networks (GAN) as the basis components. The first network cycle aims at mapping the noisy and blurry LR input to a noise-free LR space, then a new cycle with a well-trained ×2 network model is orderly introduced to super-resolve the intermediate output of the former cycle. The number of total cycles depends on the different up-sampling factors (×2, ×4, ×8). Finally, all modules are trained in an end-to-end manner to get the desired HR output. Quantitative indexes and qualitative results show that our proposed method achieves comparable performance with the state-of-the-art supervised models.
在卷积神经网络(CNN)的帮助下,人们对单图像超分辨率问题进行了广泛研究。这些基于卷积神经网络的方法大多侧重于学习将低分辨率(LR)图像映射到高分辨率(HR)图像的模型,其中 LR 图像是通过已知模型从 HR 图像向下采样得到的。然而,在更普遍的情况下,当下采样过程未知,且低分辨率输入因噪声和模糊而退化时,传统的监督学习就很难获得低分辨率和高分辨率图像对。受最近使用无配对数据的无监督图像式翻译应用的启发,我们提出了一种多循环网络结构(Cycle-in-Cycle network structure),以多个生成式对抗网络(GAN)作为基础组件来处理更一般的情况。第一个网络循环的目的是将有噪声的模糊 LR 输入映射到无噪声的 LR 空间,然后有序地引入一个具有训练有素的 ×2 网络模型的新循环,对前一个循环的中间输出进行超分辨率处理。总循环次数取决于不同的上采样因子(×2、×4、×8)。最后,以端到端方式对所有模块进行训练,以获得所需的人力资源输出。定量指标和定性结果表明,我们提出的方法与最先进的监督模型性能相当。
{"title":"Multiple Cycle-in-Cycle Generative Adversarial Networks for Unsupervised Image Super-Resolution.","authors":"Yongbing Zhang, Siyuan Liu, Chao Dong, Xinfeng Zhang, Yuan Yuan","doi":"10.1109/TIP.2019.2938347","DOIUrl":"10.1109/TIP.2019.2938347","url":null,"abstract":"<p><p>With the help of convolutional neural networks (CNN), the single image super-resolution problem has been widely studied. Most of these CNN based methods focus on learning a model to map a low-resolution (LR) image to a highresolution (HR) image, where the LR image is downsampled from the HR image with a known model. However, in a more general case when the process of the down-sampling is unknown and the LR input is degraded by noises and blurring, it is difficult to acquire the LR and HR image pairs for traditional supervised learning. Inspired by the recent unsupervised imagestyle translation applications using unpaired data, we propose a multiple Cycle-in-Cycle network structure to deal with the more general case using multiple generative adversarial networks (GAN) as the basis components. The first network cycle aims at mapping the noisy and blurry LR input to a noise-free LR space, then a new cycle with a well-trained ×2 network model is orderly introduced to super-resolve the intermediate output of the former cycle. The number of total cycles depends on the different up-sampling factors (×2, ×4, ×8). Finally, all modules are trained in an end-to-end manner to get the desired HR output. Quantitative indexes and qualitative results show that our proposed method achieves comparable performance with the state-of-the-art supervised models.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62586945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-05DOI: 10.1109/TIP.2019.2930146
Jingwen Ye, Yongcheng Jing, Xinchao Wang, Kairi Ou, Dacheng Tao, Mingli Song
Human parsing and matting play important roles in various applications, such as dress collocation, clothing recommendation, and image editing. In this paper, we propose a lightweight hybrid model that unifies the fully-supervised hierarchical-granularity parsing task and the unsupervised matting one. Our model comprises two parts, the extensible hierarchical semantic segmentation block using CNN and the matting module composed of guided filters. Given a human image, the segmentation block stage-1 first obtains a primitive segmentation map to separate the human and the background. The primitive segmentation is then fed into stage-2 together with the original image to give a rough segmentation of human body. This procedure is repeated in the stage-3 to acquire a refined segmentation. The matting module takes as input the above estimated segmentation maps and produces the matting map, in a fully unsupervised manner. The obtained matting map is then in turn fed back to the CNN in the first block for refining the semantic segmentation results.
{"title":"Edge-Sensitive Human Cutout with Hierarchical Granularity and Loopy Matting Guidance.","authors":"Jingwen Ye, Yongcheng Jing, Xinchao Wang, Kairi Ou, Dacheng Tao, Mingli Song","doi":"10.1109/TIP.2019.2930146","DOIUrl":"10.1109/TIP.2019.2930146","url":null,"abstract":"<p><p>Human parsing and matting play important roles in various applications, such as dress collocation, clothing recommendation, and image editing. In this paper, we propose a lightweight hybrid model that unifies the fully-supervised hierarchical-granularity parsing task and the unsupervised matting one. Our model comprises two parts, the extensible hierarchical semantic segmentation block using CNN and the matting module composed of guided filters. Given a human image, the segmentation block stage-1 first obtains a primitive segmentation map to separate the human and the background. The primitive segmentation is then fed into stage-2 together with the original image to give a rough segmentation of human body. This procedure is repeated in the stage-3 to acquire a refined segmentation. The matting module takes as input the above estimated segmentation maps and produces the matting map, in a fully unsupervised manner. The obtained matting map is then in turn fed back to the CNN in the first block for refining the semantic segmentation results.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62583550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a method for synopsizing multiple videos captured by a set of surveillance cameras with some overlapped field-of-views. Currently, object-based approaches that directly shift objects along the time axis are already able to compute compact synopsis results for multiple surveillance videos. The challenge is how to present the multiple synopsis results in a more compact and understandable way. Previous approaches show them side by side on the screen, which however is difficult for user to comprehend. In this paper, we solve the problem by joint object-shifting and camera view-switching. Firstly, we synchronize the input videos, and group the same object in different videos together. Then we shift the groups of objects along the time axis to obtain multiple synopsis videos. Instead of showing them simultaneously, we just show one of them at each time, and allow to switch among the views of different synopsis videos. In this view switching way, we obtain just a single synopsis results consisting of content from all the input videos, which is much easier for user to follow and understand. To obtain the best synopsis result, we construct a simultaneous object-shifting and view-switching optimization framework instead of solving them separately. We also present an alternative optimization strategy composed of graph cuts and dynamic programming to solve the unified optimization. Experiments demonstrate that our single synopsis video generated from multiple input videos is compact, complete, and easy to understand.
{"title":"Multi-View Video Synopsis via Simultaneous Object-Shifting and View-Switching Optimization.","authors":"Zhensong Zhang, Yongwei Nie, Hanqiu Sun, Qing Zhang, Qiuxia Lai, Guiqing Li, Mingyu Xiao","doi":"10.1109/TIP.2019.2938086","DOIUrl":"10.1109/TIP.2019.2938086","url":null,"abstract":"<p><p>We present a method for synopsizing multiple videos captured by a set of surveillance cameras with some overlapped field-of-views. Currently, object-based approaches that directly shift objects along the time axis are already able to compute compact synopsis results for multiple surveillance videos. The challenge is how to present the multiple synopsis results in a more compact and understandable way. Previous approaches show them side by side on the screen, which however is difficult for user to comprehend. In this paper, we solve the problem by joint object-shifting and camera view-switching. Firstly, we synchronize the input videos, and group the same object in different videos together. Then we shift the groups of objects along the time axis to obtain multiple synopsis videos. Instead of showing them simultaneously, we just show one of them at each time, and allow to switch among the views of different synopsis videos. In this view switching way, we obtain just a single synopsis results consisting of content from all the input videos, which is much easier for user to follow and understand. To obtain the best synopsis result, we construct a simultaneous object-shifting and view-switching optimization framework instead of solving them separately. We also present an alternative optimization strategy composed of graph cuts and dynamic programming to solve the unified optimization. Experiments demonstrate that our single synopsis video generated from multiple input videos is compact, complete, and easy to understand.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62586355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-08-30DOI: 10.1109/TIP.2019.2936744
Weilin Huang, Jianxin Liu
Seismic image interpolation is a currently popular research subject in modern reflection seismology. The interpolation problem is generally treated as a process of inversion. Under the compressed sensing framework, various sparse transformations and low-rank constraints based methods have great performances in recovering irregularly missing traces. However, in the case of regularly missing traces, their applications are limited because of the strong spatial aliasing energies. In addition, the erratic noise always poses a serious impact on the interpolation results obtained by the sparse transformations and low-rank constraints-based methods. This is because the erratic noise is far from satisfying the statistical assumption behind these methods. In this study, we propose a mathematical morphology-based interpolation technique, which constrains the morphological scale of the model in the inversion process. The inversion problem is solved by the shaping regularization approach. The mathematical morphological constraint (MMC)-based interpolation technique has a satisfactory robustness to the spatial aliasing and erratic energies. We provide a detailed algorithmic framework and discuss the extension from 2D to higher dimensional version and the back operator in the shaping inversion. A group of numerical examples demonstrates the successful performance of the proposed technique.
{"title":"Robust Seismic Image Interpolation with Mathematical Morphological Constraint.","authors":"Weilin Huang, Jianxin Liu","doi":"10.1109/TIP.2019.2936744","DOIUrl":"10.1109/TIP.2019.2936744","url":null,"abstract":"<p><p>Seismic image interpolation is a currently popular research subject in modern reflection seismology. The interpolation problem is generally treated as a process of inversion. Under the compressed sensing framework, various sparse transformations and low-rank constraints based methods have great performances in recovering irregularly missing traces. However, in the case of regularly missing traces, their applications are limited because of the strong spatial aliasing energies. In addition, the erratic noise always poses a serious impact on the interpolation results obtained by the sparse transformations and low-rank constraints-based methods. This is because the erratic noise is far from satisfying the statistical assumption behind these methods. In this study, we propose a mathematical morphology-based interpolation technique, which constrains the morphological scale of the model in the inversion process. The inversion problem is solved by the shaping regularization approach. The mathematical morphological constraint (MMC)-based interpolation technique has a satisfactory robustness to the spatial aliasing and erratic energies. We provide a detailed algorithmic framework and discuss the extension from 2D to higher dimensional version and the back operator in the shaping inversion. A group of numerical examples demonstrates the successful performance of the proposed technique.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62585953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-08-28DOI: 10.1109/TIP.2019.2936103
Maria Perez-Ortiz, Aliaksei Mikhailiuk, Emin Zerman, Vedad Hulusic, Giuseppe Valenzise, Rafal K Mantiuk
The goal of psychometric scaling is the quantification of perceptual experiences, understanding the relationship between an external stimulus, the internal representation and the response. In this paper, we propose a probabilistic framework to fuse the outcome of different psychophysical experimental protocols, namely rating and pairwise comparisons experiments. Such a method can be used for merging existing datasets of subjective nature and for experiments in which both measurements are collected. We analyze and compare the outcomes of both types of experimental protocols in terms of time and accuracy in a set of simulations and experiments with benchmark and real-world image quality assessment datasets, showing the necessity of scaling and the advantages of each protocol and mixing. Although most of our examples focus on image quality assessment, our findings generalize to any other subjective quality-of-experience task.
{"title":"From pairwise comparisons and rating to a unified quality scale.","authors":"Maria Perez-Ortiz, Aliaksei Mikhailiuk, Emin Zerman, Vedad Hulusic, Giuseppe Valenzise, Rafal K Mantiuk","doi":"10.1109/TIP.2019.2936103","DOIUrl":"10.1109/TIP.2019.2936103","url":null,"abstract":"<p><p>The goal of psychometric scaling is the quantification of perceptual experiences, understanding the relationship between an external stimulus, the internal representation and the response. In this paper, we propose a probabilistic framework to fuse the outcome of different psychophysical experimental protocols, namely rating and pairwise comparisons experiments. Such a method can be used for merging existing datasets of subjective nature and for experiments in which both measurements are collected. We analyze and compare the outcomes of both types of experimental protocols in terms of time and accuracy in a set of simulations and experiments with benchmark and real-world image quality assessment datasets, showing the necessity of scaling and the advantages of each protocol and mixing. Although most of our examples focus on image quality assessment, our findings generalize to any other subjective quality-of-experience task.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62585086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}