Pub Date : 2016-09-01DOI: 10.1109/ICIP.2016.7532803
Joonsoo Kim, He Li, Jiaju Yue, E. Delp
In this paper we introduce a shape descriptor known as Self Similar Affine Invariant (SSAI) descriptor for shape retrieval. The SSAI descriptor is based on the property that two sets of points are transformed by an affine transform, then subsets of each set of points are also related by the same affine transformation. Also, the SSAI descriptor is insensitive to local shape distortions. We use multiple SSAI descriptors based on different sets of neighbor points to improve shape recognition accuracy. We also describe an efficient image matching method for the multiple SSAI descriptors. Experimental results show that our approach achieves very good performance on two publicly available shape datasets.
{"title":"Shape matching using a self similar affine invariant descriptor","authors":"Joonsoo Kim, He Li, Jiaju Yue, E. Delp","doi":"10.1109/ICIP.2016.7532803","DOIUrl":"https://doi.org/10.1109/ICIP.2016.7532803","url":null,"abstract":"In this paper we introduce a shape descriptor known as Self Similar Affine Invariant (SSAI) descriptor for shape retrieval. The SSAI descriptor is based on the property that two sets of points are transformed by an affine transform, then subsets of each set of points are also related by the same affine transformation. Also, the SSAI descriptor is insensitive to local shape distortions. We use multiple SSAI descriptors based on different sets of neighbor points to improve shape recognition accuracy. We also describe an efficient image matching method for the multiple SSAI descriptors. Experimental results show that our approach achieves very good performance on two publicly available shape datasets.","PeriodicalId":6521,"journal":{"name":"2016 IEEE International Conference on Image Processing (ICIP)","volume":"46 1","pages":"2470-2474"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86885975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/ICIP.2016.7532663
Baihong Lin, Xiaoming Tao, Linhao Dong, Jianhua Lu
High resolution hyper-spectral imaging works as a scheme to obtain images with high spatial and spectral resolutions by merging a low spatial resolution hyper-spectral image (HSI) with a high spatial resolution multi-spectral image (MSI). In this paper, we propose a novel method based on probabilistic matrix factorization under Bayesian framework: First, Gaussian priors, as observations' distributions, are given upon two HSI-MSI-pair-based images, in which two variances share the same hyper-parameter to ensure fair and effective constraints on two observations. Second, to avoid the manual tuning process and learn a better setting automatically, hyper-priors are adopted for all hyper-parameters. To that end, a variational expectation-maximization (EM) approach is devised to figure out the result expectation for its simplicity and effectiveness. Exhaustive experiments of two different cases prove that our algorithm outperforms many state-of-the-art methods.
{"title":"Variational EM approach for high resolution hyper-spectral imaging based on probabilistic matrix factorization","authors":"Baihong Lin, Xiaoming Tao, Linhao Dong, Jianhua Lu","doi":"10.1109/ICIP.2016.7532663","DOIUrl":"https://doi.org/10.1109/ICIP.2016.7532663","url":null,"abstract":"High resolution hyper-spectral imaging works as a scheme to obtain images with high spatial and spectral resolutions by merging a low spatial resolution hyper-spectral image (HSI) with a high spatial resolution multi-spectral image (MSI). In this paper, we propose a novel method based on probabilistic matrix factorization under Bayesian framework: First, Gaussian priors, as observations' distributions, are given upon two HSI-MSI-pair-based images, in which two variances share the same hyper-parameter to ensure fair and effective constraints on two observations. Second, to avoid the manual tuning process and learn a better setting automatically, hyper-priors are adopted for all hyper-parameters. To that end, a variational expectation-maximization (EM) approach is devised to figure out the result expectation for its simplicity and effectiveness. Exhaustive experiments of two different cases prove that our algorithm outperforms many state-of-the-art methods.","PeriodicalId":6521,"journal":{"name":"2016 IEEE International Conference on Image Processing (ICIP)","volume":"31 1","pages":"1774-1778"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86258858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/ICIP.2016.7533165
Zhaoju Li, Zhenjun Han, Qixiang Ye
Matching specific persons across scenes, known as person re-identification, is an important yet unsolved computer vision problem. Feature representation and metric learning are two fundamental factors in person re-identification. However, current person re-identification methods, which use single handcrafted feature with corresponding metric, could be not powerful enough when facing illumination, viewpoint and pose variations. Thus it inevitably produces suboptimal ranking lists. In this paper, we propose incorporating multiple features with metrics to build weak learners, and aggregate the base ranking lists by AdaBoost Ranking. Experiments on two commonly used datasets, VIPeR and CUHK01, show that our proposed approach greatly improves recognition rates over the state-of-the-art methods.
{"title":"Person re-identification via adaboost ranking ensemble","authors":"Zhaoju Li, Zhenjun Han, Qixiang Ye","doi":"10.1109/ICIP.2016.7533165","DOIUrl":"https://doi.org/10.1109/ICIP.2016.7533165","url":null,"abstract":"Matching specific persons across scenes, known as person re-identification, is an important yet unsolved computer vision problem. Feature representation and metric learning are two fundamental factors in person re-identification. However, current person re-identification methods, which use single handcrafted feature with corresponding metric, could be not powerful enough when facing illumination, viewpoint and pose variations. Thus it inevitably produces suboptimal ranking lists. In this paper, we propose incorporating multiple features with metrics to build weak learners, and aggregate the base ranking lists by AdaBoost Ranking. Experiments on two commonly used datasets, VIPeR and CUHK01, show that our proposed approach greatly improves recognition rates over the state-of-the-art methods.","PeriodicalId":6521,"journal":{"name":"2016 IEEE International Conference on Image Processing (ICIP)","volume":"21 6 1","pages":"4269-4273"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83658683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, very deep two-stream ConvNets have achieved great discriminative power for video classification, which is especially the case for the temporal ConvNets when trained on multi-frame optical flow. However, action recognition in videos often fall prey to the wild camera motion, which poses challenges on the extraction of reliable optical flow for human body. In light of this, we propose a novel method to remove the global camera motion, which explicitly calculates a homography between two consecutive frames without human detection. Given the estimated homography due to camera motion, background motion can be canceled out from the warped optical flow. We take this a step further and design a new architecture called Saliency-Context two-stream ConvNets, where the context two-stream ConvNets are employed to recognize the entire scene in video frames, whilst the saliency streams are trained on salient human motion regions that are detected from the warped optical flow. Finally, the Saliency-Context two-stream ConvNets allow us to capture complementary information and achieve state-of-the-art performance on UCF101 dataset.
{"title":"Saliency-context two-stream convnets for action recognition","authors":"Quan-Qi Chen, Feng Liu, Xue Li, Baodi Liu, Yujin Zhang","doi":"10.1109/ICIP.2016.7532925","DOIUrl":"https://doi.org/10.1109/ICIP.2016.7532925","url":null,"abstract":"Recently, very deep two-stream ConvNets have achieved great discriminative power for video classification, which is especially the case for the temporal ConvNets when trained on multi-frame optical flow. However, action recognition in videos often fall prey to the wild camera motion, which poses challenges on the extraction of reliable optical flow for human body. In light of this, we propose a novel method to remove the global camera motion, which explicitly calculates a homography between two consecutive frames without human detection. Given the estimated homography due to camera motion, background motion can be canceled out from the warped optical flow. We take this a step further and design a new architecture called Saliency-Context two-stream ConvNets, where the context two-stream ConvNets are employed to recognize the entire scene in video frames, whilst the saliency streams are trained on salient human motion regions that are detected from the warped optical flow. Finally, the Saliency-Context two-stream ConvNets allow us to capture complementary information and achieve state-of-the-art performance on UCF101 dataset.","PeriodicalId":6521,"journal":{"name":"2016 IEEE International Conference on Image Processing (ICIP)","volume":"32 1","pages":"3076-3080"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82963847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/ICIP.2016.7532796
Alex Mackin, K. Noland, D. Bull
The visibility of motion artifacts in a video sequence e.g. motion blur and temporal aliasing, affects perceived motion quality. The frame rate required to render these motion artifacts imperceptible is far higher than is currently feasible or specified in current video formats. This paper investigates the perception of temporal aliasing and its associated artifacts below this frame rate, along with their influence on motion quality, with the aim of making suitable frame rate recommendations for future formats. Results show impairment in motion quality due to temporal aliasing can be tolerated to a degree, and that it may be acceptable to sample at frame rates 50% lower than those needed to eliminate perceptible temporal aliasing.
{"title":"The visibility of motion artifacts and their effect on motion quality","authors":"Alex Mackin, K. Noland, D. Bull","doi":"10.1109/ICIP.2016.7532796","DOIUrl":"https://doi.org/10.1109/ICIP.2016.7532796","url":null,"abstract":"The visibility of motion artifacts in a video sequence e.g. motion blur and temporal aliasing, affects perceived motion quality. The frame rate required to render these motion artifacts imperceptible is far higher than is currently feasible or specified in current video formats. This paper investigates the perception of temporal aliasing and its associated artifacts below this frame rate, along with their influence on motion quality, with the aim of making suitable frame rate recommendations for future formats. Results show impairment in motion quality due to temporal aliasing can be tolerated to a degree, and that it may be acceptable to sample at frame rates 50% lower than those needed to eliminate perceptible temporal aliasing.","PeriodicalId":6521,"journal":{"name":"2016 IEEE International Conference on Image Processing (ICIP)","volume":"17 1","pages":"2435-2439"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88873894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/ICIP.2016.7532485
S. Croci, T. Aydin, N. Stefanoski, M. Gross, A. Smolic
Subjective studies showed that most HDR video tone mapping operators either produce disturbing temporal artifacts, or are limited in their local contrast reproduction capability. Recently, both these issues have been addressed by a novel temporally coherent local HDR tone mapping method, which has been shown, both qualitatively and through a subjective study, to be advantageous compared to previous methods. However, this method's high-quality results came at the cost of a computationally expensive workflow that could only be executed offline. In this paper, we present a modified algorithm which builds upon the previous work by redesigning key components to achieve real-time performance. We accomplish this by replacing the optical flow based per-pixel temporal coherency with a tone-curve-space alternative. This way we eliminate the main computational burden of the original method with little sacrifice in visual quality.
{"title":"Real-time temporally coherent local HDR tone mapping","authors":"S. Croci, T. Aydin, N. Stefanoski, M. Gross, A. Smolic","doi":"10.1109/ICIP.2016.7532485","DOIUrl":"https://doi.org/10.1109/ICIP.2016.7532485","url":null,"abstract":"Subjective studies showed that most HDR video tone mapping operators either produce disturbing temporal artifacts, or are limited in their local contrast reproduction capability. Recently, both these issues have been addressed by a novel temporally coherent local HDR tone mapping method, which has been shown, both qualitatively and through a subjective study, to be advantageous compared to previous methods. However, this method's high-quality results came at the cost of a computationally expensive workflow that could only be executed offline. In this paper, we present a modified algorithm which builds upon the previous work by redesigning key components to achieve real-time performance. We accomplish this by replacing the optical flow based per-pixel temporal coherency with a tone-curve-space alternative. This way we eliminate the main computational burden of the original method with little sacrifice in visual quality.","PeriodicalId":6521,"journal":{"name":"2016 IEEE International Conference on Image Processing (ICIP)","volume":"30 1","pages":"889-893"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83720014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/ICIP.2016.7532959
Ting Zhang, Qiulei Dong, Zhanyi Hu
How to learn view-invariant facial representations is an important task for view-invariant face recognition. The recent work [1] discovered that the brain of the macaque monkey has a face-processing network, where some neurons are view-specific. Motivated by this discovery, this paper proposes a deep convolutional learning model for face recognition, which explicitly enforces this view-specific mechanism for learning view-invariant facial representations. The proposed model consists of two concatenated modules: the first one is a convolutional neural network (CNN) for learning the corresponding viewing pose to the input face image; the second one consists of multiple CNNs, each of which learns the corresponding frontal image of an image under a specific viewing pose. This method is of low computational cost, and it can be well trained with a relatively small number of samples. The experimental results on the MultiPIE dataset demonstrate the effectiveness of our proposed convolutional model in contrast to three state-of-the-art works.
{"title":"Pursuing face identity from view-specific representation to view-invariant representation","authors":"Ting Zhang, Qiulei Dong, Zhanyi Hu","doi":"10.1109/ICIP.2016.7532959","DOIUrl":"https://doi.org/10.1109/ICIP.2016.7532959","url":null,"abstract":"How to learn view-invariant facial representations is an important task for view-invariant face recognition. The recent work [1] discovered that the brain of the macaque monkey has a face-processing network, where some neurons are view-specific. Motivated by this discovery, this paper proposes a deep convolutional learning model for face recognition, which explicitly enforces this view-specific mechanism for learning view-invariant facial representations. The proposed model consists of two concatenated modules: the first one is a convolutional neural network (CNN) for learning the corresponding viewing pose to the input face image; the second one consists of multiple CNNs, each of which learns the corresponding frontal image of an image under a specific viewing pose. This method is of low computational cost, and it can be well trained with a relatively small number of samples. The experimental results on the MultiPIE dataset demonstrate the effectiveness of our proposed convolutional model in contrast to three state-of-the-art works.","PeriodicalId":6521,"journal":{"name":"2016 IEEE International Conference on Image Processing (ICIP)","volume":"24 1","pages":"3244-3248"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83224578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/ICIP.2016.7532363
Qiurui Wang, C. Yuan
Existing deep neural networks, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), typically treat volumetric video data as several single images and deal with one frame at one time, thus the relevance to frames can hardly be fully exploited. Besides, depth context plays the unique role in motion scenes for primates, but is seldom used in no depth label situations. In this paper, we use a more suitable architecture Multi-Scale Pyramidal Multi-Dimensional Long Short Term Memory (MSPMD-LSTM) to reveal the strong relevance within video frames. Furthermore, depth context is extracted and refined to enhance the performance of the model. Experiments demonstrate that our models yield competitive results on Youtube-Objects dataset and Segtrack v2 dataset.
{"title":"Video object segmentation by Multi-Scale Pyramidal Multi-Dimensional LSTM with generated depth context","authors":"Qiurui Wang, C. Yuan","doi":"10.1109/ICIP.2016.7532363","DOIUrl":"https://doi.org/10.1109/ICIP.2016.7532363","url":null,"abstract":"Existing deep neural networks, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), typically treat volumetric video data as several single images and deal with one frame at one time, thus the relevance to frames can hardly be fully exploited. Besides, depth context plays the unique role in motion scenes for primates, but is seldom used in no depth label situations. In this paper, we use a more suitable architecture Multi-Scale Pyramidal Multi-Dimensional Long Short Term Memory (MSPMD-LSTM) to reveal the strong relevance within video frames. Furthermore, depth context is extracted and refined to enhance the performance of the model. Experiments demonstrate that our models yield competitive results on Youtube-Objects dataset and Segtrack v2 dataset.","PeriodicalId":6521,"journal":{"name":"2016 IEEE International Conference on Image Processing (ICIP)","volume":"115 1","pages":"281-285"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83530185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/ICIP.2016.7532752
Utsav B. Gewali, S. Monteiro
Remotely extracting information about the biochemical properties of the materials in an environment from airborne- or satellite-based hyperspectral sensor has a variety of applications in forestry, agriculture, mining, environmental monitoring and space exploration. In this paper, we propose a new non-stationary covariance function, called exponential spectral angle mapper (ESAM) for predicting the biochemistry of vegetation from hyperspectral imagery using Gaussian processes. The proposed covariance function is based on the angle between the spectra, which is known to be a better measure of similarity for hyperspectral data due to its robustness to illumination variations. We demonstrate the efficacy of the proposed method with experiments on a real-world hy-perspectral dataset.
{"title":"A novel covariance function for predicting vegetation biochemistry from hyperspectral imagery with Gaussian processes","authors":"Utsav B. Gewali, S. Monteiro","doi":"10.1109/ICIP.2016.7532752","DOIUrl":"https://doi.org/10.1109/ICIP.2016.7532752","url":null,"abstract":"Remotely extracting information about the biochemical properties of the materials in an environment from airborne- or satellite-based hyperspectral sensor has a variety of applications in forestry, agriculture, mining, environmental monitoring and space exploration. In this paper, we propose a new non-stationary covariance function, called exponential spectral angle mapper (ESAM) for predicting the biochemistry of vegetation from hyperspectral imagery using Gaussian processes. The proposed covariance function is based on the angle between the spectra, which is known to be a better measure of similarity for hyperspectral data due to its robustness to illumination variations. We demonstrate the efficacy of the proposed method with experiments on a real-world hy-perspectral dataset.","PeriodicalId":6521,"journal":{"name":"2016 IEEE International Conference on Image Processing (ICIP)","volume":"58 1","pages":"2216-2220"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90559051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/ICIP.2016.7532460
Khalid Tahboub, Blanca Delgado, E. Delp
Person re-identification is the process of recognizing a person across a network of cameras with non-overlapping fields of view. In this paper we present an unsupervised multi-shot approach based on a patch-based dynamic appearance model. We use deformable graph matching for person re-identification using histograms of color and texture as features of nodes. Each graph model spans multiple images and each node is a local patch in the shape of a rectangle. We evaluate our proposed method on publicly available PRID 2011 and iLIDS-VID databases.
{"title":"Person re-identification using a patch-based appearance model","authors":"Khalid Tahboub, Blanca Delgado, E. Delp","doi":"10.1109/ICIP.2016.7532460","DOIUrl":"https://doi.org/10.1109/ICIP.2016.7532460","url":null,"abstract":"Person re-identification is the process of recognizing a person across a network of cameras with non-overlapping fields of view. In this paper we present an unsupervised multi-shot approach based on a patch-based dynamic appearance model. We use deformable graph matching for person re-identification using histograms of color and texture as features of nodes. Each graph model spans multiple images and each node is a local patch in the shape of a rectangle. We evaluate our proposed method on publicly available PRID 2011 and iLIDS-VID databases.","PeriodicalId":6521,"journal":{"name":"2016 IEEE International Conference on Image Processing (ICIP)","volume":"29 1","pages":"764-768"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86659930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}