Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206541
Qian Yu, G. Medioni
Detection and tracking of moving vehicles in airborne videos is a challenging problem. Many approaches have been proposed to improve motion segmentation on frame-by-frame and pixel-by-pixel bases, however, little attention has been paid to analyze the long-term motion pattern, which is a distinctive property for moving vehicles in airborne videos. In this paper, we provide a straightforward geometric interpretation of a general motion pattern in 4D space (x, y, vx, vy). We propose to use the tensor voting computational framework to detect and segment such motion patterns in 4D space. Specifically, in airborne videos, we analyze the essential difference in motion patterns caused by parallax and independent moving objects, which leads to a practical method for segmenting motion patterns (flows) created by moving vehicles in stabilized airborne videos. The flows are used in turn to facilitate detection and tracking of each individual object in the flow. Conceptually, this approach is similar to “track-before-detect” techniques, which involves temporal information in the process as early as possible. As shown in the experiments, many difficult cases in airborne videos, such as parallax, noisy background modeling and long term occlusions, can be addressed by our approach.
机载视频中运动飞行器的检测与跟踪是一个具有挑战性的问题。人们提出了许多方法来改进逐帧和逐像素的运动分割,然而,很少有人关注分析长期运动模式,这是机载视频中运动车辆的一个独特特性。在本文中,我们提供了四维空间(x, y, vx, vy)中一般运动模式的直接几何解释。我们建议使用张量投票计算框架来检测和分割四维空间中的这种运动模式。具体来说,在机载视频中,我们分析了视差和独立运动物体引起的运动模式的本质区别,从而得出了一种实用的方法来分割稳定机载视频中运动车辆产生的运动模式(流)。依次使用流来促进流中每个单独对象的检测和跟踪。从概念上讲,这种方法类似于“检测前跟踪”技术,它尽可能早地涉及到过程中的时间信息。实验表明,我们的方法可以解决机载视频中的许多困难情况,如视差、噪声背景建模和长期遮挡。
{"title":"Motion pattern interpretation and detection for tracking moving vehicles in airborne video","authors":"Qian Yu, G. Medioni","doi":"10.1109/CVPR.2009.5206541","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206541","url":null,"abstract":"Detection and tracking of moving vehicles in airborne videos is a challenging problem. Many approaches have been proposed to improve motion segmentation on frame-by-frame and pixel-by-pixel bases, however, little attention has been paid to analyze the long-term motion pattern, which is a distinctive property for moving vehicles in airborne videos. In this paper, we provide a straightforward geometric interpretation of a general motion pattern in 4D space (x, y, vx, vy). We propose to use the tensor voting computational framework to detect and segment such motion patterns in 4D space. Specifically, in airborne videos, we analyze the essential difference in motion patterns caused by parallax and independent moving objects, which leads to a practical method for segmenting motion patterns (flows) created by moving vehicles in stabilized airborne videos. The flows are used in turn to facilitate detection and tracking of each individual object in the flow. Conceptually, this approach is similar to “track-before-detect” techniques, which involves temporal information in the process as early as possible. As shown in the experiments, many difficult cases in airborne videos, such as parallax, noisy background modeling and long term occlusions, can be addressed by our approach.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116810879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/cvpr.2009.5206611
B. Munsell, Andrew Temlyakov, Song Wang
Accurately identifying corresponded landmarks from a population of shape instances is the major challenge in constructing statistical shape models. In general, shape-correspondence methods can be grouped into one of two categories: global methods and pair-wise methods. In this paper, we develop a new method that attempts to address the limitations of both the global and pair-wise methods. In particular, we reorganize the input population into a tree structure that incorporates global information about the population of shape instances, where each node in the tree represents a shape instance and each edge connects two very similar shape instances. Using this organized tree, neighboring shape instances can be corresponded efficiently and accurately by a pair-wise method. In the experiments, we evaluate the proposed method and compare its performance to five available shape correspondence methods and show the proposed method achieves the accuracy of a global method with speed of a pair-wise method.
{"title":"Fast multiple shape correspondence by pre-organizing shape instances","authors":"B. Munsell, Andrew Temlyakov, Song Wang","doi":"10.1109/cvpr.2009.5206611","DOIUrl":"https://doi.org/10.1109/cvpr.2009.5206611","url":null,"abstract":"Accurately identifying corresponded landmarks from a population of shape instances is the major challenge in constructing statistical shape models. In general, shape-correspondence methods can be grouped into one of two categories: global methods and pair-wise methods. In this paper, we develop a new method that attempts to address the limitations of both the global and pair-wise methods. In particular, we reorganize the input population into a tree structure that incorporates global information about the population of shape instances, where each node in the tree represents a shape instance and each edge connects two very similar shape instances. Using this organized tree, neighboring shape instances can be corresponded efficiently and accurately by a pair-wise method. In the experiments, we evaluate the proposed method and compare its performance to five available shape correspondence methods and show the proposed method achieves the accuracy of a global method with speed of a pair-wise method.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123982331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206654
Andrew Wagner, John Wright, Arvind Ganesh, Zihan Zhou, Yi Ma
Most contemporary face recognition algorithms work well under laboratory conditions but degrade when tested in less-controlled environments. This is mostly due to the difficulty of simultaneously handling variations in illumination, alignment, pose, and occlusion. In this paper, we propose a simple and practical face recognition system that achieves a high degree of robustness and stability to all these variations. We demonstrate how to use tools from sparse representation to align a test face image with a set of frontal training images in the presence of significant registration error and occlusion. We thoroughly characterize the region of attraction for our alignment algorithm on public face datasets such as Multi-PIE. We further study how to obtain a sufficient set of training illuminations for linearly interpolating practical lighting conditions. We have implemented a complete face recognition system, including a projector-based training acquisition system, in order to evaluate how our algorithms work under practical testing conditions. We show that our system can efficiently and effectively recognize faces under a variety of realistic conditions, using only frontal images under the proposed illuminations as training.
{"title":"Towards a practical face recognition system: Robust registration and illumination by sparse representation","authors":"Andrew Wagner, John Wright, Arvind Ganesh, Zihan Zhou, Yi Ma","doi":"10.1109/CVPR.2009.5206654","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206654","url":null,"abstract":"Most contemporary face recognition algorithms work well under laboratory conditions but degrade when tested in less-controlled environments. This is mostly due to the difficulty of simultaneously handling variations in illumination, alignment, pose, and occlusion. In this paper, we propose a simple and practical face recognition system that achieves a high degree of robustness and stability to all these variations. We demonstrate how to use tools from sparse representation to align a test face image with a set of frontal training images in the presence of significant registration error and occlusion. We thoroughly characterize the region of attraction for our alignment algorithm on public face datasets such as Multi-PIE. We further study how to obtain a sufficient set of training illuminations for linearly interpolating practical lighting conditions. We have implemented a complete face recognition system, including a projector-based training acquisition system, in order to evaluate how our algorithms work under practical testing conditions. We show that our system can efficiently and effectively recognize faces under a variety of realistic conditions, using only frontal images under the proposed illuminations as training.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123988885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206843
A. Goh, C. Lenglet, P. Thompson, R. Vidal
High angular resolution diffusion imaging has become an important magnetic resonance technique for in vivo imaging. Most current research in this field focuses on developing methods for computing the orientation distribution function (ODF), which is the probability distribution function of water molecule diffusion along any angle on the sphere. In this paper, we present a Riemannian framework to carry out computations on an ODF field. The proposed framework does not require that the ODFs be represented by any fixed parameterization, such as a mixture of von Mises-Fisher distributions or a spherical harmonic expansion. Instead, we use a non-parametric representation of the ODF, and exploit the fact that under the square-root re-parameterization, the space of ODFs forms a Riemannian manifold, namely the unit Hilbert sphere. Specifically, we use Riemannian operations to perform various geometric data processing algorithms, such as interpolation, convolution and linear and nonlinear filtering. We illustrate these concepts with numerical experiments on synthetic and real datasets.
{"title":"A nonparametric Riemannian framework for processing high angular resolution diffusion images (HARDI)","authors":"A. Goh, C. Lenglet, P. Thompson, R. Vidal","doi":"10.1109/CVPR.2009.5206843","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206843","url":null,"abstract":"High angular resolution diffusion imaging has become an important magnetic resonance technique for in vivo imaging. Most current research in this field focuses on developing methods for computing the orientation distribution function (ODF), which is the probability distribution function of water molecule diffusion along any angle on the sphere. In this paper, we present a Riemannian framework to carry out computations on an ODF field. The proposed framework does not require that the ODFs be represented by any fixed parameterization, such as a mixture of von Mises-Fisher distributions or a spherical harmonic expansion. Instead, we use a non-parametric representation of the ODF, and exploit the fact that under the square-root re-parameterization, the space of ODFs forms a Riemannian manifold, namely the unit Hilbert sphere. Specifically, we use Riemannian operations to perform various geometric data processing algorithms, such as interpolation, convolution and linear and nonlinear filtering. We illustrate these concepts with numerical experiments on synthetic and real datasets.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129725605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206863
Frank R. Schmidt, Eno Töppe, D. Cremers
We present a fast graph cut algorithm for planar graphs. It is based on the graph theoretical work and leads to an efficient method that we apply on shape matching and image segmentation. In contrast to currently used methods in computer vision, the presented approach provides an upper bound for its runtime behavior that is almost linear. In particular, we are able to match two different planar shapes of N points in O(N2 log N) and segment a given image of N pixels in O(N log N). We present two experimental benchmark studies which demonstrate that the presented method is also in practice faster than previously proposed graph cut methods: On planar shape matching and image segmentation we observe a speed-up of an order of magnitude, depending on resolution.
{"title":"Efficient planar graph cuts with applications in Computer Vision","authors":"Frank R. Schmidt, Eno Töppe, D. Cremers","doi":"10.1109/CVPR.2009.5206863","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206863","url":null,"abstract":"We present a fast graph cut algorithm for planar graphs. It is based on the graph theoretical work and leads to an efficient method that we apply on shape matching and image segmentation. In contrast to currently used methods in computer vision, the presented approach provides an upper bound for its runtime behavior that is almost linear. In particular, we are able to match two different planar shapes of N points in O(N2 log N) and segment a given image of N pixels in O(N log N). We present two experimental benchmark studies which demonstrate that the presented method is also in practice faster than previously proposed graph cut methods: On planar shape matching and image segmentation we observe a speed-up of an order of magnitude, depending on resolution.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127013163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206553
Chunming Li, Chris Gatenby, Li Wang, J. Gore
This paper proposes a new energy minimization framework for simultaneous estimation of the bias field and segmentation of tissues for magnetic resonance images. The bias field is modeled as a linear combination of a set of basis functions, and thereby parameterized by the coefficients of the basis functions. We define an energy that depends on the coefficients of the basis functions, the membership functions of the tissues in the image, and the constants approximating the true signal from the corresponding tissues. This energy is convex in each of its variables. Bias field estimation and image segmentation are simultaneously achieved as the result of minimizing this energy. We provide an efficient iterative algorithm for energy minimization, which converges to the optimal solution at a fast rate. A salient advantage of our method is that its result is independent of initialization, which allows robust and fully automated application. The proposed method has been successfully applied to 3-Tesla MR images with desirable results. Comparisons with other approaches demonstrate the superior performance of this algorithm.
{"title":"A robust parametric method for bias field estimation and segmentation of MR images","authors":"Chunming Li, Chris Gatenby, Li Wang, J. Gore","doi":"10.1109/CVPR.2009.5206553","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206553","url":null,"abstract":"This paper proposes a new energy minimization framework for simultaneous estimation of the bias field and segmentation of tissues for magnetic resonance images. The bias field is modeled as a linear combination of a set of basis functions, and thereby parameterized by the coefficients of the basis functions. We define an energy that depends on the coefficients of the basis functions, the membership functions of the tissues in the image, and the constants approximating the true signal from the corresponding tissues. This energy is convex in each of its variables. Bias field estimation and image segmentation are simultaneously achieved as the result of minimizing this energy. We provide an efficient iterative algorithm for energy minimization, which converges to the optimal solution at a fast rate. A salient advantage of our method is that its result is independent of initialization, which allows robust and fully automated application. The proposed method has been successfully applied to 3-Tesla MR images with desirable results. Comparisons with other approaches demonstrate the superior performance of this algorithm.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127544805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206527
Shaolei Feng, S. Zhou, Sara Good, D. Comaniciu
3D ultrasound imaging has been increasingly used in clinics for fetal examination. However, manually searching for the optimal view of the fetal face in 3D ultrasound volumes is cumbersome and time-consuming even for expert physicians and sonographers. In this paper we propose a learning-based approach which combines both 3D and 2D information for automatic and fast fetal face detection from 3D ultrasound volumes. Our approach applies a new technique - constrained marginal space learning - for 3D face mesh detection, and combines a boosting-based 2D profile detection to refine 3D face pose. To enhance the rendering of the fetal face, an automatic carving algorithm is proposed to remove all obstructions in front of the face based on the detected face mesh. Experiments are performed on a challenging 3D ultrasound data set containing 1010 fetal volumes. The results show that our system not only achieves excellent detection accuracy but also runs very fast - it can detect the fetal face from the 3D data in 1 second on a dual-core 2.0 GHz computer.
{"title":"Automatic fetal face detection from ultrasound volumes via learning 3D and 2D information","authors":"Shaolei Feng, S. Zhou, Sara Good, D. Comaniciu","doi":"10.1109/CVPR.2009.5206527","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206527","url":null,"abstract":"3D ultrasound imaging has been increasingly used in clinics for fetal examination. However, manually searching for the optimal view of the fetal face in 3D ultrasound volumes is cumbersome and time-consuming even for expert physicians and sonographers. In this paper we propose a learning-based approach which combines both 3D and 2D information for automatic and fast fetal face detection from 3D ultrasound volumes. Our approach applies a new technique - constrained marginal space learning - for 3D face mesh detection, and combines a boosting-based 2D profile detection to refine 3D face pose. To enhance the rendering of the fetal face, an automatic carving algorithm is proposed to remove all obstructions in front of the face based on the detected face mesh. Experiments are performed on a challenging 3D ultrasound data set containing 1010 fetal volumes. The results show that our system not only achieves excellent detection accuracy but also runs very fast - it can detect the fetal face from the 3D data in 1 second on a dual-core 2.0 GHz computer.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"1969 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129973997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206809
Tianzhu Zhang, Hanqing Lu, S. Li
Activity analysis is a basic task in video surveillance and has become an active research area. However, due to the diversity of moving objects category and their motion patterns, developing robust semantic scene models for activity analysis remains a challenging problem in traffic scenarios. This paper proposes a novel framework to learn semantic scene models. In this framework, the detected moving objects are first classified as pedestrians or vehicles via a co-trained classifier which takes advantage of the multiview information of objects. As a result, the framework can automatically learn motion patterns respectively for pedestrians and vehicles. Then, a graph is proposed to learn and cluster the motion patterns. To this end, trajectory is parameterized and the image is cut into multiple blocks which are taken as the nodes in the graph. Based on the parameters of trajectories, the primary motion patterns in each node (block) are extracted via Gaussian mixture model (GMM), and supplied to this graph. The graph cut algorithm is finally employed to group the motion patterns together, and trajectories are clustered to learn semantic scene models. Experimental results and applications to real world scenes show the validity of our proposed method.
{"title":"Learning semantic scene models by object classification and trajectory clustering","authors":"Tianzhu Zhang, Hanqing Lu, S. Li","doi":"10.1109/CVPR.2009.5206809","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206809","url":null,"abstract":"Activity analysis is a basic task in video surveillance and has become an active research area. However, due to the diversity of moving objects category and their motion patterns, developing robust semantic scene models for activity analysis remains a challenging problem in traffic scenarios. This paper proposes a novel framework to learn semantic scene models. In this framework, the detected moving objects are first classified as pedestrians or vehicles via a co-trained classifier which takes advantage of the multiview information of objects. As a result, the framework can automatically learn motion patterns respectively for pedestrians and vehicles. Then, a graph is proposed to learn and cluster the motion patterns. To this end, trajectory is parameterized and the image is cut into multiple blocks which are taken as the nodes in the graph. Based on the parameters of trajectories, the primary motion patterns in each node (block) are extracted via Gaussian mixture model (GMM), and supplied to this graph. The graph cut algorithm is finally employed to group the motion patterns together, and trajectories are clustered to learn semantic scene models. Experimental results and applications to real world scenes show the validity of our proposed method.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129134467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206507
Y. S. Heo, Kyoung Mu Lee, Sang Uk Lee
Radiometric variations between input images can seriously degrade the performance of stereo matching algorithms. In this situation, mutual information is a very popular and powerful measure which can find any global relationship of intensities between two input images taken from unknown sources. The mutual information-based method, however, is still ambiguous or erroneous as regards local radiometric variations, since it only accounts for global variation between images, and does not contain spatial information properly. In this paper, we present a new method based on mutual information combined with SIFT descriptor to find correspondence for images which undergo local as well as global radiometric variations. We transform the input color images to log-chromaticity color space from which a linear relationship can be established. To incorporate spatial information in mutual information, we utilize the SIFT descriptor which includes near pixel gradient histogram to construct a joint probability in log-chromaticity color space. By combining the mutual information as an appearance measure and the SIFT descriptor as a geometric measure, we devise a robust and accurate stereo system. Experimental results show that our method is superior to the state-of-the art algorithms including conventional mutual information-based methods and window correlation methods under various radiometric changes.
{"title":"Mutual information-based stereo matching combined with SIFT descriptor in log-chromaticity color space","authors":"Y. S. Heo, Kyoung Mu Lee, Sang Uk Lee","doi":"10.1109/CVPR.2009.5206507","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206507","url":null,"abstract":"Radiometric variations between input images can seriously degrade the performance of stereo matching algorithms. In this situation, mutual information is a very popular and powerful measure which can find any global relationship of intensities between two input images taken from unknown sources. The mutual information-based method, however, is still ambiguous or erroneous as regards local radiometric variations, since it only accounts for global variation between images, and does not contain spatial information properly. In this paper, we present a new method based on mutual information combined with SIFT descriptor to find correspondence for images which undergo local as well as global radiometric variations. We transform the input color images to log-chromaticity color space from which a linear relationship can be established. To incorporate spatial information in mutual information, we utilize the SIFT descriptor which includes near pixel gradient histogram to construct a joint probability in log-chromaticity color space. By combining the mutual information as an appearance measure and the SIFT descriptor as a geometric measure, we devise a robust and accurate stereo system. Experimental results show that our method is superior to the state-of-the art algorithms including conventional mutual information-based methods and window correlation methods under various radiometric changes.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132861938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206666
Jun-Seong Kim, Jin-Hwan Kim, Chang-Su Kim
An adaptive image and video retargeting algorithm based on Fourier analysis is proposed in this work. We first divide an input image into several strips using the gradient information so that each strip consists of textures of similar complexities. Then, we scale each strip adaptively according to its importance measure. More specifically, the distortions, generated by the scaling procedure, are formulated in the frequency domain using the Fourier transform. Then, the objective is to determine the sizes of scaled strips to minimize the sum of distortions, subject to the constraint that the sum of their sizes should equal the size of the target output image. We solve this constrained optimization problem using the Lagrangian multiplier technique. Moreover, we extend the approach to the retargeting of video sequences. Simulation results demonstrate that the proposed algorithm provides reliable retargeting performance efficiently.
{"title":"Adaptive image and video retargeting technique based on Fourier analysis","authors":"Jun-Seong Kim, Jin-Hwan Kim, Chang-Su Kim","doi":"10.1109/CVPR.2009.5206666","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206666","url":null,"abstract":"An adaptive image and video retargeting algorithm based on Fourier analysis is proposed in this work. We first divide an input image into several strips using the gradient information so that each strip consists of textures of similar complexities. Then, we scale each strip adaptively according to its importance measure. More specifically, the distortions, generated by the scaling procedure, are formulated in the frequency domain using the Fourier transform. Then, the objective is to determine the sizes of scaled strips to minimize the sum of distortions, subject to the constraint that the sum of their sizes should equal the size of the target output image. We solve this constrained optimization problem using the Lagrangian multiplier technique. Moreover, we extend the approach to the retargeting of video sequences. Simulation results demonstrate that the proposed algorithm provides reliable retargeting performance efficiently.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126228766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}