Pub Date : 2007-11-12DOI: 10.1109/ICIP.2007.4379003
Charles Florin, N. Paragios, G. Funka-Lea, James P. Williams
Tracking highly deforming structures in space and time arises in numerous applications in computer vision. Static Models are often referred to as linear combinations of a mean model and modes of variation learned from training examples. In Dynamic Modeling, the shape is represented as a function of shapes at previous time steps. In this paper, we introduce a novel technique that uses the spatial and the temporal information on the object deformation. We reformulate tracking as a high order time series prediction mechanism that adapts itself on-line to the newest results. Samples (toward dimensionality reduction) are represented in an orthogonal basis, and are introduced in an auto-regressive model that is determined through an optimization process in appropriate metric spaces. Toward capturing evolving deformations as well as cases that have not been part of the learning stage, a process that updates on-line both the orthogonal basis decomposition and the parameters of the autoregressive model is proposed. Experimental results with a nonstationary dynamic system prove adaptive AR models give better results than both stationary models and models learned over the whole sequence.
{"title":"Time-Varying Linear Autoregressive Models for Segmentation","authors":"Charles Florin, N. Paragios, G. Funka-Lea, James P. Williams","doi":"10.1109/ICIP.2007.4379003","DOIUrl":"https://doi.org/10.1109/ICIP.2007.4379003","url":null,"abstract":"Tracking highly deforming structures in space and time arises in numerous applications in computer vision. Static Models are often referred to as linear combinations of a mean model and modes of variation learned from training examples. In Dynamic Modeling, the shape is represented as a function of shapes at previous time steps. In this paper, we introduce a novel technique that uses the spatial and the temporal information on the object deformation. We reformulate tracking as a high order time series prediction mechanism that adapts itself on-line to the newest results. Samples (toward dimensionality reduction) are represented in an orthogonal basis, and are introduced in an auto-regressive model that is determined through an optimization process in appropriate metric spaces. Toward capturing evolving deformations as well as cases that have not been part of the learning stage, a process that updates on-line both the orthogonal basis decomposition and the parameters of the autoregressive model is proposed. Experimental results with a nonstationary dynamic system prove adaptive AR models give better results than both stationary models and models learned over the whole sequence.","PeriodicalId":131177,"journal":{"name":"2007 IEEE International Conference on Image Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133630352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-11-12DOI: 10.1109/ICIP.2007.4379827
Loren Merritt, R. Vanam
In this paper, we describe rate control and motion estimation in x264, an open source H.264/AVC encoder. We compare the rate control methods of x264 with the JM reference encoder and show that our approach performs well in both PSNR and bitrate. In motion estimation, we describe our implementation of initialization and show that it improves PSNR. We also propose an early termination for simplified uneven cross multi hexagon grid search (UMH) in x264 and show that it improves the speed by a factor of 1.5. Finally, we show that x264 performs 50 times faster and provides bitrates within 5% of the JM reference encoder for the same PSNR.
{"title":"Improved Rate Control and Motion Estimation for H.264 Encoder","authors":"Loren Merritt, R. Vanam","doi":"10.1109/ICIP.2007.4379827","DOIUrl":"https://doi.org/10.1109/ICIP.2007.4379827","url":null,"abstract":"In this paper, we describe rate control and motion estimation in x264, an open source H.264/AVC encoder. We compare the rate control methods of x264 with the JM reference encoder and show that our approach performs well in both PSNR and bitrate. In motion estimation, we describe our implementation of initialization and show that it improves PSNR. We also propose an early termination for simplified uneven cross multi hexagon grid search (UMH) in x264 and show that it improves the speed by a factor of 1.5. Finally, we show that x264 performs 50 times faster and provides bitrates within 5% of the JM reference encoder for the same PSNR.","PeriodicalId":131177,"journal":{"name":"2007 IEEE International Conference on Image Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132266478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-11-12DOI: 10.1109/ICIP.2007.4379275
Zhihai He
In this work, we introduce a nonlinear geometric transform, called peak transform, for efficient image representation and coding. Coupled with wavelet transform and subband decomposition, the peak transform is able to significantly reduce signal energy in high-frequency subbands and achieve a significant transform coding gain. This has important applications in efficient data representation and compression. Based on peak transform (PT), we design an image encoder, called PT encoder, for efficient image compression. Our extensive experimental results demonstrate that, in wavelet-based subband decomposition, the signal energy in high-frequency subbands can be reduced by up to 60% if a peak transform is applied. The PT image encoder outperforms state-of-the-art JPEG2000 and H.264 (INTRA) encoders by up to 2-3 dB in PSNR (peak signal-to-noise ratio), especially for images with a significant amount of high-frequency components.
{"title":"Peak Transform - A Nonlinear Transform for Efficient Image Representation and Coding","authors":"Zhihai He","doi":"10.1109/ICIP.2007.4379275","DOIUrl":"https://doi.org/10.1109/ICIP.2007.4379275","url":null,"abstract":"In this work, we introduce a nonlinear geometric transform, called peak transform, for efficient image representation and coding. Coupled with wavelet transform and subband decomposition, the peak transform is able to significantly reduce signal energy in high-frequency subbands and achieve a significant transform coding gain. This has important applications in efficient data representation and compression. Based on peak transform (PT), we design an image encoder, called PT encoder, for efficient image compression. Our extensive experimental results demonstrate that, in wavelet-based subband decomposition, the signal energy in high-frequency subbands can be reduced by up to 60% if a peak transform is applied. The PT image encoder outperforms state-of-the-art JPEG2000 and H.264 (INTRA) encoders by up to 2-3 dB in PSNR (peak signal-to-noise ratio), especially for images with a significant amount of high-frequency components.","PeriodicalId":131177,"journal":{"name":"2007 IEEE International Conference on Image Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132579637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-11-12DOI: 10.1109/ICIP.2007.4379550
J. Lee, K. Yow
Our paper presents a system for efficient recognition of landmarks taken from camera phones. Information such as tutorial rooms within the captured landmarks is returned to user within seconds. The system uses a database of multiple viewpoint's images for matching. Various navigational aids and sensors are used to optimize accuracy and retrieval time by providing complementary information about relative position and viewpoint of each query image. This makes our system less sensitive to orientation, scale and perspective distortion. Multi-scale approach and a reliability score model are proposed in this application. Our system is validated by several experiments in the campus, with images taken from different resolution's camera phones, positions and times of day.
{"title":"Image Recognition for Mobile Applications","authors":"J. Lee, K. Yow","doi":"10.1109/ICIP.2007.4379550","DOIUrl":"https://doi.org/10.1109/ICIP.2007.4379550","url":null,"abstract":"Our paper presents a system for efficient recognition of landmarks taken from camera phones. Information such as tutorial rooms within the captured landmarks is returned to user within seconds. The system uses a database of multiple viewpoint's images for matching. Various navigational aids and sensors are used to optimize accuracy and retrieval time by providing complementary information about relative position and viewpoint of each query image. This makes our system less sensitive to orientation, scale and perspective distortion. Multi-scale approach and a reliability score model are proposed in this application. Our system is validated by several experiments in the campus, with images taken from different resolution's camera phones, positions and times of day.","PeriodicalId":131177,"journal":{"name":"2007 IEEE International Conference on Image Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128817442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-11-12DOI: 10.1109/ICIP.2007.4379318
Chang Su, A. Amer
A threshold quantization algorithm for robust change detection is proposed in this paper. According to the threshold distribution of difference frames, a 4-level Lloyd-Max quantizer is designed, and then, based on the topological stabilization of video frames, the Lloyd-Max quantizer is refined by a linear adjusting function to form the proposed threshold quantizer. Objective and subjective experiments show that the proposed quantizer greatly improves the robustness of the thresholding methods for change detection thus significantly improves the quality of change masks without increasing computation loads.
{"title":"Topological-Stabilization Based Threshold Quantization for Robust Change Detection","authors":"Chang Su, A. Amer","doi":"10.1109/ICIP.2007.4379318","DOIUrl":"https://doi.org/10.1109/ICIP.2007.4379318","url":null,"abstract":"A threshold quantization algorithm for robust change detection is proposed in this paper. According to the threshold distribution of difference frames, a 4-level Lloyd-Max quantizer is designed, and then, based on the topological stabilization of video frames, the Lloyd-Max quantizer is refined by a linear adjusting function to form the proposed threshold quantizer. Objective and subjective experiments show that the proposed quantizer greatly improves the robustness of the thresholding methods for change detection thus significantly improves the quality of change masks without increasing computation loads.","PeriodicalId":131177,"journal":{"name":"2007 IEEE International Conference on Image Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128841237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-11-12DOI: 10.1109/ICIP.2007.4379278
Yu Liu, K. Ngan
In this paper, we propose a new weighted adaptive lifting (WAL)-based wavelet transform that is designed to solve the problems existing in the previous adaptive directional lifting (ADL) approach. The proposed approach uses the weighted function to make sure that the prediction and update stages are consistent, the directional interpolation to improve the orientation property of interpolated image, and adaptive interpolation filter to adjust to statistical property of each image. Experimental results show that the proposed WAL-based wavelet transform for image coding outperforms the conventional lifting-based wavelet transform up to 3.02 dB in PSNR and significant improvement in subjective quality is also observed. Compared with the ADL approach, up to 1.18 dB improvement in PSNR is reported.
{"title":"Weighted Adaptive Lifting-Basedwavelet Transform","authors":"Yu Liu, K. Ngan","doi":"10.1109/ICIP.2007.4379278","DOIUrl":"https://doi.org/10.1109/ICIP.2007.4379278","url":null,"abstract":"In this paper, we propose a new weighted adaptive lifting (WAL)-based wavelet transform that is designed to solve the problems existing in the previous adaptive directional lifting (ADL) approach. The proposed approach uses the weighted function to make sure that the prediction and update stages are consistent, the directional interpolation to improve the orientation property of interpolated image, and adaptive interpolation filter to adjust to statistical property of each image. Experimental results show that the proposed WAL-based wavelet transform for image coding outperforms the conventional lifting-based wavelet transform up to 3.02 dB in PSNR and significant improvement in subjective quality is also observed. Compared with the ADL approach, up to 1.18 dB improvement in PSNR is reported.","PeriodicalId":131177,"journal":{"name":"2007 IEEE International Conference on Image Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131785582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-11-12DOI: 10.1109/ICIP.2007.4379762
Jaideep Jeyakar, R. Venkatesh Babu, K. Ramakrishnan
The mean shift algorithm has been proved to be efficient for tracking 2D blobs through a video sequence. Even so, this algorithm has certain inherent disadvantages. In this paper, we propose a robust tracking algorithm which overcomes the drawbacks of global color histogram based tracking. We incorporate tracking based only on reliable colors by separating the object from its background. A fast yet robust model update is employed to overcome illumination changes. This algorithm is computationally simple enough to be executed real time and was tested on several complex video sequences. The proposed technique could be easily extended to other tracking algorithms too.
{"title":"Robust Object Tracking using Local Kernels and Background Information","authors":"Jaideep Jeyakar, R. Venkatesh Babu, K. Ramakrishnan","doi":"10.1109/ICIP.2007.4379762","DOIUrl":"https://doi.org/10.1109/ICIP.2007.4379762","url":null,"abstract":"The mean shift algorithm has been proved to be efficient for tracking 2D blobs through a video sequence. Even so, this algorithm has certain inherent disadvantages. In this paper, we propose a robust tracking algorithm which overcomes the drawbacks of global color histogram based tracking. We incorporate tracking based only on reliable colors by separating the object from its background. A fast yet robust model update is employed to overcome illumination changes. This algorithm is computationally simple enough to be executed real time and was tested on several complex video sequences. The proposed technique could be easily extended to other tracking algorithms too.","PeriodicalId":131177,"journal":{"name":"2007 IEEE International Conference on Image Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131903795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-11-12DOI: 10.1109/ICIP.2007.4379946
A. Czyżewski, P. Maziewski
Wow distortion reduction has not attracted an adequate scientific attention so far. Only few papers on the subject are available, concerning mostly archive gramophone records, wax cylinders, and magnetic tapes affected by wow. This paper outlines researched wow reduction algorithms concerning archive movie soundtracks, or more generally audio recordings accompanying archival visual contents. The methods presented here are based on the pilot tone tracking, on the spectral analysis of genuine audio components, and on non-uniform resampling. The paper provides only a short overview of the concepts founding those methods; other studied approaches to the wow processing, as well as a more detailed description of the presented ones, can be found in referenced papers.
{"title":"Some Techniques for Wow Effect Reduction","authors":"A. Czyżewski, P. Maziewski","doi":"10.1109/ICIP.2007.4379946","DOIUrl":"https://doi.org/10.1109/ICIP.2007.4379946","url":null,"abstract":"Wow distortion reduction has not attracted an adequate scientific attention so far. Only few papers on the subject are available, concerning mostly archive gramophone records, wax cylinders, and magnetic tapes affected by wow. This paper outlines researched wow reduction algorithms concerning archive movie soundtracks, or more generally audio recordings accompanying archival visual contents. The methods presented here are based on the pilot tone tracking, on the spectral analysis of genuine audio components, and on non-uniform resampling. The paper provides only a short overview of the concepts founding those methods; other studied approaches to the wow processing, as well as a more detailed description of the presented ones, can be found in referenced papers.","PeriodicalId":131177,"journal":{"name":"2007 IEEE International Conference on Image Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127432556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-11-12DOI: 10.1109/ICIP.2007.4379104
M. Brucher, C. Heinrich, F. Heitz, J. Armspach
This communication deals with data reduction and regression. A set of high dimensional data (e.g., images) usually has only a few degrees of freedom with corresponding variables that are used to parameterize the original data set. Data understanding, visualization and classification are the usual goals. The proposed method reduces data considering a unique set of low-dimensional variables and a user-defined cost function in the multidimensional scaling framework. Mapping of the reduced variables to the original data is also addressed, which is another contribution of this work. Typical data reduction methods, such as Isomap or LLE, do not deal with this important aspect of manifold learning. We also tackle the inversion of the mapping, which makes it possible to project high-dimensional noisy points onto the manifold, like PCA with linear models. We present an application of our approach to several standard data sets such as the SwissRoll.
{"title":"Unsupervised Nonlinear Manifold Learning","authors":"M. Brucher, C. Heinrich, F. Heitz, J. Armspach","doi":"10.1109/ICIP.2007.4379104","DOIUrl":"https://doi.org/10.1109/ICIP.2007.4379104","url":null,"abstract":"This communication deals with data reduction and regression. A set of high dimensional data (e.g., images) usually has only a few degrees of freedom with corresponding variables that are used to parameterize the original data set. Data understanding, visualization and classification are the usual goals. The proposed method reduces data considering a unique set of low-dimensional variables and a user-defined cost function in the multidimensional scaling framework. Mapping of the reduced variables to the original data is also addressed, which is another contribution of this work. Typical data reduction methods, such as Isomap or LLE, do not deal with this important aspect of manifold learning. We also tackle the inversion of the mapping, which makes it possible to project high-dimensional noisy points onto the manifold, like PCA with linear models. We present an application of our approach to several standard data sets such as the SwissRoll.","PeriodicalId":131177,"journal":{"name":"2007 IEEE International Conference on Image Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124290862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-11-12DOI: 10.1109/ICIP.2007.4379106
P. Kisilev, D. Shaked, Suk Hwan Lim
In this work, we propose noise and signal activity estimation method that discriminates noise from signal based on local and global properties of the image data. The method yields pixel-wise maps of the noise variance and of the signal activity. Using these maps to guide imaging algorithms such as image enhancement and print defect detection improves their performance. The proposed method does not assume a white Gaussian noise model; it is very efficient computationally and, as such, is useful for a wide variety of applications.
{"title":"Noise and Signal Activity Maps for Better Imaging Algorithms","authors":"P. Kisilev, D. Shaked, Suk Hwan Lim","doi":"10.1109/ICIP.2007.4379106","DOIUrl":"https://doi.org/10.1109/ICIP.2007.4379106","url":null,"abstract":"In this work, we propose noise and signal activity estimation method that discriminates noise from signal based on local and global properties of the image data. The method yields pixel-wise maps of the noise variance and of the signal activity. Using these maps to guide imaging algorithms such as image enhancement and print defect detection improves their performance. The proposed method does not assume a white Gaussian noise model; it is very efficient computationally and, as such, is useful for a wide variety of applications.","PeriodicalId":131177,"journal":{"name":"2007 IEEE International Conference on Image Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124341764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}