Pub Date : 2010-12-10DOI: 10.1109/MMSP.2010.5662043
F. Dufaux, T. Ebrahimi
In this paper, we propose a new Distributed Video Coding (DVC) architecture where motion estimation is performed both at the encoder and decoder, effectively combining global and local motion models. We show that the proposed approach improves significantly the quality of Side Information (SI), especially for sequences with complex motion patterns. In turn, it leads to rate-distortion gains of up to 1 dB when compared to the state-of-the-art DISCOVER DVC codec.
{"title":"Encoder and decoder side global and local motion estimation for Distributed Video Coding","authors":"F. Dufaux, T. Ebrahimi","doi":"10.1109/MMSP.2010.5662043","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662043","url":null,"abstract":"In this paper, we propose a new Distributed Video Coding (DVC) architecture where motion estimation is performed both at the encoder and decoder, effectively combining global and local motion models. We show that the proposed approach improves significantly the quality of Side Information (SI), especially for sequences with complex motion patterns. In turn, it leads to rate-distortion gains of up to 1 dB when compared to the state-of-the-art DISCOVER DVC codec.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130541580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-10DOI: 10.1109/MMSP.2010.5662049
Liyuan Xing, Junyong You, T. Ebrahimi, A. Perkis
Most quality models for stereoscopic presentations are dedicated to measuring quality degradation caused by compression artefacts. However, non-compression distortions induced during acquisition and presentation usually have significant influence on 3D viewing experience. In this paper, we propose an objective metric for viewing experience assessment by taking camera baseline and binocular distortion crosstalk into consideration. In particular, the proposed metric is based on our previous work on both subjective evaluation and objective assessment of crosstalk perception. Results on a publicly available stereoscopic quality database demonstrate that the proposed metric can achieve more than 87% correlation with subjective assessment of viewing experience.
{"title":"An objective metric for assessing quality of experience on stereoscopic images","authors":"Liyuan Xing, Junyong You, T. Ebrahimi, A. Perkis","doi":"10.1109/MMSP.2010.5662049","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662049","url":null,"abstract":"Most quality models for stereoscopic presentations are dedicated to measuring quality degradation caused by compression artefacts. However, non-compression distortions induced during acquisition and presentation usually have significant influence on 3D viewing experience. In this paper, we propose an objective metric for viewing experience assessment by taking camera baseline and binocular distortion crosstalk into consideration. In particular, the proposed metric is based on our previous work on both subjective evaluation and objective assessment of crosstalk perception. Results on a publicly available stereoscopic quality database demonstrate that the proposed metric can achieve more than 87% correlation with subjective assessment of viewing experience.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133298080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-10DOI: 10.1109/MMSP.2010.5662035
D. Cernea, A. Munteanu, A. Alecu, J. Cornelis, P. Schelkens, F. Morán
Our recently proposed wavelet-based L-infinite-constrained coding approach for meshes ensures that the maximum error between the vertex positions in the original and decoded meshes is guaranteed to be lower than a given upper bound. Instantiations of both L-2 and L-infinite coding approaches are demonstrated for MESHGRID, which is a scalable 3D object encoding system, part of MPEG-4 AFX. In this survey paper, we compare the novel L-infinite distortion estimator against the L-2 distortion estimator which is typically employed in 3D mesh coding systems. In addition, we show that, under certain conditions, the L-infinite estimator can be exploited to approximate the Hausdorff distance in real-time implementations.
{"title":"Efficient error control in 3D mesh coding","authors":"D. Cernea, A. Munteanu, A. Alecu, J. Cornelis, P. Schelkens, F. Morán","doi":"10.1109/MMSP.2010.5662035","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662035","url":null,"abstract":"Our recently proposed wavelet-based L-infinite-constrained coding approach for meshes ensures that the maximum error between the vertex positions in the original and decoded meshes is guaranteed to be lower than a given upper bound. Instantiations of both L-2 and L-infinite coding approaches are demonstrated for MESHGRID, which is a scalable 3D object encoding system, part of MPEG-4 AFX. In this survey paper, we compare the novel L-infinite distortion estimator against the L-2 distortion estimator which is typically employed in 3D mesh coding systems. In addition, we show that, under certain conditions, the L-infinite estimator can be exploited to approximate the Hausdorff distance in real-time implementations.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"27 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114121610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-10DOI: 10.1109/MMSP.2010.5662036
Thomas Maugey, C. Yaacoub, J. Farah, Marco Cagnazzo, B. Pesquet-Popescu
Side information construction in Wyner-Ziv video coding is a sensible task which strongly influences the final ratedistortion performance of the scheme. This side information is usually generated through an interpolation of the previous and next images. Some of the zones of a scene however, such as the occlusions, cannot be estimated with other frames. In this paper we propose to avoid this problem by sending some hash information for these unpredictable zones of the image. The resulting algorithm is described and tested here. The obtained results show the advantages of using localized hash information for the high error zones in distributed video coding.
{"title":"Side information enhancement using an adaptive hash-based genetic algorithm in a Wyner-Ziv context","authors":"Thomas Maugey, C. Yaacoub, J. Farah, Marco Cagnazzo, B. Pesquet-Popescu","doi":"10.1109/MMSP.2010.5662036","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662036","url":null,"abstract":"Side information construction in Wyner-Ziv video coding is a sensible task which strongly influences the final ratedistortion performance of the scheme. This side information is usually generated through an interpolation of the previous and next images. Some of the zones of a scene however, such as the occlusions, cannot be estimated with other frames. In this paper we propose to avoid this problem by sending some hash information for these unpredictable zones of the image. The resulting algorithm is described and tested here. The obtained results show the advantages of using localized hash information for the high error zones in distributed video coding.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"199 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122429260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-10DOI: 10.1109/MMSP.2010.5662006
Beom Su Kim, H. Koo, N. Cho
We propose a new image projection method in an attempt to reduce the perceptual distortion in panoramic image mosaics. Specifically, we reduce the stretching distortion of some image patches and bending of straight lines. Since the stretching distortion usually occurs when projecting a viewing sphere to the cylindrical image surface in an oblique direction, we propose to use an adjustable cylindrical surface to match the viewing direction with the equator of the cylindrical surface. Also, in order to find the trade-off between the stretching distortion and bending of straight lines, we also adjust the curvature of cylindrical surface according to the object of interest in the image. The warping function from the viewing sphere to the adjustable image surface is derived and the amount of distortion caused by this warping function is also defined. From the measure of distortion, the optimal pose of the cylindrical image plane and its curvature are determined, and the image on the viewing sphere is projected on the optimal plane. The experimental results show that the proposed method produces the panoramic image with less distortion than the existing methods.
{"title":"A new image projection method for panoramic image stitching","authors":"Beom Su Kim, H. Koo, N. Cho","doi":"10.1109/MMSP.2010.5662006","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662006","url":null,"abstract":"We propose a new image projection method in an attempt to reduce the perceptual distortion in panoramic image mosaics. Specifically, we reduce the stretching distortion of some image patches and bending of straight lines. Since the stretching distortion usually occurs when projecting a viewing sphere to the cylindrical image surface in an oblique direction, we propose to use an adjustable cylindrical surface to match the viewing direction with the equator of the cylindrical surface. Also, in order to find the trade-off between the stretching distortion and bending of straight lines, we also adjust the curvature of cylindrical surface according to the object of interest in the image. The warping function from the viewing sphere to the adjustable image surface is derived and the amount of distortion caused by this warping function is also defined. From the measure of distortion, the optimal pose of the cylindrical image plane and its curvature are determined, and the image on the viewing sphere is projected on the optimal plane. The experimental results show that the proposed method produces the panoramic image with less distortion than the existing methods.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115456991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-10DOI: 10.1109/MMSP.2010.5662037
Y. Priziment, D. Malah
Performance of a distributed video coding system depends, to a large extent, on the accuracy of joint source and side information distribution modeling. In this work we first examine a family of stationary joint distribution models. As one of our findings, we propose to use the double-Gamma model as an alternative to the widely adopted Laplace model, due to its superior performance. In addition, we suggest a new spatially adaptive model, which enables to follow the spatially varying joint statistics of the source and side information. We present two methods, class-based and neighborhood-based, for estimation of the spatially varying model parameters. We then show how the obtained pixel domain model can be used in the transform domain to facilitate utilization of frame spatial redundancy. Integration of the proposed models into a distributed video coding system resulted in improved performance.
{"title":"On joint distribution modeling in distributed video coding systems","authors":"Y. Priziment, D. Malah","doi":"10.1109/MMSP.2010.5662037","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662037","url":null,"abstract":"Performance of a distributed video coding system depends, to a large extent, on the accuracy of joint source and side information distribution modeling. In this work we first examine a family of stationary joint distribution models. As one of our findings, we propose to use the double-Gamma model as an alternative to the widely adopted Laplace model, due to its superior performance. In addition, we suggest a new spatially adaptive model, which enables to follow the spatially varying joint statistics of the source and side information. We present two methods, class-based and neighborhood-based, for estimation of the spatially varying model parameters. We then show how the obtained pixel domain model can be used in the transform domain to facilitate utilization of frame spatial redundancy. Integration of the proposed models into a distributed video coding system resulted in improved performance.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128300932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-10DOI: 10.1109/MMSP.2010.5662010
Monica-Claudia Dobrea, D. Dobrea, D. Alexa
In this paper, we further develop the idea of subject specific mental tasks selection process as a necessary prerequisite in any EEG-based brain computer interface (BCI) application. While, in two previous researches we proved — using the EEG-extracted auto-regressive (AR) parameters and twelve different mental tasks —, the major gains one can obtain in tasks classification performance only by selecting the proper tasks, here we investigate the putative relation that exists between each (subject, given EEG features) pair and the corresponding individual optimum set of cognitive tasks. In this idea, a set of three different spectrum relative power parameters were considered. The classification performances achieved with these last EEG features are comparatively presented for two subjects and for two sets of tasks: i) the frequently used in the BCI field, Keirn and Aunon set of tasks, and ii) the previously determined (AR-based) optimum individual set of tasks.
{"title":"Spectral EEG features and tasks selection process: Some considerations toward BCI applications","authors":"Monica-Claudia Dobrea, D. Dobrea, D. Alexa","doi":"10.1109/MMSP.2010.5662010","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662010","url":null,"abstract":"In this paper, we further develop the idea of subject specific mental tasks selection process as a necessary prerequisite in any EEG-based brain computer interface (BCI) application. While, in two previous researches we proved — using the EEG-extracted auto-regressive (AR) parameters and twelve different mental tasks —, the major gains one can obtain in tasks classification performance only by selecting the proper tasks, here we investigate the putative relation that exists between each (subject, given EEG features) pair and the corresponding individual optimum set of cognitive tasks. In this idea, a set of three different spectrum relative power parameters were considered. The classification performances achieved with these last EEG features are comparatively presented for two subjects and for two sets of tasks: i) the frequently used in the BCI field, Keirn and Aunon set of tasks, and ii) the previously determined (AR-based) optimum individual set of tasks.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117150498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-10DOI: 10.1109/MMSP.2010.5662063
Chih-Chung Hsu, Chia-Wen Lin, Chiou-Ting Hsu, H. Liao, Jen-Yu Yu
This paper proposes a two-step prototype-face-based scheme of hallucinating the high-resolution detail of a low-resolution input face image. The proposed scheme is mainly composed of two steps: the global estimation step and the local facial-parts refinement step. In the global estimation step, the initial high-resolution face image is hallucinated via a linear combination of the global prototype faces with a coefficient vector. Instead of estimating coefficient vector in the high-dimensional raw image domain, we propose a maximum a posteriori (MAP) estimator to estimate the optimum set of coefficients in the low-dimensional coefficient domain. In the local refinement step, the facial parts (i.e., eyes, nose and mouth) are further refined using a basis selection method based on overcomplete nonnegative matrix factorization (ONMF). Experimental results demonstrate that the proposed method can achieve significant subjective and objective improvement over state-of-the-art face hallucination methods, especially when an input face does not belong to a person in the training data set.
{"title":"Face hallucination using Bayesian global estimation and local basis selection","authors":"Chih-Chung Hsu, Chia-Wen Lin, Chiou-Ting Hsu, H. Liao, Jen-Yu Yu","doi":"10.1109/MMSP.2010.5662063","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662063","url":null,"abstract":"This paper proposes a two-step prototype-face-based scheme of hallucinating the high-resolution detail of a low-resolution input face image. The proposed scheme is mainly composed of two steps: the global estimation step and the local facial-parts refinement step. In the global estimation step, the initial high-resolution face image is hallucinated via a linear combination of the global prototype faces with a coefficient vector. Instead of estimating coefficient vector in the high-dimensional raw image domain, we propose a maximum a posteriori (MAP) estimator to estimate the optimum set of coefficients in the low-dimensional coefficient domain. In the local refinement step, the facial parts (i.e., eyes, nose and mouth) are further refined using a basis selection method based on overcomplete nonnegative matrix factorization (ONMF). Experimental results demonstrate that the proposed method can achieve significant subjective and objective improvement over state-of-the-art face hallucination methods, especially when an input face does not belong to a person in the training data set.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123323657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-10DOI: 10.1109/MMSP.2010.5662016
L. Oudre, C. Févotte, Y. Grenier
This paper describes a method for chord recognition from audio signals. Our method provides a coherent and relevant probabilistic framework for template-based transcription. The only information needed for the transcription is the definition of the chords : in particular neither annotated audio data nor music theory knowledge is required. We extract from the signal a succession of chroma vectors which are our model observations. We propose a generative model for these observations from chord distribution probabilities and fixed chord templates. The parameters are evaluated through an EM algorithm. In order to capture the temporal structure, we apply some post-processing filtering methods before detecting the chords. Our method is evaluated on two audio corpus. Results show that our method outperforms state-of-the-art chord recognition methods and also gives more relevant chord transcriptions.
{"title":"Probabilistic framework for template-based chord recognition","authors":"L. Oudre, C. Févotte, Y. Grenier","doi":"10.1109/MMSP.2010.5662016","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662016","url":null,"abstract":"This paper describes a method for chord recognition from audio signals. Our method provides a coherent and relevant probabilistic framework for template-based transcription. The only information needed for the transcription is the definition of the chords : in particular neither annotated audio data nor music theory knowledge is required. We extract from the signal a succession of chroma vectors which are our model observations. We propose a generative model for these observations from chord distribution probabilities and fixed chord templates. The parameters are evaluated through an EM algorithm. In order to capture the temporal structure, we apply some post-processing filtering methods before detecting the chords. Our method is evaluated on two audio corpus. Results show that our method outperforms state-of-the-art chord recognition methods and also gives more relevant chord transcriptions.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"63 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114060227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-10DOI: 10.1109/MMSP.2010.5661995
F. Farhadzadeh, S. Voloshynovskiy, O. Koval
In this paper, the performance of the content identification based on digital fingerprinting and order statistic list decoding is analyzed by evaluating the probabilities of correct identification, false acceptance and the probability mass function of queried binary fingerprint position on the list of candidates. The particular attention is dedicated to the cases when traditional maximum likelihood decoder fails to produce the reliable content identification. The maximum likelihood decoding is shown to be a particular case of order statistic list decoding for the list size equals 1. We demonstrate the efficiency of the proposed content identification system performance by investigating the probability mass function behavior and imposing the constraint on the cardinality of list size.
{"title":"Content identification based on digital fingerprint: What can be done if ML decoding fails?","authors":"F. Farhadzadeh, S. Voloshynovskiy, O. Koval","doi":"10.1109/MMSP.2010.5661995","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5661995","url":null,"abstract":"In this paper, the performance of the content identification based on digital fingerprinting and order statistic list decoding is analyzed by evaluating the probabilities of correct identification, false acceptance and the probability mass function of queried binary fingerprint position on the list of candidates. The particular attention is dedicated to the cases when traditional maximum likelihood decoder fails to produce the reliable content identification. The maximum likelihood decoding is shown to be a particular case of order statistic list decoding for the list size equals 1. We demonstrate the efficiency of the proposed content identification system performance by investigating the probability mass function behavior and imposing the constraint on the cardinality of list size.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115387887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}