Pub Date : 2009-05-06DOI: 10.1109/PCS.2009.5167453
S. Kamp, B. Bross, M. Wien
In this paper, a decoder side motion vector derivation scheme for inter frame video coding is proposed. Using a template matching algorithm, motion information is derived at the decoder instead of explicitly coding the information into the bitstream. Based on Lagrangian rate-distortion optimisation, the encoder locally signals whether motion derivation or forward motion coding is used. While our method exploits multiple reference pictures for improved prediction performance and bitrate reduction, only a small template matching search range is required. Derived motion information is reused to improve the performance of predictive motion vector coding in subsequent blocks. An efficient conditional signalling scheme for motion derivation in Skip blocks is employed. The motion vector derivation method has been implemented as an extension to H.264/AVC. Simulation results show that a bitrate reduction of up to 10.4% over H.264/AVC is achieved by the proposed scheme.
{"title":"Decoder side motion vector derivation for inter frame video coding","authors":"S. Kamp, B. Bross, M. Wien","doi":"10.1109/PCS.2009.5167453","DOIUrl":"https://doi.org/10.1109/PCS.2009.5167453","url":null,"abstract":"In this paper, a decoder side motion vector derivation scheme for inter frame video coding is proposed. Using a template matching algorithm, motion information is derived at the decoder instead of explicitly coding the information into the bitstream. Based on Lagrangian rate-distortion optimisation, the encoder locally signals whether motion derivation or forward motion coding is used. While our method exploits multiple reference pictures for improved prediction performance and bitrate reduction, only a small template matching search range is required. Derived motion information is reused to improve the performance of predictive motion vector coding in subsequent blocks. An efficient conditional signalling scheme for motion derivation in Skip blocks is employed. The motion vector derivation method has been implemented as an extension to H.264/AVC. Simulation results show that a bitrate reduction of up to 10.4% over H.264/AVC is achieved by the proposed scheme.","PeriodicalId":247944,"journal":{"name":"2008 15th IEEE International Conference on Image Processing","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128039073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-12-12DOI: 10.1109/ICIP.2008.4711703
B. Davis, S. Lazebnik
This paper uses a recently introduced manifold kernel regression technique to explore the relationship between facial shape and attractiveness on a heterogeneous dataset of over three thousand images gathered from the Web. Using the concept of the Frechet mean of images under a diffeomorphic transformation model, we evolve the average face as a function of attractiveness ratings. Examining these averages and associated deformation maps enables us to discern aggregate shape change trends for male and female faces.
{"title":"Analysis of human attractiveness using manifold kernel regression","authors":"B. Davis, S. Lazebnik","doi":"10.1109/ICIP.2008.4711703","DOIUrl":"https://doi.org/10.1109/ICIP.2008.4711703","url":null,"abstract":"This paper uses a recently introduced manifold kernel regression technique to explore the relationship between facial shape and attractiveness on a heterogeneous dataset of over three thousand images gathered from the Web. Using the concept of the Frechet mean of images under a diffeomorphic transformation model, we evolve the average face as a function of attractiveness ratings. Examining these averages and associated deformation maps enables us to discern aggregate shape change trends for male and female faces.","PeriodicalId":247944,"journal":{"name":"2008 15th IEEE International Conference on Image Processing","volume":"518 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123118931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-12-12DOI: 10.1109/ICIP.2008.4711734
Chuanxin Hu, L. Cheong
A parallel camera array resembles a large class of biological visual systems. It consists of two cameras moving in tandem, which have parallel viewing directions and no overlap in the visual fields. Without correspondences, we leverage on pair of parallel visual rays to remove rotational flows and obtain a quasi-parallax motion field, which leads to an accurate and parsimonious solution for translation recovery. The rotation is subsequently recovered using the epipolar constraints and benefits greatly from the good translation estimate. Experimental results show that the linear and the bundle adjustment methods achieve comparable performances.
{"title":"Linear ego-motion recovery algorithm based on quasi-parallax","authors":"Chuanxin Hu, L. Cheong","doi":"10.1109/ICIP.2008.4711734","DOIUrl":"https://doi.org/10.1109/ICIP.2008.4711734","url":null,"abstract":"A parallel camera array resembles a large class of biological visual systems. It consists of two cameras moving in tandem, which have parallel viewing directions and no overlap in the visual fields. Without correspondences, we leverage on pair of parallel visual rays to remove rotational flows and obtain a quasi-parallax motion field, which leads to an accurate and parsimonious solution for translation recovery. The rotation is subsequently recovered using the epipolar constraints and benefits greatly from the good translation estimate. Experimental results show that the linear and the bundle adjustment methods achieve comparable performances.","PeriodicalId":247944,"journal":{"name":"2008 15th IEEE International Conference on Image Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116656316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-12-12DOI: 10.1109/ICIP.2008.4711733
Pierre-Marc Jodoin, J. Konrad, Venkatesh Saligrama, Vincent Veilleux-Gaboury
Fast and accurate motion detection in the presence of camera jitter is known to be a difficult problem. Existing statistical methods often produce abundant false positives since jitter-induced motion is difficult to differentiate from scene-induced motion. Although frame alignment by means of camera motion compensation can help resolve such ambiguities, the additional steps of motion estimation and compensation increase the complexity of the overall algorithm. In this paper, we address camera jitter by applying background subtraction to scene dynamics instead of scene photometry. In our method, an object is assumed moving if its dynamical behavior is different from the average dynamics observed in a reference sequence. Our method is conceptually simple, fast, requires little memory, and is easy to train, even on videos containing moving objects. It has been tested and performs well on indoor and outdoor sequences with strong camera jitter.
{"title":"Motion detection with an unstable camera","authors":"Pierre-Marc Jodoin, J. Konrad, Venkatesh Saligrama, Vincent Veilleux-Gaboury","doi":"10.1109/ICIP.2008.4711733","DOIUrl":"https://doi.org/10.1109/ICIP.2008.4711733","url":null,"abstract":"Fast and accurate motion detection in the presence of camera jitter is known to be a difficult problem. Existing statistical methods often produce abundant false positives since jitter-induced motion is difficult to differentiate from scene-induced motion. Although frame alignment by means of camera motion compensation can help resolve such ambiguities, the additional steps of motion estimation and compensation increase the complexity of the overall algorithm. In this paper, we address camera jitter by applying background subtraction to scene dynamics instead of scene photometry. In our method, an object is assumed moving if its dynamical behavior is different from the average dynamics observed in a reference sequence. Our method is conceptually simple, fast, requires little memory, and is easy to train, even on videos containing moving objects. It has been tested and performs well on indoor and outdoor sequences with strong camera jitter.","PeriodicalId":247944,"journal":{"name":"2008 15th IEEE International Conference on Image Processing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121067192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-12-12DOI: 10.1109/ICIP.2008.4712451
S. Nasir, S. Worrall, M. Mrak, A. Kondoz
This paper presents a novel channel optimisation scheme that enhances the quality of object based video, transmitted over a fixed bandwidth channel. The optimisation methodology is based on an accurate modelling of video packet distortion at the encoder. Video packets are ranked according to their expected distortion and are then mapped to one of a number of different priority radio bearers. In the proposed scheme the video compression technique uses motion compensated prediction and video frames are split into a number of video packets. The algorithm performance is demonstrated for object based MPEG-4 video transmission over a UMTS/FDD system. The results demonstrate that the performance gain achieved with the proposed scheme can reach 2 dB, compared with the equal error protection scheme for video transmission over a fixed bandwidth channel.
{"title":"Multi bearer channel resource allocation for optimised transmission of video objects","authors":"S. Nasir, S. Worrall, M. Mrak, A. Kondoz","doi":"10.1109/ICIP.2008.4712451","DOIUrl":"https://doi.org/10.1109/ICIP.2008.4712451","url":null,"abstract":"This paper presents a novel channel optimisation scheme that enhances the quality of object based video, transmitted over a fixed bandwidth channel. The optimisation methodology is based on an accurate modelling of video packet distortion at the encoder. Video packets are ranked according to their expected distortion and are then mapped to one of a number of different priority radio bearers. In the proposed scheme the video compression technique uses motion compensated prediction and video frames are split into a number of video packets. The algorithm performance is demonstrated for object based MPEG-4 video transmission over a UMTS/FDD system. The results demonstrate that the performance gain achieved with the proposed scheme can reach 2 dB, compared with the equal error protection scheme for video transmission over a fixed bandwidth channel.","PeriodicalId":247944,"journal":{"name":"2008 15th IEEE International Conference on Image Processing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121129570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-12-12DOI: 10.1109/ICIP.2008.4711874
Georgios Evangelopoulos, P. Maragos
Texture modeling and separation of structure in images are treated in synergy. A variational image decomposition scheme is formulated using explicit texture reconstruction constraints from the outputs of linear filters tuned to different spatial frequencies and orientations. Relevant to the texture image part information is reconstructed using modulation modeling and component selection. The general formulation leads to a u + Kv model of K + 1 image components, with multiple texture subcomponents.
{"title":"Texture modulation-constrained image decomposition","authors":"Georgios Evangelopoulos, P. Maragos","doi":"10.1109/ICIP.2008.4711874","DOIUrl":"https://doi.org/10.1109/ICIP.2008.4711874","url":null,"abstract":"Texture modeling and separation of structure in images are treated in synergy. A variational image decomposition scheme is formulated using explicit texture reconstruction constraints from the outputs of linear filters tuned to different spatial frequencies and orientations. Relevant to the texture image part information is reconstructed using modulation modeling and component selection. The general formulation leads to a u + Kv model of K + 1 image components, with multiple texture subcomponents.","PeriodicalId":247944,"journal":{"name":"2008 15th IEEE International Conference on Image Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121259946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-12-12DOI: 10.1109/ICIP.2008.4712077
W. Mantzel, J. Romberg
There is a significant amount of redundancy between video frames or images that can be explained by considering these observations as samples of a light field function. By using a compact depth- augmented representation for such a light field function, it may even be possible to tie-together inter-frame dependencies in a more meaningful way than conventional 2-D intensity based motion compensation methods. We propose a depth-augmented layered orthographic light field representation and show how it may be constructed from actual data at a basic level as the solution to an over-determined linear inverse problem. We finally demonstrate the potential utility of such information in video coding with a compression example when this light field side information is given as a simple texture map.
{"title":"Capturing light field textures for video coding","authors":"W. Mantzel, J. Romberg","doi":"10.1109/ICIP.2008.4712077","DOIUrl":"https://doi.org/10.1109/ICIP.2008.4712077","url":null,"abstract":"There is a significant amount of redundancy between video frames or images that can be explained by considering these observations as samples of a light field function. By using a compact depth- augmented representation for such a light field function, it may even be possible to tie-together inter-frame dependencies in a more meaningful way than conventional 2-D intensity based motion compensation methods. We propose a depth-augmented layered orthographic light field representation and show how it may be constructed from actual data at a basic level as the solution to an over-determined linear inverse problem. We finally demonstrate the potential utility of such information in video coding with a compression example when this light field side information is given as a simple texture map.","PeriodicalId":247944,"journal":{"name":"2008 15th IEEE International Conference on Image Processing","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127145741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-12-12DOI: 10.1109/ICIP.2008.4711924
Jimin Jia, Nenghai Yu, Xiaoguang Rui, Mingjing Li
In image annotation refinement, word correlations among candidate annotations are used to reserve high relevant words and remove irrelevant words. Existing methods build word correlations on textual annotations of images. In this paper, visual contents of images are utilized to explore better word correlations by using multi-graph similarity reinforcement method. Firstly, image visual similarity graph and word correlations graph are built respectively. Secondly, the two graphs are iteratively reinforced by each other through image-word transfer matrix. Once the two graphs converge to steady states, the new word correlations graph is used to refine the candidate annotations. The experiments show that our method performs better than method not considering visual content of images.
{"title":"Multi-graph similarity reinforcement for image annotation refinement","authors":"Jimin Jia, Nenghai Yu, Xiaoguang Rui, Mingjing Li","doi":"10.1109/ICIP.2008.4711924","DOIUrl":"https://doi.org/10.1109/ICIP.2008.4711924","url":null,"abstract":"In image annotation refinement, word correlations among candidate annotations are used to reserve high relevant words and remove irrelevant words. Existing methods build word correlations on textual annotations of images. In this paper, visual contents of images are utilized to explore better word correlations by using multi-graph similarity reinforcement method. Firstly, image visual similarity graph and word correlations graph are built respectively. Secondly, the two graphs are iteratively reinforced by each other through image-word transfer matrix. Once the two graphs converge to steady states, the new word correlations graph is used to refine the candidate annotations. The experiments show that our method performs better than method not considering visual content of images.","PeriodicalId":247944,"journal":{"name":"2008 15th IEEE International Conference on Image Processing","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127291209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-12-12DOI: 10.1109/ICIP.2008.4712250
Aditya Mavlankar, Jeonghun Noh, Pierpaolo Baccichet, B. Girod
Video streaming with virtual pan/tilt/zoom functionality allows the viewer to watch arbitrary regions of a high-spatial-resolution scene. In our proposed system, the user controls his region-of-interest (ROI) interactively during the streaming session. The relevant portion of the scene is rendered on his screen immediately. An additional thumbnail overview aids his navigation. We design a peer-to-peer (P2P) multicast live video streaming system to provide the control of interactive region-of-interest (IROI) to large populations of viewers while exploiting the overlap of ROIs for efficient and scalable delivery. Our P2P overlay is altered on-the-fly in a distributed manner with the changing ROIs of the peers. The main challenges for such a system are posed by the stringent latency constraint, the churn in the ROIs of peers and the limited bandwidth at the server hosting the IROI video session. Experimental results with a network simulator indicate that the delivered quality is close to that of an alternative traditional unicast client-server delivery mechanism yet requiring less uplink capacity at the server.
{"title":"Peer-to-peer multicast live video streaming with interactive virtual pan/tilt/zoom functionality","authors":"Aditya Mavlankar, Jeonghun Noh, Pierpaolo Baccichet, B. Girod","doi":"10.1109/ICIP.2008.4712250","DOIUrl":"https://doi.org/10.1109/ICIP.2008.4712250","url":null,"abstract":"Video streaming with virtual pan/tilt/zoom functionality allows the viewer to watch arbitrary regions of a high-spatial-resolution scene. In our proposed system, the user controls his region-of-interest (ROI) interactively during the streaming session. The relevant portion of the scene is rendered on his screen immediately. An additional thumbnail overview aids his navigation. We design a peer-to-peer (P2P) multicast live video streaming system to provide the control of interactive region-of-interest (IROI) to large populations of viewers while exploiting the overlap of ROIs for efficient and scalable delivery. Our P2P overlay is altered on-the-fly in a distributed manner with the changing ROIs of the peers. The main challenges for such a system are posed by the stringent latency constraint, the churn in the ROIs of peers and the limited bandwidth at the server hosting the IROI video session. Experimental results with a network simulator indicate that the delivered quality is close to that of an alternative traditional unicast client-server delivery mechanism yet requiring less uplink capacity at the server.","PeriodicalId":247944,"journal":{"name":"2008 15th IEEE International Conference on Image Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125136368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-12-12DOI: 10.1109/ICIP.2008.4711879
M. Ciuc, V. Vrabie, M. Herbin, C. Vertan, P. Vautrot
Rank-order based filters are usually implemented using reduced ordering, since there is no natural way to order vector data, such as color pixel values. This paper proposes a new statistics for multivariate data which is a mean rank obtained by aggregating partial ordering ranks. This statistics is then used for the reduced ordering of vector data; the median statistic is characterized by the best mean rank vector (BMRV). We devise two filtering structures based on the BMRV statistics: one that uses a classical square neighborhood, and one which is based on adaptive neighborhoods. We show that the proposed filters are highly effective for filtering color images heavily corrupted by impulsive noise, and compare favorably to state-of-the-art filtering structures.
{"title":"Adaptive-neighborhood best mean rank vector filter for impulsive noise removal","authors":"M. Ciuc, V. Vrabie, M. Herbin, C. Vertan, P. Vautrot","doi":"10.1109/ICIP.2008.4711879","DOIUrl":"https://doi.org/10.1109/ICIP.2008.4711879","url":null,"abstract":"Rank-order based filters are usually implemented using reduced ordering, since there is no natural way to order vector data, such as color pixel values. This paper proposes a new statistics for multivariate data which is a mean rank obtained by aggregating partial ordering ranks. This statistics is then used for the reduced ordering of vector data; the median statistic is characterized by the best mean rank vector (BMRV). We devise two filtering structures based on the BMRV statistics: one that uses a classical square neighborhood, and one which is based on adaptive neighborhoods. We show that the proposed filters are highly effective for filtering color images heavily corrupted by impulsive noise, and compare favorably to state-of-the-art filtering structures.","PeriodicalId":247944,"journal":{"name":"2008 15th IEEE International Conference on Image Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125151476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}