Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521704
T. Urruty, F. Belkouch, C. Djeraba
Motivated by the needs for efficient indexing structures adapted to real applications in video database, we present a new indexing structure named Kpyr. In Kpyr, we use a clustering algorithm to partition the data space into sub-spaces on which we apply Pyramid technique (S. Berchtold, et al., 1998). We thus reduce the search space concerned by a query and improve the performances. We show that our approach provides interesting and performing experimental results for both K-nearest neighbors and window queries
{"title":"KPYR: An Efficient Indexing Method","authors":"T. Urruty, F. Belkouch, C. Djeraba","doi":"10.1109/ICME.2005.1521704","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521704","url":null,"abstract":"Motivated by the needs for efficient indexing structures adapted to real applications in video database, we present a new indexing structure named Kpyr. In Kpyr, we use a clustering algorithm to partition the data space into sub-spaces on which we apply Pyramid technique (S. Berchtold, et al., 1998). We thus reduce the search space concerned by a query and improve the performances. We show that our approach provides interesting and performing experimental results for both K-nearest neighbors and window queries","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122836364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521404
Rong Zhang, R. Yu, Qibin Sun, L. Wong
Compression ratio and computational complexity are two major factors for a successful image coder. By exploring the Laplacian distribution of the wavelet coefficients, a new bit plane entropy coder is proposed in this paper. Compared with the state-of-the-art JPEG2000 entropy coder (EBCOT), the proposed coder achieves a 0.75% better loss less performance for 5 level 5/3 wavelet decomposition at block size 64 £ 64 and 2.56% at block size 16 £ 16. Experimental results also show PSNR improvements of about 0.13dB at 1bpp and 0.25dB at 2bpp on average for lossy compression. However, the gain in coding performance is not based on increasing computational complexity but in stead a reduction by using a static arithmetic coder which avoids complicated adaptive procedure.
{"title":"A New Bit-Plane Entropy Coder for Scalable Image Coding","authors":"Rong Zhang, R. Yu, Qibin Sun, L. Wong","doi":"10.1109/ICME.2005.1521404","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521404","url":null,"abstract":"Compression ratio and computational complexity are two major factors for a successful image coder. By exploring the Laplacian distribution of the wavelet coefficients, a new bit plane entropy coder is proposed in this paper. Compared with the state-of-the-art JPEG2000 entropy coder (EBCOT), the proposed coder achieves a 0.75% better loss less performance for 5 level 5/3 wavelet decomposition at block size 64 £ 64 and 2.56% at block size 16 £ 16. Experimental results also show PSNR improvements of about 0.13dB at 1bpp and 0.25dB at 2bpp on average for lossy compression. However, the gain in coding performance is not based on increasing computational complexity but in stead a reduction by using a static arithmetic coder which avoids complicated adaptive procedure.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"366 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122919967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521541
J. Nesvadba, Fabian Ernst, Jernej Perhavc, J. Benois-Pineau, L. Primaux
A video cut detector (CD), a member of the shot boundary detector (SBD) group, is an essential element for spatio-temporal audiovisual (AV) segmentation and various video-processing technologies. Platform, processing and performance constraints forced the development of various dedicated CDs. Future platforms allow the usage of advanced CD algorithms with higher reliability. In order to enable an appropriate trade-off decision to be made between reliability and the required processing power, benchmarking of four CD algorithms has taken place on bases of a generic, culture-diverse multi-genre AV corpus. In terms of complexity/performance trade-off, a field-difference-based CD proved to be optimal.
{"title":"Comparison of shot boundary detectors","authors":"J. Nesvadba, Fabian Ernst, Jernej Perhavc, J. Benois-Pineau, L. Primaux","doi":"10.1109/ICME.2005.1521541","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521541","url":null,"abstract":"A video cut detector (CD), a member of the shot boundary detector (SBD) group, is an essential element for spatio-temporal audiovisual (AV) segmentation and various video-processing technologies. Platform, processing and performance constraints forced the development of various dedicated CDs. Future platforms allow the usage of advanced CD algorithms with higher reliability. In order to enable an appropriate trade-off decision to be made between reliability and the required processing power, benchmarking of four CD algorithms has taken place on bases of a generic, culture-diverse multi-genre AV corpus. In terms of complexity/performance trade-off, a field-difference-based CD proved to be optimal.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114137156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521546
Yu-Chun Peng, Meng-Ting Lu, Homer H. Chen
A digital image stabilization system compensates the image movement caused by hand jiggle for the image sequence captured by a hand-held video camera. In this paper, a simplified stabilization algorithm based on our previous work is presented. The algorithm performs block-based motion estimation on 16 local 16/spl times/16 blocks and uses a median filter to estimate the global motion. It reduces the complexity by confining the motion estimation to a small number of blocks of the image. This greatly facilitates the implementation of the algorithm on BF561, a DSP processor of analog device. Details of the DSP implementation are described.
{"title":"DSP implementation of digital image stabilizer","authors":"Yu-Chun Peng, Meng-Ting Lu, Homer H. Chen","doi":"10.1109/ICME.2005.1521546","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521546","url":null,"abstract":"A digital image stabilization system compensates the image movement caused by hand jiggle for the image sequence captured by a hand-held video camera. In this paper, a simplified stabilization algorithm based on our previous work is presented. The algorithm performs block-based motion estimation on 16 local 16/spl times/16 blocks and uses a median filter to estimate the global motion. It reduces the complexity by confining the motion estimation to a small number of blocks of the image. This greatly facilitates the implementation of the algorithm on BF561, a DSP processor of analog device. Details of the DSP implementation are described.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114159466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521643
Yinpeng Chen, H. Sundaram
In this paper, we present an efficient 3D shape rejection algorithm for unlabeled 3D markers. The problem is important in domains such as rehabilitation and the performing arts. There are three key innovations in our approach-(a) a multi-resolution shape representation using Haar wavelets for unlabeled markers, (b) a multi-resolution shape metric and (c) a shape rejection algorithm that is predicated on the simple idea that we do not need to compute the entire distance to conclude that two shapes are dissimilar. We tested the approach on a real-world pose classification problem with excellent results. We achieved a classification accuracy of 98% with an order of magnitude improvement in terms of computational complexity over a baseline shape matching algorithm.
{"title":"A computationally efficient 3D shape rejection algorithm","authors":"Yinpeng Chen, H. Sundaram","doi":"10.1109/ICME.2005.1521643","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521643","url":null,"abstract":"In this paper, we present an efficient 3D shape rejection algorithm for unlabeled 3D markers. The problem is important in domains such as rehabilitation and the performing arts. There are three key innovations in our approach-(a) a multi-resolution shape representation using Haar wavelets for unlabeled markers, (b) a multi-resolution shape metric and (c) a shape rejection algorithm that is predicated on the simple idea that we do not need to compute the entire distance to conclude that two shapes are dissimilar. We tested the approach on a real-world pose classification problem with excellent results. We achieved a classification accuracy of 98% with an order of magnitude improvement in terms of computational complexity over a baseline shape matching algorithm.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121900239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521373
S. Boughorbel, Jean-Philippe Tarel, N. Boujemaa
Kernel based methods such as support vector machine (SVM) has provided successful tools for solving many recognition problems. One of the reasons of this success is the use of kernels. Positive definiteness has to be checked for kernels to be suitable for most of these methods. For instance for SVM, the use of a positive definite kernel insures that the optimized problem is convex and thus the obtained solution is unique. Alternative class of kernels called conditionally positive definite have been studied for a long time from the theoretical point of view and have drawn attention from the community only in the last decade. We propose a new kernel, named log kernel, which seems particularly interesting for images. Moreover, we prove that this new kernel is a conditionally positive definite kernel as well as the power kernel. Finally, we show from experimentations that using conditionally positive definite kernels allows us to outperform classical positive definite kernels
{"title":"Conditionally Positive Definite Kernels for SVM Based Image Recognition","authors":"S. Boughorbel, Jean-Philippe Tarel, N. Boujemaa","doi":"10.1109/ICME.2005.1521373","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521373","url":null,"abstract":"Kernel based methods such as support vector machine (SVM) has provided successful tools for solving many recognition problems. One of the reasons of this success is the use of kernels. Positive definiteness has to be checked for kernels to be suitable for most of these methods. For instance for SVM, the use of a positive definite kernel insures that the optimized problem is convex and thus the obtained solution is unique. Alternative class of kernels called conditionally positive definite have been studied for a long time from the theoretical point of view and have drawn attention from the community only in the last decade. We propose a new kernel, named log kernel, which seems particularly interesting for images. Moreover, we prove that this new kernel is a conditionally positive definite kernel as well as the power kernel. Finally, we show from experimentations that using conditionally positive definite kernels allows us to outperform classical positive definite kernels","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128964401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521467
Qiong Liu, Xiaojin Shi, Don Kimber, F. Zhao, Frank Raab
This paper presents an information-driven online video composition system. The composition work handled by the system includes dynamically setting multiple pan/tilt/zoom (PTZ) cameras to proper poses and selecting the best close-up view for passive viewers. The main idea of the composition system is to maximize captured video information with limited cameras. Unlike video composition based on heuristic rules, our video composition is formulated as a process of minimizing distortions between ideal signals (i.e. signals with infinite spatial-temporal resolution) and displayed signals. The formulation is consistent with many well-known empirical approaches widely used in previous systems and may provide analytical explanations to those approaches. Moreover, it provides a novel approach for studying video composition tasks systematically. The composition system allows each user to select a personal close-up view. It manages PTZ cameras and a video switcher based on both signal characteristics and users' view selections. Additionally, it can automate the video composition process based on past users' view-selections when immediate selections are not available. We demonstrate the performance of this system with real meetings
{"title":"An Online Video Composition System","authors":"Qiong Liu, Xiaojin Shi, Don Kimber, F. Zhao, Frank Raab","doi":"10.1109/ICME.2005.1521467","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521467","url":null,"abstract":"This paper presents an information-driven online video composition system. The composition work handled by the system includes dynamically setting multiple pan/tilt/zoom (PTZ) cameras to proper poses and selecting the best close-up view for passive viewers. The main idea of the composition system is to maximize captured video information with limited cameras. Unlike video composition based on heuristic rules, our video composition is formulated as a process of minimizing distortions between ideal signals (i.e. signals with infinite spatial-temporal resolution) and displayed signals. The formulation is consistent with many well-known empirical approaches widely used in previous systems and may provide analytical explanations to those approaches. Moreover, it provides a novel approach for studying video composition tasks systematically. The composition system allows each user to select a personal close-up view. It manages PTZ cameras and a video switcher based on both signal characteristics and users' view selections. Additionally, it can automate the video composition process based on past users' view-selections when immediate selections are not available. We demonstrate the performance of this system with real meetings","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129039557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521448
C. A. Rahman, Wael Badawy
This paper presents a novel quarter pel full search block motion estimation architecture for H.264/AVC encoder. The proposed architecture is capable of calculating all 41 motion vectors required by the various size blocks, supported by H.264/AVC, in parallel. The architecture has been prototyped in Verilog HDL, simulated and synthesized for Xilinx Virtex2 FPGA. The experimental result shows that the architecture is capable of processing CIF frame sequences in real time considering 5 reference frames within the search range of -3.75 to +4.00 at a clock speed of 120 MHz. The maximum speed of the architecture is around 150 MHz.
{"title":"A quarter pel full search block motion estimation architecture for H.264/AVC","authors":"C. A. Rahman, Wael Badawy","doi":"10.1109/ICME.2005.1521448","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521448","url":null,"abstract":"This paper presents a novel quarter pel full search block motion estimation architecture for H.264/AVC encoder. The proposed architecture is capable of calculating all 41 motion vectors required by the various size blocks, supported by H.264/AVC, in parallel. The architecture has been prototyped in Verilog HDL, simulated and synthesized for Xilinx Virtex2 FPGA. The experimental result shows that the architecture is capable of processing CIF frame sequences in real time considering 5 reference frames within the search range of -3.75 to +4.00 at a clock speed of 120 MHz. The maximum speed of the architecture is around 150 MHz.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129324514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521397
Jianfeng Chen, L. Shue, Hanwu Sun, K. Phua
In this paper, a microphone array with 3-D focal zone is proposed. The microphone array consists of one omni-directional and two uni-directional microphones. The microphone array is so constructed that a cross zone is formed such that only the sound within this zone is captured and any interferences outside the zone are effectively cancelled. The proposed framework is flexible in defining the location/size of the closed volume where the sound source of interest is located. Simulations have been carried out to demonstrate the 3-D spatial selectivity as well as the noise cancellation performance. The most important feature which differs from the previous works is that the super volumetric selectivity is realized by strategically use only three microphones, by which the overall apparatus acts as a virtual wireless close-talking microphone with confined position constrained in both distance and directions.
{"title":"An adaptive microphone array with local acoustic sensitivity","authors":"Jianfeng Chen, L. Shue, Hanwu Sun, K. Phua","doi":"10.1109/ICME.2005.1521397","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521397","url":null,"abstract":"In this paper, a microphone array with 3-D focal zone is proposed. The microphone array consists of one omni-directional and two uni-directional microphones. The microphone array is so constructed that a cross zone is formed such that only the sound within this zone is captured and any interferences outside the zone are effectively cancelled. The proposed framework is flexible in defining the location/size of the closed volume where the sound source of interest is located. Simulations have been carried out to demonstrate the 3-D spatial selectivity as well as the noise cancellation performance. The most important feature which differs from the previous works is that the super volumetric selectivity is realized by strategically use only three microphones, by which the overall apparatus acts as a virtual wireless close-talking microphone with confined position constrained in both distance and directions.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129469154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521369
Berlin Chen
This paper considers dynamic language model adaptation for Mandarin broadcast news recognition. Both contemporary newswire texts and in-domain automatic transcripts were exploited in language model adaptation. A topical mixture model was presented to dynamically explore the long-span latent topical information for language model adaptation. The underlying characteristics and different kinds of model structures were extensively investigated, while their performance was analyzed and verified by comparison with the conventional MAP-based adaptation approaches, which are devoted to extracting the short-span n-gram information. The fusion of global topical and local contextual information was investigated as well. The speech recognition experiments were conducted on the broadcast news collected in Taiwan. Very promising results in perplexity as well as character error rate reductions were initially obtained.
{"title":"Dynamic language model adaptation using latent topical information and automatic transcripts","authors":"Berlin Chen","doi":"10.1109/ICME.2005.1521369","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521369","url":null,"abstract":"This paper considers dynamic language model adaptation for Mandarin broadcast news recognition. Both contemporary newswire texts and in-domain automatic transcripts were exploited in language model adaptation. A topical mixture model was presented to dynamically explore the long-span latent topical information for language model adaptation. The underlying characteristics and different kinds of model structures were extensively investigated, while their performance was analyzed and verified by comparison with the conventional MAP-based adaptation approaches, which are devoted to extracting the short-span n-gram information. The fusion of global topical and local contextual information was investigated as well. The speech recognition experiments were conducted on the broadcast news collected in Taiwan. Very promising results in perplexity as well as character error rate reductions were initially obtained.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"178 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129582814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}