Pub Date : 2005-07-06DOI: 10.1109/icme.2005.1521458
Tuan Q. Pham, L. Vliet
Bilateral filtering is an edge-preserving filtering technique that employs both geometric closeness and photometric similarity of neighboring pixels to construct its filter kernel. Multi-dimensional bilateral filtering is computationally expensive because the adaptive kernel has to be recomputed at every pixel. In this paper, we present a separable implementation of the bilateral filter. The separable implementation offers equivalent adaptive filtering capability at a fraction of execution time compared to the traditional filter. Because of this efficiency, the separable bilateral filter can be used for fast preprocessing of images and videos. Experiments show that better image quality and higher compression efficiency is achievable if the original video is preprocessed with the separable bilateral filter.
{"title":"Separable bilateral filtering for fast video preprocessing","authors":"Tuan Q. Pham, L. Vliet","doi":"10.1109/icme.2005.1521458","DOIUrl":"https://doi.org/10.1109/icme.2005.1521458","url":null,"abstract":"Bilateral filtering is an edge-preserving filtering technique that employs both geometric closeness and photometric similarity of neighboring pixels to construct its filter kernel. Multi-dimensional bilateral filtering is computationally expensive because the adaptive kernel has to be recomputed at every pixel. In this paper, we present a separable implementation of the bilateral filter. The separable implementation offers equivalent adaptive filtering capability at a fraction of execution time compared to the traditional filter. Because of this efficiency, the separable bilateral filter can be used for fast preprocessing of images and videos. Experiments show that better image quality and higher compression efficiency is achievable if the original video is preprocessed with the separable bilateral filter.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114654778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521383
Eva Ceccarelli, A. Bimbo, P. Pala
Along with image and video libraries, archives of 3D models have recently gained increasing attention. Accordingly, there is an increasing demand for solutions enabling retrieval of 3D models based on global properties as well as properties of object parts. In particular, retrieval based on object parts relies on segmentation of 3D objects into their constituent parts. This is a challenging task, as the identification of object parts should conform to human perceptual judgement. Therefore, definition of models and solutions that enable decomposition of 3D objects into perceptually relevant parts is a fundamental step to enable effective retrieval based on object parts. However, a few approaches have been proposed to support segmentation of 3D meshes into perceptually relevant parts. In this paper, we propose a model based on pulse-coupled oscillator networks. Preliminary experiments are reported to demonstrate the validity and potential of the proposed solution
{"title":"Segmentation of 3D Objects Using Pulse-Coupled Oscillator Networks","authors":"Eva Ceccarelli, A. Bimbo, P. Pala","doi":"10.1109/ICME.2005.1521383","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521383","url":null,"abstract":"Along with image and video libraries, archives of 3D models have recently gained increasing attention. Accordingly, there is an increasing demand for solutions enabling retrieval of 3D models based on global properties as well as properties of object parts. In particular, retrieval based on object parts relies on segmentation of 3D objects into their constituent parts. This is a challenging task, as the identification of object parts should conform to human perceptual judgement. Therefore, definition of models and solutions that enable decomposition of 3D objects into perceptually relevant parts is a fundamental step to enable effective retrieval based on object parts. However, a few approaches have been proposed to support segmentation of 3D meshes into perceptually relevant parts. In this paper, we propose a model based on pulse-coupled oscillator networks. Preliminary experiments are reported to demonstrate the validity and potential of the proposed solution","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116034783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521600
Saad Ali, M. Shah
In this paper, we present a novel framework for generic object class detection by integrating Kernel PCA with AdaBoost. The classifier obtained in this way is invariant to changes in appearance, illumination conditions and surrounding clutter. A nonlinear shape subspace is learned for positive and negative object classes using kernel PCA. Features are derived by projecting example images onto the learned sub-spaces. Base learners are modeled using Bayes classifier. AdaBoost is then employed to discover the features that are most relevant for the object detection task at hand. Proposed method has been successfully tested on wide range of object classes (cars, airplanes, pedestrians, motorcycles etc) using standard data sets and has shown good performance. Using a small training set, the classifier learned in this way was able to generalize the intra-class variation while still maintaining high detection rate. In most object categories, we achieved detection rates of above 95% with minimal false alarm rates. We demonstrate the comparative performance of our method against current state of the art approaches.
{"title":"An integrated approach for generic object detection using kernel PCA and boosting","authors":"Saad Ali, M. Shah","doi":"10.1109/ICME.2005.1521600","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521600","url":null,"abstract":"In this paper, we present a novel framework for generic object class detection by integrating Kernel PCA with AdaBoost. The classifier obtained in this way is invariant to changes in appearance, illumination conditions and surrounding clutter. A nonlinear shape subspace is learned for positive and negative object classes using kernel PCA. Features are derived by projecting example images onto the learned sub-spaces. Base learners are modeled using Bayes classifier. AdaBoost is then employed to discover the features that are most relevant for the object detection task at hand. Proposed method has been successfully tested on wide range of object classes (cars, airplanes, pedestrians, motorcycles etc) using standard data sets and has shown good performance. Using a small training set, the classifier learned in this way was able to generalize the intra-class variation while still maintaining high detection rate. In most object categories, we achieved detection rates of above 95% with minimal false alarm rates. We demonstrate the comparative performance of our method against current state of the art approaches.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123577461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521596
H. Sridharan, Ankur Mani, H. Sundaram, J. Brungart, David Birchfield
In this paper, we develop a novel real-time, interactive, automatic multimodal exploratory environment that dynamically adapts the media presented, to user context. There are two key contributions of this paper-(a) development of multimodal user-context model and (b) modeling the dynamics of the presentation to maximize coherence. We develop a novel user-context model comprising interests, media history, interaction behavior and tasks, that evolves based on the specific interaction. We also develop novel metrics between media elements and the user context. The presentation environment dynamically adapts to the current user context. We develop an optimal media selection and display framework that maximizes coherence, while constrained by the user-context, user goals and the structure of the knowledge in the exploratory environment. The experimental results indicate that the system performs well. The results also show that user-context models significantly improve presentation coherence
{"title":"Context-Aware Dynamic Presentation Synthesis for Exploratory Multimodal Environments","authors":"H. Sridharan, Ankur Mani, H. Sundaram, J. Brungart, David Birchfield","doi":"10.1109/ICME.2005.1521596","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521596","url":null,"abstract":"In this paper, we develop a novel real-time, interactive, automatic multimodal exploratory environment that dynamically adapts the media presented, to user context. There are two key contributions of this paper-(a) development of multimodal user-context model and (b) modeling the dynamics of the presentation to maximize coherence. We develop a novel user-context model comprising interests, media history, interaction behavior and tasks, that evolves based on the specific interaction. We also develop novel metrics between media elements and the user context. The presentation environment dynamically adapts to the current user context. We develop an optimal media selection and display framework that maximizes coherence, while constrained by the user-context, user goals and the structure of the knowledge in the exploratory environment. The experimental results indicate that the system performs well. The results also show that user-context models significantly improve presentation coherence","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121896553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521489
E. Bertino, E. Ferrari, A. Perego, Diego Santi
Synchronized multimedia applications play an important role in a digital library environment, since they allow one to efficiently disseminate knowledge among differently skilled users through an approach, which is more direct than the classic 'static' documents. In this paper, we propose a new authoring approach based on an innovative presentation structure and a new class of content-based constraints. Thanks to a flexible heuristic process, such features allow the author to easily combine several multimedia objects into a multi-topic presentation, whose different contents can be freely chosen by end users according to their preferences or skills
{"title":"A Constraint-Based Approach for the Authoring of Multi-Topic Multimedia Presentations","authors":"E. Bertino, E. Ferrari, A. Perego, Diego Santi","doi":"10.1109/ICME.2005.1521489","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521489","url":null,"abstract":"Synchronized multimedia applications play an important role in a digital library environment, since they allow one to efficiently disseminate knowledge among differently skilled users through an approach, which is more direct than the classic 'static' documents. In this paper, we propose a new authoring approach based on an innovative presentation structure and a new class of content-based constraints. Thanks to a flexible heuristic process, such features allow the author to easily combine several multimedia objects into a multi-topic presentation, whose different contents can be freely chosen by end users according to their preferences or skills","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121957669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521702
Z. Pan, K. Kotani, T. Ohmi
The encoding process of image vector quantization (VQ) is very heavy due to it performing a lot of k-dimensional Euclidean distance computations. In order to speed up VQ encoding, it is most important to avoid unnecessary exact Euclidean distance computations as many as possible by using features of a vector to estimate how large it is first so as to reject most of unlikely codewords. The mean, the variance, L 2 norm and partial sum of a vector have been proposed as effective features in previous works for fast VQ encoding. Recently, in the previous work (Z. Lu et al., 2003), three features of the mean, the variance and L2 norm are used together to derive an EEENNS search method, which is very search efficient but still has obvious computational redundancy. This paper aims at modifying the results of EEENNS method further by introducing another feature of partial sum to replace L2 norm feature so as to reduce more search space. Mathematical analysis and experimental results confirmed that the proposed method is more search efficient compared to (Z. Lu et al., 2003)
由于图像矢量量化(VQ)需要进行大量的k维欧氏距离计算,编码过程非常繁重。为了加快VQ编码的速度,最重要的是尽可能多地避免不必要的精确欧氏距离计算,首先利用向量的特征来估计它的大小,从而拒绝大多数不可能的码字。均值、方差、l2范数和部分和是矢量快速编码的有效特征。最近,在之前的工作(Z. Lu et al., 2003)中,将均值、方差和L2范数三个特征结合在一起,推导出了一种EEENNS搜索方法,该方法搜索效率很高,但仍然存在明显的计算冗余。本文旨在进一步修改EEENNS方法的结果,通过引入部分和的另一个特征来代替L2范数特征,从而减少更多的搜索空间。数学分析和实验结果证实,该方法的搜索效率高于(Z. Lu et al., 2003)。
{"title":"Fast Search Method for Image Vector Quantization Based on Equal-Average Equal-Variance and Partial Sum Concept","authors":"Z. Pan, K. Kotani, T. Ohmi","doi":"10.1109/ICME.2005.1521702","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521702","url":null,"abstract":"The encoding process of image vector quantization (VQ) is very heavy due to it performing a lot of k-dimensional Euclidean distance computations. In order to speed up VQ encoding, it is most important to avoid unnecessary exact Euclidean distance computations as many as possible by using features of a vector to estimate how large it is first so as to reject most of unlikely codewords. The mean, the variance, L 2 norm and partial sum of a vector have been proposed as effective features in previous works for fast VQ encoding. Recently, in the previous work (Z. Lu et al., 2003), three features of the mean, the variance and L2 norm are used together to derive an EEENNS search method, which is very search efficient but still has obvious computational redundancy. This paper aims at modifying the results of EEENNS method further by introducing another feature of partial sum to replace L2 norm feature so as to reduce more search space. Mathematical analysis and experimental results confirmed that the proposed method is more search efficient compared to (Z. Lu et al., 2003)","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117089064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521452
A. Oikonomopoulos, I. Patras, M. Pantic
This paper addresses the problem of human action recognition by introducing a sparse representation of image sequences as a collection of spatiotemporal events that are localized at points that are salient both in space and time. We detect the spatiotemporal salient points by measuring changes in the information content of pixel neighborhoods not only in space but also in time. We introduce an appropriate distance metric between two collections of spatiotemporal salient points that is based on the Chamfer distance and an iterative linear time warping technique that deals with time expansion or time compression issues. We propose a classification scheme that is based on relevance vector machines and on the proposed distance measure. We present results on real image sequences from a small database depicting people performing 19 aerobic exercises.
{"title":"Spatiotemporal saliency for human action recognition","authors":"A. Oikonomopoulos, I. Patras, M. Pantic","doi":"10.1109/ICME.2005.1521452","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521452","url":null,"abstract":"This paper addresses the problem of human action recognition by introducing a sparse representation of image sequences as a collection of spatiotemporal events that are localized at points that are salient both in space and time. We detect the spatiotemporal salient points by measuring changes in the information content of pixel neighborhoods not only in space but also in time. We introduce an appropriate distance metric between two collections of spatiotemporal salient points that is based on the Chamfer distance and an iterative linear time warping technique that deals with time expansion or time compression issues. We propose a classification scheme that is based on relevance vector machines and on the proposed distance measure. We present results on real image sequences from a small database depicting people performing 19 aerobic exercises.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124025872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521390
Yongdong Wu
A complete MPEG-4 stream consists of many elementary streams, which may be generated by different authors. In the scenario of this paper, each author signs his own authentic elementary stream independently, and then an untrusted distributor aggregates these signatures into only one. Based on the unique signature, a client is able to verify the received MPEG-4 stream with the certificates of all the authors other than the certificate of the distributor. In addition, each author cannot deny what he has signed even if he is willing to admit a signature on another ES. This aggregated signature scheme is efficient in terms of transmission overhead and verification time since only one signature is processed in the client side.
{"title":"Aggregating signatures of MPEG-4 elementary streams","authors":"Yongdong Wu","doi":"10.1109/ICME.2005.1521390","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521390","url":null,"abstract":"A complete MPEG-4 stream consists of many elementary streams, which may be generated by different authors. In the scenario of this paper, each author signs his own authentic elementary stream independently, and then an untrusted distributor aggregates these signatures into only one. Based on the unique signature, a client is able to verify the received MPEG-4 stream with the certificates of all the authors other than the certificate of the distributor. In addition, each author cannot deny what he has signed even if he is willing to admit a signature on another ES. This aggregated signature scheme is efficient in terms of transmission overhead and verification time since only one signature is processed in the client side.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124760486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521497
Shuiming Ye, E. Chang, Qibin Sun
In a typical content and watermarking based image authentication approach, a feature is extracted from the given image, and then embedded back into the image using a watermarking method. Since the entropy of the feature might be higher than the capacity of the watermarking scheme, or the feature is represented in a continuous domain, it has to be further quantized before embedding. The lost of information during quantization potentially degrades the overall performance of the authentication scheme. This paper propose a simple but effective approach that avoids the feature quantization by additive feature: the feature is firstly added into the image before watermark embedding, and latterly subtracted from the watermarked image. In our experiments, the proposed approach obtains larger achievable robustness/sensitivity region and has a smaller fuzzy region of authenticity than the typical approach
{"title":"Watermarking based Image Authentication using Feature Amplification","authors":"Shuiming Ye, E. Chang, Qibin Sun","doi":"10.1109/ICME.2005.1521497","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521497","url":null,"abstract":"In a typical content and watermarking based image authentication approach, a feature is extracted from the given image, and then embedded back into the image using a watermarking method. Since the entropy of the feature might be higher than the capacity of the watermarking scheme, or the feature is represented in a continuous domain, it has to be further quantized before embedding. The lost of information during quantization potentially degrades the overall performance of the authentication scheme. This paper propose a simple but effective approach that avoids the feature quantization by additive feature: the feature is firstly added into the image before watermark embedding, and latterly subtracted from the watermarked image. In our experiments, the proposed approach obtains larger achievable robustness/sensitivity region and has a smaller fuzzy region of authenticity than the typical approach","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128702665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521738
Jeroen Morang, R. Ordelman, F. D. Jong, A. V. Hessen
In this paper, a cross-media browsing demonstrator named InfoLink is described. InfoLink automatically links the content of Dutch broadcast news videos to related information sources in parallel collections containing text and/or video. Automatic segmentation, speech recognition and available meta-data are used to index and link items. The concept is visualized using SMIL-scripts for presenting the streaming broadcast news video and the information links
{"title":"Infolink: Analysis of Dutch Broadcast News and Cross-Media Browsing","authors":"Jeroen Morang, R. Ordelman, F. D. Jong, A. V. Hessen","doi":"10.1109/ICME.2005.1521738","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521738","url":null,"abstract":"In this paper, a cross-media browsing demonstrator named InfoLink is described. InfoLink automatically links the content of Dutch broadcast news videos to related information sources in parallel collections containing text and/or video. Automatic segmentation, speech recognition and available meta-data are used to index and link items. The concept is visualized using SMIL-scripts for presenting the streaming broadcast news video and the information links","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127034801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}