This paper presents the design, implementation, and evaluation of a practical voltage scaling (PDVS) algorithm for mobile devices primarily running multimedia applications. PDVS seeks to minimize the total energy of the whole device while meeting multimedia timing requirements. To do this, PDVS extends traditional real-time scheduling by deciding what execution speed in addition to when to execute what applications. PDVS makes these decisions based on the discrete speed levels of the CPU, the total power of the device at different speeds, and the probability distribution of CPU demand of multimedia applications. We have implemented PDVS in the Linux kernel and evaluated it on an HP laptop. Our experimental results show that PDVS saves energy substantially without affecting multimedia performance. It saves energy by 14.4% to 37.2% compared to scheduling algorithms without voltage scaling and by up to 10.4% compared to previous voltage scaling algorithms that assume an ideal CPU with continuous speeds and cubic power-speed relationship.
{"title":"Practical voltage scaling for mobile multimedia devices","authors":"Wanghong Yuan, K. Nahrstedt","doi":"10.1145/1027527.1027737","DOIUrl":"https://doi.org/10.1145/1027527.1027737","url":null,"abstract":"This paper presents the design, implementation, and evaluation of a <i>practical</i> voltage scaling (PDVS) algorithm for mobile devices primarily running multimedia applications. PDVS seeks to minimize the total energy of the whole device while meeting multimedia timing requirements. To do this, PDVS extends traditional real-time scheduling by deciding <i>what execution speed</i> in addition to when to execute what applications. PDVS makes these decisions based on the discrete speed levels of the CPU, the total power of the device at different speeds, and the probability distribution of CPU demand of multimedia applications. We have implemented PDVS in the Linux kernel and evaluated it on an HP laptop. Our experimental results show that PDVS saves energy substantially without affecting multimedia performance. It saves energy by 14.4% to 37.2% compared to scheduling algorithms without voltage scaling and by up to 10.4% compared to previous voltage scaling algorithms that assume an ideal CPU with continuous speeds and cubic power-speed relationship.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115128375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motion information is a powerful cue for visual perception. In the context of video indexing and retrieval, motion content serves as a useful source for compact video representation. There has been a lot of literature about parametric motion models. However, it is hard to secure a proper parametric assumption in a wide range of video scenarios. Diverse camera shots and frequent occurrences of bad optical flow estimation motivate us to develop nonparametric motion models. In this paper, we employ the mean shift procedure to propose a novel nonparametric motion representation. With this compact representation, various motion characterization tasks can be achieved by machine learning. Such a learning mechanism can not only capture the domain-independent parametric constraints, but also acquire the domain-dependent knowledge to tolerate the influence of bad dense optical flow vectors or block-based MPEG motion vector fields (MVF). The proposed nonparametric motion model has been applied to camera motion pattern classification on 23191 MVF extracted from MPEG-7 dataset.
{"title":"Nonparametric motion model with applications to camera motion pattern classification","authors":"Ling-yu Duan, Mingliang Xu, Q. Tian, Changsheng Xu","doi":"10.1145/1027527.1027603","DOIUrl":"https://doi.org/10.1145/1027527.1027603","url":null,"abstract":"Motion information is a powerful cue for visual perception. In the context of video indexing and retrieval, motion content serves as a useful source for compact video representation. There has been a lot of literature about parametric motion models. However, it is hard to secure a proper parametric assumption in a wide range of video scenarios. Diverse camera shots and frequent occurrences of bad optical flow estimation motivate us to develop nonparametric motion models. In this paper, we employ the mean shift procedure to propose a novel nonparametric motion representation. With this compact representation, various motion characterization tasks can be achieved by machine learning. Such a learning mechanism can not only capture the domain-independent parametric constraints, but also acquire the domain-dependent knowledge to tolerate the influence of bad dense optical flow vectors or block-based MPEG motion vector fields (MVF). The proposed nonparametric motion model has been applied to camera motion pattern classification on 23191 MVF extracted from MPEG-7 dataset.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116795280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Subspace learning techniques are widespread in pattern recognition research. They include Principal Component Analysis (PCA), Locality Preserving Projection (LPP), etc. These techniques are generally unsupervised which allows them to model data in the absence of labels or categories. In relevance feedback driven image retrieval system, the user provided information can be used to better describe the intrinsic semantic relationships between images. In this paper, we propose a semi-supervised subspace learning algorithm which incrementally learns an adaptive subspace by preserving the semantic structure of the image space, based on user interactions in a relevance feedback driven query-by-example system. Our algorithm is capable of accumulating knowledge from users, which could result in new feature representations for images in the database so that the system's future retrieval performance can be enhanced. Experiments on a large collection of images have shown the effectiveness and efficiency of our proposed algorithm.
{"title":"Incremental semi-supervised subspace learning for image retrieval","authors":"Xiaofei He","doi":"10.1145/1027527.1027530","DOIUrl":"https://doi.org/10.1145/1027527.1027530","url":null,"abstract":"Subspace learning techniques are widespread in pattern recognition research. They include Principal Component Analysis (PCA), Locality Preserving Projection (LPP), etc. These techniques are generally unsupervised which allows them to model data in the absence of labels or categories. In relevance feedback driven image retrieval system, the user provided information can be used to better describe the intrinsic semantic relationships between images. In this paper, we propose a semi-supervised subspace learning algorithm which incrementally learns an adaptive subspace by preserving the semantic structure of the image space, based on user interactions in a relevance feedback driven query-by-example system. Our algorithm is capable of accumulating knowledge from users, which could result in new feature representations for images in the database so that the system's future retrieval performance can be enhanced. Experiments on a large collection of images have shown the effectiveness and efficiency of our proposed algorithm.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125913857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we describe music monitoring in TV broadcasting based on content-based retrieval. A part of audio signals is sequentially extracted from TV broadcasting as a retrieval key, and a music DB that stores a great number of musical pieces is retrieved by this key based on content-based retrieval, and a musical piece is identified sequentially. In this way, we are able to carry out music monitoring. There are three necessary requirements important for realization of the music monitoring. They are robustness against non-stationary noise, real-time processing of large-scale music DB retrieval, and high granularity of the retrieval key. As a method of realizing robustness against non-stationary noise, we propose a partially similar retrieval method which improves retrieval accuracy by using the moment in which no superfluous noise is produced during the existence of non-stationary noise. In order to realize real-time processing of large-scale music DB retrieval, we adopt a coarse-to-fine strategy, and propose a spectral peaks hashing method which performs high-speed refining by using hashing. To calculate a hash value in this hashing, frequency channel numbers of the spectral peaks are used. In order to realize high granularity of the retrieval key, it is necessary to solve the problem of retrieval accuracy degradation associated with heightening the granularity. To improve this accuracy, we propose a detection-by-continuity method which uses music continuity. Moreover, by using music continuity to correct the starting point and the terminal point of a musical piece in TV broadcasting, the retrieval accuracy is improved further. In order to evaluate the effectiveness of the proposed methods, we performed experiments using a music DB which stores over 28,000 musical pieces (over 1800 hours) and TV broadcasting audio signals containing music and background music (BGM). The granularity of the retrieval key was set at about 0.5 seconds. Through these experiments, We verified that music monitoring was possible for over 90% of the total time of music and BGM used in TV broadcasting, and that real-time processing was possible.
{"title":"Real-time background music monitoring based on content-based retrieval","authors":"Yoshiharu Suga, N. Kosugi, M. Morimoto","doi":"10.1145/1027527.1027550","DOIUrl":"https://doi.org/10.1145/1027527.1027550","url":null,"abstract":"In this paper, we describe music monitoring in TV broadcasting based on content-based retrieval. A part of audio signals is sequentially extracted from TV broadcasting as a retrieval key, and a music DB that stores a great number of musical pieces is retrieved by this key based on content-based retrieval, and a musical piece is identified sequentially. In this way, we are able to carry out music monitoring. There are three necessary requirements important for realization of the music monitoring. They are robustness against non-stationary noise, real-time processing of large-scale music DB retrieval, and high granularity of the retrieval key. As a method of realizing robustness against non-stationary noise, we propose a partially similar retrieval method which improves retrieval accuracy by using the moment in which no superfluous noise is produced during the existence of non-stationary noise. In order to realize real-time processing of large-scale music DB retrieval, we adopt a coarse-to-fine strategy, and propose a spectral peaks hashing method which performs high-speed refining by using hashing. To calculate a hash value in this hashing, frequency channel numbers of the spectral peaks are used. In order to realize high granularity of the retrieval key, it is necessary to solve the problem of retrieval accuracy degradation associated with heightening the granularity. To improve this accuracy, we propose a detection-by-continuity method which uses music continuity. Moreover, by using music continuity to correct the starting point and the terminal point of a musical piece in TV broadcasting, the retrieval accuracy is improved further. In order to evaluate the effectiveness of the proposed methods, we performed experiments using a music DB which stores over 28,000 musical pieces (over 1800 hours) and TV broadcasting audio signals containing music and background music (BGM). The granularity of the retrieval key was set at about 0.5 seconds. Through these experiments, We verified that music monitoring was possible for over 90% of the total time of music and BGM used in TV broadcasting, and that real-time processing was possible.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125564324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GURU is a distance-learning environment that renders multimedia information to users with disabilities in an accessible manner. It is an implementation framework developed as part of an effort to provide accessible multimedia information to end users with perceptual (visual and auditory), cognitive or motor impairments. GURU is based on the MPEG-4 standard, and it modifies MP4 content and the presentation of the different objects in the scene dynamically based on users' visual, auditory and motor abilities. This paper briefly describes the implementation of the prototype framework and illustrates sample adaptations as implemented in this framework.
{"title":"GURU: a multimedia distance-learning framework for users with disabilities","authors":"Vidhya Balasubramanian, N. Venkatasubramanian","doi":"10.1145/1027527.1027698","DOIUrl":"https://doi.org/10.1145/1027527.1027698","url":null,"abstract":"GURU is a distance-learning environment that renders multimedia information to users with disabilities in an accessible manner. It is an implementation framework developed as part of an effort to provide accessible multimedia information to end users with perceptual (visual and auditory), cognitive or motor impairments. GURU is based on the MPEG-4 standard, and it modifies MP4 content and the presentation of the different objects in the scene dynamically based on users' visual, auditory and motor abilities. This paper briefly describes the implementation of the prototype framework and illustrates sample adaptations as implemented in this framework.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116417675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Today's interfaces for time-scaled audio replay have limitations especially regarding highly interactive tasks such as skimming and searching, which require quick temporary speed changes. Motivated by this shortcoming, we introduce a new interaction technique for speech skimming based on the so called rubber-band metaphor. We propose an "elastic" audio slider which is especially useful for temporary manipulation of replay speed and which integrates seamlessly into standard interface designs. The feasibility of this concept is proven by an initial user study.
{"title":"Interactive manipulation of replay speed while listening to speech recordings","authors":"Wolfgang Hürst, T. Lauer, Georg Götz","doi":"10.1145/1027527.1027645","DOIUrl":"https://doi.org/10.1145/1027527.1027645","url":null,"abstract":"Today's interfaces for time-scaled audio replay have limitations especially regarding highly interactive tasks such as skimming and searching, which require quick temporary speed changes. Motivated by this shortcoming, we introduce a new interaction technique for speech skimming based on the so called rubber-band metaphor. We propose an \"elastic\" audio slider which is especially useful for temporary manipulation of replay speed and which integrates seamlessly into standard interface designs. The feasibility of this concept is proven by an initial user study.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116529023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motion information is a powerful cue for visual perception. In the context of video indexing and retrieval, motion content serves as a useful source for compact video representation. There has been a lot of literature about parametric motion models. However, it is hard to secure a proper parametric assumption in a wide range of video scenarios. Diverse camera shots and frequent occurrences of improper optical flow estimation or block matching motivate us to develop nonparametric motion models. In this demonstration, we present a novel nonparametric motion model. The unique features mainly include: 1) Instead of computationally expensive and vulnerable parametric regression our proposed model bases the motion characterization on the classification of motion patterns; 2) we employ machine learning to capture the knowledge of recognizing camera motion patterns from bad motion vector fields (MVF); and 3) with the mean shift filtering our proposed motion representation elegantly incorporates the spatial-range information for noise removal and discontinuity preserving smoothing of MVF. Promising results have been achieved on two tasks: 1) camera motion pattern recognition on 23191 MVFs and 2) recognition of the intensity of motion activity on 622 video segments culled from the MPEG-7 dataset.
{"title":"Nonparametric motion model","authors":"Ling-yu Duan, Mingliang Xu, Q. Tian, Changsheng Xu","doi":"10.1145/1027527.1027700","DOIUrl":"https://doi.org/10.1145/1027527.1027700","url":null,"abstract":"Motion information is a powerful cue for visual perception. In the context of video indexing and retrieval, motion content serves as a useful source for compact video representation. There has been a lot of literature about parametric motion models. However, it is hard to secure a proper parametric assumption in a wide range of video scenarios. Diverse camera shots and frequent occurrences of improper optical flow estimation or block matching motivate us to develop nonparametric motion models. In this demonstration, we present a novel nonparametric motion model. The unique features mainly include: 1) Instead of computationally expensive and vulnerable parametric regression our proposed model bases the motion characterization on the classification of motion patterns; 2) we employ machine learning to capture the knowledge of recognizing camera motion patterns from bad motion vector fields (MVF); and 3) with the mean shift filtering our proposed motion representation elegantly incorporates the spatial-range information for noise removal and discontinuity preserving smoothing of MVF. Promising results have been achieved on two tasks: 1) camera motion pattern recognition on 23191 MVFs and 2) recognition of the intensity of motion activity on 622 video segments culled from the MPEG-7 dataset.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114565448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a novel approach for auto image annotation. In our approach, we first perform the segmentation of images into regions, followed by clustering of regions, before learning the relationship between concepts and region clusters using the set of training images with pre-assigned concepts. The main focus of this paper is two-fold. First, in the learning stage, we perform clustering of regions into region clusters by incorporating pair-wise constraints which are derived by considering the language model underlying the annotations assigned to training images. Second, in the annotation stage, we employ a semi-naïve Bayes model to compute the posterior probability of concepts given the region clusters. Experiment results show that our proposed system utilizing these two strategies outperforms the state-of-the-art techniques in annotating large image collection.
{"title":"A semi-naïve Bayesian method incorporating clustering with pair-wise constraints for auto image annotation","authors":"Wanjun Jin, Rui Shi, Tat-Seng Chua","doi":"10.1145/1027527.1027605","DOIUrl":"https://doi.org/10.1145/1027527.1027605","url":null,"abstract":"We propose a novel approach for auto image annotation. In our approach, we first perform the segmentation of images into regions, followed by clustering of regions, before learning the relationship between concepts and region clusters using the set of training images with pre-assigned concepts. The main focus of this paper is two-fold. First, in the learning stage, we perform clustering of regions into region clusters by incorporating pair-wise constraints which are derived by considering the language model underlying the annotations assigned to training images. Second, in the annotation stage, we employ a semi-naïve Bayes model to compute the posterior probability of concepts given the region clusters. Experiment results show that our proposed system utilizing these two strategies outperforms the state-of-the-art techniques in annotating large image collection.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122142348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nowadays, the demand on user-friendly querying interface such as query-by-sketch and query-by-editing is an important issue in the content-based retrieval system for 3-D object database. Especially in MPEG-7, P3DS (Perceptual 3-D Shape) descriptor has been developed in order to provide the user-friendly querying, which can not be covered by an existing international standard for description and browsing of 3-D object database. Since the P3DS descriptor is based on the part-based representation of 3-D object, it is a kind of attributed relational gra (ARG) so that the ARG matching algorithm naturally follows as the core procedure for the similarity matching of the P3DS descriptor. In this paper, given a P3DS database from the corresponding 3-D object database, we bring focus into investigating the pros and cons of the target ARG matching algorithms. In order to demonstrate the objective evidence of our conclusion, we have conducted the experiments based on the database of 480 3-D objects with 33 categories in terms of the bull's eye performance, average normalized modified retrieval rate, and precision/recall curve.
{"title":"A comparative study on attributed relational gra matching algorithms for perceptual 3-D shape descriptor in MPEG-7","authors":"Duck Hoon Kim, I. Yun, Sang Uk Lee","doi":"10.1145/1027527.1027686","DOIUrl":"https://doi.org/10.1145/1027527.1027686","url":null,"abstract":"Nowadays, the demand on user-friendly querying interface such as query-by-sketch and query-by-editing is an important issue in the content-based retrieval system for 3-D object database. Especially in MPEG-7, P3DS (Perceptual 3-D Shape) descriptor has been developed in order to provide the user-friendly querying, which can not be covered by an existing international standard for description and browsing of 3-D object database. Since the P3DS descriptor is based on the part-based representation of 3-D object, it is a kind of attributed relational gra (ARG) so that the ARG matching algorithm naturally follows as the core procedure for the similarity matching of the P3DS descriptor. In this paper, given a P3DS database from the corresponding 3-D object database, we bring focus into investigating the pros and cons of the target ARG matching algorithms. In order to demonstrate the objective evidence of our conclusion, we have conducted the experiments based on the database of 480 3-D objects with 33 categories in terms of the bull's eye performance, average normalized modified retrieval rate, and <i>precision/recall</i> curve.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117048346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Universal Whistling Machine (U.W.M) senses the presence of people in its vicinity and attracts them with a signature whistle. Given a response whistle, U.W.M. counters with its own composition, based on a time-frequency analysis of the original.
{"title":"When code is content: experiments with a whistling machine","authors":"M. Böhlen, J. Rinker","doi":"10.1145/1027527.1027761","DOIUrl":"https://doi.org/10.1145/1027527.1027761","url":null,"abstract":"The Universal Whistling Machine (U.W.M) senses the presence of people in its vicinity and attracts them with a signature whistle. Given a response whistle, U.W.M. counters with its own composition, based on a time-frequency analysis of the original.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129593579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}