Pub Date : 2006-07-09DOI: 10.1109/ICME.2006.262553
Wen Wu, Jie Yang, Jing Zhang
Trip planning and in-vehicle navigation are crucial tasks for easier and safer driving. The existing navigation systems are based on machine intelligence without allowing human knowledge incorporation. These systems give turn guidance with abstract visual instruction and have not reached the potential of minimizing driver's cognitive load, which is the amount of mental processing power required. In this paper, we describe the development of a multimedia system that makes driving and navigation safer and easier by offering tools for route sharing in trip planning and video-based route guidance during driving. The system provides a multimodal interface for a user to share his/her route with others by drawing on a digital map, naturally incorporating human knowledge into the trip planning process. The system gives driving instructions by overlaying navigational arrows onto live video and providing synthesized voice to reduce the driver's cognitive load, in addition to presenting landmark images for key maneuvers. We describe our observations which had motivated the development of the system, detailed architecture and user interfaces, and finally discusses our initial test findings in the real-road driving context
{"title":"A Multimedia System for Route Sharing and Video-Based Navigation","authors":"Wen Wu, Jie Yang, Jing Zhang","doi":"10.1109/ICME.2006.262553","DOIUrl":"https://doi.org/10.1109/ICME.2006.262553","url":null,"abstract":"Trip planning and in-vehicle navigation are crucial tasks for easier and safer driving. The existing navigation systems are based on machine intelligence without allowing human knowledge incorporation. These systems give turn guidance with abstract visual instruction and have not reached the potential of minimizing driver's cognitive load, which is the amount of mental processing power required. In this paper, we describe the development of a multimedia system that makes driving and navigation safer and easier by offering tools for route sharing in trip planning and video-based route guidance during driving. The system provides a multimodal interface for a user to share his/her route with others by drawing on a digital map, naturally incorporating human knowledge into the trip planning process. The system gives driving instructions by overlaying navigational arrows onto live video and providing synthesized voice to reduce the driver's cognitive load, in addition to presenting landmark images for key maneuvers. We describe our observations which had motivated the development of the system, detailed architecture and user interfaces, and finally discusses our initial test findings in the real-road driving context","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127458748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-07-09DOI: 10.1109/ICME.2006.262407
A. Shoa, S. Shirani
Distortion of matching pursuit is calculated in terms of matching pursuit encoder parameters for uniformly distributed signals and dictionaries. Then, the MP encoder is optimized using the analytically derived approximation for MP distortion. Our simulation results show that this optimized MP encoder exhibits optimum performance for nonuniform signal and dictionary distributions as well
{"title":"Optimization of Matching Pursuit Encoder Based on Analytical Approximation of Matching Pursuit Distortion","authors":"A. Shoa, S. Shirani","doi":"10.1109/ICME.2006.262407","DOIUrl":"https://doi.org/10.1109/ICME.2006.262407","url":null,"abstract":"Distortion of matching pursuit is calculated in terms of matching pursuit encoder parameters for uniformly distributed signals and dictionaries. Then, the MP encoder is optimized using the analytically derived approximation for MP distortion. Our simulation results show that this optimized MP encoder exhibits optimum performance for nonuniform signal and dictionary distributions as well","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124941681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-07-09DOI: 10.1109/ICME.2006.262563
O. Pietquin
Because of the great variability of factors to take into account, designing a spoken dialogue system is still a tailoring task. Rapid design and reusability of previous work is made very difficult. For these reasons, the application of machine learning methods to dialogue strategy optimization has become a leading subject of researches this last decade. Yet, techniques such as reinforcement learning are very demanding in training data while obtaining a substantial amount of data in the particular case of spoken dialogues is time-consuming and therefore expansive. In order to expand existing data sets, dialogue simulation techniques are becoming a standard solution. In this paper we describe a user modeling technique for realistic simulation of man-machine goal-directed spoken dialogues. This model, based on a stochastic description of man-machine communication, unlike previously proposed models, is consistent along the interaction according to its history and a predefined user goal
{"title":"Consistent Goal-Directed User Model for Realisitc Man-Machine Task-Oriented Spoken Dialogue Simulation","authors":"O. Pietquin","doi":"10.1109/ICME.2006.262563","DOIUrl":"https://doi.org/10.1109/ICME.2006.262563","url":null,"abstract":"Because of the great variability of factors to take into account, designing a spoken dialogue system is still a tailoring task. Rapid design and reusability of previous work is made very difficult. For these reasons, the application of machine learning methods to dialogue strategy optimization has become a leading subject of researches this last decade. Yet, techniques such as reinforcement learning are very demanding in training data while obtaining a substantial amount of data in the particular case of spoken dialogues is time-consuming and therefore expansive. In order to expand existing data sets, dialogue simulation techniques are becoming a standard solution. In this paper we describe a user modeling technique for realistic simulation of man-machine goal-directed spoken dialogues. This model, based on a stochastic description of man-machine communication, unlike previously proposed models, is consistent along the interaction according to its history and a predefined user goal","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125867080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-07-09DOI: 10.1109/ICME.2006.262911
Kazunori Matsumoto, Masaki Naito, K. Hoashi, F. Sugaya
This paper describes our new algorithm for shot boundary detection and its evaluation. We adopt a 2-stage data fusion approach with SVM technique to decide whether a boundary exists or not within a given video sequence. This approach is useful to avoid huge feature space problems, even when we adopt many promising features extracted from a video sequence. We also introduce a novel feature to improve detection. The feature consists of two kinds of values extracted from a local frame sequence. One is the image difference between the target frame and that synthesized from the neighbors. The other is the difference between neighbors. This feature can be extracted quickly with a least-square technique. Evaluation of our algorithm is conducted with the TRECVID evaluation framework. Our system obtained a high performance at a shot boundary detection task in TRECVID2005
{"title":"SVM-Based Shot Boundary Detection with a Novel Feature","authors":"Kazunori Matsumoto, Masaki Naito, K. Hoashi, F. Sugaya","doi":"10.1109/ICME.2006.262911","DOIUrl":"https://doi.org/10.1109/ICME.2006.262911","url":null,"abstract":"This paper describes our new algorithm for shot boundary detection and its evaluation. We adopt a 2-stage data fusion approach with SVM technique to decide whether a boundary exists or not within a given video sequence. This approach is useful to avoid huge feature space problems, even when we adopt many promising features extracted from a video sequence. We also introduce a novel feature to improve detection. The feature consists of two kinds of values extracted from a local frame sequence. One is the image difference between the target frame and that synthesized from the neighbors. The other is the difference between neighbors. This feature can be extracted quickly with a least-square technique. Evaluation of our algorithm is conducted with the TRECVID evaluation framework. Our system obtained a high performance at a shot boundary detection task in TRECVID2005","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123690233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-07-09DOI: 10.1109/ICME.2006.262601
D. Lelescu
The use of block transforms for coding intra-frames in video coding may preclude higher coding performance due to residual correlation across block boundaries and insufficient energy compaction, which translates into unrealized rate-distortion gains. Subjectively, the occurrence of blocking artifacts is common. Post-filters and lapped transforms offer good solutions to these problems. Lapped transforms offer a more general framework which can incorporate coordinated pre- and post-filtering operations. Most common are fixed lapped transforms (such as lapped orthogonal transforms), and also transforms with adaptive basis function length. In contrast, in this paper we determine a lapped transform that non-linearly adapts its basis functions to local image statistics and the quantization regime. This transform was incorporated into the H.264/AVC codec, and its performance evaluated. As a result, significant rate-distortion gains of up to 0.45 dB (average 0.35dB) PSNR were obtained compared to the H.264/AVC codec alone
{"title":"Nonlinearly-Adapted Lapped Transforms for Intra-Frame Coding","authors":"D. Lelescu","doi":"10.1109/ICME.2006.262601","DOIUrl":"https://doi.org/10.1109/ICME.2006.262601","url":null,"abstract":"The use of block transforms for coding intra-frames in video coding may preclude higher coding performance due to residual correlation across block boundaries and insufficient energy compaction, which translates into unrealized rate-distortion gains. Subjectively, the occurrence of blocking artifacts is common. Post-filters and lapped transforms offer good solutions to these problems. Lapped transforms offer a more general framework which can incorporate coordinated pre- and post-filtering operations. Most common are fixed lapped transforms (such as lapped orthogonal transforms), and also transforms with adaptive basis function length. In contrast, in this paper we determine a lapped transform that non-linearly adapts its basis functions to local image statistics and the quantization regime. This transform was incorporated into the H.264/AVC codec, and its performance evaluated. As a result, significant rate-distortion gains of up to 0.45 dB (average 0.35dB) PSNR were obtained compared to the H.264/AVC codec alone","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114924711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-07-09DOI: 10.1109/ICME.2006.262848
Akihiro Matsuoka, Kiyoshi Tanaka, A. Yoneyama, Y. Nakajima
In this work, we propose a data embedding scheme in MPEG-1/audio layer II compressed domain. Data embedding is conducted every AAU by using side information (location of sub-band allocated audio signal) as a data carrier. In general, non-zero signals concentrates in low and middle frequency bands. Therefore we utilize sub-bands that are not allocated audio signal in high frequency bands to embed information. The proposed scheme can increase payload while achieving rewritable (reversible) data, embedding by choosing appropriate parameter. We verify the basic performance of our scheme through computer simulation by using some voice and music signals
{"title":"Data Embedding in MPEG-1/Audio Layer II Compressed Domain using Side Information","authors":"Akihiro Matsuoka, Kiyoshi Tanaka, A. Yoneyama, Y. Nakajima","doi":"10.1109/ICME.2006.262848","DOIUrl":"https://doi.org/10.1109/ICME.2006.262848","url":null,"abstract":"In this work, we propose a data embedding scheme in MPEG-1/audio layer II compressed domain. Data embedding is conducted every AAU by using side information (location of sub-band allocated audio signal) as a data carrier. In general, non-zero signals concentrates in low and middle frequency bands. Therefore we utilize sub-bands that are not allocated audio signal in high frequency bands to embed information. The proposed scheme can increase payload while achieving rewritable (reversible) data, embedding by choosing appropriate parameter. We verify the basic performance of our scheme through computer simulation by using some voice and music signals","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116070655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-07-09DOI: 10.1109/ICME.2006.262620
ZhenQiu Zhang, G. Potamianos, Stephen M. Chu, J. Tu, Thomas S. Huang
We present a robust vision system for single person tracking inside a smart room using multiple synchronized, calibrated, stationary cameras. The system consists of two main components, namely initialization and tracking, assisted by an additional component that detects tracking drift. The main novelty lies in the adaptive tracking mechanism that is based on subspace learning of the tracked person appearance in selected two-dimensional camera views. The sub-space is learned on the fly, during tracking, but in contrast to the traditional literature approach, an additional "forgetting" mechanism is introduced, as a means to reduce drifting. The proposed algorithm replaces mean-shift tracking, previously employed in our work. By combining the proposed technique with a robust initialization component that is based on face detection and spatio-temporal dynamic programming, the resulting vision system significantly outperforms previously reported systems for the task of tracking the seminar presenter in data collected as part of the CHIL project
{"title":"Person Tracking in Smart Rooms using Dynamic Programming and Adaptive Subspace Learning","authors":"ZhenQiu Zhang, G. Potamianos, Stephen M. Chu, J. Tu, Thomas S. Huang","doi":"10.1109/ICME.2006.262620","DOIUrl":"https://doi.org/10.1109/ICME.2006.262620","url":null,"abstract":"We present a robust vision system for single person tracking inside a smart room using multiple synchronized, calibrated, stationary cameras. The system consists of two main components, namely initialization and tracking, assisted by an additional component that detects tracking drift. The main novelty lies in the adaptive tracking mechanism that is based on subspace learning of the tracked person appearance in selected two-dimensional camera views. The sub-space is learned on the fly, during tracking, but in contrast to the traditional literature approach, an additional \"forgetting\" mechanism is introduced, as a means to reduce drifting. The proposed algorithm replaces mean-shift tracking, previously employed in our work. By combining the proposed technique with a robust initialization component that is based on face detection and spatio-temporal dynamic programming, the resulting vision system significantly outperforms previously reported systems for the task of tracking the seminar presenter in data collected as part of the CHIL project","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116129058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-07-09DOI: 10.1109/ICME.2006.262793
R. Eslami, J. Deller, H. Radha
Blind multiplicative watermarking schemes for speech signals using wavelets and discrete cosine transform are presented. Watermarked signals are modeled using a generalized Gaussian distribution (GGD) and Cauchy probability model. Detectors are developed employing generalized likelihood ratio test (GLRT) and locally most powerful (LMP) approach. The LMP scheme is used for the Cauchy distribution, while the GLRT estimates the gain factor as an unknown parameter in the GGD model. The detectors are tested using Monte Carlo simulation and results show the superiority of the proposed LMP/Cauchy detector in some experiments
{"title":"On the Detection of Multiplicative Watermarks for Speech Signals in the Wavelet and DCT Domains","authors":"R. Eslami, J. Deller, H. Radha","doi":"10.1109/ICME.2006.262793","DOIUrl":"https://doi.org/10.1109/ICME.2006.262793","url":null,"abstract":"Blind multiplicative watermarking schemes for speech signals using wavelets and discrete cosine transform are presented. Watermarked signals are modeled using a generalized Gaussian distribution (GGD) and Cauchy probability model. Detectors are developed employing generalized likelihood ratio test (GLRT) and locally most powerful (LMP) approach. The LMP scheme is used for the Cauchy distribution, while the GLRT estimates the gain factor as an unknown parameter in the GGD model. The detectors are tested using Monte Carlo simulation and results show the superiority of the proposed LMP/Cauchy detector in some experiments","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122515274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-07-09DOI: 10.1109/ICME.2006.262717
C. Poucet, David Atienza Alonso, F. Catthoor
Modern multimedia applications possess a very dynamic use of the memory hierarchy depending on the actual input, therefore requiring run-time profiling techniques to enable optimizations. Because they can contain hundreds of thousands of lines of complex object-oriented specifications, this constitutes a tedious time-consuming task since the addition of profilecode is usually performed manually. In this paper, we present a high-level library-based approach for profiling both statically and dynamically defined variables using templates in C++. Our results in the visual texture coder of the MPEG4 standard show that using the information it provides, we can easily achieve 70.56% energy savings and 19.22% memory access reduction
{"title":"Template-Based Semi-Automatic Profiling of Multimedia Applications","authors":"C. Poucet, David Atienza Alonso, F. Catthoor","doi":"10.1109/ICME.2006.262717","DOIUrl":"https://doi.org/10.1109/ICME.2006.262717","url":null,"abstract":"Modern multimedia applications possess a very dynamic use of the memory hierarchy depending on the actual input, therefore requiring run-time profiling techniques to enable optimizations. Because they can contain hundreds of thousands of lines of complex object-oriented specifications, this constitutes a tedious time-consuming task since the addition of profilecode is usually performed manually. In this paper, we present a high-level library-based approach for profiling both statically and dynamically defined variables using templates in C++. Our results in the visual texture coder of the MPEG4 standard show that using the information it provides, we can easily achieve 70.56% energy savings and 19.22% memory access reduction","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122448719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rapid growth of digital photography in recent years spurred the need of photo management tools. In this study, we propose an automatic organization framework for photo collections based on image content, so that a novel browsing experience is provided for users. For each photograph, human faces, together with corresponding clothes and nearby regions are located. We extract color histograms of these regions as the image content feature. Then a similarity matrix of a photo collection is generated according to temporal and content features of those photographs. We perform hierarchical clustering based on this matrix, and extract duplicate subjects of a cluster by introducing the contrast context histogram (CCH) technique. The experimental results show that the developed framework provides a promising result for photo management
{"title":"Image Content Clustering and Summarization for Photo Collections","authors":"Cheng-Hung Li, Chih-Yi Chiu, Chun-Rong Huang, Chu-Song Chen, Lee-Feng Chien","doi":"10.1109/ICME.2006.262710","DOIUrl":"https://doi.org/10.1109/ICME.2006.262710","url":null,"abstract":"Rapid growth of digital photography in recent years spurred the need of photo management tools. In this study, we propose an automatic organization framework for photo collections based on image content, so that a novel browsing experience is provided for users. For each photograph, human faces, together with corresponding clothes and nearby regions are located. We extract color histograms of these regions as the image content feature. Then a similarity matrix of a photo collection is generated according to temporal and content features of those photographs. We perform hierarchical clustering based on this matrix, and extract duplicate subjects of a cluster by introducing the contrast context histogram (CCH) technique. The experimental results show that the developed framework provides a promising result for photo management","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114381259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}