Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521463
Thurid Vogt, E. André
We present a data-mining experiment on feature selection for automatic emotion recognition. Starting from more than 1000 features derived from pitch, energy and MFCC time series, the most relevant features in respect to the data are selected from this set by removing correlated features. The features selected for acted and realistic emotions are analyzed and show significant differences. All features are computed automatically and we also contrast automatically with manually units of analysis. A higher degree of automation did not prove to be a disadvantage in terms of recognition accuracy
{"title":"Comparing Feature Sets for Acted and Spontaneous Speech in View of Automatic Emotion Recognition","authors":"Thurid Vogt, E. André","doi":"10.1109/ICME.2005.1521463","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521463","url":null,"abstract":"We present a data-mining experiment on feature selection for automatic emotion recognition. Starting from more than 1000 features derived from pitch, energy and MFCC time series, the most relevant features in respect to the data are selected from this set by removing correlated features. The features selected for acted and realistic emotions are analyzed and show significant differences. All features are computed automatically and we also contrast automatically with manually units of analysis. A higher degree of automation did not prove to be a disadvantage in terms of recognition accuracy","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115245513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521610
S. Bocconi, F. Nack, L. Hardman
We use rhetorical annotations to specify a generation process that can assemble meaningful video sequences with a communicative goal and an argumentative progression. Our annotation schema encodes the verbal information contained in the audio channel, identifying the claims the interviewees make and the argumentation structures they use to make those claims. Based on this schema, we construct a semantic graph which is traversed by rhetoric-based strategies selecting video segments. The selected video segments are edited to form a meaningful video sequence.
{"title":"Using rhetorical annotations for generating video documentaries","authors":"S. Bocconi, F. Nack, L. Hardman","doi":"10.1109/ICME.2005.1521610","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521610","url":null,"abstract":"We use rhetorical annotations to specify a generation process that can assemble meaningful video sequences with a communicative goal and an argumentative progression. Our annotation schema encodes the verbal information contained in the audio channel, identifying the claims the interviewees make and the argumentation structures they use to make those claims. Based on this schema, we construct a semantic graph which is traversed by rhetoric-based strategies selecting video segments. The selected video segments are edited to form a meaningful video sequence.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"165 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123171999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521711
Jin Young Lee, H. Radha
Network impairments such as delay and packet losses have severe impact on the presentation quality of many predictive video sources. Prior researches have shown efforts to develop packet loss resilient coding methods to overcome such impairments for real-time streaming applications. Interleaved source coding (ISC) is one of the error resilient coding methods, which is based on an optimum interleaving of predictive video coded frames transmitted over a single erasure channel. ISC employs a Markov decision process (MDP) and a corresponding dynamic programming algorithm to identify the optimal interleaving pattern for a given channel model and a transmitting sequence. ISC has shown to significantly improve the overall quality of predictive video coded stream over a lossy channel without complex modifications to standard video coders. In this paper, ISC is evaluated over channels with memory. In particular, we analyze the impact of packet correlation of the popular Gilbert model on ISC-based packet video over a wide range of packet loss probabilities. Simulations have shown that ISC advances the traditional method as either the loss rate increases or the packet correlation decreases
{"title":"Evaluation of the Interleaved Source Coding (ISC) Under Packet Correlation","authors":"Jin Young Lee, H. Radha","doi":"10.1109/ICME.2005.1521711","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521711","url":null,"abstract":"Network impairments such as delay and packet losses have severe impact on the presentation quality of many predictive video sources. Prior researches have shown efforts to develop packet loss resilient coding methods to overcome such impairments for real-time streaming applications. Interleaved source coding (ISC) is one of the error resilient coding methods, which is based on an optimum interleaving of predictive video coded frames transmitted over a single erasure channel. ISC employs a Markov decision process (MDP) and a corresponding dynamic programming algorithm to identify the optimal interleaving pattern for a given channel model and a transmitting sequence. ISC has shown to significantly improve the overall quality of predictive video coded stream over a lossy channel without complex modifications to standard video coders. In this paper, ISC is evaluated over channels with memory. In particular, we analyze the impact of packet correlation of the popular Gilbert model on ISC-based packet video over a wide range of packet loss probabilities. Simulations have shown that ISC advances the traditional method as either the loss rate increases or the packet correlation decreases","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121710710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521639
Hiroki Onishi, T. Satoh, T. Uehara, K. Yamaoka
This report describes a pay broadcasting system for the Internet. This system would enable tens of thousands of people to access an identical video stream simultaneously. In this proposed system, contents are broadcast to all terminals using IP multicast. Contents are encrypted so that legitimate users can decode them with a private key and session keys. As a key management scheme, the Tracing Traitor scheme is adopted because it offers advantages in scalability. The system can also embed digital watermarks, which act as a psychological deterrent to illegal copying and distribution of copyrighted contents. Finally, implementation of an application system is described and efficient broadcasting of contents with this system is demonstrated
{"title":"IP Multicast Video Broadcasting System with User Authentication","authors":"Hiroki Onishi, T. Satoh, T. Uehara, K. Yamaoka","doi":"10.1109/ICME.2005.1521639","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521639","url":null,"abstract":"This report describes a pay broadcasting system for the Internet. This system would enable tens of thousands of people to access an identical video stream simultaneously. In this proposed system, contents are broadcast to all terminals using IP multicast. Contents are encrypted so that legitimate users can decode them with a private key and session keys. As a key management scheme, the Tracing Traitor scheme is adopted because it offers advantages in scalability. The system can also embed digital watermarks, which act as a psychological deterrent to illegal copying and distribution of copyrighted contents. Finally, implementation of an application system is described and efficient broadcasting of contents with this system is demonstrated","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122908003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521705
G. Iacovoni, S. Morsa, R. Felice
We propose a new video homogeneous transcoding architecture DCT-based which relies on both quality and temporal reduction techniques. The frame layer control is driven by a new indicator, the jerkiness, which represents the user perception of the movement which affects a video stream. The proposed transcoder can meet the constraints of a real-time communication and it has been extensively tested under different conditions
{"title":"Quality-Temporal Transcoder Driven by the Jerkiness","authors":"G. Iacovoni, S. Morsa, R. Felice","doi":"10.1109/ICME.2005.1521705","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521705","url":null,"abstract":"We propose a new video homogeneous transcoding architecture DCT-based which relies on both quality and temporal reduction techniques. The frame layer control is driven by a new indicator, the jerkiness, which represents the user perception of the movement which affects a video stream. The proposed transcoder can meet the constraints of a real-time communication and it has been extensively tested under different conditions","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126236720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521347
Y. Zhai, M. Shah
Temporal video segmentation is one of the fundamental and essential tasks in video processing, understanding and management. In this paper, we present an automatic method for segmenting the home videos into temporal logical units. We have developed a statistical framework using Markov chain Monte Carlo (MCMC) technique. The temporal scene boundaries are detected by maximizing the posterior probability of the model parameters. The model parameters contain the number of the scenes and the boundary locations of the scenes. The proposed method has been demonstrated on several home videos, and high accuracy has been obtained
{"title":"Automatic Segmentation of Home Videos","authors":"Y. Zhai, M. Shah","doi":"10.1109/ICME.2005.1521347","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521347","url":null,"abstract":"Temporal video segmentation is one of the fundamental and essential tasks in video processing, understanding and management. In this paper, we present an automatic method for segmenting the home videos into temporal logical units. We have developed a statistical framework using Markov chain Monte Carlo (MCMC) technique. The temporal scene boundaries are detected by maximizing the posterior probability of the model parameters. The model parameters contain the number of the scenes and the boundary locations of the scenes. The proposed method has been demonstrated on several home videos, and high accuracy has been obtained","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126434551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521349
Tim D. Jackson, Francis F. Li, Keith Yates
This paper proposes a method for the formation of an auxiliary media channel within a host signal. Using a psychoacoustic frequency masking model, perceptually insignificant subband components of the host audio signal are identified and removed. The auxiliary channel data are placed in the empty subbands in the host signal and scaled to a level below the audible threshold. An implementation is given along with results suggesting that the proposed method can effectively hide an auxiliary media channel in a normal audio signal without degrading the perceived sound quality.
{"title":"Hidden auxiliary media channels in audio signals by perceptually insignificant component replacement","authors":"Tim D. Jackson, Francis F. Li, Keith Yates","doi":"10.1109/ICME.2005.1521349","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521349","url":null,"abstract":"This paper proposes a method for the formation of an auxiliary media channel within a host signal. Using a psychoacoustic frequency masking model, perceptually insignificant subband components of the host audio signal are identified and removed. The auxiliary channel data are placed in the empty subbands in the host signal and scaled to a level below the audible threshold. An implementation is given along with results suggesting that the proposed method can effectively hide an auxiliary media channel in a normal audio signal without degrading the perceived sound quality.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125653237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521560
Björn Schuller, S. Reiter, R. Müller, M. Al-Hames, M. Lang, G. Rigoll
Emotion recognition grows to an important factor in future media retrieval and man machine interfaces. However, even human deciders often experience problems realizing one's emotion, especially of strangers. In this work we strive to recognize emotion independent of the person concentrating on the speech channel. Single feature relevance of acoustic features is a critical point, which we address by filter-based gain ratio calculation starting at a basis of 276 features. As optimization of a minimum set as a whole in general saves more extraction effort, we furthermore apply an SVM-SFFS wrapper based search. For a more robust estimation we also integrate spoken content information by a Bayesian net analysis of ASR outputs. Overall classification is realized in an early feature fusion by stacked ensembles of diverse base classifiers. Tests ran on a 3,947 movie and automotive interaction dialog-turns database consisting of 35 speakers. Remarkable overall performance can be reported in the discrimination of the seven discrete emotions named in the MPEG-4 standard with added neutrality
{"title":"Speaker Independent Speech Emotion Recognition by Ensemble Classification","authors":"Björn Schuller, S. Reiter, R. Müller, M. Al-Hames, M. Lang, G. Rigoll","doi":"10.1109/ICME.2005.1521560","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521560","url":null,"abstract":"Emotion recognition grows to an important factor in future media retrieval and man machine interfaces. However, even human deciders often experience problems realizing one's emotion, especially of strangers. In this work we strive to recognize emotion independent of the person concentrating on the speech channel. Single feature relevance of acoustic features is a critical point, which we address by filter-based gain ratio calculation starting at a basis of 276 features. As optimization of a minimum set as a whole in general saves more extraction effort, we furthermore apply an SVM-SFFS wrapper based search. For a more robust estimation we also integrate spoken content information by a Bayesian net analysis of ASR outputs. Overall classification is realized in an early feature fusion by stacked ensembles of diverse base classifiers. Tests ran on a 3,947 movie and automotive interaction dialog-turns database consisting of 35 speakers. Remarkable overall performance can be reported in the discrimination of the seven discrete emotions named in the MPEG-4 standard with added neutrality","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125791357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521641
Keon Stevenson, C. Leung
While text-oriented document searching are relatively mature on the Internet, image searching, which requires much more than text matching, significantly lags behind. The use of image search engines significantly enlarges the scope of images to users accessibility. This paper provides an understanding of current technologies in image searching on the Internet, and points to future areas of improvement for multimedia applications. We develop a systematic set of image queries to assess the competence and performance of the major image search engines. We find that current technology is only able to deliver an average precision of around 42% and an average recall of around 12%, while the best performers are capable of producing over 70% for precision and around 27% for recall. The reasons for such differences, and mechanisms for search improvement, are also indicated.
{"title":"Comparative evaluation of Web image search engines for multimedia applications","authors":"Keon Stevenson, C. Leung","doi":"10.1109/ICME.2005.1521641","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521641","url":null,"abstract":"While text-oriented document searching are relatively mature on the Internet, image searching, which requires much more than text matching, significantly lags behind. The use of image search engines significantly enlarges the scope of images to users accessibility. This paper provides an understanding of current technologies in image searching on the Internet, and points to future areas of improvement for multimedia applications. We develop a systematic set of image queries to assess the competence and performance of the major image search engines. We find that current technology is only able to deliver an average precision of around 42% and an average recall of around 12%, while the best performers are capable of producing over 70% for precision and around 27% for recall. The reasons for such differences, and mechanisms for search improvement, are also indicated.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129500596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521362
S. Emmanuel, C. K. Heng, A. Das
In this paper, we present a novel reversible watermarking scheme for image authentication for JPEG/JPEG-2000 coded images. Since the watermarking scheme is reversible, the exact original image can be recovered from the watermarked image. The watermarking scheme makes use of finite state machine principles. The proposed scheme is asymmetric as the watermark extraction key is different from its embedding key. The algorithm is implemented and tested for its visual quality, compression overhead, execution time overhead and payload capacity. It is found that the algorithm has high visual quality, high payload capacity, low compression overhead and low execution time overhead
{"title":"A Reversible Watermarking Scheme for JPEG-2000 Compressed Images","authors":"S. Emmanuel, C. K. Heng, A. Das","doi":"10.1109/ICME.2005.1521362","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521362","url":null,"abstract":"In this paper, we present a novel reversible watermarking scheme for image authentication for JPEG/JPEG-2000 coded images. Since the watermarking scheme is reversible, the exact original image can be recovered from the watermarked image. The watermarking scheme makes use of finite state machine principles. The proposed scheme is asymmetric as the watermark extraction key is different from its embedding key. The algorithm is implemented and tested for its visual quality, compression overhead, execution time overhead and payload capacity. It is found that the algorithm has high visual quality, high payload capacity, low compression overhead and low execution time overhead","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128489082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}