Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521485
Yungang Zhang, Changshui Zhang
Separation of voice and music is an interesting but difficult problem. It is useful for many other researches such as audio content analysis. In this paper, the difference between voice and music signals is carefully studied. It is proposed that the harmonic structure stability is the key difference between them. A separation algorithm based on this theory is proposed. The main idea is to learn the average harmonic structure of the music, and then separate signals by using it to distinguish voice and music harmonic structures. Experimental results show that the algorithm can separate mixed signals and obtains not only a very high signal-to-noise ratio (SNR) but also a rather good subjective audio quality
{"title":"Separation of Voice and Music by Harmonic Structure Stability Analysis","authors":"Yungang Zhang, Changshui Zhang","doi":"10.1109/ICME.2005.1521485","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521485","url":null,"abstract":"Separation of voice and music is an interesting but difficult problem. It is useful for many other researches such as audio content analysis. In this paper, the difference between voice and music signals is carefully studied. It is proposed that the harmonic structure stability is the key difference between them. A separation algorithm based on this theory is proposed. The main idea is to learn the average harmonic structure of the music, and then separate signals by using it to distinguish voice and music harmonic structures. Experimental results show that the algorithm can separate mixed signals and obtains not only a very high signal-to-noise ratio (SNR) but also a rather good subjective audio quality","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128315364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521447
O. Pietquin
Speech enabled interfaces and spoken dialog systems are mostly based on statistical speech and language processing modules. Their behavior is therefore not deterministic and hardly predictable. This makes the simulation and the optimization of such systems performances difficult, as well as the reuse of previous work to build new systems. In the aim of a partially automated optimization of such systems, this paper presents a formalism attempt for the description of man-machine spoken communication in the framework of spoken dialog systems. This formalization is partly based on a probabilistic description of the information processing occurring in each module composing a spoken dialog system but also on a stochastic user modeling. Eventually, some possible applications of this theoretic framework are proposed
{"title":"A Probabilistic Description of Man-Machine Spoken Communication","authors":"O. Pietquin","doi":"10.1109/ICME.2005.1521447","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521447","url":null,"abstract":"Speech enabled interfaces and spoken dialog systems are mostly based on statistical speech and language processing modules. Their behavior is therefore not deterministic and hardly predictable. This makes the simulation and the optimization of such systems performances difficult, as well as the reuse of previous work to build new systems. In the aim of a partially automated optimization of such systems, this paper presents a formalism attempt for the description of man-machine spoken communication in the framework of spoken dialog systems. This formalization is partly based on a probabilistic description of the information processing occurring in each module composing a spoken dialog system but also on a stochastic user modeling. Eventually, some possible applications of this theoretic framework are proposed","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127953416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521623
Xu Su, Q. Tian, Q. Xue, N. Sebe, Jingsheng Ma
Super-resolution is the problem of generating one or a set of high-resolution images from one or a sequence of low-resolution frames. Most methods have been proposed for super-resolution based on multiple low resolution images of the same scene, which is called multiple-frame super-resolution. Only a few approaches produce a high-resolution image from a single low-resolution image, with the help of one or a set of training images from scenes of the same or different types. It is referred to as single-frame super-resolution. This article reviews a variety of single-frame super-resolution methods proposed in the recent years. In the paper, a new manifold learning method: locally linear embedding (LLE) and its relation with single-frame super-resolution is introduced. Detailed study of a critical issue: "neighborhood issue" is presented with related experimental results and analysis and possible future research is given.
{"title":"Neighborhood issue in single-frame image super-resolution","authors":"Xu Su, Q. Tian, Q. Xue, N. Sebe, Jingsheng Ma","doi":"10.1109/ICME.2005.1521623","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521623","url":null,"abstract":"Super-resolution is the problem of generating one or a set of high-resolution images from one or a sequence of low-resolution frames. Most methods have been proposed for super-resolution based on multiple low resolution images of the same scene, which is called multiple-frame super-resolution. Only a few approaches produce a high-resolution image from a single low-resolution image, with the help of one or a set of training images from scenes of the same or different types. It is referred to as single-frame super-resolution. This article reviews a variety of single-frame super-resolution methods proposed in the recent years. In the paper, a new manifold learning method: locally linear embedding (LLE) and its relation with single-frame super-resolution is introduced. Detailed study of a critical issue: \"neighborhood issue\" is presented with related experimental results and analysis and possible future research is given.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128346596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521597
Yue Yang, Baoxin Li
We propose a non-linear image enhancement method based on Gabor filters, which allows selective enhancement based on the contrast sensitivity function of the human visual system. We also propose an evaluation method for measuring the performance of the algorithm and for comparing it with existing approaches. The selective enhancement of the proposed approach is especially suitable for digital television applications to improve the perceived visual quality of the images when the source image contains less satisfactory amount of high frequencies due to various reasons, including interpolation that is used to convert standard definition sources into high-definition images.
{"title":"Non-linear image enhancement for digital TV applications using Gabor filters","authors":"Yue Yang, Baoxin Li","doi":"10.1109/ICME.2005.1521597","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521597","url":null,"abstract":"We propose a non-linear image enhancement method based on Gabor filters, which allows selective enhancement based on the contrast sensitivity function of the human visual system. We also propose an evaluation method for measuring the performance of the algorithm and for comparing it with existing approaches. The selective enhancement of the proposed approach is especially suitable for digital television applications to improve the perceived visual quality of the images when the source image contains less satisfactory amount of high frequencies due to various reasons, including interpolation that is used to convert standard definition sources into high-definition images.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130950058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521529
Mohammed Ameer Ali, G. Karmakar, L. Dooley
Results of any clustering algorithm are highly sensitive to features that limit their generalization and hence provide a strong motivation to integrate shape information into the algorithm. Existing fuzzy shape-based clustering algorithms consider only circular and elliptical shape information and consequently do not segment well, arbitrary shaped objects. To address this issue, this paper introduces a new shape-based algorithm, called fuzzy image segmentation using shape information (FISS) by incorporating general shape information. Both qualitative and quantitative analysis proves the superiority of the new FISS algorithm compared to other well-established shape-based fuzzy clustering algorithms, including Gustafson-Kessel, ring-shaped, circular shell, c-ellipsoidal shells and elliptic ring-shaped clusters.
{"title":"Fuzzy image segmentation using shape information","authors":"Mohammed Ameer Ali, G. Karmakar, L. Dooley","doi":"10.1109/ICME.2005.1521529","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521529","url":null,"abstract":"Results of any clustering algorithm are highly sensitive to features that limit their generalization and hence provide a strong motivation to integrate shape information into the algorithm. Existing fuzzy shape-based clustering algorithms consider only circular and elliptical shape information and consequently do not segment well, arbitrary shaped objects. To address this issue, this paper introduces a new shape-based algorithm, called fuzzy image segmentation using shape information (FISS) by incorporating general shape information. Both qualitative and quantitative analysis proves the superiority of the new FISS algorithm compared to other well-established shape-based fuzzy clustering algorithms, including Gustafson-Kessel, ring-shaped, circular shell, c-ellipsoidal shells and elliptic ring-shaped clusters.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"222 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130655048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521641
Keon Stevenson, C. Leung
While text-oriented document searching are relatively mature on the Internet, image searching, which requires much more than text matching, significantly lags behind. The use of image search engines significantly enlarges the scope of images to users accessibility. This paper provides an understanding of current technologies in image searching on the Internet, and points to future areas of improvement for multimedia applications. We develop a systematic set of image queries to assess the competence and performance of the major image search engines. We find that current technology is only able to deliver an average precision of around 42% and an average recall of around 12%, while the best performers are capable of producing over 70% for precision and around 27% for recall. The reasons for such differences, and mechanisms for search improvement, are also indicated.
{"title":"Comparative evaluation of Web image search engines for multimedia applications","authors":"Keon Stevenson, C. Leung","doi":"10.1109/ICME.2005.1521641","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521641","url":null,"abstract":"While text-oriented document searching are relatively mature on the Internet, image searching, which requires much more than text matching, significantly lags behind. The use of image search engines significantly enlarges the scope of images to users accessibility. This paper provides an understanding of current technologies in image searching on the Internet, and points to future areas of improvement for multimedia applications. We develop a systematic set of image queries to assess the competence and performance of the major image search engines. We find that current technology is only able to deliver an average precision of around 42% and an average recall of around 12%, while the best performers are capable of producing over 70% for precision and around 27% for recall. The reasons for such differences, and mechanisms for search improvement, are also indicated.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129500596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521526
Dong-Wan Kang, K. Huang, J. Ohya
This paper studies how audiences should be expressed in a Cyber-theater, in which remotely located persons can direct plays as directors, perform as performers and/or see the performances as audiences through a networked virtual environment. It is noted that the audience effect has been widely acknowledged in the real-world theater: that is, the audience reaction has a significant effect on the acting of player and performance of the play itself. However, only a few works relevant to audiences in the cyber theater can be seen. This paper studies whether the audience effect exists also in the cyber-theater. By constructing, a system in which two actors are displayed a remotely located audience's avatar in which the audience can display his/her emotional actions, we clarified that interaction between the actors and audiences are effective.
{"title":"Analysis of expressing audiences in a cyber-theater","authors":"Dong-Wan Kang, K. Huang, J. Ohya","doi":"10.1109/ICME.2005.1521526","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521526","url":null,"abstract":"This paper studies how audiences should be expressed in a Cyber-theater, in which remotely located persons can direct plays as directors, perform as performers and/or see the performances as audiences through a networked virtual environment. It is noted that the audience effect has been widely acknowledged in the real-world theater: that is, the audience reaction has a significant effect on the acting of player and performance of the play itself. However, only a few works relevant to audiences in the cyber theater can be seen. This paper studies whether the audience effect exists also in the cyber-theater. By constructing, a system in which two actors are displayed a remotely located audience's avatar in which the audience can display his/her emotional actions, we clarified that interaction between the actors and audiences are effective.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130985474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521364
Ju Wang, Jonathan C. L. Liu
This paper presents a block-selection-based video watermarking scheme that is designed to be resilient against two dangerous attacks: cumulative attack and temporal attack. We use content-based block selection to counteract cumulative attack by spreading the locations of marked blocks. The block selection algorithm also leads to a novel frame synchronization method that can effectively re-synchronize suspected video frames to their original positions. Our scheme has low computation overhead and robust detection performance for moderately compressed video.
{"title":"Content-based block watermarking against cumulative and temporal attack","authors":"Ju Wang, Jonathan C. L. Liu","doi":"10.1109/ICME.2005.1521364","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521364","url":null,"abstract":"This paper presents a block-selection-based video watermarking scheme that is designed to be resilient against two dangerous attacks: cumulative attack and temporal attack. We use content-based block selection to counteract cumulative attack by spreading the locations of marked blocks. The block selection algorithm also leads to a novel frame synchronization method that can effectively re-synchronize suspected video frames to their original positions. Our scheme has low computation overhead and robust detection performance for moderately compressed video.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128181476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521362
S. Emmanuel, C. K. Heng, A. Das
In this paper, we present a novel reversible watermarking scheme for image authentication for JPEG/JPEG-2000 coded images. Since the watermarking scheme is reversible, the exact original image can be recovered from the watermarked image. The watermarking scheme makes use of finite state machine principles. The proposed scheme is asymmetric as the watermark extraction key is different from its embedding key. The algorithm is implemented and tested for its visual quality, compression overhead, execution time overhead and payload capacity. It is found that the algorithm has high visual quality, high payload capacity, low compression overhead and low execution time overhead
{"title":"A Reversible Watermarking Scheme for JPEG-2000 Compressed Images","authors":"S. Emmanuel, C. K. Heng, A. Das","doi":"10.1109/ICME.2005.1521362","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521362","url":null,"abstract":"In this paper, we present a novel reversible watermarking scheme for image authentication for JPEG/JPEG-2000 coded images. Since the watermarking scheme is reversible, the exact original image can be recovered from the watermarked image. The watermarking scheme makes use of finite state machine principles. The proposed scheme is asymmetric as the watermark extraction key is different from its embedding key. The algorithm is implemented and tested for its visual quality, compression overhead, execution time overhead and payload capacity. It is found that the algorithm has high visual quality, high payload capacity, low compression overhead and low execution time overhead","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128489082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521347
Y. Zhai, M. Shah
Temporal video segmentation is one of the fundamental and essential tasks in video processing, understanding and management. In this paper, we present an automatic method for segmenting the home videos into temporal logical units. We have developed a statistical framework using Markov chain Monte Carlo (MCMC) technique. The temporal scene boundaries are detected by maximizing the posterior probability of the model parameters. The model parameters contain the number of the scenes and the boundary locations of the scenes. The proposed method has been demonstrated on several home videos, and high accuracy has been obtained
{"title":"Automatic Segmentation of Home Videos","authors":"Y. Zhai, M. Shah","doi":"10.1109/ICME.2005.1521347","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521347","url":null,"abstract":"Temporal video segmentation is one of the fundamental and essential tasks in video processing, understanding and management. In this paper, we present an automatic method for segmenting the home videos into temporal logical units. We have developed a statistical framework using Markov chain Monte Carlo (MCMC) technique. The temporal scene boundaries are detected by maximizing the posterior probability of the model parameters. The model parameters contain the number of the scenes and the boundary locations of the scenes. The proposed method has been demonstrated on several home videos, and high accuracy has been obtained","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126434551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}