Pub Date : 1999-06-07DOI: 10.1109/MMCS.1999.778552
Meirav Adoram, M. Lew
Finding shapes in image databases is a challenging topic in content based retrieval. In this paper the goal is to find database images which contain shapes similar to the query of the user. Unlike most solutions to this problem, the algorithm presented in this paper is meant to cope with changes in rotation, scale, translation, and lossy compression noise. A Java application was built which uses snakes and invariant moments. The GVF snake was used because it has two significant advantages over the traditional snake formulation. First, the GVF snake can fit into concavities, and second, the GVF snake can fit itself to objects using both expansion and contraction of the snake. The objects in the images were segmented with the active contours, and then invariant moments were calculated and compared with a minimum distance classifier. Retrieval quality of the system was measured with respect to original images, rotated images, scaled images, noisy images, and combinations of those distortions.
{"title":"IRUS: image retrieval using shape","authors":"Meirav Adoram, M. Lew","doi":"10.1109/MMCS.1999.778552","DOIUrl":"https://doi.org/10.1109/MMCS.1999.778552","url":null,"abstract":"Finding shapes in image databases is a challenging topic in content based retrieval. In this paper the goal is to find database images which contain shapes similar to the query of the user. Unlike most solutions to this problem, the algorithm presented in this paper is meant to cope with changes in rotation, scale, translation, and lossy compression noise. A Java application was built which uses snakes and invariant moments. The GVF snake was used because it has two significant advantages over the traditional snake formulation. First, the GVF snake can fit into concavities, and second, the GVF snake can fit itself to objects using both expansion and contraction of the snake. The objects in the images were segmented with the active contours, and then invariant moments were calculated and compared with a minimum distance classifier. Retrieval quality of the system was measured with respect to original images, rotated images, scaled images, noisy images, and combinations of those distortions.","PeriodicalId":408680,"journal":{"name":"Proceedings IEEE International Conference on Multimedia Computing and Systems","volume":"23 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116634062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-07DOI: 10.1109/MMCS.1999.778642
Tomoko Sagisaka, T. Munakata
In recent years multimedia have been frequently used in almost every part of educational settings. In particular the use of sound has produced much effectiveness in a variety of educational activities. In this demonstration, we show two novel types of sound tool for editing speech signal called by "Sound Cutter" and "Symbolic Sound Editor". These sound tools not only facilitate producing hypermedia teaching/studying materials, but also assist language education effectively. Sound Cutter decomposes automatically a continuous speech signal into several short segments based on the pause positions, and assigns serial ID into them to register with a database. This process allows users to reconstruct the segmented sound easily and reproduce a new file as they like. Next Symbolic Sound Editor splits automatically a speech signal into much smaller segments and put index numbers into them. User can edit the sounds easily, referring the index numbers instead of sound wave form image.
{"title":"Two types of sound tool for editing speech signal: sound cutter and symbolic sound editor","authors":"Tomoko Sagisaka, T. Munakata","doi":"10.1109/MMCS.1999.778642","DOIUrl":"https://doi.org/10.1109/MMCS.1999.778642","url":null,"abstract":"In recent years multimedia have been frequently used in almost every part of educational settings. In particular the use of sound has produced much effectiveness in a variety of educational activities. In this demonstration, we show two novel types of sound tool for editing speech signal called by \"Sound Cutter\" and \"Symbolic Sound Editor\". These sound tools not only facilitate producing hypermedia teaching/studying materials, but also assist language education effectively. Sound Cutter decomposes automatically a continuous speech signal into several short segments based on the pause positions, and assigns serial ID into them to register with a database. This process allows users to reconstruct the segmented sound easily and reproduce a new file as they like. Next Symbolic Sound Editor splits automatically a speech signal into much smaller segments and put index numbers into them. User can edit the sounds easily, referring the index numbers instead of sound wave form image.","PeriodicalId":408680,"journal":{"name":"Proceedings IEEE International Conference on Multimedia Computing and Systems","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127545327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-07DOI: 10.1109/MMCS.1999.778639
C. Lastrucci, L. Lastrucci, F. Casati
Powersoft has developed PowerDriverDTSS(R), an hardware and software architecture based on proprietary algorithm for demand transport services with undefined timetable, route and stops. PowerDriverDTSS(R) resolves the major problems of this type of service as the real time customers booking with the optimization of the vehicle path, guaranteeing an high quality standard for the service. This system is under evaluation by an Italian public bus transport operator.
{"title":"PowerDriverDTSS: The advanced demand responsive transport service system","authors":"C. Lastrucci, L. Lastrucci, F. Casati","doi":"10.1109/MMCS.1999.778639","DOIUrl":"https://doi.org/10.1109/MMCS.1999.778639","url":null,"abstract":"Powersoft has developed PowerDriverDTSS(R), an hardware and software architecture based on proprietary algorithm for demand transport services with undefined timetable, route and stops. PowerDriverDTSS(R) resolves the major problems of this type of service as the real time customers booking with the optimization of the vehicle path, guaranteeing an high quality standard for the service. This system is under evaluation by an Italian public bus transport operator.","PeriodicalId":408680,"journal":{"name":"Proceedings IEEE International Conference on Multimedia Computing and Systems","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124701744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-07DOI: 10.1109/MMCS.1999.778655
S. Renals
THISL is an ESPRIT Long Term Research Project focused on the automatic indexing and retrieval of broadcast television and radio programmes. In particular it is concerned with the production of a demonstrator news-on-demand system to navigate an archive of BBC news broadcasts. Prototype systems based on both British and North American broadcast news have been constructed. The North American system has been successfully evaluated within the framework of the TREC-6 and TREC-7 spoken document retrieval tracks, and the system based on BBC TV and radio news archives will be evaluated by BBC R&D.
{"title":"The THISL spoken document retrieval project","authors":"S. Renals","doi":"10.1109/MMCS.1999.778655","DOIUrl":"https://doi.org/10.1109/MMCS.1999.778655","url":null,"abstract":"THISL is an ESPRIT Long Term Research Project focused on the automatic indexing and retrieval of broadcast television and radio programmes. In particular it is concerned with the production of a demonstrator news-on-demand system to navigate an archive of BBC news broadcasts. Prototype systems based on both British and North American broadcast news have been constructed. The North American system has been successfully evaluated within the framework of the TREC-6 and TREC-7 spoken document retrieval tracks, and the system based on BBC TV and radio news archives will be evaluated by BBC R&D.","PeriodicalId":408680,"journal":{"name":"Proceedings IEEE International Conference on Multimedia Computing and Systems","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124771708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-07DOI: 10.1109/MMCS.1999.778585
G. Polese, Shi-Kuo Chang, G. Tortora
We present a design methodology for multidimensional languages to be used in multimedia applications. The design framework extends methodologies for visual language design and relies on Teleaction Objects as a model for specifying and controlling multimedia presentations.
{"title":"The design of multimedia languages based on teleaction objects","authors":"G. Polese, Shi-Kuo Chang, G. Tortora","doi":"10.1109/MMCS.1999.778585","DOIUrl":"https://doi.org/10.1109/MMCS.1999.778585","url":null,"abstract":"We present a design methodology for multidimensional languages to be used in multimedia applications. The design framework extends methodologies for visual language design and relies on Teleaction Objects as a model for specifying and controlling multimedia presentations.","PeriodicalId":408680,"journal":{"name":"Proceedings IEEE International Conference on Multimedia Computing and Systems","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123249161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-07DOI: 10.1109/MMCS.1999.778612
Ian Karl Levy, R. Wilson
A new approach to vector quantizer (VQ) codebook design for video data compression is described. This is based on the notion that symmetries in the data, which are seldom captured exactly in any training dataset, are both important perceptually and can lead to a more robust and effective codebook. The idea is illustrated using a 3D wavelet transformed video sequence. After discussing the relevant symmetries, a codebook design method is presented, based on a modification of the Linde-Buzo-Gray (1980) algorithm. This is applied to various video sequences. Comparisons drawn with other work in the area demonstrate that the scheme has potential and is worthy of further investigation.
{"title":"Three dimensional wavelet transform video compression","authors":"Ian Karl Levy, R. Wilson","doi":"10.1109/MMCS.1999.778612","DOIUrl":"https://doi.org/10.1109/MMCS.1999.778612","url":null,"abstract":"A new approach to vector quantizer (VQ) codebook design for video data compression is described. This is based on the notion that symmetries in the data, which are seldom captured exactly in any training dataset, are both important perceptually and can lead to a more robust and effective codebook. The idea is illustrated using a 3D wavelet transformed video sequence. After discussing the relevant symmetries, a codebook design method is presented, based on a modification of the Linde-Buzo-Gray (1980) algorithm. This is applied to various video sequences. Comparisons drawn with other work in the area demonstrate that the scheme has potential and is worthy of further investigation.","PeriodicalId":408680,"journal":{"name":"Proceedings IEEE International Conference on Multimedia Computing and Systems","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123508511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-07DOI: 10.1109/MMCS.1999.778645
A. Rodà, S. Canazza
Musical performance introduces some deviations from nominal values specified in the score. Music reproduced without such variations is usually perceived as mechanical. Most investigations explore how the musical structure influences the performance. There are a few studies on how the musician's expressive intentions are reflected in the performance. The purpose of this work is to develop a model for the expressive modification in real time of musical performance. Perceptual analyses were conducted on some performances played with different intentions (correlated with a set of sensorial adjectives). From these analyses, two distinct expressive directions were observed: the first one correlated with "energy" and the second one with the "kinetics" of the pieces. The two-dimensional space (perceptual Parametric Space, PPS) obtained represents how the subjects arranged the pieces in their own minds. Acoustical analysis allowed us to correlate the expressive directions of the PPS with the main acoustic parameters. Each point of PPS is therefore associated with a set of acoustic parameters. Analysis-by-synthesis method was used to validate the model. In order to carry out computer generated performances, we developed a real time software.
{"title":"Adding expressiveness in musical performance in real time","authors":"A. Rodà, S. Canazza","doi":"10.1109/MMCS.1999.778645","DOIUrl":"https://doi.org/10.1109/MMCS.1999.778645","url":null,"abstract":"Musical performance introduces some deviations from nominal values specified in the score. Music reproduced without such variations is usually perceived as mechanical. Most investigations explore how the musical structure influences the performance. There are a few studies on how the musician's expressive intentions are reflected in the performance. The purpose of this work is to develop a model for the expressive modification in real time of musical performance. Perceptual analyses were conducted on some performances played with different intentions (correlated with a set of sensorial adjectives). From these analyses, two distinct expressive directions were observed: the first one correlated with \"energy\" and the second one with the \"kinetics\" of the pieces. The two-dimensional space (perceptual Parametric Space, PPS) obtained represents how the subjects arranged the pieces in their own minds. Acoustical analysis allowed us to correlate the expressive directions of the PPS with the main acoustic parameters. Each point of PPS is therefore associated with a set of acoustic parameters. Analysis-by-synthesis method was used to validate the model. In order to carry out computer generated performances, we developed a real time software.","PeriodicalId":408680,"journal":{"name":"Proceedings IEEE International Conference on Multimedia Computing and Systems","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123672651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-07DOI: 10.1109/MMCS.1999.779303
M. Lazarescu, S. Venkatesh, G. West, T. Caelli
Combines natural language understanding and image processing with incremental learning to develop a system that can automatically interpret and index American Football. We have developed a model for representing spatio-temporal characteristics of multiple objects in dynamic scenes in this domain. Our representation combines expert knowledge, domain knowledge, spatial knowledge and temporal knowledge. We also present an incremental learning algorithm to improve the knowledge base as well as to keep previously developed concepts consistent with new data. The advantages of the incremental learning algorithm are that is that it does not split concepts and it generates a compact conceptual hierarchy which does not store instances.
{"title":"On the automated interpretation and indexing of American Football","authors":"M. Lazarescu, S. Venkatesh, G. West, T. Caelli","doi":"10.1109/MMCS.1999.779303","DOIUrl":"https://doi.org/10.1109/MMCS.1999.779303","url":null,"abstract":"Combines natural language understanding and image processing with incremental learning to develop a system that can automatically interpret and index American Football. We have developed a model for representing spatio-temporal characteristics of multiple objects in dynamic scenes in this domain. Our representation combines expert knowledge, domain knowledge, spatial knowledge and temporal knowledge. We also present an incremental learning algorithm to improve the knowledge base as well as to keep previously developed concepts consistent with new data. The advantages of the incremental learning algorithm are that is that it does not split concepts and it generates a compact conceptual hierarchy which does not store instances.","PeriodicalId":408680,"journal":{"name":"Proceedings IEEE International Conference on Multimedia Computing and Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116443133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-07DOI: 10.1109/MMCS.1999.779320
Tat-Seng Chua, Chun-Xin Chu, M. Kankanhalli
The paper proposes a relevance feedback (RF) approach to content based image retrieval using multiple attributes. The proposed approach has been applied to images' text and color attributes. In order to ensure that meaningful features are extracted, a pseudo object model based on color coherence vector has been adopted to model color content. The RF approach employs techniques developed in the fields of information retrieval and machine learning to extract pertinent features from each of the attributes. It then uses the user's relevance judgments to estimate the importance of different attributes in an integrated content based image retrieval. The system developed has been tested on a large image collection containing over 12000 images. The results demonstrate that the proposed RF approaches and pseudo object based color model are effective.
{"title":"Relevance feedback techniques for image retrieval using multiple attributes","authors":"Tat-Seng Chua, Chun-Xin Chu, M. Kankanhalli","doi":"10.1109/MMCS.1999.779320","DOIUrl":"https://doi.org/10.1109/MMCS.1999.779320","url":null,"abstract":"The paper proposes a relevance feedback (RF) approach to content based image retrieval using multiple attributes. The proposed approach has been applied to images' text and color attributes. In order to ensure that meaningful features are extracted, a pseudo object model based on color coherence vector has been adopted to model color content. The RF approach employs techniques developed in the fields of information retrieval and machine learning to extract pertinent features from each of the attributes. It then uses the user's relevance judgments to estimate the importance of different attributes in an integrated content based image retrieval. The system developed has been tested on a large image collection containing over 12000 images. The results demonstrate that the proposed RF approaches and pseudo object based color model are effective.","PeriodicalId":408680,"journal":{"name":"Proceedings IEEE International Conference on Multimedia Computing and Systems","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122321760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-07DOI: 10.1109/MMCS.1999.778288
Giuseppe Boccignone, M. D. Santo, G. Percannella
The current research efforts in the field of video parsing and analysis are focused on the use of pictorial information, while neglecting an important supplementary source of content information such as the embedded audio or soundtrack. In contrast, we address the issue of scene change detection with the use of video and audio information. We also discuss how joint exploitation of audio and video can be thoroughly performed on MPEG encoded video sequences. First experimental results are presented and discussed.
{"title":"Joint audio-video processing of MPEG encoded sequences","authors":"Giuseppe Boccignone, M. D. Santo, G. Percannella","doi":"10.1109/MMCS.1999.778288","DOIUrl":"https://doi.org/10.1109/MMCS.1999.778288","url":null,"abstract":"The current research efforts in the field of video parsing and analysis are focused on the use of pictorial information, while neglecting an important supplementary source of content information such as the embedded audio or soundtrack. In contrast, we address the issue of scene change detection with the use of video and audio information. We also discuss how joint exploitation of audio and video can be thoroughly performed on MPEG encoded video sequences. First experimental results are presented and discussed.","PeriodicalId":408680,"journal":{"name":"Proceedings IEEE International Conference on Multimedia Computing and Systems","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122415203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}