Pub Date : 2001-08-22DOI: 10.1109/ICME.2001.1237803
K. Nakabayashi
In order to promote computer-based education and training, it is crucial to establish interoperability of learning contents, learner information, and learning system components. In the US, Europe and Asia, government, industry and academia are paying attention and making effort toward this direction. Several learning technology standardization initiatives are developing specifications covering quite large field such as platform, multimedia data, learning contents, learner information, and competency definitions. This paper discusses the needs of learning technology standards, summarizes the efforts in each initiative, and describes the future direction of standardization effort.
{"title":"Trends of learning technology standard","authors":"K. Nakabayashi","doi":"10.1109/ICME.2001.1237803","DOIUrl":"https://doi.org/10.1109/ICME.2001.1237803","url":null,"abstract":"In order to promote computer-based education and training, it is crucial to establish interoperability of learning contents, learner information, and learning system components. In the US, Europe and Asia, government, industry and academia are paying attention and making effort toward this direction. Several learning technology standardization initiatives are developing specifications covering quite large field such as platform, multimedia data, learning contents, learner information, and competency definitions. This paper discusses the needs of learning technology standards, summarizes the efforts in each initiative, and describes the future direction of standardization effort.","PeriodicalId":405589,"journal":{"name":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127077634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-08-22DOI: 10.1109/ICME.2001.1237725
Baochun Li
In order to achieve the best application-level Quality-of-Service (QoS), complex multimedia applications need to be dynamically tuned and reconfigured to adapt to unpredictable open environments offered by general-purpose systems. We believe that the objective of such adaptations should be to maintain a stable QoS with respect to a set of critical application QoS parameters. However, we have observed that only a limited set of parameters may be used as “tuning knobs” to affect the application behavior. In this paper, we present a hierarchical graph model to discover the relationships between the sets of tunable and critical QoS parameters. Based on such a model, we propose a polynomialcomplexity QoS probing algorithm to quantitatively capture the run-time relationships between the two sets of parameters. Our probing algorithm is integrated into our broader framework, Agilos, which uses a configurable visual tracking application to verify the effectiveness of adaptations.
{"title":"A hierarchical graph model for probing multimedia applications","authors":"Baochun Li","doi":"10.1109/ICME.2001.1237725","DOIUrl":"https://doi.org/10.1109/ICME.2001.1237725","url":null,"abstract":"In order to achieve the best application-level Quality-of-Service (QoS), complex multimedia applications need to be dynamically tuned and reconfigured to adapt to unpredictable open environments offered by general-purpose systems. We believe that the objective of such adaptations should be to maintain a stable QoS with respect to a set of critical application QoS parameters. However, we have observed that only a limited set of parameters may be used as “tuning knobs” to affect the application behavior. In this paper, we present a hierarchical graph model to discover the relationships between the sets of tunable and critical QoS parameters. Based on such a model, we propose a polynomialcomplexity QoS probing algorithm to quantitatively capture the run-time relationships between the two sets of parameters. Our probing algorithm is integrated into our broader framework, Agilos, which uses a configurable visual tracking application to verify the effectiveness of adaptations.","PeriodicalId":405589,"journal":{"name":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124159683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-08-22DOI: 10.1109/ICME.2001.1237815
Jong-Nam Kim, Sung-Cheal Byun, Byung-Ha Ahn
To reduce an amount of computation of full search (FS) algorithm for fast motion estimation, we propose a new and fast matching algorithm without any degradation of predicted images like the conventional FS. The computational reduction without any degradation in predicted image comes from fast kick-off of impossible motion vectors. The fast kick-off of improper motion vectors comes from sequential rejection with derived formula and subblock norms. The sequential rejection of impossible candidates is based on multiple decision boundaries. Our proposed algorithm reduces more the computations than the recent fast full search (FS) motion estimation algorithms.
{"title":"Fast full search based block matching algorithm from fast kick-off of impossible candidate checking points","authors":"Jong-Nam Kim, Sung-Cheal Byun, Byung-Ha Ahn","doi":"10.1109/ICME.2001.1237815","DOIUrl":"https://doi.org/10.1109/ICME.2001.1237815","url":null,"abstract":"To reduce an amount of computation of full search (FS) algorithm for fast motion estimation, we propose a new and fast matching algorithm without any degradation of predicted images like the conventional FS. The computational reduction without any degradation in predicted image comes from fast kick-off of impossible motion vectors. The fast kick-off of improper motion vectors comes from sequential rejection with derived formula and subblock norms. The sequential rejection of impossible candidates is based on multiple decision boundaries. Our proposed algorithm reduces more the computations than the recent fast full search (FS) motion estimation algorithms.","PeriodicalId":405589,"journal":{"name":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122217796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-08-22DOI: 10.1109/ICME.2001.1237713
J. Jang, Hong-Ru Lee, M. Kao
paper presents the use of linear scaling and tree search in a content-based music retrieval system that can take a user's acoustic input (8-second clip of singing or humming) via a microphone and then retrieve the intended song from over 3000 candidate songs in the database. The system, known as Super MBox, demonstrates the feasibility of real-time content-based music retrieval with a high recognition rate. Super MBox first takes the user's acoustic input from a microphone and converts it into a pitch vector. Then a fast comparison engine using linear scaling and tree search is employed to compute the similarity scores. We have tested Super MBox and found the top-20 recognition rate is about 73% with about 1000 clips of test inputs from people with mediocre singing skills.
{"title":"Content-based music retrieval using linear scaling and branch-and-bound tree search","authors":"J. Jang, Hong-Ru Lee, M. Kao","doi":"10.1109/ICME.2001.1237713","DOIUrl":"https://doi.org/10.1109/ICME.2001.1237713","url":null,"abstract":"paper presents the use of linear scaling and tree search in a content-based music retrieval system that can take a user's acoustic input (8-second clip of singing or humming) via a microphone and then retrieve the intended song from over 3000 candidate songs in the database. The system, known as Super MBox, demonstrates the feasibility of real-time content-based music retrieval with a high recognition rate. Super MBox first takes the user's acoustic input from a microphone and converts it into a pitch vector. Then a fast comparison engine using linear scaling and tree search is employed to compute the similarity scores. We have tested Super MBox and found the top-20 recognition rate is about 73% with about 1000 clips of test inputs from people with mediocre singing skills.","PeriodicalId":405589,"journal":{"name":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131273987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-08-22DOI: 10.1109/ICME.2001.1237648
Sungchan Park, NamRye Son, Junghyun Kim, Gueesang Lee
In this paper, a new approach for the recovery of lost or erroneous motion vector(MV)s by classifying the movements of neighboring blocks by their homogeneity is proposed. MVs of the neighboring blocks are classified according to the direction of MVs and a representative value for each class is determined to obtain the candidate MV with the minimum distortion is selected. Experimental results show that the proposed algorithm exhibits better performance in many cases than existing methods.
{"title":"Recovery of motion vectors by detecting homogeneous movements for H.263 video communications","authors":"Sungchan Park, NamRye Son, Junghyun Kim, Gueesang Lee","doi":"10.1109/ICME.2001.1237648","DOIUrl":"https://doi.org/10.1109/ICME.2001.1237648","url":null,"abstract":"In this paper, a new approach for the recovery of lost or erroneous motion vector(MV)s by classifying the movements of neighboring blocks by their homogeneity is proposed. MVs of the neighboring blocks are classified according to the direction of MVs and a representative value for each class is determined to obtain the candidate MV with the minimum distortion is selected. Experimental results show that the proposed algorithm exhibits better performance in many cases than existing methods.","PeriodicalId":405589,"journal":{"name":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122987376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-08-22DOI: 10.1109/ICME.2001.1237805
K. Mase
Toy Interface is a real-world oriented interface that uses modeled objects with “toy”-like shapes and attributes as the interface between the real world and cyberspace. Toy-interface can be categorized into one of three types: the doll type, miniascape type and brick type. We investigate various toy interfaces and present the design detail of a doll-type interface prototype for the purpose of multi-modal interaction and communication.
{"title":"Toy interface for multimodal interaction and communication","authors":"K. Mase","doi":"10.1109/ICME.2001.1237805","DOIUrl":"https://doi.org/10.1109/ICME.2001.1237805","url":null,"abstract":"Toy Interface is a real-world oriented interface that uses modeled objects with “toy”-like shapes and attributes as the interface between the real world and cyberspace. Toy-interface can be categorized into one of three types: the doll type, miniascape type and brick type. We investigate various toy interfaces and present the design detail of a doll-type interface prototype for the purpose of multi-modal interaction and communication.","PeriodicalId":405589,"journal":{"name":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124157400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-08-22DOI: 10.1109/ICME.2001.1237781
S. Semwal
Cognitive maps are mental models of the relative locations and attribute phenomena of spatial environments. The ability to form cognitive maps is one of the innate gifts of nature. An absence of this ability can have crippling effect, for example, on the visually impaired. The sense of touch becomes the primary source of forming cognitive maps for the visually impaired. Once formed, cognitive maps provide precise mapping of the physical world so that a visually impaired individual can successfully navigate with minimal assistance. However, traditional mobility training is time consuming, and it is very difficult for the blind to express or revisit the cognitive maps formed after a training session is over. The proposed haptic environment will allow the visually impaired individual to express cognitive maps as 3D surface maps, with two PHANToM force-feedback devices guiding them. The 3D representation can be finetuned by the care-giver, and then felt again by the visually impaired in order to form precise cognitive maps. In addition to voice commentary, a library of pre-existing shapes familiar to the blind will provide orientation and proprioceptive haptic-cues during navigation. A graphical display of cognitive maps will provide feedback to the care-giver or trainer. As the haptic environment can be easily stored and retrieved, the MoVE system will also encourage navigation by the blind at their own convenience, and with family members.
{"title":"Wayfinding and navigation in haptic virtual environments","authors":"S. Semwal","doi":"10.1109/ICME.2001.1237781","DOIUrl":"https://doi.org/10.1109/ICME.2001.1237781","url":null,"abstract":"Cognitive maps are mental models of the relative locations and attribute phenomena of spatial environments. The ability to form cognitive maps is one of the innate gifts of nature. An absence of this ability can have crippling effect, for example, on the visually impaired. The sense of touch becomes the primary source of forming cognitive maps for the visually impaired. Once formed, cognitive maps provide precise mapping of the physical world so that a visually impaired individual can successfully navigate with minimal assistance. However, traditional mobility training is time consuming, and it is very difficult for the blind to express or revisit the cognitive maps formed after a training session is over. The proposed haptic environment will allow the visually impaired individual to express cognitive maps as 3D surface maps, with two PHANToM force-feedback devices guiding them. The 3D representation can be finetuned by the care-giver, and then felt again by the visually impaired in order to form precise cognitive maps. In addition to voice commentary, a library of pre-existing shapes familiar to the blind will provide orientation and proprioceptive haptic-cues during navigation. A graphical display of cognitive maps will provide feedback to the care-giver or trainer. As the haptic environment can be easily stored and retrieved, the MoVE system will also encourage navigation by the blind at their own convenience, and with family members.","PeriodicalId":405589,"journal":{"name":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128990199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-08-22DOI: 10.1109/ICME.2001.1237885
X. Huang, G. Woolsey
Rapidly advancing capabilities in PC-based multimedia technology are providing new opportunities for delivery of educational material. Multimedia technology is being introduced at all levels of the degrees in Electronics and Communications at the University of New England (UNE). In this paper attention is drawn the use of multimedia technology through the example of a fourth-year education package on signal processing. We have used this multimedia education package for teaching and learning during formal class periods and to encourage students to use the technology in their own personal study and projects in order to increase their engineering generic skills. The success of the venture has encouraged us to extend the technology to other selected units in the UNE engineering programs.
{"title":"Multimedia materials for teaching signal processing","authors":"X. Huang, G. Woolsey","doi":"10.1109/ICME.2001.1237885","DOIUrl":"https://doi.org/10.1109/ICME.2001.1237885","url":null,"abstract":"Rapidly advancing capabilities in PC-based multimedia technology are providing new opportunities for delivery of educational material. Multimedia technology is being introduced at all levels of the degrees in Electronics and Communications at the University of New England (UNE). In this paper attention is drawn the use of multimedia technology through the example of a fourth-year education package on signal processing. We have used this multimedia education package for teaching and learning during formal class periods and to encourage students to use the technology in their own personal study and projects in order to increase their engineering generic skills. The success of the venture has encouraged us to extend the technology to other selected units in the UNE engineering programs.","PeriodicalId":405589,"journal":{"name":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129399092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-08-22DOI: 10.1109/ICME.2001.1237924
S. Nepal, Uma Srinivasan, G. Reynolds
Recent content-based retrieval systems such as QBIC [7] and VisualSEEk [8] use low-level audio-visual features such as color, pan, zoom, and loudness for retrieval. However, users prefer to retrieve videos using high-level semantics based on their perception such as "bright color" and "very loud sound". This results in a gap between what users would like and what systems can generate. This paper is an attempt to bridge this gap by mapping users’ perception (of semantic concepts) to lowlevel feature values. This paper proposes a model for providing high-level semantics for an audio feature that determines loudness. We first perform a pilot user study to capture the user perception of loudness level on a collection of audio clips of sound effects, and map them to five different semantic terms. We then describe how the loudness measure in MPEG-1 layer II audio files can be mapped to user perceived loudness. We then devise a fuzzy technique for retrieving audio/video clips from the collections using those semantic terms.
{"title":"Semantic based retrieval model for digital audio and video","authors":"S. Nepal, Uma Srinivasan, G. Reynolds","doi":"10.1109/ICME.2001.1237924","DOIUrl":"https://doi.org/10.1109/ICME.2001.1237924","url":null,"abstract":"Recent content-based retrieval systems such as QBIC [7] and VisualSEEk [8] use low-level audio-visual features such as color, pan, zoom, and loudness for retrieval. However, users prefer to retrieve videos using high-level semantics based on their perception such as \"bright color\" and \"very loud sound\". This results in a gap between what users would like and what systems can generate. This paper is an attempt to bridge this gap by mapping users’ perception (of semantic concepts) to lowlevel feature values. This paper proposes a model for providing high-level semantics for an audio feature that determines loudness. We first perform a pilot user study to capture the user perception of loudness level on a collection of audio clips of sound effects, and map them to five different semantic terms. We then describe how the loudness measure in MPEG-1 layer II audio files can be mapped to user perceived loudness. We then devise a fuzzy technique for retrieving audio/video clips from the collections using those semantic terms.","PeriodicalId":405589,"journal":{"name":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115725735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-08-22DOI: 10.1109/ICME.2001.1237657
M. Bertini, C. Colombo, A. Bimbo
Broadcasters are demonstrating interest in building digital archives of their assets for reuse of archive materials for TV programs, on-line availability, and archiving. This requires tools for video indexing and retrieval by content exploiting high-level video information such as that contained in super-imposed text captions. In this paper we present a method to automatically detect and localize captions in digital video using temporal and spatial local properties of salient points in video frames. Results of experiments on both high-resolutionDV sequences and standard VHS videos are presented and discussed.
{"title":"Automatic caption localization in videos using salient points","authors":"M. Bertini, C. Colombo, A. Bimbo","doi":"10.1109/ICME.2001.1237657","DOIUrl":"https://doi.org/10.1109/ICME.2001.1237657","url":null,"abstract":"Broadcasters are demonstrating interest in building digital archives of their assets for reuse of archive materials for TV programs, on-line availability, and archiving. This requires tools for video indexing and retrieval by content exploiting high-level video information such as that contained in super-imposed text captions. In this paper we present a method to automatically detect and localize captions in digital video using temporal and spatial local properties of salient points in video frames. Results of experiments on both high-resolutionDV sequences and standard VHS videos are presented and discussed.","PeriodicalId":405589,"journal":{"name":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125552656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}