This article argues for the growing importance of quality metadata and the equation of that quality with precision and semantic grounding. Such semantic grounding requires metadata that derives from intentional human intervention as well as mechanistic measurement of content media. In both cases, one chief problem in the automatic generation of semantic metadata is ambiguity leading to the overgeneration of inaccurate annotations. We look at a particular richly annotated image collection to show how context dramatically reduces the problem of ambiguity over this particular corpus. In particular, we consider both the abstract measurement of "contextual ambiguity" over the collection and the application of a particular disambiguation algorithm to synthesized keyword searches across the selection.
{"title":"Context for semantic metadata","authors":"K. Haase","doi":"10.1145/1027527.1027574","DOIUrl":"https://doi.org/10.1145/1027527.1027574","url":null,"abstract":"This article argues for the growing importance of quality metadata and the equation of that quality with precision and semantic grounding. Such semantic grounding requires metadata that derives from intentional human intervention as well as mechanistic measurement of content media. In both cases, one chief problem in the automatic generation of semantic metadata is ambiguity leading to the overgeneration of inaccurate annotations. We look at a particular richly annotated image collection to show how context dramatically reduces the problem of ambiguity over this particular corpus. In particular, we consider both the abstract measurement of \"contextual ambiguity\" over the collection and the application of a particular disambiguation algorithm to synthesized keyword searches across the selection.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"361 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113956197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose an efficient scheme to transport video over wireless networks, specifically cdma2000® 1x. Speech transmission over cdma2000® uses a variable rate voice coder (vocoder) over a channel with multiple fixed rates. We apply these ideas to compressed video transmission over wireless IP networks. Explicit Bit Rate (EBR) video compression is designed to match the video encoder output to a set of fixed channel rates. We show that in comparison with VBR video transmission over a fixed rate wireless channel, EBR video transmission provides improved error resilience, reduced latency and improved efficiency.
{"title":"Video transport over wireless networks","authors":"H. Garudadri, P. Sagetong, S. Nanda","doi":"10.1145/1027527.1027626","DOIUrl":"https://doi.org/10.1145/1027527.1027626","url":null,"abstract":"In this paper, we propose an efficient scheme to transport video over wireless networks, specifically cdma2000® 1x. Speech transmission over cdma2000® uses a variable rate voice coder (vocoder) over a channel with multiple fixed rates. We apply these ideas to compressed video transmission over wireless IP networks. Explicit Bit Rate (EBR) video compression is designed to match the video encoder output to a set of fixed channel rates. We show that in comparison with VBR video transmission over a fixed rate wireless channel, EBR video transmission provides improved error resilience, reduced latency and improved efficiency.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125177922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu Cao, Dalei Li, Wallapak Tavanapong, Jung-Hwan Oh, J. Wong, P. C. Groen
Colonoscopy is an important screening tool for colorectal cancer. During a colonoscopic procedure, a tiny video camera at the tip of the endoscope generates a video signal of the internal mucosa of the colon. The video data are displayed on a monitor for real-time analysis by the endoscopist. We call videos captured from colonoscopic procedures colonoscopy videos. Because these videos possess unique characteristics, new types of semantic units and parsing techniques are required. In this paper, we define new semantic units called operation shots, each is a segment of visual and audio data that correspond to a therapeutic or biopsy operation. We introduce a new spatio-temporal analysis technique to detect operation shots. Our experiments on colonoscopy videos demonstrate that the technique does not miss any meaningful operation shots and incurs a small number of false operation shots. Our prototype parsing software implements the operation shot detection technique along with our other techniques previously developed for colonoscopy videos. Our browsing tool enables users to quickly locate operation shots of interest. The proposed technique and software are useful (1) for post-procedure reviews and analyses for causes of complications due to biopsy or therapeutic operations, (2) for developing an effective content-based retrieval system for colonoscopy videos to facilitate endoscopic research and education, and (3) for development of a systematic approach to assess endoscopists' procedural skills.
{"title":"Parsing and browsing tools for colonoscopy videos","authors":"Yu Cao, Dalei Li, Wallapak Tavanapong, Jung-Hwan Oh, J. Wong, P. C. Groen","doi":"10.1145/1027527.1027723","DOIUrl":"https://doi.org/10.1145/1027527.1027723","url":null,"abstract":"Colonoscopy is an important screening tool for colorectal cancer. During a colonoscopic procedure, a tiny video camera at the tip of the endoscope generates a video signal of the internal mucosa of the colon. The video data are displayed on a monitor for real-time analysis by the endoscopist. We call videos captured from colonoscopic procedures <i>colonoscopy videos</i>. Because these videos possess unique characteristics, new types of semantic units and parsing techniques are required. In this paper, we define new semantic units called <i>operation shots</i>, each is a segment of visual and audio data that correspond to a therapeutic or biopsy operation. We introduce a new spatio-temporal analysis technique to detect operation shots. Our experiments on colonoscopy videos demonstrate that the technique does not miss any meaningful operation shots and incurs a small number of false operation shots. Our prototype parsing software implements the operation shot detection technique along with our other techniques previously developed for colonoscopy videos. Our browsing tool enables users to quickly locate operation shots of interest. The proposed technique and software are useful (1) for post-procedure reviews and analyses for causes of complications due to biopsy or therapeutic operations, (2) for developing an effective content-based retrieval system for colonoscopy videos to facilitate endoscopic research and education, and (3) for development of a systematic approach to assess endoscopists' procedural skills.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125100538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The variation of facial texture and surface due to the change of expression is an important cue for analyzing and modeling facial expressions. In this paper, we propose a new approach to represent the facial expression by using a so-called topographic feature. In order to capture the variation of facial surface structure, facial textures are processed by increasing the resolution. The topographical structure of human face is analyzed based on the resolution-enhanced textures. We investigate the relationship between the facial expression and its topographic features, and propose to represent the facial expression by the topographic labels. The detected topographic facial surface and the expressive regions reflect the status of facial skin movement. Based on the observation that the facial texture and its topographic features change along with facial expressions, we compare the disparity of these features between the neutral face and the expressive face to distinguish a number of universal expressions. The experiment demonstrates the feasibility of the proposed approach for facial expression representation and recognition.
{"title":"Facial expression representation and recognition based on texture augmentation and topographic masking","authors":"L. Yin, J. Loi, Wei Xiong","doi":"10.1145/1027527.1027580","DOIUrl":"https://doi.org/10.1145/1027527.1027580","url":null,"abstract":"The variation of facial texture and surface due to the change of expression is an important cue for analyzing and modeling facial expressions. In this paper, we propose a new approach to represent the facial expression by using a so-called topographic feature. In order to capture the variation of facial surface structure, facial textures are processed by increasing the resolution. The topographical structure of human face is analyzed based on the resolution-enhanced textures. We investigate the relationship between the facial expression and its topographic features, and propose to represent the facial expression by the topographic labels. The detected topographic facial surface and the expressive regions reflect the status of facial skin movement. Based on the observation that the facial texture and its topographic features change along with facial expressions, we compare the disparity of these features between the neutral face and the expressive face to distinguish a number of universal expressions. The experiment demonstrates the feasibility of the proposed approach for facial expression representation and recognition.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"238 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131449945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a novel approach to the construction of a projector-based augmented reality environment. The approach is based on capturing the dynamic changes of surfaces and projecting the images within a large real environment using a system that includes a laser range finder and a projector, whose optical axes are integrated by mirrors. The proposed method offers two distinct advances: (1) robust 3-D viewing point detection from consecutive range images, and (2) fast view-driven image generation and presentation with view frustum clipping to measured surfaces. A prototype system is shown to confirm the feasibility of the method; it generates view-driven images to suit the user's viewing position that are then projected within dynamic real environment, in real-time.
{"title":"Location-aware projection with robust 3-D viewing point detection and fast image deformation","authors":"J. Shimamura, K. Arakawa","doi":"10.1145/1027527.1027595","DOIUrl":"https://doi.org/10.1145/1027527.1027595","url":null,"abstract":"This paper describes a novel approach to the construction of a projector-based augmented reality environment. The approach is based on capturing the dynamic changes of surfaces and projecting the images within a large real environment using a system that includes a laser range finder and a projector, whose optical axes are integrated by mirrors. The proposed method offers two distinct advances: (1) robust 3-D viewing point detection from consecutive range images, and (2) fast view-driven image generation and presentation with view frustum clipping to measured surfaces. A prototype system is shown to confirm the feasibility of the method; it generates view-driven images to suit the user's viewing position that are then projected within dynamic real environment, in real-time.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"204 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134308462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ePic is an integrated presentation authoring and playback system that makes it easy to use a wide range of devices installed in one or multiple multimedia venues.
ePic是一个集成的演示文稿创作和播放系统,可以轻松使用安装在一个或多个多媒体场所的各种设备。
{"title":"An EPIC enhanced meeting environment","authors":"Qiong Liu, F. Zhao, John Doherty, Don Kimber","doi":"10.1145/1027527.1027743","DOIUrl":"https://doi.org/10.1145/1027527.1027743","url":null,"abstract":"ePic is an integrated presentation authoring and playback system that makes it easy to use a wide range of devices installed in one or multiple multimedia venues.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131486167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose an iterative similarity propagation approach to explore the inter-relationships between Web images and their textual annotations for image retrieval. By considering Web images as one type of objects, their surrounding texts as another type, and constructing the links structure between them via webpage analysis, we can iteratively reinforce the similarities between images. The basic idea is that if two objects of the same type are both related to one object of another type, these two objects are similar; likewise, if two objects of the same type are related to two different, but similar objects of another type, then to some extent, these two objects are also similar. The goal of our method is to fully exploit the mutual reinforcement between images and their textual annotations. Our experiments based on 10,628 images crawled from the Web show that our proposed approach can significantly improve Web image retrieval performance.
{"title":"Multi-model similarity propagation and its application for web image retrieval","authors":"Xin-Jing Wang, Wei-Ying Ma, Gui-Rong Xue, Xing Li","doi":"10.1145/1027527.1027746","DOIUrl":"https://doi.org/10.1145/1027527.1027746","url":null,"abstract":"In this paper, we propose an iterative similarity propagation approach to explore the inter-relationships between Web images and their textual annotations for image retrieval. By considering Web images as one type of objects, their surrounding texts as another type, and constructing the links structure between them via webpage analysis, we can iteratively reinforce the similarities between images. The basic idea is that if two objects of the same type are both related to one object of another type, these two objects are similar; likewise, if two objects of the same type are related to two different, but similar objects of another type, then to some extent, these two objects are also similar. The goal of our method is to fully exploit the mutual reinforcement between images and their textual annotations. Our experiments based on 10,628 images crawled from the Web show that our proposed approach can significantly improve Web image retrieval performance.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134527011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hiroyasu Ichida, Yuichi Itoh, Y. Kitamura, F. Kishino
We present a novel method for interactive retrieval of 3D shapes using ysical objects. Our method is based on simple ysical 3D interaction with a set of tangible blocks. As the user connects blocks, the system automatically recognizes the shape of the constructed ysical structure and picks similar 3D shape models from a preset model database, in real time. Our system fully supports interactive retrieval of 3D shape models in an extremely simple fashion, which is completely non-verbal and cross-cultural. These advantages make it an ideal interface for inexperienced users, previously barred from many applications that include 3D shape retrieval tasks.
{"title":"Interactive retrieval of 3D shape models using physical objects","authors":"Hiroyasu Ichida, Yuichi Itoh, Y. Kitamura, F. Kishino","doi":"10.1145/1027527.1027685","DOIUrl":"https://doi.org/10.1145/1027527.1027685","url":null,"abstract":"We present a novel method for interactive retrieval of 3D shapes using ysical objects. Our method is based on simple ysical 3D interaction with a set of tangible blocks. As the user connects blocks, the system automatically recognizes the shape of the constructed ysical structure and picks similar 3D shape models from a preset model database, in real time. Our system fully supports interactive retrieval of 3D shape models in an extremely simple fashion, which is completely non-verbal and cross-cultural. These advantages make it an ideal interface for inexperienced users, previously barred from many applications that include 3D shape retrieval tasks.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133506020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In a distributed collaborative application, a key requirement is that all users see the same copy of a shared window object at any given point in time (WYSIWIS). In this paper, we study 'user-assisted causal ordering' of messages as the basis for achieving WYSIWIS. The approach requires specifying the synchronization constraints on accessing shared window objects in the form of an order in which messages need to be processed and object state updated. The specifications are made available to the window subsystem based on the user-level knowledge about the actions on objects and the current (shared) object state. In contrast with the current approaches employing transaction models, our approach allows flexibility in the programming of collaboration-style applications, and offers increased levels of concurrency.
{"title":"User-assisted tools for concurrency control in distributed multimedia collaborations","authors":"A. Sabbir, K. Ravindran","doi":"10.1145/1027527.1027652","DOIUrl":"https://doi.org/10.1145/1027527.1027652","url":null,"abstract":"In a distributed collaborative application, a key requirement is that all users see the same copy of a shared window object at any given point in time (WYSIWIS). In this paper, we study 'user-assisted causal ordering' of messages as the basis for achieving WYSIWIS. The approach requires specifying the synchronization constraints on accessing shared window objects in the form of an order in which messages need to be processed and object state updated. The specifications are made available to the window subsystem based on the user-level knowledge about the actions on objects and the current (shared) object state. In contrast with the current approaches employing transaction models, our approach allows flexibility in the programming of collaboration-style applications, and offers increased levels of concurrency.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133709607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose a Web image search result organizing method to facilitate user browsing. We formalize this problem as a salient image region pattern extraction problem. Given the images returned by Web search engine, we first segment the images into homogeneous regions and quantize the environmental regions into image codewords. The salient codeword "phrases" are then extracted and ranked based on a regression model learned from human labeled training data. According to the salient "phrases", images are assigned to different clusters, with the one nearest to the centroid as the entry for the corresponding cluster. Satisfying experimental results show the effectiveness of our proposed method.
{"title":"Grouping web image search result","authors":"Xin-Jing Wang, Wei-Ying Ma, Qi-Cai He, Xing Li","doi":"10.1145/1027527.1027632","DOIUrl":"https://doi.org/10.1145/1027527.1027632","url":null,"abstract":"In this paper, we propose a Web image search result organizing method to facilitate user browsing. We formalize this problem as a salient image region pattern extraction problem. Given the images returned by Web search engine, we first segment the images into homogeneous regions and quantize the environmental regions into image codewords. The salient codeword \"phrases\" are then extracted and ranked based on a regression model learned from human labeled training data. According to the salient \"phrases\", images are assigned to different clusters, with the one nearest to the centroid as the entry for the corresponding cluster. Satisfying experimental results show the effectiveness of our proposed method.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133726025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}