Most existing video retrieval systems use low-level visual features such as color histogram, shape, texture, or motion. In this paper, we explore the use of higher-level motion representation for video retrieval of dynamic objects. We use three motion representations, which together can retrieve a large variety of motion patterns. Our approach works on top of a tracking unit and assumes that each dynamic object has been tracked and circumscribed in a minimal bounding box in each video frame. We represent the motion attributes of each object in terms of changes in the image context of its circumscribing box. The changes are described via motion templates [4], self-similarity plots [3], and image dynamics [9]. Initially, defined criteria of the retrieval process are interactively refined using relevance feedback from the user. Experimental results demonstrate the use of the proposed motion models in retrieving objects undergoing complex motion.
{"title":"Motion based retrieval of dynamic objects in videos","authors":"Che-Bin Liu, N. Ahuja","doi":"10.1145/1027527.1027593","DOIUrl":"https://doi.org/10.1145/1027527.1027593","url":null,"abstract":"Most existing video retrieval systems use low-level visual features such as color histogram, shape, texture, or motion. In this paper, we explore the use of higher-level motion representation for video retrieval of dynamic objects. We use three motion representations, which together can retrieve a large variety of motion patterns. Our approach works on top of a tracking unit and assumes that each dynamic object has been tracked and circumscribed in a minimal bounding box in each video frame. We represent the motion attributes of each object in terms of changes in the image context of its circumscribing box. The changes are described via motion templates [4], self-similarity plots [3], and image dynamics [9]. Initially, defined criteria of the retrieval process are interactively refined using relevance feedback from the user. Experimental results demonstrate the use of the proposed motion models in retrieving objects undergoing complex motion.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134395376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Planet Usher: An Interactive Home Movie is a CD-Rom that draws inspiration and media from my brother's extensive home video archive; an archive that spans twenty years of family events and non-events [1]. What makes even the non-events remarkable, is the fact that my brother went from being a deaf man with a video camera, to a deaf-blind man with an extensive audio-visual archive he can no longer see, nor hear, due to the effects of Usher Syndrome. So Planet Usher offers an exploration of a sustained and enduring amateur practice. It is also a story about disability and the family as they emerge faultingly from the lost archive. And it is a confrontation with the frailties of memory and narrative as they come face to face with the vicissitudes of both interactivity and lived experience.
Planet Usher: An Interactive Home Movie是一张CD-Rom,它从我哥哥大量的家庭视频档案中汲取灵感和媒体;一个跨越20年的家庭事件和非事件的档案[1]。让这些无足轻重的事情变得引人注目的是,我哥哥从一个拥有摄像机的聋哑人,变成了一个拥有大量视听档案的聋哑人,由于Usher综合症的影响,他再也看不见,也听不见。因此,《亚瑟星球》提供了一种持续和持久的业余实践的探索。这也是一个关于残疾和家庭的故事,他们从丢失的档案中错误地出现。这是对记忆和叙事的脆弱性的对抗,因为它们面对着互动和生活经验的变迁。
{"title":"Planet usher: an interactive home movie","authors":"Patrick Tarrant","doi":"10.1145/1027527.1027763","DOIUrl":"https://doi.org/10.1145/1027527.1027763","url":null,"abstract":"<i>Planet Usher: An Interactive Home Movie</i> is a CD-Rom that draws inspiration and media from my brother's extensive home video archive; an archive that spans twenty years of family events and non-events [1]. What makes even the non-events remarkable, is the fact that my brother went from being a deaf man with a video camera, to a deaf-blind man with an extensive audio-visual archive he can no longer see, nor hear, due to the effects of Usher Syndrome. So <i>Planet Usher</i> offers an exploration of a sustained and enduring amateur practice. It is also a story about disability and the family as they emerge faultingly from the lost archive. And it is a confrontation with the frailties of memory and narrative as they come face to face with the vicissitudes of both interactivity and lived experience.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114908513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the growing popularity of digitized sports video, automatic analysis of them need be processed to facilitate semantic summarization and retrieval. Playfield plays the fundamental role in automatically analyzing many sports programs. Many semantic clues could be inferred from the results of playfield segmentation. In this paper, a novel playfield segmentation method based on Gaussian mixture models (GMMs) is proposed. Firstly, training pixels are automatically sampled from frames. Then, by supposing that field pixels are the dominant components in most of the video frames, we build the GMMs of the field pixels and use these models to detect playfield pixels. Finally region-growing operation is employed to segment the playfield regions from the background. Experimental results show that the proposed method is robust to various sports videos even for very poor grass field conditions. Based on the results of playfield segmentation, match situation analysis is investigated, which is also desired for sports professionals and longtime fanners. The results are encouraging.
{"title":"A new method to segment playfield and its applications in match analysis in sports video","authors":"Shuqiang Jiang, Qixiang Ye, Wen Gao, Tiejun Huang","doi":"10.1145/1027527.1027594","DOIUrl":"https://doi.org/10.1145/1027527.1027594","url":null,"abstract":"With the growing popularity of digitized sports video, automatic analysis of them need be processed to facilitate semantic summarization and retrieval. Playfield plays the fundamental role in automatically analyzing many sports programs. Many semantic clues could be inferred from the results of playfield segmentation. In this paper, a novel playfield segmentation method based on Gaussian mixture models (GMMs) is proposed. Firstly, training pixels are automatically sampled from frames. Then, by supposing that field pixels are the dominant components in most of the video frames, we build the GMMs of the field pixels and use these models to detect playfield pixels. Finally region-growing operation is employed to segment the playfield regions from the background. Experimental results show that the proposed method is robust to various sports videos even for very poor grass field conditions. Based on the results of playfield segmentation, match situation analysis is investigated, which is also desired for sports professionals and longtime fanners. The results are encouraging.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133278794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, a framework named SMARXO is proposed to address the security issues in multimedia applications by adopting RBAC (Role-Based Access Control), XML, and Object-Relational Databases. Compared with the other existing security models or projects, SMARXO can deal with more intricate situations. First, the image object-level security and video scene/shot-level security can be easily achieved. Second, the temporal constrains and IP address restrictions are modeled for the access control purpose. Finally, XML queries can be performed such that the administrators can proficiently retrieve useful information from the security roles and policies.
{"title":"SMARXO: towards secured multimedia applications by adopting RBAC, XML and object-relational database","authors":"Shu‐Ching Chen, M. Shyu, Na Zhao","doi":"10.1145/1027527.1027631","DOIUrl":"https://doi.org/10.1145/1027527.1027631","url":null,"abstract":"In this paper, a framework named SMARXO is proposed to address the security issues in multimedia applications by adopting RBAC (Role-Based Access Control), XML, and Object-Relational Databases. Compared with the other existing security models or projects, SMARXO can deal with more intricate situations. First, the image object-level security and video scene/shot-level security can be easily achieved. Second, the temporal constrains and IP address restrictions are modeled for the access control purpose. Finally, XML queries can be performed such that the administrators can proficiently retrieve useful information from the security roles and policies.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"321 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129408282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Princess Series is a narrative of a modern-day princess who found herself grown up without a fortune. Presented as a series of Flash animations, each story depicts her daily struggle to save her soul while trying to survive in the corporate world by making junk mail. She courageously faces the dilemmas, contradictions, and paradoxes of modern life. She questions her own emotions, ideals, psychology, gender, and identity, all in the hopes of a happy ending. The work can be presented on a variety of mediums such as computer screens, projectors, and plasma screens. Still images can also be scaled to any size for print.
{"title":"The princess series","authors":"Roxanne Wolanczyk","doi":"10.1145/1027527.1027772","DOIUrl":"https://doi.org/10.1145/1027527.1027772","url":null,"abstract":"The Princess Series is a narrative of a modern-day princess who found herself grown up without a fortune. Presented as a series of Flash animations, each story depicts her daily struggle to save her soul while trying to survive in the corporate world by making junk mail. She courageously faces the dilemmas, contradictions, and paradoxes of modern life. She questions her own emotions, ideals, psychology, gender, and identity, all in the hopes of a happy ending.\u0000 The work can be presented on a variety of mediums such as computer screens, projectors, and plasma screens. Still images can also be scaled to any size for print.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"74 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134530275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shibboleth is a multimedia artwork that explores the cultural barriers created and enforced by accent and pronunciation differences. It is founded on the idea of biblical origin of a shibboleth --- a word or phrase that distinguishes one cultural group from another. The artwork consists of a computer interface through which users are able to to see and hear rhythmic audio-visual compositions of shibboleths created from previously recorded data and relevant sounds and imagery. Users can also use the interface to listen to examples of previously recorded shibboleths, as well as to add their own to a growing, geographically-indexed database.
{"title":"Shibboleth: exploring cultural boundaries in speech","authors":"A. Senior","doi":"10.1145/1027527.1027771","DOIUrl":"https://doi.org/10.1145/1027527.1027771","url":null,"abstract":"Shibboleth is a multimedia artwork that explores the cultural barriers created and enforced by accent and pronunciation differences. It is founded on the idea of biblical origin of a <i>shibboleth</i> --- a word or phrase that distinguishes one cultural group from another. The artwork consists of a computer interface through which users are able to to see and hear rhythmic audio-visual compositions of shibboleths created from previously recorded data and relevant sounds and imagery. Users can also use the interface to listen to examples of previously recorded shibboleths, as well as to add their own to a growing, geographically-indexed database.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124714729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes our current effort on analyzing the contents of discussion scenes in instructional videos based on a clustering technique. Specifically, given a discussion scene pre-detected from an education or training video, we first apply a mode-based clustering approach to group all speech segments into an optimal number of clusters where each cluster contains speech from one speaker; we then analyze the discussion patterns in the scene, and subsequently classify it into either a 2-speaker or multi-speaker discussion. Encouraging classification results have been achieved on 122 discussion scenes detected from five IBM MicroMBA videos. Moreover, we have also observed fairly good performance on the speaker clustering scheme, which demonstrates the superiority of the proposed clustering approach. Undoubtedly, the discussion scene information output from this analysis scheme would facilitate the content browsing, searching and understanding of instructional videos.
{"title":"Analyzing discussion scene contents in instructional videos","authors":"Y. Li, C. Dorai","doi":"10.1145/1027527.1027587","DOIUrl":"https://doi.org/10.1145/1027527.1027587","url":null,"abstract":"This paper describes our current effort on analyzing the contents of discussion scenes in instructional videos based on a clustering technique. Specifically, given a discussion scene pre-detected from an education or training video, we first apply a mode-based clustering approach to group all speech segments into an optimal number of clusters where each cluster contains speech from one speaker; we then analyze the discussion patterns in the scene, and subsequently classify it into either a 2-speaker or multi-speaker discussion. Encouraging classification results have been achieved on 122 discussion scenes detected from five IBM MicroMBA videos. Moreover, we have also observed fairly good performance on the speaker clustering scheme, which demonstrates the superiority of the proposed clustering approach. Undoubtedly, the discussion scene information output from this analysis scheme would facilitate the content browsing, searching and understanding of instructional videos.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130419417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce a system for near-duplicate detection and sub-image retrieval. Such a system is useful for finding copyright violations and detecting forged images. We define near-duplicate as images altered with common transformations such as changing contrast, saturation, scaling, cropping, framing, etc. Our system builds a parts-based representation of images using distinctive local descriptors which give high quality matches even under severe transformations. To cope with the large number of features extracted from the images, we employ locality-sensitive hashing to index the local descriptors. This allows us to make approximate similarity queries that only examine a small fraction of the database. Although locality-sensitive hashing has excellent theoretical performance properties, a standard implementation would still be unacceptably slow for this application. We show that, by optimizing layout and access to the index data on disk, we can efficiently query indices containing millions of keypoints. Our system achieves near-perfect accuracy (100% precision at 99.85% recall) on the tests presented in Meng et al. [16], and consistently strong results on our own, significantly more challenging experiments. Query times are interactive even for collections of thousands of images.
{"title":"An efficient parts-based near-duplicate and sub-image retrieval system","authors":"Yan Ke, R. Sukthankar, Larry Huston","doi":"10.1145/1027527.1027729","DOIUrl":"https://doi.org/10.1145/1027527.1027729","url":null,"abstract":"We introduce a system for near-duplicate detection and sub-image retrieval. Such a system is useful for finding copyright violations and detecting forged images. We define near-duplicate as images altered with common transformations such as changing contrast, saturation, scaling, cropping, framing, etc. Our system builds a parts-based representation of images using <i>distinctive local descriptors</i> which give high quality matches even under severe transformations. To cope with the large number of features extracted from the images, we employ <i>locality-sensitive hashing</i> to index the local descriptors. This allows us to make approximate similarity queries that only examine a small fraction of the database. Although locality-sensitive hashing has excellent theoretical performance properties, a standard implementation would still be unacceptably slow for this application. We show that, by optimizing layout and access to the index data on disk, we can efficiently query indices containing millions of keypoints. Our system achieves near-perfect accuracy (100% precision at 99.85% recall) on the tests presented in Meng <i>et al.</i> [16], and consistently strong results on our own, significantly more challenging experiments. Query times are interactive even for collections of thousands of images.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130002706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Efficient and intelligent music information retrieval is a very important topic of the 21st century. With the ultimate goal of building personal music information retrieval systems, this paper studies the problem of identifying "similar" artists using both lyrics and acoustic data. The approach for using a small set of labeled samples for the seed labeling to build classifiers that improve themselves using unlabeled data is presented. This approach is tested on a data set consisting of 43 artists and 56 albums using artist similarity provided by All Music Guide. Experimental results show that using such an approach the accuracy of artist similarity classifiers can be significantly improved and that artist similarity can be efficiently identified.
高效智能的音乐信息检索是21世纪的重要课题。本文以建立个人音乐信息检索系统为最终目标,研究了同时使用歌词和声学数据识别“相似”艺术家的问题。提出了一种使用少量标记样本进行种子标记以构建分类器的方法,该分类器可以使用未标记的数据进行自我改进。使用All Music Guide提供的艺术家相似性,在包含43位艺术家和56张专辑的数据集上测试了这种方法。实验结果表明,采用该方法可以显著提高艺术家相似度分类器的准确率,有效地识别出艺术家相似度。
{"title":"Music artist style identification by semi-supervised learning from both lyrics and content","authors":"Tao Li, M. Ogihara","doi":"10.1145/1027527.1027612","DOIUrl":"https://doi.org/10.1145/1027527.1027612","url":null,"abstract":"Efficient and intelligent music information retrieval is a very important topic of the 21st century. With the ultimate goal of building personal music information retrieval systems, this paper studies the problem of identifying \"similar\" artists using both lyrics and acoustic data. The approach for using a small set of labeled samples for the seed labeling to build classifiers that improve themselves using unlabeled data is presented. This approach is tested on a data set consisting of 43 artists and 56 albums using artist similarity provided by All Music Guide. Experimental results show that using such an approach the accuracy of artist similarity classifiers can be significantly improved and that artist similarity can be efficiently identified.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130469664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This demonstration presents the Digital Content Distribution Management System (DiMaS). DiMaS proves as a concept that it is possible to make a system for multimedia producing communities to publish their work on highly popular P2P networks, and importantly, the system enables producers to insert content metadata, to manage intellectual property and usage rights, and to charge for the consumption. All this can be done without introducing another new content or metadata file format and a dedicated client application to read the format.
{"title":"DiMaS: distributing multimedia on peer-to-peer file sharing networks","authors":"Tommo Reti, R. Sarvas","doi":"10.1145/1027527.1027560","DOIUrl":"https://doi.org/10.1145/1027527.1027560","url":null,"abstract":"This demonstration presents the Digital Content Distribution Management System (DiMaS). DiMaS proves as a concept that it is possible to make a system for multimedia producing communities to publish their work on highly popular P2P networks, and importantly, the system enables producers to insert content metadata, to manage intellectual property and usage rights, and to charge for the consumption. All this can be done without introducing another new content or metadata file format and a dedicated client application to read the format.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"C-35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126492863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}