Mor Naaman, Susumu Harada, Qianying Wang, H. Garcia-Molina, A. Paepcke
Given time and location information about digital photographs we can automatically generate an abundance of related contextual metadata, using off-the-shelf and Web-based data sources. Among these are the local daylight status and weather conditions at the time and place a photo was taken. This metadata has the potential of serving as memory cues and filters when browsing photo collections, especially as these collections grow into the tens of thousands and span dozens of years. We describe the contextual metadata that we automatically assemble for a photograph, given time and location, as well as a browser interface that utilizes that metadata. We then present the results of a user study and a survey that together expose which categories of contextual metadata are most useful for recalling and finding photographs. We identify among still unavailable metadata categories those that are most promising to develop next.
{"title":"Context data in geo-referenced digital photo collections","authors":"Mor Naaman, Susumu Harada, Qianying Wang, H. Garcia-Molina, A. Paepcke","doi":"10.1145/1027527.1027573","DOIUrl":"https://doi.org/10.1145/1027527.1027573","url":null,"abstract":"Given time and location information about digital photographs we can automatically generate an abundance of related contextual metadata, using off-the-shelf and Web-based data sources. Among these are the local daylight status and weather conditions at the time and place a photo was taken. This metadata has the potential of serving as memory cues and filters when browsing photo collections, especially as these collections grow into the tens of thousands and span dozens of years.\u0000 We describe the contextual metadata that we automatically assemble for a photograph, given time and location, as well as a browser interface that utilizes that metadata. We then present the results of a user study and a survey that together expose which categories of contextual metadata are most useful for recalling and finding photographs. We identify among still unavailable metadata categories those that are most promising to develop next.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127886447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The interactive system which interacts human with computer has been recognized as one direction of computer development for a long time. For example, in cinema, a person gets information he wants or plays the media data while moving by using a mobile device. As the development of this system, we designed and implemented the system interacts with users in a small terminal. Our study has three categories. The first category is the development of new interactive media markup language (IML) for the writing interactive media data. The second category is the IML translator which translates IML into the best form to be played on mobile device. And the third category is the IM player, which plays the transferred media data and interacts with user. IML was designed for controlling vector graphics and general media objects in detail and supporting synchronization. Also, it was designed to be operated in small mobile device as well as desktop PC or set-top box which has high CPU performance. The player, implemented finally, is operated on PDA (HP iPAQ) and plays the multimedia data consist of vector graphics (OpenGL), H.264 and AAC etc. according to the choice of user. This system can be used in the ways of interactive cinema and interactive game, and can substitute new interactive web services for existing web services.
{"title":"An approach to interactive media system for mobile devices","authors":"Eun‐Seok Ryu, C. Yoo","doi":"10.1145/1027527.1027557","DOIUrl":"https://doi.org/10.1145/1027527.1027557","url":null,"abstract":"The interactive system which interacts human with computer has been recognized as one direction of computer development for a long time. For example, in cinema, a person gets information he wants or plays the media data while moving by using a mobile device. As the development of this system, we designed and implemented the system interacts with users in a small terminal. Our study has three categories. The first category is the development of new interactive media markup language (IML) for the writing interactive media data. The second category is the IML translator which translates IML into the best form to be played on mobile device. And the third category is the <i>IM player</i>, which plays the transferred media data and interacts with user. IML was designed for controlling vector graphics and general media objects in detail and supporting synchronization. Also, it was designed to be operated in small mobile device as well as desktop PC or set-top box which has high CPU performance. The player, implemented finally, is operated on PDA (HP iPAQ) and plays the multimedia data consist of vector graphics (OpenGL), H.264 and AAC etc. according to the choice of user. This system can be used in the ways of interactive cinema and interactive game, and can substitute new interactive web services for existing web services.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126267316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we describe ChucK - a programming language and programming model for writing precisely timed, concurrent audio synthesis and multimedia programs. Precise concurrent audio programming has been an unsolved (and ill-defined) problem. ChucK provides a concurrent programming model that solves this problem and significantly enhances designing, developing, and reasoning about programs with complex audio timing. ChucK employs a novel data-driven timing mechanism and a related time-based synchronization model, both implemented in a virtual machine. We show how these features enable precise, concurrent audio programming and provide a high degree of programmability in writing real-time audio and multimedia programs. As an extension, programmers can use this model to write code on-the-fly -- while the program is running. These features provide a powerful programming tool for building and experimenting with complex audio synthesis and multimedia programs.
{"title":"ChucK: a programming language for on-the-fly, real-time audio synthesis and multimedia","authors":"Ge Wang, P. Cook","doi":"10.1145/1027527.1027716","DOIUrl":"https://doi.org/10.1145/1027527.1027716","url":null,"abstract":"In this paper, we describe ChucK - a programming language and programming model for writing precisely timed, concurrent audio synthesis and multimedia programs. Precise concurrent audio programming has been an unsolved (and ill-defined) problem. ChucK provides a concurrent programming model that solves this problem and significantly enhances designing, developing, and reasoning about programs with complex audio timing. ChucK employs a novel <i>data-driven</i> timing mechanism and a related <i>time-based synchronization</i> model, both implemented in a virtual machine. We show how these features enable precise, concurrent audio programming and provide a high degree of programmability in writing real-time audio and multimedia programs. As an extension, programmers can use this model to write code <i>on-the-fly</i> -- while the program is running. These features provide a powerful programming tool for building and experimenting with complex audio synthesis and multimedia programs.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126504023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The authors developed a system in which visually dense displays of thumbnail imagery in storyboard views are used for shot-based video retrieval. The views allow for effective retrieval, as evidenced by the success achieved by expert users with the system in interactive query for NIST TRECVID 2002 and 2003. This paper demonstrates that novice users also achieve comparatively high retrieval performance with these views using the TRECVID 2003 benchmarks. Through an analysis of the user interaction logs, heuristic evaluation, and think-aloud protocol, the usability of the video information retrieval system is appraised with respect to shot-based retrieval. Design implications are presented based on these TRECVID usability evaluations regarding efficient, effective information retrieval interfaces to locate visual information from video corpora.
{"title":"Finding the right shots: assessing usability and performance of a digital video library interface","authors":"Michael G. Christel, N. Moraveji","doi":"10.1145/1027527.1027691","DOIUrl":"https://doi.org/10.1145/1027527.1027691","url":null,"abstract":"The authors developed a system in which visually dense displays of thumbnail imagery in storyboard views are used for shot-based video retrieval. The views allow for effective retrieval, as evidenced by the success achieved by expert users with the system in interactive query for NIST TRECVID 2002 and 2003. This paper demonstrates that novice users also achieve comparatively high retrieval performance with these views using the TRECVID 2003 benchmarks. Through an analysis of the user interaction logs, heuristic evaluation, and think-aloud protocol, the usability of the video information retrieval system is appraised with respect to shot-based retrieval. Design implications are presented based on these TRECVID usability evaluations regarding efficient, effective information retrieval interfaces to locate visual information from video corpora.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132543845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we point out that the general principle "do not compute what you do not need to compute" can be applied easily and successfully within a MPEG audio decoding strategy. More specifically, we will discuss the problem of eliminating costly computation cycles being wasted at processing useless zero-valued data. Hence, the title: "do not zero-pute". At first, this may all sound somewhat obvious or trivial. Indeed, this can be true in many cases, but experience gathered in various teaching related projects during several academic years has also lead us to believe the opposite. Moreover, a survey of the existing literature quickly reveals that the approach discussed below has not been investigated and documented properly. Although we will only illustrate our optimization approach by discussing the MPEG-audio layer II decoding process in detail, we hope the reader will be able to apply, extend, and implement the basic principles presented here within many other applications.
{"title":"Do not zero-pute: an efficient homespun MPEG-audio layer II decoding and optimization strategy","authors":"P. Smet, F. Rooms, H. Luong, W. Philips","doi":"10.1145/1027527.1027615","DOIUrl":"https://doi.org/10.1145/1027527.1027615","url":null,"abstract":"In this paper we point out that the general principle \"do not compute what you do not need to compute\" can be applied easily and successfully within a MPEG audio decoding strategy. More specifically, we will discuss the problem of eliminating costly computation cycles being wasted at processing useless zero-valued data. Hence, the title: \"do not zero-pute\". At first, this may all sound somewhat obvious or trivial. Indeed, this can be true in many cases, but experience gathered in various teaching related projects during several academic years has also lead us to believe the opposite. Moreover, a survey of the existing literature quickly reveals that the approach discussed below has not been investigated and documented properly. Although we will only illustrate our optimization approach by discussing the MPEG-audio layer II decoding process in detail, we hope the reader will be able to apply, extend, and implement the basic principles presented here within many other applications.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"149 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134092301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this demonstration, a personalized Karaoke system, P-Karaoke, is proposed. In the P-Karaoke system, personal home videos and photographs, which are automatically selected from users' multimedia database according to their content, users' preferences or music, are utilized as the background videos of the Karaoke. The selected video clips, photographs, music and lyrics are well aligned to compose a Karaoke video, connecting by specific content-based transitions.
{"title":"P-Karaoke: personalized karaoke system","authors":"Xiansheng Hua, Lie Lu, HongJiang Zhang","doi":"10.1145/1027527.1027563","DOIUrl":"https://doi.org/10.1145/1027527.1027563","url":null,"abstract":"In this demonstration, a personalized Karaoke system, <i>P-Karaoke</i>, is proposed. In the P-Karaoke system, personal home videos and photographs, which are automatically selected from users' multimedia database according to their content, users' preferences or music, are utilized as the background videos of the Karaoke. The selected video clips, photographs, music and lyrics are well aligned to compose a Karaoke video, connecting by specific content-based transitions.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134316714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
TRECVID is an annual exercise which encourages research in information retrieval from digital video by providing a large video test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. TRECVID benchmarking covers both interactive and manual searching by end users, as well as the benchmarking of some supporting technologies including shot boundary detection, extraction of some semantic features, and the automatic segmentation of TV news broadcasts into non-overlapping news stories. TRECVID has a broad range of over 40 participating groups from across the world and as it is now (2004) in its 4th annual cycle it is opportune to stand back and look at the lessons we have learned from the cumulative activity. In this paper we shall present a brief and high-level overview of the TRECVID activity covering the data, the benchmarked tasks, the overall results obtained by groups to date and an overview of the approaches taken by selective groups in some tasks. While progress from one year to the next cannot be measured directly because of the changing nature of the video data we have been using, we shall present a summary of the lessons we have learned from TRECVID and include some pointers on what we feel are the most important of these lessons.
{"title":"TRECVID: evaluating the effectiveness of information retrieval tasks on digital video","authors":"A. Smeaton, P. Over, Wessel Kraaij","doi":"10.1145/1027527.1027678","DOIUrl":"https://doi.org/10.1145/1027527.1027678","url":null,"abstract":"TRECVID is an annual exercise which encourages research in information retrieval from digital video by providing a large video test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. TRECVID benchmarking covers both interactive and manual searching by end users, as well as the benchmarking of some supporting technologies including shot boundary detection, extraction of some semantic features, and the automatic segmentation of TV news broadcasts into non-overlapping news stories. TRECVID has a broad range of over 40 participating groups from across the world and as it is now (2004) in its 4th annual cycle it is opportune to stand back and look at the lessons we have learned from the cumulative activity. In this paper we shall present a brief and high-level overview of the TRECVID activity covering the data, the benchmarked tasks, the overall results obtained by groups to date and an overview of the approaches taken by selective groups in some tasks. While progress from one year to the next cannot be measured directly because of the changing nature of the video data we have been using, we shall present a summary of the lessons we have learned from TRECVID and include some pointers on what we feel are the most important of these lessons.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133377886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
User interfaces using windows, keyboard and mouse have been in use for over 30 years, but only offer limited facilities to the user. Conventional displays are small, at least compared with a physical desk; conventional input devices restrict both manual expression and cognitive flexibility; remote collaboration is a poor shadow of sitting in the same room. We show how recent technological advances in large display devices and input devices can address these problems. The Escritoire is a desk-based interface using overlapping projectors to create a large display with a high resolution region in the centre for detailed work. Two pens provide bimanual input over the entire area, and an interface like physical paper addresses some of the affordances not provided by the conventional user interface. Multiple desks can be connected to allow remote collaboration. The system has been tested with single users and collaborating pairs.
{"title":"A personal projected display","authors":"M. Ashdown, P. Robinson","doi":"10.1145/1027527.1027739","DOIUrl":"https://doi.org/10.1145/1027527.1027739","url":null,"abstract":"User interfaces using windows, keyboard and mouse have been in use for over 30 years, but only offer limited facilities to the user. Conventional displays are small, at least compared with a physical desk; conventional input devices restrict both manual expression and cognitive flexibility; remote collaboration is a poor shadow of sitting in the same room. We show how recent technological advances in large display devices and input devices can address these problems. The <i>Escritoire</i> is a desk-based interface using overlapping projectors to create a large display with a high resolution region in the centre for detailed work. Two pens provide bimanual input over the entire area, and an interface like physical paper addresses some of the affordances not provided by the conventional user interface. Multiple desks can be connected to allow remote collaboration. The system has been tested with single users and collaborating pairs.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127859412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Susanne CJ Boll, S. Ahuja, Dirk Friebel, B. Horowitz, N. Raman, S. Nandagopalan
PANEL OVERVIEW With the availability of new and powerful mobile devices evolving network infrastructures in point-to-point networks towards 3G networks, wireless networks and also broadcasting of digital TV to set top boxes as well as the availability of a tremendous amount of media such as pictures from camera phones, and digital music, multimedia started its triumphal procession to inform, entertain, and educate users everywhere.
{"title":"Where are the brave new mobile multimedia applications?","authors":"Susanne CJ Boll, S. Ahuja, Dirk Friebel, B. Horowitz, N. Raman, S. Nandagopalan","doi":"10.1145/1027527.1027750","DOIUrl":"https://doi.org/10.1145/1027527.1027750","url":null,"abstract":"PANEL OVERVIEW With the availability of new and powerful mobile devices evolving network infrastructures in point-to-point networks towards 3G networks, wireless networks and also broadcasting of digital TV to set top boxes as well as the availability of a tremendous amount of media such as pictures from camera phones, and digital music, multimedia started its triumphal procession to inform, entertain, and educate users everywhere.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124257549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sending audio data over a computer network consumes a large amount of bandwidth, and so compression strategies are regularly built into audio file formats and transmission software. In some environments, the basic nature of the sound does not change significantly; for example, phone lines deal frequently with voice transmission. By matching input audio blocks against those in a table, we can transmit the table indices only, and audio can be synthesized at the receiving end by simple table look-up. This has a number of potentially interesting applications.
{"title":"Index-frame audio transmission","authors":"J. Parker, Keith Chung","doi":"10.1145/1027527.1027618","DOIUrl":"https://doi.org/10.1145/1027527.1027618","url":null,"abstract":"Sending audio data over a computer network consumes a large amount of bandwidth, and so compression strategies are regularly built into audio file formats and transmission software. In some environments, the basic nature of the sound does not change significantly; for example, phone lines deal frequently with voice transmission. By matching input audio blocks against those in a table, we can transmit the table indices only, and audio can be synthesized at the receiving end by simple table look-up. This has a number of potentially interesting applications.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"232 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114420355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}