Web image search engine has become an important tool to organize digital images on the Web. However, most commercial search engines still use a list presentation while little effort has been placed on improving their usability. How to present the image search results in a more intuitive and effective way is still an open question to be carefully studied. In this demo, we present iFind, a scalable Web image search engine, in which we integrated two kinds of search result browsing interfaces. User study results have proved that our interfaces are superior to traditional interfaces.
{"title":"Intuitive and effective interfaces for WWW image search engines","authors":"Zhiwei Li, Xing Xie, Hao Liu, Xiaoou Tang, Mingjing Li, Wei-Ying Ma","doi":"10.1145/1027527.1027697","DOIUrl":"https://doi.org/10.1145/1027527.1027697","url":null,"abstract":"Web image search engine has become an important tool to organize digital images on the Web. However, most commercial search engines still use a list presentation while little effort has been placed on improving their usability. How to present the image search results in a more intuitive and effective way is still an open question to be carefully studied. In this demo, we present iFind, a scalable Web image search engine, in which we integrated two kinds of search result browsing interfaces. User study results have proved that our interfaces are superior to traditional interfaces.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123744681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peer-to-Peer(P2P) media streaming has emerged as a promising solution to media streaming in large distributed systems such as the Internet. Several P2P media streaming solutions have been proposed by researchers, however they all implicitly assume peers are collaborative, thus they suffer from the selfish peers that are not willing to collaborate. In this paper we introduce an incentive mechanism to urge selfish peers to behave collaboratively. It combines the traditional reputation-based approach and an online streaming behavior monitoring scheme. Our preliminary results show that the overall performance achieved by collaborative peers do not suffer from the existence of non-collaborative peers. The incentive mechanism is orthogonal to the existing media streaming solutions and can be integrated into them.
{"title":"Collaboration-aware peer-to-peer media streaming","authors":"S. Ye, F. Makedon","doi":"10.1145/1027527.1027625","DOIUrl":"https://doi.org/10.1145/1027527.1027625","url":null,"abstract":"Peer-to-Peer(P2P) media streaming has emerged as a promising solution to media streaming in large distributed systems such as the Internet. Several P2P media streaming solutions have been proposed by researchers, however they all implicitly assume peers are collaborative, thus they suffer from the selfish peers that are not willing to collaborate. In this paper we introduce an incentive mechanism to urge selfish peers to behave collaboratively. It combines the traditional reputation-based approach and an online streaming behavior monitoring scheme. Our preliminary results show that the overall performance achieved by collaborative peers do not suffer from the existence of non-collaborative peers. The incentive mechanism is orthogonal to the existing media streaming solutions and can be integrated into them.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124193142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Ahn, Sung-Hoon Sohn, Chei-Yol Kim, Gyuil Cha, Y. Baek, Sung-In Jung, Myungjoon Kim
The EXT3NS is a scalable file system designed to handle video streaming workload in large-scale on-demand streaming services. It is based on a special H/W device, called Network-Storage card (NS card), which aims at accelerating streaming operation by shortening the data path from storage device to network interface. The design objective of EXT3NS is to minimize the delay and the delay variance of I/O request in the sequential workload on NS card. Metadata structure, file organization, metadata structure, unit of storage, etc. are elaborately tailored to achieve this objective. Further, EXT3NS provides the standard API's to read and write files in storage unit of NS card. The streaming server utilizes it to gain high disk I/O bandwidth, to avoid unnecessary memory copies on the data path from disk to network, and to alleviates CPU's burden by offloading parts of network protocol processing, The EXT3NS is a full functional file system based on the popular EXT3. The performance measurements on our prototype video server show obvious performance improvements. Specifically, we obtain better results from file system benchmark program, and obtain performance improvements in disk read and network transmission, which leads to overall streaming performance increase. Especially, the streaming server shows much less server's CPU utilization and less fluctuation of client bit rate, hence more reliable streaming service is possible.
{"title":"Implementation and evaluation of EXT3NS multimedia file system","authors":"B. Ahn, Sung-Hoon Sohn, Chei-Yol Kim, Gyuil Cha, Y. Baek, Sung-In Jung, Myungjoon Kim","doi":"10.1145/1027527.1027668","DOIUrl":"https://doi.org/10.1145/1027527.1027668","url":null,"abstract":"The EXT3NS is a scalable file system designed to handle video streaming workload in large-scale on-demand streaming services. It is based on a special H/W device, called Network-Storage card (NS card), which aims at accelerating streaming operation by shortening the data path from storage device to network interface. The design objective of EXT3NS is to minimize the delay and the delay variance of I/O request in the sequential workload on NS card. Metadata structure, file organization, metadata structure, unit of storage, etc. are elaborately tailored to achieve this objective. Further, EXT3NS provides the standard API's to read and write files in storage unit of NS card. The streaming server utilizes it to gain high disk I/O bandwidth, to avoid unnecessary memory copies on the data path from disk to network, and to alleviates CPU's burden by offloading parts of network protocol processing, The EXT3NS is a full functional file system based on the popular EXT3. The performance measurements on our prototype video server show obvious performance improvements. Specifically, we obtain better results from file system benchmark program, and obtain performance improvements in disk read and network transmission, which leads to overall streaming performance increase. Especially, the streaming server shows much less server's CPU utilization and less fluctuation of client bit rate, hence more reliable streaming service is possible.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124445983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Takeshi Nagamine, A. Jaimes, Kengo Omura, K. Hirata
We present a system based on a new, memory-cue paradigm for retrieving meeting video scenes. The system graically represents important memory retrieval cues such as room layout, participant's faces and sitting positions, etc.. Queries are formulated dynamically: as the user graically manipulates the cues, the query results are shown. Our system (1) helps users easily express the cues they recall about a particular meeting; (2) helps users remember new cues for meeting video retrieval. We discuss the experiments that motivate this new approach, implementation, and future work.
{"title":"A visuospatial memory cue system for meeting video retrieval","authors":"Takeshi Nagamine, A. Jaimes, Kengo Omura, K. Hirata","doi":"10.1145/1027527.1027699","DOIUrl":"https://doi.org/10.1145/1027527.1027699","url":null,"abstract":"We present a system based on a new, memory-cue paradigm for retrieving meeting video scenes. The system graically represents important memory retrieval cues such as room layout, participant's faces and sitting positions, etc.. Queries are formulated dynamically: as the user graically manipulates the cues, the query results are shown. Our system (1) helps users easily express the <i>cues</i> they recall about a particular meeting; (2) helps users <i>remember</i> new cues for meeting video retrieval. We discuss the experiments that motivate this new approach, implementation, and future work.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130583429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A potential security problem in frequency domain video encryption is that some trivial information such as the distribution of DCT coefficients may leak out secret. To illuminate this problem, we performed a successful attack on video using the distribution information of DCT coefficients. Then, according to the weak points discovered, a novel video encryption algorithm, working on run-length coded data, is proposed. It has amended identified security problems, while preserving high efficiency and the adaptability to cooperate with compression schemes.
{"title":"Enhancing security of frequency domain video encryption","authors":"Zheng Liu, Xue Li, Zhao Yang Dong","doi":"10.1145/1027527.1027597","DOIUrl":"https://doi.org/10.1145/1027527.1027597","url":null,"abstract":"A potential security problem in frequency domain video encryption is that some trivial information such as the distribution of DCT coefficients may leak out secret. To illuminate this problem, we performed a successful attack on video using the distribution information of DCT coefficients. Then, according to the weak points discovered, a novel video encryption algorithm, working on run-length coded data, is proposed. It has amended identified security problems, while preserving high efficiency and the adaptability to cooperate with compression schemes.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130889701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present the Range Multicast protocol and the implemented prototype. We propose to demonstrate the system at the 2004 ACM Multimedia Conference.
本文提出了范围组播协议及其实现原型。我们建议在2004年ACM多媒体会议上演示该系统。
{"title":"Range multicast routers for large-scale deployment of multimedia application","authors":"Ning Jiang, Y. H. Ho, K. Hua","doi":"10.1145/1027527.1027558","DOIUrl":"https://doi.org/10.1145/1027527.1027558","url":null,"abstract":"In this paper, we present the Range Multicast protocol and the implemented prototype. We propose to demonstrate the system at the 2004 ACM Multimedia Conference.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130651408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Internet is composed of many kinds of networks and the networks are composed of network nodes such as routers. Routers use processor power for forwarding each packet with any size. At that time, node processor would be a bottleneck in respect to the high throughput if there would be too many packets to forward. Then, authors propose the packet assembly method. This aims to decrease the number of packets for the reduction of processor load, based on the fact that there are many packets much smaller than maximum transferable unit in backbone network. For the examination of the packet assembly, authors conducted two experiments. One is the experiment that conducts the packet assembly method for the traffic of digital video, and it provides the comparison of the image of digital video forwarded via routers without packet assembly with the one with packet assembly, and transition of edge router load and core router load. The other is the experiment that conducts the packet assembly method for the traffic of VoIP, and investigated about the influence on PSQM score, latency, and jitter.
{"title":"Application of packet assembly technology to digital video and VoIP","authors":"T. Kanda, K. Shimamura","doi":"10.1145/1027527.1027620","DOIUrl":"https://doi.org/10.1145/1027527.1027620","url":null,"abstract":"The Internet is composed of many kinds of networks and the networks are composed of network nodes such as routers. Routers use processor power for forwarding each packet with any size. At that time, node processor would be a bottleneck in respect to the high throughput if there would be too many packets to forward. Then, authors propose the packet assembly method. This aims to decrease the number of packets for the reduction of processor load, based on the fact that there are many packets much smaller than maximum transferable unit in backbone network.\u0000 For the examination of the packet assembly, authors conducted two experiments. One is the experiment that conducts the packet assembly method for the traffic of digital video, and it provides the comparison of the image of digital video forwarded via routers without packet assembly with the one with packet assembly, and transition of edge router load and core router load. The other is the experiment that conducts the packet assembly method for the traffic of VoIP, and investigated about the influence on PSQM score, latency, and jitter.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129328962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper investigates the issues in polyphonic popular song retrieval. The problems that we consider include singing voice extraction, melodic curve representation, and database indexing. Initially, polyphonic songs are decomposed into singing voices and instruments sounds in both time and frequency domains based on SVM and ICA. The extracted singing voices are represented as two melodic curves that model the statistical mean and neighborhood similarity of notes. To speed up the matching between songs and query, we further adopt proportional transportation distance to index the songs as vantage point trees. Encouraging results have been obtained through experiments.
{"title":"Indexing and matching of polyphonic songs for query-by-singing system","authors":"Tat-Wan Leung, C. Ngo","doi":"10.1145/1027527.1027598","DOIUrl":"https://doi.org/10.1145/1027527.1027598","url":null,"abstract":"This paper investigates the issues in polyphonic popular song retrieval. The problems that we consider include singing voice extraction, melodic curve representation, and database indexing. Initially, polyphonic songs are decomposed into singing voices and instruments sounds in both time and frequency domains based on SVM and ICA. The extracted singing voices are represented as two melodic curves that model the statistical mean and neighborhood similarity of notes. To speed up the matching between songs and query, we further adopt proportional transportation distance to index the songs as vantage point trees. Encouraging results have been obtained through experiments.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128193679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Most current image retrieval systems and commercial search engines use mainly text annotations to index and retrieve WWW images. This research explores the use of machine learning approaches to automatically annotate WWW images based on a predefined list of concepts by fusing evidences from image contents and their associated HTML text. One major practical limitation of employing supervised machine learning approaches is that for effective learning, a large set of labeled training samples is needed. This is tedious and severely impedes the practical development of effective search techniques for WWW images, which are dynamic and fast-changing. As web-based images possess both intrinsic visual contents and text annotations, they provide a strong basis to bootstrap the learning process by adopting a co-training approach involving classifiers based on two orthogonal set of features -- visual and text. The idea of co-training is to start from a small set of labeled training samples, and successively annotate a larger set of unlabeled samples using the two orthogonal classifiers. We carry out experiments using a set of over 5,000 images acquired from the Web. We explore the use of different combinations of HTML text and visual representations. We find that our bootstrapping approach can achieve a performance comparable to that of the supervised learning approach with an F1 measure of over 54%. At the same time, it offers the added advantage of requiring only a small initial set of training samples.
{"title":"A bootstrapping framework for annotating and retrieving WWW images","authors":"Huamin Feng, Rui Shi, Tat-Seng Chua","doi":"10.1145/1027527.1027748","DOIUrl":"https://doi.org/10.1145/1027527.1027748","url":null,"abstract":"Most current image retrieval systems and commercial search engines use mainly text annotations to index and retrieve WWW images. This research explores the use of machine learning approaches to automatically annotate WWW images based on a predefined list of concepts by fusing evidences from image contents and their associated HTML text. One major practical limitation of employing supervised machine learning approaches is that for effective learning, a large set of labeled training samples is needed. This is tedious and severely impedes the practical development of effective search techniques for WWW images, which are dynamic and fast-changing. As web-based images possess both intrinsic visual contents and text annotations, they provide a strong basis to bootstrap the learning process by adopting a co-training approach involving classifiers based on two orthogonal set of features -- visual and text. The idea of co-training is to start from a small set of labeled training samples, and successively annotate a larger set of unlabeled samples using the two orthogonal classifiers. We carry out experiments using a set of over 5,000 images acquired from the Web. We explore the use of different combinations of HTML text and visual representations. We find that our bootstrapping approach can achieve a performance comparable to that of the supervised learning approach with an F1 measure of over 54%. At the same time, it offers the added advantage of requiring only a small initial set of training samples.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126185351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Richard J. Anderson, C. Hoyer, Craig Prince, Jonathan Su, F. Videon, S. Wolfman
In this paper, we report on an empirical exploration of digital ink and speech usage in lecture presentation. We studied the video archives of five Master's level Computer Science courses to understand how instructors use ink and speech together while lecturing, and to evaluate techniques for analyzing digital ink. Our interest in understanding how ink and speech are used together is to inform the development of future tools for supporting classroom presentation, distance education, and viewing of archived lectures. We want to make it easier to interact with electronic materials and to extract information from them. We want to provide an empirical basis for addressing challenging problems such as automatically generating full text transcripts of lectures, matching speaker audio with slide content, and recognizing the meaning of the instructor's ink. Our results include an evaluation of handwritten word recognition in the lecture domain, an approach for associating attentional marks with content, an analysis of linkage between speech and ink, and an application of recognition techniques to infer speaker actions.
{"title":"Speech, ink, and slides: the interaction of content channels","authors":"Richard J. Anderson, C. Hoyer, Craig Prince, Jonathan Su, F. Videon, S. Wolfman","doi":"10.1145/1027527.1027713","DOIUrl":"https://doi.org/10.1145/1027527.1027713","url":null,"abstract":"In this paper, we report on an empirical exploration of digital ink and speech usage in lecture presentation. We studied the video archives of five Master's level Computer Science courses to understand how instructors use ink and speech together while lecturing, and to evaluate techniques for analyzing digital ink. Our interest in understanding how ink and speech are used together is to inform the development of future tools for supporting classroom presentation, distance education, and viewing of archived lectures. We want to make it easier to interact with electronic materials and to extract information from them. We want to provide an empirical basis for addressing challenging problems such as automatically generating full text transcripts of lectures, matching speaker audio with slide content, and recognizing the meaning of the instructor's ink. Our results include an evaluation of handwritten word recognition in the lecture domain, an approach for associating attentional marks with content, an analysis of linkage between speech and ink, and an application of recognition techniques to infer speaker actions.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114259673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}