首页 > 最新文献

MULTIMEDIA '04最新文献

英文 中文
Intuitive and effective interfaces for WWW image search engines 直观和有效的界面为WWW图像搜索引擎
Pub Date : 2004-10-10 DOI: 10.1145/1027527.1027697
Zhiwei Li, Xing Xie, Hao Liu, Xiaoou Tang, Mingjing Li, Wei-Ying Ma
Web image search engine has become an important tool to organize digital images on the Web. However, most commercial search engines still use a list presentation while little effort has been placed on improving their usability. How to present the image search results in a more intuitive and effective way is still an open question to be carefully studied. In this demo, we present iFind, a scalable Web image search engine, in which we integrated two kinds of search result browsing interfaces. User study results have proved that our interfaces are superior to traditional interfaces.
网络图像搜索引擎已经成为组织网络上数字图像的重要工具。然而,大多数商业搜索引擎仍然使用列表表示,而很少努力提高其可用性。如何以更直观有效的方式呈现图像搜索结果,仍然是一个有待认真研究的开放性问题。在这个演示中,我们展示了iFind,这是一个可伸缩的Web图像搜索引擎,我们在其中集成了两种搜索结果浏览界面。用户研究结果证明,我们的界面优于传统的界面。
{"title":"Intuitive and effective interfaces for WWW image search engines","authors":"Zhiwei Li, Xing Xie, Hao Liu, Xiaoou Tang, Mingjing Li, Wei-Ying Ma","doi":"10.1145/1027527.1027697","DOIUrl":"https://doi.org/10.1145/1027527.1027697","url":null,"abstract":"Web image search engine has become an important tool to organize digital images on the Web. However, most commercial search engines still use a list presentation while little effort has been placed on improving their usability. How to present the image search results in a more intuitive and effective way is still an open question to be carefully studied. In this demo, we present iFind, a scalable Web image search engine, in which we integrated two kinds of search result browsing interfaces. User study results have proved that our interfaces are superior to traditional interfaces.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123744681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Collaboration-aware peer-to-peer media streaming 协作感知的点对点媒体流
Pub Date : 2004-10-10 DOI: 10.1145/1027527.1027625
S. Ye, F. Makedon
Peer-to-Peer(P2P) media streaming has emerged as a promising solution to media streaming in large distributed systems such as the Internet. Several P2P media streaming solutions have been proposed by researchers, however they all implicitly assume peers are collaborative, thus they suffer from the selfish peers that are not willing to collaborate. In this paper we introduce an incentive mechanism to urge selfish peers to behave collaboratively. It combines the traditional reputation-based approach and an online streaming behavior monitoring scheme. Our preliminary results show that the overall performance achieved by collaborative peers do not suffer from the existence of non-collaborative peers. The incentive mechanism is orthogonal to the existing media streaming solutions and can be integrated into them.
点对点(P2P)媒体流已经成为大型分布式系统(如Internet)中媒体流的一种很有前途的解决方案。研究人员提出了几种P2P流媒体解决方案,但它们都隐含地假设对等体是协作的,因此它们都受到自私的对等体不愿意合作的影响。在本文中,我们引入了一种激励机制来促使自私的同伴合作行为。它结合了传统的基于声誉的方法和在线流媒体行为监控方案。我们的初步研究结果表明,协作同伴的整体绩效并不会因为非协作同伴的存在而受到影响。激励机制与现有的流媒体解决方案是正交的,可以融入其中。
{"title":"Collaboration-aware peer-to-peer media streaming","authors":"S. Ye, F. Makedon","doi":"10.1145/1027527.1027625","DOIUrl":"https://doi.org/10.1145/1027527.1027625","url":null,"abstract":"Peer-to-Peer(P2P) media streaming has emerged as a promising solution to media streaming in large distributed systems such as the Internet. Several P2P media streaming solutions have been proposed by researchers, however they all implicitly assume peers are collaborative, thus they suffer from the selfish peers that are not willing to collaborate. In this paper we introduce an incentive mechanism to urge selfish peers to behave collaboratively. It combines the traditional reputation-based approach and an online streaming behavior monitoring scheme. Our preliminary results show that the overall performance achieved by collaborative peers do not suffer from the existence of non-collaborative peers. The incentive mechanism is orthogonal to the existing media streaming solutions and can be integrated into them.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124193142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Implementation and evaluation of EXT3NS multimedia file system EXT3NS多媒体文件系统的实现与评价
Pub Date : 2004-10-10 DOI: 10.1145/1027527.1027668
B. Ahn, Sung-Hoon Sohn, Chei-Yol Kim, Gyuil Cha, Y. Baek, Sung-In Jung, Myungjoon Kim
The EXT3NS is a scalable file system designed to handle video streaming workload in large-scale on-demand streaming services. It is based on a special H/W device, called Network-Storage card (NS card), which aims at accelerating streaming operation by shortening the data path from storage device to network interface. The design objective of EXT3NS is to minimize the delay and the delay variance of I/O request in the sequential workload on NS card. Metadata structure, file organization, metadata structure, unit of storage, etc. are elaborately tailored to achieve this objective. Further, EXT3NS provides the standard API's to read and write files in storage unit of NS card. The streaming server utilizes it to gain high disk I/O bandwidth, to avoid unnecessary memory copies on the data path from disk to network, and to alleviates CPU's burden by offloading parts of network protocol processing, The EXT3NS is a full functional file system based on the popular EXT3. The performance measurements on our prototype video server show obvious performance improvements. Specifically, we obtain better results from file system benchmark program, and obtain performance improvements in disk read and network transmission, which leads to overall streaming performance increase. Especially, the streaming server shows much less server's CPU utilization and less fluctuation of client bit rate, hence more reliable streaming service is possible.
EXT3NS是一个可扩展的文件系统,设计用于处理大规模按需流媒体服务中的视频流工作负载。它基于一种特殊的H/W设备,称为network - storage card (NS卡),其目的是通过缩短数据从存储设备到网络接口的路径来加速流操作。EXT3NS的设计目标是最小化NS卡上顺序工作负载中I/O请求的延迟和延迟变化。元数据结构、文件组织、元数据结构、存储单元等都是为实现这一目标而精心定制的。此外,EXT3NS还提供了标准的API来读写NS卡存储单元中的文件。流服务器利用它来获得高磁盘I/O带宽,避免从磁盘到网络的数据路径上不必要的内存拷贝,并通过卸载部分网络协议处理来减轻CPU的负担。EXT3NS是基于流行的EXT3的全功能文件系统。在我们的原型视频服务器上的性能测量显示出明显的性能改进。具体来说,我们从文件系统基准程序中获得了更好的结果,并且在磁盘读取和网络传输方面获得了性能改进,从而导致整体流性能的提高。特别是流媒体服务器的CPU利用率更低,客户端比特率波动更小,从而可以提供更可靠的流媒体服务。
{"title":"Implementation and evaluation of EXT3NS multimedia file system","authors":"B. Ahn, Sung-Hoon Sohn, Chei-Yol Kim, Gyuil Cha, Y. Baek, Sung-In Jung, Myungjoon Kim","doi":"10.1145/1027527.1027668","DOIUrl":"https://doi.org/10.1145/1027527.1027668","url":null,"abstract":"The EXT3NS is a scalable file system designed to handle video streaming workload in large-scale on-demand streaming services. It is based on a special H/W device, called Network-Storage card (NS card), which aims at accelerating streaming operation by shortening the data path from storage device to network interface. The design objective of EXT3NS is to minimize the delay and the delay variance of I/O request in the sequential workload on NS card. Metadata structure, file organization, metadata structure, unit of storage, etc. are elaborately tailored to achieve this objective. Further, EXT3NS provides the standard API's to read and write files in storage unit of NS card. The streaming server utilizes it to gain high disk I/O bandwidth, to avoid unnecessary memory copies on the data path from disk to network, and to alleviates CPU's burden by offloading parts of network protocol processing, The EXT3NS is a full functional file system based on the popular EXT3. The performance measurements on our prototype video server show obvious performance improvements. Specifically, we obtain better results from file system benchmark program, and obtain performance improvements in disk read and network transmission, which leads to overall streaming performance increase. Especially, the streaming server shows much less server's CPU utilization and less fluctuation of client bit rate, hence more reliable streaming service is possible.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124445983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A visuospatial memory cue system for meeting video retrieval 用于会议视频检索的视觉空间记忆提示系统
Pub Date : 2004-10-10 DOI: 10.1145/1027527.1027699
Takeshi Nagamine, A. Jaimes, Kengo Omura, K. Hirata
We present a system based on a new, memory-cue paradigm for retrieving meeting video scenes. The system graically represents important memory retrieval cues such as room layout, participant's faces and sitting positions, etc.. Queries are formulated dynamically: as the user graically manipulates the cues, the query results are shown. Our system (1) helps users easily express the cues they recall about a particular meeting; (2) helps users remember new cues for meeting video retrieval. We discuss the experiments that motivate this new approach, implementation, and future work.
我们提出了一个基于新的记忆线索范式的系统,用于检索会议视频场景。该系统以图形方式表示重要的记忆检索线索,如房间布局、参与者的面孔和坐姿等。查询是动态制定的:当用户以图形方式操作提示时,查询结果就会显示出来。我们的系统(1)帮助用户轻松地表达他们回忆起的关于特定会议的线索;(2)帮助用户记忆会议视频检索的新线索。我们讨论了激发这种新方法、实现和未来工作的实验。
{"title":"A visuospatial memory cue system for meeting video retrieval","authors":"Takeshi Nagamine, A. Jaimes, Kengo Omura, K. Hirata","doi":"10.1145/1027527.1027699","DOIUrl":"https://doi.org/10.1145/1027527.1027699","url":null,"abstract":"We present a system based on a new, memory-cue paradigm for retrieving meeting video scenes. The system graically represents important memory retrieval cues such as room layout, participant's faces and sitting positions, etc.. Queries are formulated dynamically: as the user graically manipulates the cues, the query results are shown. Our system (1) helps users easily express the <i>cues</i> they recall about a particular meeting; (2) helps users <i>remember</i> new cues for meeting video retrieval. We discuss the experiments that motivate this new approach, implementation, and future work.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130583429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Enhancing security of frequency domain video encryption 提高频域视频加密的安全性
Pub Date : 2004-10-10 DOI: 10.1145/1027527.1027597
Zheng Liu, Xue Li, Zhao Yang Dong
A potential security problem in frequency domain video encryption is that some trivial information such as the distribution of DCT coefficients may leak out secret. To illuminate this problem, we performed a successful attack on video using the distribution information of DCT coefficients. Then, according to the weak points discovered, a novel video encryption algorithm, working on run-length coded data, is proposed. It has amended identified security problems, while preserving high efficiency and the adaptability to cooperate with compression schemes.
频域视频加密的一个潜在的安全问题是一些琐碎的信息,如DCT系数的分布可能会泄露秘密。为了阐明这个问题,我们利用DCT系数的分布信息对视频进行了成功的攻击。然后,根据发现的弱点,提出了一种适用于游程编码数据的视频加密算法。它修正了已识别的安全问题,同时保持了高效率和与压缩方案配合的适应性。
{"title":"Enhancing security of frequency domain video encryption","authors":"Zheng Liu, Xue Li, Zhao Yang Dong","doi":"10.1145/1027527.1027597","DOIUrl":"https://doi.org/10.1145/1027527.1027597","url":null,"abstract":"A potential security problem in frequency domain video encryption is that some trivial information such as the distribution of DCT coefficients may leak out secret. To illuminate this problem, we performed a successful attack on video using the distribution information of DCT coefficients. Then, according to the weak points discovered, a novel video encryption algorithm, working on run-length coded data, is proposed. It has amended identified security problems, while preserving high efficiency and the adaptability to cooperate with compression schemes.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130889701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Range multicast routers for large-scale deployment of multimedia application 范围多播路由器适用于多媒体应用的大规模部署
Pub Date : 2004-10-10 DOI: 10.1145/1027527.1027558
Ning Jiang, Y. H. Ho, K. Hua
In this paper, we present the Range Multicast protocol and the implemented prototype. We propose to demonstrate the system at the 2004 ACM Multimedia Conference.
本文提出了范围组播协议及其实现原型。我们建议在2004年ACM多媒体会议上演示该系统。
{"title":"Range multicast routers for large-scale deployment of multimedia application","authors":"Ning Jiang, Y. H. Ho, K. Hua","doi":"10.1145/1027527.1027558","DOIUrl":"https://doi.org/10.1145/1027527.1027558","url":null,"abstract":"In this paper, we present the Range Multicast protocol and the implemented prototype. We propose to demonstrate the system at the 2004 ACM Multimedia Conference.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130651408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Application of packet assembly technology to digital video and VoIP 分组分组技术在数字视频和VoIP中的应用
Pub Date : 2004-10-10 DOI: 10.1145/1027527.1027620
T. Kanda, K. Shimamura
The Internet is composed of many kinds of networks and the networks are composed of network nodes such as routers. Routers use processor power for forwarding each packet with any size. At that time, node processor would be a bottleneck in respect to the high throughput if there would be too many packets to forward. Then, authors propose the packet assembly method. This aims to decrease the number of packets for the reduction of processor load, based on the fact that there are many packets much smaller than maximum transferable unit in backbone network. For the examination of the packet assembly, authors conducted two experiments. One is the experiment that conducts the packet assembly method for the traffic of digital video, and it provides the comparison of the image of digital video forwarded via routers without packet assembly with the one with packet assembly, and transition of edge router load and core router load. The other is the experiment that conducts the packet assembly method for the traffic of VoIP, and investigated about the influence on PSQM score, latency, and jitter.
Internet由多种类型的网络组成,这些网络由路由器等网络节点组成。路由器使用处理器的能力来转发任何大小的数据包。此时,如果有太多的数据包需要转发,节点处理器将成为高吞吐量的瓶颈。在此基础上,提出了分组组装方法。基于骨干网中存在许多比最大可传输单元小得多的数据包,其目的是减少数据包数量以减少处理器负载。为了检验分组装配,作者进行了两个实验。一是对数字视频流量进行分组分组方法的实验,对不分组分组的路由器转发的数字视频图像与分组分组转发的数字视频图像进行了比较,并对边缘路由器负载和核心路由器负载进行了转换。二是针对VoIP业务进行分组分组方法的实验,研究分组分组对PSQM评分、时延和抖动的影响。
{"title":"Application of packet assembly technology to digital video and VoIP","authors":"T. Kanda, K. Shimamura","doi":"10.1145/1027527.1027620","DOIUrl":"https://doi.org/10.1145/1027527.1027620","url":null,"abstract":"The Internet is composed of many kinds of networks and the networks are composed of network nodes such as routers. Routers use processor power for forwarding each packet with any size. At that time, node processor would be a bottleneck in respect to the high throughput if there would be too many packets to forward. Then, authors propose the packet assembly method. This aims to decrease the number of packets for the reduction of processor load, based on the fact that there are many packets much smaller than maximum transferable unit in backbone network.\u0000 For the examination of the packet assembly, authors conducted two experiments. One is the experiment that conducts the packet assembly method for the traffic of digital video, and it provides the comparison of the image of digital video forwarded via routers without packet assembly with the one with packet assembly, and transition of edge router load and core router load. The other is the experiment that conducts the packet assembly method for the traffic of VoIP, and investigated about the influence on PSQM score, latency, and jitter.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129328962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Indexing and matching of polyphonic songs for query-by-singing system 复调歌曲的唱查系统索引与匹配
Pub Date : 2004-10-10 DOI: 10.1145/1027527.1027598
Tat-Wan Leung, C. Ngo
This paper investigates the issues in polyphonic popular song retrieval. The problems that we consider include singing voice extraction, melodic curve representation, and database indexing. Initially, polyphonic songs are decomposed into singing voices and instruments sounds in both time and frequency domains based on SVM and ICA. The extracted singing voices are represented as two melodic curves that model the statistical mean and neighborhood similarity of notes. To speed up the matching between songs and query, we further adopt proportional transportation distance to index the songs as vantage point trees. Encouraging results have been obtained through experiments.
本文研究了复调流行歌曲检索中存在的问题。我们考虑的问题包括歌声提取、旋律曲线表示和数据库索引。首先,基于支持向量机和独立分量分析,在时域和频域将复调歌曲分解为歌声和乐器声音。将提取的歌声表示为两条旋律曲线,这两条旋律曲线对音符的统计平均值和邻域相似性进行建模。为了加快歌曲与查询的匹配速度,我们进一步采用比例运输距离作为有利点树对歌曲进行索引。通过实验取得了令人鼓舞的结果。
{"title":"Indexing and matching of polyphonic songs for query-by-singing system","authors":"Tat-Wan Leung, C. Ngo","doi":"10.1145/1027527.1027598","DOIUrl":"https://doi.org/10.1145/1027527.1027598","url":null,"abstract":"This paper investigates the issues in polyphonic popular song retrieval. The problems that we consider include singing voice extraction, melodic curve representation, and database indexing. Initially, polyphonic songs are decomposed into singing voices and instruments sounds in both time and frequency domains based on SVM and ICA. The extracted singing voices are represented as two melodic curves that model the statistical mean and neighborhood similarity of notes. To speed up the matching between songs and query, we further adopt proportional transportation distance to index the songs as vantage point trees. Encouraging results have been obtained through experiments.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128193679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A bootstrapping framework for annotating and retrieving WWW images 一个用于注释和检索WWW图像的引导框架
Pub Date : 2004-10-10 DOI: 10.1145/1027527.1027748
Huamin Feng, Rui Shi, Tat-Seng Chua
Most current image retrieval systems and commercial search engines use mainly text annotations to index and retrieve WWW images. This research explores the use of machine learning approaches to automatically annotate WWW images based on a predefined list of concepts by fusing evidences from image contents and their associated HTML text. One major practical limitation of employing supervised machine learning approaches is that for effective learning, a large set of labeled training samples is needed. This is tedious and severely impedes the practical development of effective search techniques for WWW images, which are dynamic and fast-changing. As web-based images possess both intrinsic visual contents and text annotations, they provide a strong basis to bootstrap the learning process by adopting a co-training approach involving classifiers based on two orthogonal set of features -- visual and text. The idea of co-training is to start from a small set of labeled training samples, and successively annotate a larger set of unlabeled samples using the two orthogonal classifiers. We carry out experiments using a set of over 5,000 images acquired from the Web. We explore the use of different combinations of HTML text and visual representations. We find that our bootstrapping approach can achieve a performance comparable to that of the supervised learning approach with an F1 measure of over 54%. At the same time, it offers the added advantage of requiring only a small initial set of training samples.
目前大多数图像检索系统和商业搜索引擎主要使用文本注释来索引和检索WWW图像。本研究探索使用机器学习方法,通过融合来自图像内容及其相关HTML文本的证据,基于预定义的概念列表自动注释WWW图像。使用监督机器学习方法的一个主要的实际限制是,为了有效的学习,需要大量的标记训练样本。这是一项繁琐的工作,严重阻碍了对动态和快速变化的WWW图像进行有效搜索技术的实际发展。由于基于web的图像具有内在的视觉内容和文本注释,因此通过采用涉及基于视觉和文本两个正交特征集的分类器的共同训练方法,它们为引导学习过程提供了强大的基础。协同训练的思想是从一组小的带标签的训练样本开始,使用两个正交分类器依次标注一组更大的未标记样本。我们使用从网络上获取的5000多张图片进行实验。我们探索了HTML文本和视觉表示的不同组合的使用。我们发现我们的自举方法可以达到与监督学习方法相当的性能,F1度量超过54%。同时,它还提供了只需要少量初始训练样本的额外优势。
{"title":"A bootstrapping framework for annotating and retrieving WWW images","authors":"Huamin Feng, Rui Shi, Tat-Seng Chua","doi":"10.1145/1027527.1027748","DOIUrl":"https://doi.org/10.1145/1027527.1027748","url":null,"abstract":"Most current image retrieval systems and commercial search engines use mainly text annotations to index and retrieve WWW images. This research explores the use of machine learning approaches to automatically annotate WWW images based on a predefined list of concepts by fusing evidences from image contents and their associated HTML text. One major practical limitation of employing supervised machine learning approaches is that for effective learning, a large set of labeled training samples is needed. This is tedious and severely impedes the practical development of effective search techniques for WWW images, which are dynamic and fast-changing. As web-based images possess both intrinsic visual contents and text annotations, they provide a strong basis to bootstrap the learning process by adopting a co-training approach involving classifiers based on two orthogonal set of features -- visual and text. The idea of co-training is to start from a small set of labeled training samples, and successively annotate a larger set of unlabeled samples using the two orthogonal classifiers. We carry out experiments using a set of over 5,000 images acquired from the Web. We explore the use of different combinations of HTML text and visual representations. We find that our bootstrapping approach can achieve a performance comparable to that of the supervised learning approach with an F1 measure of over 54%. At the same time, it offers the added advantage of requiring only a small initial set of training samples.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126185351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 113
Speech, ink, and slides: the interaction of content channels 语音、墨水和幻灯片:内容渠道的互动
Pub Date : 2004-10-10 DOI: 10.1145/1027527.1027713
Richard J. Anderson, C. Hoyer, Craig Prince, Jonathan Su, F. Videon, S. Wolfman
In this paper, we report on an empirical exploration of digital ink and speech usage in lecture presentation. We studied the video archives of five Master's level Computer Science courses to understand how instructors use ink and speech together while lecturing, and to evaluate techniques for analyzing digital ink. Our interest in understanding how ink and speech are used together is to inform the development of future tools for supporting classroom presentation, distance education, and viewing of archived lectures. We want to make it easier to interact with electronic materials and to extract information from them. We want to provide an empirical basis for addressing challenging problems such as automatically generating full text transcripts of lectures, matching speaker audio with slide content, and recognizing the meaning of the instructor's ink. Our results include an evaluation of handwritten word recognition in the lecture domain, an approach for associating attentional marks with content, an analysis of linkage between speech and ink, and an application of recognition techniques to infer speaker actions.
在本文中,我们报告了一项关于数字墨水和演讲在演讲中的使用的实证研究。我们研究了五门硕士级计算机科学课程的视频档案,以了解教师如何在讲课时同时使用墨水和语音,并评估分析数字墨水的技术。我们对理解墨水和语言如何一起使用的兴趣是为支持课堂演示、远程教育和观看存档讲座的未来工具的开发提供信息。我们希望能更容易地与电子材料互动,并从中提取信息。我们希望为解决具有挑战性的问题提供经验基础,例如自动生成讲座的全文文本,将演讲者的音频与幻灯片内容相匹配,以及识别讲师墨水的含义。我们的研究结果包括对演讲领域手写单词识别的评估,将注意力标记与内容关联的方法,语音和墨水之间联系的分析,以及应用识别技术来推断说话者的行为。
{"title":"Speech, ink, and slides: the interaction of content channels","authors":"Richard J. Anderson, C. Hoyer, Craig Prince, Jonathan Su, F. Videon, S. Wolfman","doi":"10.1145/1027527.1027713","DOIUrl":"https://doi.org/10.1145/1027527.1027713","url":null,"abstract":"In this paper, we report on an empirical exploration of digital ink and speech usage in lecture presentation. We studied the video archives of five Master's level Computer Science courses to understand how instructors use ink and speech together while lecturing, and to evaluate techniques for analyzing digital ink. Our interest in understanding how ink and speech are used together is to inform the development of future tools for supporting classroom presentation, distance education, and viewing of archived lectures. We want to make it easier to interact with electronic materials and to extract information from them. We want to provide an empirical basis for addressing challenging problems such as automatically generating full text transcripts of lectures, matching speaker audio with slide content, and recognizing the meaning of the instructor's ink. Our results include an evaluation of handwritten word recognition in the lecture domain, an approach for associating attentional marks with content, an analysis of linkage between speech and ink, and an application of recognition techniques to infer speaker actions.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114259673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
期刊
MULTIMEDIA '04
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1