Video Google: a text retrieval approach to object matching in videos

Proceedings Ninth IEEE International Conference on Computer Vision Pub Date : 2003-10-13 DOI:10.1109/ICCV.2003.1238663

Josef Sivic, Andrew Zisserman

引用次数: 7002

Abstract

We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject unstable regions and reduce the effects of noise in the descriptors. The analogy with text retrieval is in the implementation where matches on descriptors are pre-computed (using vector quantization), and inverted file systems and document rankings are used. The result is that retrieved is immediate, returning a ranked list of key frames/shots in the manner of Google. The method is illustrated for matching in two full length feature films.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Video谷歌:视频中对象匹配的文本检索方法

我们描述了一种对象和场景检索方法，该方法搜索并定位视频中用户概述对象的所有出现情况。目标由一组视点不变区域描述符表示，以便在视点、光照和部分遮挡变化的情况下仍能成功识别。在一个镜头内视频的时间连续性被用来跟踪区域，以拒绝不稳定的区域和减少噪声在描述符的影响。与文本检索类似的是在实现中预先计算描述符上的匹配(使用矢量量化)，并使用反向文件系统和文档排名。结果是检索是即时的，以谷歌的方式返回关键帧/镜头的排名列表。以两长片的匹配为例说明了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings Ninth IEEE International Conference on Computer Vision

自引率

0.00%

发文量

期刊最新文献

Fusion of static and dynamic body biometrics for gait recognition Selection of scale-invariant parts for object class recognition Information theoretic focal length selection for real-time active 3D object tracking A multi-scale generative model for animate shapes and parts Integrated edge and junction detection with the boundary tensor