2007 IEEE Conference on Computer Vision and Pattern Recognition最新文献

英文中文

Application of the Reeb Graph Technique to Vehicle Occupant's Head Detection in Low-resolution Range Images Reeb图技术在低分辨率距离图像中汽车乘员头部检测中的应用

2007 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2007-06-17 DOI: 10.1109/CVPR.2007.383450

P. Devarakota, M. Castillo-Franco, R. Ginhoux, B. Mirbach, B. Ottersten

In [3], a low-resolution range sensor was investigated for an occupant classification system that distinguish person from child seats or an empty seat. The optimal deployment of vehicle airbags for maximum protection moreover requires information about the occupant's size and position. The detection of occupant's position involves the detection and localization of occupant's head. This is a challenging problem as the approaches based on local shape analysis (in 2D or 3D) alone are not robust enough as other parts of the person's body like shoulders, knee may have similar shapes as the head. This paper discusses and investigate the potential of a Reeb graph approach to describe the topology of vehicle occupants in terms of a skeleton. The essence of the proposed approach is that an occupant sitting in a vehicle has a typical topology which leads to different branches of a Reeb Graph and the possible location of the occupant's head are thus the end points of the Reeb graph. The proposed method is applied on real 3D range images and is compared to Ground truth information. Results show the feasibility of using topological information to identify the position of occupant's head.

在[3]中，研究了一种低分辨率距离传感器，用于区分儿童座椅或空座的乘员分类系统。此外，为了最大限度地提供保护，车辆安全气囊的最佳部署还需要有关乘员体型和位置的信息。乘员位置检测涉及到对乘员头部的检测和定位。这是一个具有挑战性的问题，因为仅基于局部形状分析(2D或3D)的方法不够健壮，因为人体的其他部位(如肩膀、膝盖)可能与头部形状相似。本文讨论并研究了Reeb图方法在描述车辆乘员的拓扑结构方面的潜力。所提出的方法的本质是，坐在车辆中的乘员具有典型的拓扑结构，该拓扑结构导致Reeb图的不同分支，因此乘员头部的可能位置是Reeb图的端点。将该方法应用于真实的三维距离图像，并与Ground truth信息进行了比较。结果表明，利用拓扑信息识别乘员头部位置是可行的。

{"title":"Application of the Reeb Graph Technique to Vehicle Occupant's Head Detection in Low-resolution Range Images","authors":"P. Devarakota, M. Castillo-Franco, R. Ginhoux, B. Mirbach, B. Ottersten","doi":"10.1109/CVPR.2007.383450","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383450","url":null,"abstract":"In [3], a low-resolution range sensor was investigated for an occupant classification system that distinguish person from child seats or an empty seat. The optimal deployment of vehicle airbags for maximum protection moreover requires information about the occupant's size and position. The detection of occupant's position involves the detection and localization of occupant's head. This is a challenging problem as the approaches based on local shape analysis (in 2D or 3D) alone are not robust enough as other parts of the person's body like shoulders, knee may have similar shapes as the head. This paper discusses and investigate the potential of a Reeb graph approach to describe the topology of vehicle occupants in terms of a skeleton. The essence of the proposed approach is that an occupant sitting in a vehicle has a typical topology which leads to different branches of a Reeb Graph and the possible location of the occupant's head are thus the end points of the Reeb graph. The proposed method is applied on real 3D range images and is compared to Ground truth information. Results show the feasibility of using topological information to identify the position of occupant's head.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"257 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133915491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Unsupervised Clustering using Multi-Resolution Perceptual Grouping 基于多分辨率感知分组的无监督聚类

2007 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2007-06-17 DOI: 10.1109/CVPR.2007.382986

T. Syeda-Mahmood, Fei Wang

Clustering is a common operation for data partitioning in many practical applications. Often, such data distributions exhibit higher level structures which are important for problem characterization, but are not explicitly discovered by existing clustering algorithms. In this paper, we introduce multi-resolution perceptual grouping as an approach to unsupervised clustering. Specifically, we use the perceptual grouping constraints of proximity, density, contiguity and orientation similarity. We apply these constraints in a multi-resolution fashion, to group sample points in high dimensional spaces into salient clusters. We present an extensive evaluation of the clustering algorithm against state-of-the-art supervised and unsupervised clustering methods on large datasets.

在许多实际应用中，聚类是一种常见的数据分区操作。通常，这样的数据分布表现出对问题表征很重要的高级结构，但现有的聚类算法没有明确地发现这些结构。本文引入多分辨率感知分组作为一种无监督聚类方法。具体来说，我们使用了接近性、密度、邻近性和方向相似性的感知分组约束。我们以多分辨率的方式应用这些约束，将高维空间中的样本点分组为显著簇。我们对大型数据集上最先进的监督和无监督聚类方法的聚类算法进行了广泛的评估。

引用次数: 5

A Topic-Motion Model for Unsupervised Video Object Discovery 无监督视频对象发现的主题-运动模型

2007 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2007-06-17 DOI: 10.1109/CVPR.2007.383220

David Liu, Tsuhan Chen

The bag-of-words representation has attracted a lot of attention recently in the field of object recognition. Based on the bag-of-words representation, topic models such as probabilistic latent semantic analysis (PLSA) have been applied to unsupervised object discovery in still images. In this paper, we extend topic models from still images to motion videos with the integration of a temporal model. We propose a novel spatial-temporal framework that uses topic models for appearance modeling, and the probabilistic data association (PDA) filter for motion modeling. The spatial and temporal models are tightly integrated so that motion ambiguities can be resolved by appearance, and appearance ambiguities can be resolved by motion. We show promising results that cannot be achieved by appearance or motion modeling alone.

近年来，词袋表示在物体识别领域引起了广泛的关注。基于词袋表示的主题模型，如概率潜在语义分析(PLSA)已被应用于静止图像的无监督对象发现。在本文中，我们将主题模型从静态图像扩展到运动视频，并集成了一个时间模型。我们提出了一个新的时空框架，该框架使用主题模型进行外观建模，并使用概率数据关联(PDA)滤波器进行运动建模。空间和时间模型紧密结合，运动模糊可以通过外观来解决，外观模糊可以通过运动来解决。我们展示了有希望的结果，不能通过外观或运动建模单独实现。

引用次数: 47

A New Method for Object Tracking Based on Regions Instead of Contours 一种基于区域代替轮廓的目标跟踪新方法

2007 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2007-06-17 DOI: 10.1109/CVPR.2007.383454

N. Gómez, R. Alquézar, F. Serratosa

This paper presents a new method for object tracking in video sequences that is especially suitable in very noisy environments. In such situations, segmented images from one frame to the next one are usually so different that it is very hard or even impossible to match the corresponding regions or contours of both images. With the aim of tracking objects in these situations, our approach has two main characteristics. On one hand, we assume that the tracking approaches based on contours cannot be applied, and therefore, our system uses object recognition results computed from regions (specifically, colour spots from segmented images). On the other hand, we discard to match the spots of consecutive segmented images and, consequently, the methods that represent the objects by structures such as graphs or skeletons, since the structures obtained may be too different in consecutive frames. Thus, we represent the location of tracked objects through images of probabilities that are updated dynamically using both recognition and tracking results in previous steps. From these probabilities and a simple prediction of the apparent motion of the object in the image, a binary decision can be made for each pixel and abject.

本文提出了一种新的视频序列目标跟踪方法，特别适用于非常嘈杂的环境。在这种情况下，从一帧到下一帧的分割图像通常是如此不同，以至于很难甚至不可能匹配两个图像的相应区域或轮廓。为了在这些情况下跟踪目标，我们的方法有两个主要特点。一方面，我们假设不能应用基于轮廓的跟踪方法，因此，我们的系统使用从区域计算的对象识别结果(特别是来自分割图像的色点)。另一方面，我们放弃匹配连续分割图像的点，因此，用图或骨架等结构表示对象的方法，因为在连续帧中获得的结构可能差异太大。因此，我们通过使用前面步骤中识别和跟踪结果动态更新的概率图像来表示跟踪对象的位置。根据这些概率和对图像中物体的视运动的简单预测，可以对每个像素和物体进行二值化决策。

{"title":"A New Method for Object Tracking Based on Regions Instead of Contours","authors":"N. Gómez, R. Alquézar, F. Serratosa","doi":"10.1109/CVPR.2007.383454","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383454","url":null,"abstract":"This paper presents a new method for object tracking in video sequences that is especially suitable in very noisy environments. In such situations, segmented images from one frame to the next one are usually so different that it is very hard or even impossible to match the corresponding regions or contours of both images. With the aim of tracking objects in these situations, our approach has two main characteristics. On one hand, we assume that the tracking approaches based on contours cannot be applied, and therefore, our system uses object recognition results computed from regions (specifically, colour spots from segmented images). On the other hand, we discard to match the spots of consecutive segmented images and, consequently, the methods that represent the objects by structures such as graphs or skeletons, since the structures obtained may be too different in consecutive frames. Thus, we represent the location of tracked objects through images of probabilities that are updated dynamically using both recognition and tracking results in previous steps. From these probabilities and a simple prediction of the apparent motion of the object in the image, a binary decision can be made for each pixel and abject.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133489389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Learning and Matching Line Aspects for Articulated Objects 铰接对象的线方面学习与匹配

2007 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2007-06-17 DOI: 10.1109/CVPR.2007.383021

Xiaofeng Ren

Traditional aspect graphs are topology-based and are impractical for articulated objects. In this work we learn a small number of aspects, or prototypical views, from video data. Groundtruth segmentations in video sequences are utilized for both training and testing aspect models that operate on static images. We represent aspects of an articulated object as collections of line segments. In learning aspects, where object centers are known, a linear matching based on line location and orientation is used to measure similarity between views. We use K-medoid to find cluster centers. When using line aspects in recognition, matching is based on pairwise cues of relative location, relative orientation as well adjacency and parallelism. Matching with pairwise cues leads to a quadratic optimization that we solve with a spectral approximation. We show that our line aspect matching is capable of locating people in a variety of poses. Line aspect matching performs significantly better than an alternative approach using Hausdorff distance, showing merits of the line representation.

传统的方面图是基于拓扑的，对于铰接对象是不切实际的。在这项工作中，我们从视频数据中学习了一小部分方面，或者说是原型视图。视频序列中的真值分割用于训练和测试在静态图像上操作的方面模型。我们将一个铰接对象的各个方面表示为线段的集合。在学习方面，当物体中心已知时，使用基于线位置和方向的线性匹配来度量视图之间的相似性。我们使用K-medoid来寻找聚类中心。在使用线方面进行识别时，匹配是基于相对位置、相对方向以及相邻性和平行性的成对线索。与成对线索的匹配导致我们用谱近似求解的二次优化。我们证明了我们的直线方面匹配能够定位各种姿势的人。直线方面匹配的性能明显优于使用豪斯多夫距离的替代方法，显示了直线表示的优点。

引用次数: 19

Photometric Self-Calibration of a Projector-Camera System 投影-摄像系统的光度自校正

2007 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2007-06-17 DOI: 10.1109/CVPR.2007.383468

Ray Juang, A. Majumder

In this paper, we present a method for photometric self- calibration of a projector-camera system. In addition to the input transfer functions (commonly called gamma functions), we also reconstruct the spatial intensity fall-off from the center to fringe (commonly called the vignetting effect) for both the projector and camera. Projector-camera systems are becoming more popular in a large number of applications like scene capture, 3D reconstruction, and calibrating multi-projector displays. Our method enables the use of photometrically uncalibrated projectors and cameras in all such applications.

本文提出了一种投影-摄像系统的光度自标定方法。除了输入传递函数(通常称为伽马函数)之外，我们还为投影仪和相机重建了从中心到边缘的空间强度下降(通常称为渐晕效果)。投影相机系统在场景捕捉、3D重建和校准多投影机显示器等大量应用中越来越受欢迎。我们的方法能够在所有这些应用中使用光度学上未校准的投影仪和相机。

引用次数: 30

Inferring Grammar-based Structure Models from 3D Microscopy Data 从3D显微镜数据推断基于语法的结构模型

2007 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2007-06-17 DOI: 10.1109/CVPR.2007.383031

J. Schlecht, Kobus Barnard, Ekaterina H. Spriggs, B. Pryor

We present a new method to fit grammar-based stochastic models for biological structure to stacks of microscopic images captured at incremental focal lengths. Providing the ability to quantitatively represent structure and automatically fit it to image data enables important biological research. We consider the case where individuals can be represented as an instance of a stochastic grammar, similar to L-systems used in graphics to produce realistic plant models. In particular, we construct a stochastic grammar of Alternaria, a genus of fungus, and fit instances of it to microscopic image stacks. We express the image data as the result of a generative process composed of the underlying probabilistic structure model together with the parameters of the imaging system. Fitting the model then becomes probabilistic inference. For this we create a reversible-jump MCMC sampler to traverse the parameter space. We observe that incorporating spatial structure helps fit the model parts, and that simultaneously fitting the imaging system is also very helpful.

我们提出了一种新的方法，将基于语法的生物结构随机模型拟合到以增量焦距捕获的显微图像堆栈中。提供定量表示结构并自动将其与图像数据相匹配的能力，可以实现重要的生物学研究。我们考虑个体可以被表示为随机语法实例的情况，类似于图形中用于生成逼真植物模型的l系统。特别是，我们构建了真菌属Alternaria的随机语法，并将其实例拟合到显微图像堆栈中。我们将图像数据表示为由底层概率结构模型和成像系统参数组成的生成过程的结果。然后，拟合模型就变成了概率推理。为此，我们创建了一个可逆跳跃MCMC采样器来遍历参数空间。我们观察到，结合空间结构有助于模型部件的拟合，同时拟合成像系统也很有帮助。

引用次数: 26

On the Efficacy of Correcting for Refractive Effects in Iris Recognition 虹膜识别中屈光效应的校正效果研究

2007 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2007-06-17 DOI: 10.1109/CVPR.2007.383380

J. R. Price, T. Gee, V. Paquit, K. Tobin

In this study, we aim to determine if iris recognition accuracy might be improved by correcting for the refractive effects of the human eye when the optical axes of the eye and camera are misaligned. We undertake this investigation using an anatomically-approximated, three-dimensional model of the human eye and ray-tracing. We generate synthetic iris imagery from different viewing angles using first a simple pattern of concentric rings on the iris for analysis, and then synthetic texture maps on the iris for experimentation. We estimate the distortion from the concentric-ring iris images and use the results to guide the sampling of textured iris images that are distorted by refraction. Using the well-known Gabor filter phase quantization approach, our model-based results indicate that the Hamming distances between iris signatures from different viewing angles can be significantly reduced by accounting for refraction. Over our experimental conditions comprising viewing angles from 0 to 60 degrees, we observe a median reduction in Hamming distance of 27.4% and a maximum reduction of 70.0% when we compensate for refraction. Maximum improvements are observed at viewing angles o/20deg-25deg.

在这项研究中，我们的目的是确定是否可以通过校正人眼的折射效应来提高虹膜识别的准确性，当眼睛和相机的光轴不对齐时。我们进行这项调查使用解剖学近似，人眼和光线追踪的三维模型。我们首先在虹膜上使用一个简单的同心圆图案进行分析，然后在虹膜上合成纹理图进行实验，从不同的视角生成合成虹膜图像。我们估计了同心圆虹膜图像的畸变，并利用结果来指导因折射而畸变的纹理虹膜图像的采样。使用著名的Gabor滤波器相位量化方法，我们基于模型的结果表明，不同视角下虹膜特征之间的汉明距离可以通过考虑折射而显著减小。在我们的实验条件下，包括从0到60度的视角，我们观察到汉明距离的中位数减少27.4%，当我们补偿折射时，最大减少70.0%。最大的改进是观察视角0 /20度-25度。

{"title":"On the Efficacy of Correcting for Refractive Effects in Iris Recognition","authors":"J. R. Price, T. Gee, V. Paquit, K. Tobin","doi":"10.1109/CVPR.2007.383380","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383380","url":null,"abstract":"In this study, we aim to determine if iris recognition accuracy might be improved by correcting for the refractive effects of the human eye when the optical axes of the eye and camera are misaligned. We undertake this investigation using an anatomically-approximated, three-dimensional model of the human eye and ray-tracing. We generate synthetic iris imagery from different viewing angles using first a simple pattern of concentric rings on the iris for analysis, and then synthetic texture maps on the iris for experimentation. We estimate the distortion from the concentric-ring iris images and use the results to guide the sampling of textured iris images that are distorted by refraction. Using the well-known Gabor filter phase quantization approach, our model-based results indicate that the Hamming distances between iris signatures from different viewing angles can be significantly reduced by accounting for refraction. Over our experimental conditions comprising viewing angles from 0 to 60 degrees, we observe a median reduction in Hamming distance of 27.4% and a maximum reduction of 70.0% when we compensate for refraction. Maximum improvements are observed at viewing angles o/20deg-25deg.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131856873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Real-Time License Plate Recognition on an Embedded DSP-Platform 基于嵌入式dsp平台的实时车牌识别

2007 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2007-06-17 DOI: 10.1109/CVPR.2007.383412

Clemens Arth, Florian Limberger, H. Bischof

In this paper we present a full-featured license plate detection and recognition system. The system is implemented on an embedded DSP platform and processes a video stream in real-time. It consists of a detection and a character recognition module. The detector is based on the AdaBoost approach presented by Viola and Jones. Detected license plates are segmented into individual characters by using a region-based approach. Character classification is performed with support vector classification. In order to speed up the detection process on the embedded device, a Kalman tracker is integrated into the system. The search area of the detector is limited to locations where the next location of a license plate is predicted. Furthermore, classification results of subsequent frames are combined to improve the class accuracy. The major advantages of our system are its real-time capability and that it does not require any additional sensor input (e.g. from infrared sensors) except a video stream. We evaluate our system on a large number of vehicles and license plates using bad quality video and show that the low resolution can be partly compensated by combining classification results of subsequent frames.

本文提出了一个功能齐全的车牌检测与识别系统。该系统在嵌入式DSP平台上实现，对视频流进行实时处理。它由检测模块和字符识别模块组成。该探测器基于Viola和Jones提出的AdaBoost方法。使用基于区域的方法将检测到的车牌分割成单个字符。字符分类采用支持向量分类。为了加快对嵌入式设备的检测速度，在系统中集成了卡尔曼跟踪器。探测器的搜索区域被限制在预测到下一个车牌位置的位置。并结合后续帧的分类结果，提高分类精度。我们系统的主要优点是它的实时性，除了视频流之外，它不需要任何额外的传感器输入(例如来自红外传感器)。我们在大量使用劣质视频的车辆和车牌上评估了我们的系统，并表明通过结合后续帧的分类结果可以部分补偿低分辨率。

{"title":"Real-Time License Plate Recognition on an Embedded DSP-Platform","authors":"Clemens Arth, Florian Limberger, H. Bischof","doi":"10.1109/CVPR.2007.383412","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383412","url":null,"abstract":"In this paper we present a full-featured license plate detection and recognition system. The system is implemented on an embedded DSP platform and processes a video stream in real-time. It consists of a detection and a character recognition module. The detector is based on the AdaBoost approach presented by Viola and Jones. Detected license plates are segmented into individual characters by using a region-based approach. Character classification is performed with support vector classification. In order to speed up the detection process on the embedded device, a Kalman tracker is integrated into the system. The search area of the detector is limited to locations where the next location of a license plate is predicted. Furthermore, classification results of subsequent frames are combined to improve the class accuracy. The major advantages of our system are its real-time capability and that it does not require any additional sensor input (e.g. from infrared sensors) except a video stream. We evaluate our system on a large number of vehicles and license plates using bad quality video and show that the low resolution can be partly compensated by combining classification results of subsequent frames.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133746885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 145

Object retrieval with large vocabularies and fast spatial matching 基于大词汇量和快速空间匹配的对象检索

2007 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2007-06-17 DOI: 10.1109/CVPR.2007.383172

James Philbin, Ondřej Chum, M. Isard, Josef Sivic, Andrew Zisserman

In this paper, we present a large-scale object retrieval system. The user supplies a query object by selecting a region of a query image, and the system returns a ranked list of images that contain the same object, retrieved from a large corpus. We demonstrate the scalability and performance of our system on a dataset of over 1 million images crawled from the photo-sharing site, Flickr [3], using Oxford landmarks as queries. Building an image-feature vocabulary is a major time and performance bottleneck, due to the size of our dataset. To address this problem we compare different scalable methods for building a vocabulary and introduce a novel quantization method based on randomized trees which we show outperforms the current state-of-the-art on an extensive ground-truth. Our experiments show that the quantization has a major effect on retrieval quality. To further improve query performance, we add an efficient spatial verification stage to re-rank the results returned from our bag-of-words model and show that this consistently improves search quality, though by less of a margin when the visual vocabulary is large. We view this work as a promising step towards much larger, "web-scale " image corpora.

本文提出了一个大规模的目标检索系统。用户通过选择查询图像的一个区域来提供查询对象，系统返回包含相同对象的图像的排序列表，这些图像是从一个大型语料库中检索到的。我们使用牛津地标作为查询，在从照片共享网站Flickr[3]抓取的超过100万张图像的数据集上演示了我们系统的可扩展性和性能。由于数据集的大小，构建图像特征词汇表是一个主要的时间和性能瓶颈。为了解决这个问题，我们比较了构建词汇表的不同可扩展方法，并引入了一种基于随机树的新型量化方法，我们证明该方法在广泛的基础上优于当前最先进的方法。实验表明，量化对检索质量有重要影响。为了进一步提高查询性能，我们添加了一个有效的空间验证阶段来重新排序从词袋模型返回的结果，并表明这始终提高了搜索质量，尽管当视觉词汇量很大时，改进幅度较小。我们认为这项工作是朝着更大的“网络规模”图像语料库迈出的有希望的一步。

{"title":"Object retrieval with large vocabularies and fast spatial matching","authors":"James Philbin, Ondřej Chum, M. Isard, Josef Sivic, Andrew Zisserman","doi":"10.1109/CVPR.2007.383172","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383172","url":null,"abstract":"In this paper, we present a large-scale object retrieval system. The user supplies a query object by selecting a region of a query image, and the system returns a ranked list of images that contain the same object, retrieved from a large corpus. We demonstrate the scalability and performance of our system on a dataset of over 1 million images crawled from the photo-sharing site, Flickr [3], using Oxford landmarks as queries. Building an image-feature vocabulary is a major time and performance bottleneck, due to the size of our dataset. To address this problem we compare different scalable methods for building a vocabulary and introduce a novel quantization method based on randomized trees which we show outperforms the current state-of-the-art on an extensive ground-truth. Our experiments show that the quantization has a major effect on retrieval quality. To further improve query performance, we add an efficient spatial verification stage to re-rank the results returned from our bag-of-words model and show that this consistently improves search quality, though by less of a margin when the visual vocabulary is large. We view this work as a promising step towards much larger, \"web-scale \" image corpora.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115196452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3111

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2007 IEEE Conference on Computer Vision and Pattern Recognition

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀