首页 > 最新文献

2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops最新文献

英文 中文
A GPU-based implementation of motion detection from a moving platform 基于gpu的移动平台运动检测实现
Qian Yu, G. Medioni
We describe a GPU-based implementation of motion detection from a moving platform. Motion detection from a moving platform is inherently difficult as the moving camera induces 2D motion field in the entire image. A step compensating for camera motion is required prior to estimating of the background model. Due to inevitable registration errors, the background model is estimated according to a sliding window of frames to avoid the case where erroneous registration influences the quality of the detection for the whole sequence. However, this approach involves several characteristics that put a heavy burden on real-time CPU implementation. We exploit GPU to achieve significant acceleration over standard CPU implementations. Our GPU-based implementation can build the background model and detect motion regions at around 18 fps on 320times240 videos that are captured for a moving camera.
我们描述了一个基于gpu的移动平台运动检测的实现。由于移动的摄像机会在整个图像中产生二维运动场,因此从移动平台进行运动检测本身就很困难。在估计背景模型之前,需要对相机运动进行步进补偿。由于不可避免的配准误差,背景模型采用帧间滑动窗口估计,避免配准错误影响整个序列的检测质量。然而,这种方法涉及到的几个特性给实时CPU实现带来了沉重的负担。我们利用GPU实现比标准CPU实现显著的加速。我们基于gpu的实现可以构建背景模型,并在为移动摄像机捕获的320times240视频中以大约18 fps的速度检测运动区域。
{"title":"A GPU-based implementation of motion detection from a moving platform","authors":"Qian Yu, G. Medioni","doi":"10.1109/CVPRW.2008.4563096","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563096","url":null,"abstract":"We describe a GPU-based implementation of motion detection from a moving platform. Motion detection from a moving platform is inherently difficult as the moving camera induces 2D motion field in the entire image. A step compensating for camera motion is required prior to estimating of the background model. Due to inevitable registration errors, the background model is estimated according to a sliding window of frames to avoid the case where erroneous registration influences the quality of the detection for the whole sequence. However, this approach involves several characteristics that put a heavy burden on real-time CPU implementation. We exploit GPU to achieve significant acceleration over standard CPU implementations. Our GPU-based implementation can build the background model and detect motion regions at around 18 fps on 320times240 videos that are captured for a moving camera.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"48 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114020286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Efficient scan-window based object detection using GPGPU 使用GPGPU高效的基于扫描窗口的目标检测
Li Zhang, R. Nevatia
We describe an efficient design for scan-window based object detectors using a general purpose graphics hardware computing (GPGPU) framework. While the design is particularly applied to built a pedestrian detector that uses histogram of oriented gradient (HOG) features and the support vector machine (SVM) classifiers, the methodology we use is generic and can be applied to other objects, using different features and classifiers. The GPGPU paradigm is utilized for feature extraction and classification, so that the scan windows can be processed in parallel. We further propose to precompute and cache all the histograms in advance, instead of using integral images, which greatly lowers the computation cost. A multi-scale reduce strategy is employed to save expensive CPU-GPU data transfers. Experimental results show that our implementation achieves a more-than-ten-times speed up with no loss on detection rates.
我们描述了一种基于扫描窗口的目标检测器的高效设计,使用通用图形硬件计算(GPGPU)框架。虽然该设计特别适用于构建一个使用定向梯度直方图(HOG)特征和支持向量机(SVM)分类器的行人检测器,但我们使用的方法是通用的,可以应用于其他对象,使用不同的特征和分类器。利用GPGPU范式进行特征提取和分类,使扫描窗口可以并行处理。我们进一步提出预先计算和缓存所有的直方图,而不是使用积分图像,这大大降低了计算成本。采用多尺度缩减策略,节省了昂贵的CPU-GPU数据传输。实验结果表明,我们的实现在没有检测率损失的情况下实现了十倍以上的提速。
{"title":"Efficient scan-window based object detection using GPGPU","authors":"Li Zhang, R. Nevatia","doi":"10.1109/CVPRW.2008.4563097","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563097","url":null,"abstract":"We describe an efficient design for scan-window based object detectors using a general purpose graphics hardware computing (GPGPU) framework. While the design is particularly applied to built a pedestrian detector that uses histogram of oriented gradient (HOG) features and the support vector machine (SVM) classifiers, the methodology we use is generic and can be applied to other objects, using different features and classifiers. The GPGPU paradigm is utilized for feature extraction and classification, so that the scan windows can be processed in parallel. We further propose to precompute and cache all the histograms in advance, instead of using integral images, which greatly lowers the computation cost. A multi-scale reduce strategy is employed to save expensive CPU-GPU data transfers. Experimental results show that our implementation achieves a more-than-ten-times speed up with no loss on detection rates.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114999172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 62
Circular generalized cylinder fitting for 3D reconstruction in endoscopic imaging based on MRF 基于MRF的内镜成像三维重建的圆形广义圆柱体拟合
Jin Zhou, Ananya Das, Feng Li, Baoxin Li
Endoscopy has become an established procedure for the diagnosis and therapy of various gastrointestinal (GI) ailments, and has also emerged as a commonly-used technique for minimally-invasive surgery. Most existing endoscopes are monocular, with stereo-endoscopy facing practical difficulties, preventing the physicians/surgeons from having a desired, realistic 3D view. Traditional monocular 3D reconstruction approaches (e.g., structure from motion) face extraordinary challenges for this application due to issues including noisy data, lack of textures supporting robust feature matching, nonrigidity of the objects, and glare artifacts from the imaging process, etc. In this paper, we propose a method to automatically reconstruct 3D structure from a monocular endoscopic video. Our approach attempts to address the above challenges by incorporating a circular generalized cylinder (CGC) model in 3D reconstruction. The CGC model is decomposed as a series of 3D circles. To reconstruct this model, we formulate the problem as one of maximum a posteriori estimation within a Markov random field framework, so as to ensure the smoothness constraints of the CGC model and to support robust search for the optimal solution, which is achieved by a two-stage heuristic search scheme. Both simulated and real data experiments demonstrate the effectiveness of the proposed approach.
内窥镜检查已成为各种胃肠道疾病的诊断和治疗的既定程序,也已成为一种常用的微创手术技术。大多数现有的内窥镜都是单目的,立体内窥镜面临着实际的困难,使医生/外科医生无法获得理想的、真实的3D视图。传统的单目3D重建方法(例如,从运动中获取结构)在这一应用中面临着巨大的挑战,原因包括数据噪声、缺乏支持鲁棒特征匹配的纹理、物体的非刚性以及成像过程中的眩光伪影等问题。在本文中,我们提出了一种从单眼内窥镜视频中自动重建三维结构的方法。我们的方法试图通过在三维重建中结合圆形广义圆柱体(CGC)模型来解决上述挑战。CGC模型被分解为一系列三维圆。为了重建该模型,我们将问题描述为马尔可夫随机场框架内的最大后验估计问题,以保证CGC模型的平滑性约束,并支持对最优解的鲁棒搜索,这是通过两阶段启发式搜索方案实现的。仿真实验和实际数据实验均证明了该方法的有效性。
{"title":"Circular generalized cylinder fitting for 3D reconstruction in endoscopic imaging based on MRF","authors":"Jin Zhou, Ananya Das, Feng Li, Baoxin Li","doi":"10.1109/CVPRW.2008.4563010","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563010","url":null,"abstract":"Endoscopy has become an established procedure for the diagnosis and therapy of various gastrointestinal (GI) ailments, and has also emerged as a commonly-used technique for minimally-invasive surgery. Most existing endoscopes are monocular, with stereo-endoscopy facing practical difficulties, preventing the physicians/surgeons from having a desired, realistic 3D view. Traditional monocular 3D reconstruction approaches (e.g., structure from motion) face extraordinary challenges for this application due to issues including noisy data, lack of textures supporting robust feature matching, nonrigidity of the objects, and glare artifacts from the imaging process, etc. In this paper, we propose a method to automatically reconstruct 3D structure from a monocular endoscopic video. Our approach attempts to address the above challenges by incorporating a circular generalized cylinder (CGC) model in 3D reconstruction. The CGC model is decomposed as a series of 3D circles. To reconstruct this model, we formulate the problem as one of maximum a posteriori estimation within a Markov random field framework, so as to ensure the smoothness constraints of the CGC model and to support robust search for the optimal solution, which is achieved by a two-stage heuristic search scheme. Both simulated and real data experiments demonstrate the effectiveness of the proposed approach.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"40 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113976041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Exploiting spatio-temporal information for view recognition in cardiac echo videos 利用时空信息进行心脏回波视频的视点识别
D. Beymer, T. Syeda-Mahmood, Fei Wang
2D Echocardiography is an important diagnostic aid for morphological and functional assessment of the heart. The transducer position is varied during an echo exam to elicit important information about the heart function and its anatomy. The knowledge of the transducer viewpoint is important in automatic cardiac echo interpretation to understand the regions being depicted as well as in the quantification of their attributes. In this paper, we address the problem of inferring the transducer viewpoint from the spatio-temporal information in cardiac echo videos. Unlike previous approaches, we exploit motion of the heart within a cardiac cycle in addition to spatial information to discriminate between viewpoints. Specifically, we use an active shape model (ASM) to model shape and texture information in an echo frame. The motion information derived by tracking ASMs through a heart cycle is then projected into the eigen-motion feature space of the viewpoint class for matching. We report comparison with a re-implementation of state-of-the-art view recognition methods in echos on a large database of patients with various cardiac diseases.
二维超声心动图是评价心脏形态和功能的重要诊断手段。在超声检查中,换能器的位置会发生变化,以获得有关心脏功能及其解剖结构的重要信息。换能器观点的知识在自动心脏回波解释中很重要,可以理解被描述的区域以及它们属性的量化。本文研究了从心脏回波视频的时空信息推断换能器视角的问题。与以前的方法不同,我们利用心脏在心脏周期内的运动以及空间信息来区分视点。具体来说,我们使用主动形状模型(ASM)对回波帧中的形状和纹理信息进行建模。通过心脏周期跟踪asm得到的运动信息被投影到视点类的本征运动特征空间中进行匹配。我们报告了与回声中最先进的视图识别方法在各种心脏病患者的大型数据库中的重新实现的比较。
{"title":"Exploiting spatio-temporal information for view recognition in cardiac echo videos","authors":"D. Beymer, T. Syeda-Mahmood, Fei Wang","doi":"10.1109/CVPRW.2008.4563008","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563008","url":null,"abstract":"2D Echocardiography is an important diagnostic aid for morphological and functional assessment of the heart. The transducer position is varied during an echo exam to elicit important information about the heart function and its anatomy. The knowledge of the transducer viewpoint is important in automatic cardiac echo interpretation to understand the regions being depicted as well as in the quantification of their attributes. In this paper, we address the problem of inferring the transducer viewpoint from the spatio-temporal information in cardiac echo videos. Unlike previous approaches, we exploit motion of the heart within a cardiac cycle in addition to spatial information to discriminate between viewpoints. Specifically, we use an active shape model (ASM) to model shape and texture information in an echo frame. The motion information derived by tracking ASMs through a heart cycle is then projected into the eigen-motion feature space of the viewpoint class for matching. We report comparison with a re-implementation of state-of-the-art view recognition methods in echos on a large database of patients with various cardiac diseases.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123917349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Design and calibration of a multi-view TOF sensor fusion system 多视点TOF传感器融合系统的设计与标定
Y. Kim, Derek Chan, C. Theobalt, S. Thrun
This paper describes the design and calibration of a system that enables simultaneous recording of dynamic scenes with multiple high-resolution video and low-resolution Swissranger time-of-flight (TOF) depth cameras. The system shall serve as a testbed for the development of new algorithms for high-quality multi-view dynamic scene reconstruction and 3D video. The paper also provides a detailed analysis of random and systematic depth camera noise which is important for reliable fusion of video and depth data. Finally, the paper describes how to compensate systematic depth errors and calibrate all dynamic depth and video data into a common frame.
本文描述了一个系统的设计和校准,该系统可以同时记录多个高分辨率视频和低分辨率Swissranger飞行时间(TOF)深度相机的动态场景。该系统将作为开发高质量多视角动态场景重建和3D视频新算法的试验台。本文还对随机和系统的深度摄像机噪声进行了详细的分析,这对视频和深度数据的可靠融合至关重要。最后,本文描述了如何补偿系统深度误差,并将所有动态深度和视频数据校准到一个公共帧中。
{"title":"Design and calibration of a multi-view TOF sensor fusion system","authors":"Y. Kim, Derek Chan, C. Theobalt, S. Thrun","doi":"10.1109/CVPRW.2008.4563160","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563160","url":null,"abstract":"This paper describes the design and calibration of a system that enables simultaneous recording of dynamic scenes with multiple high-resolution video and low-resolution Swissranger time-of-flight (TOF) depth cameras. The system shall serve as a testbed for the development of new algorithms for high-quality multi-view dynamic scene reconstruction and 3D video. The paper also provides a detailed analysis of random and systematic depth camera noise which is important for reliable fusion of video and depth data. Finally, the paper describes how to compensate systematic depth errors and calibrate all dynamic depth and video data into a common frame.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125811959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 142
Mutual information computation and maximization using GPU 基于GPU的互信息计算与最大化
Yuping Lin, G. Medioni
We present a GPU implementation to compute both mutual information and its derivatives. Mutual information computation is a highly demanding process due to the enormous number of exponential computations. It is therefore the bottleneck in many image registration applications. However, we show that these computations are fully parallizable and can be efficiently ported onto the GPU architecture. Compared with the same CPU implementation running on a workstation level CPU, we reached a factor of 170 in computing mutual information, and a factor of 400 in computing its derivatives.
我们提出了一种计算互信息及其导数的GPU实现。互信息计算是一个要求很高的过程,因为它需要大量的指数计算。因此,它是许多图像配准应用中的瓶颈。然而,我们证明了这些计算是完全可并行的,可以有效地移植到GPU架构上。与在工作站级CPU上运行的相同CPU实现相比,我们在计算互信息方面达到了170倍,在计算其导数方面达到了400倍。
{"title":"Mutual information computation and maximization using GPU","authors":"Yuping Lin, G. Medioni","doi":"10.1109/CVPRW.2008.4563101","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563101","url":null,"abstract":"We present a GPU implementation to compute both mutual information and its derivatives. Mutual information computation is a highly demanding process due to the enormous number of exponential computations. It is therefore the bottleneck in many image registration applications. However, we show that these computations are fully parallizable and can be efficiently ported onto the GPU architecture. Compared with the same CPU implementation running on a workstation level CPU, we reached a factor of 170 in computing mutual information, and a factor of 400 in computing its derivatives.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124645359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Investigating how and when perceptual organization cues improve boundary detection in natural images 研究感知组织线索如何以及何时改善自然图像的边界检测
Leandro A. Loss, G. Bebis, M. Nicolescu, A. Skurikhin
Boundary detection in natural images represents an important but also challenging problem in computer vision. Motivated by studies in psychophysics claiming that humans use multiple cues for segmentation, several promising methods have been proposed which perform boundary detection by optimally combining local image measurements such as color, texture, and brightness. Very interesting results have been reported by applying these methods on challenging datasets such as the Berkeley segmentation benchmark. Although combining different cues for boundary detection has been shown to outperform methods using a single cue, results can be further improved by integrating perceptual organization cues with the boundary detection process. The main goal of this study is to investigate how and when perceptual organization cues improve boundary detection in natural images. In this context, we investigate the idea of integrating with segmentation the iterative multi-scale tensor voting (IMSTV), a variant of tensor voting (TV) that performs perceptual grouping by analyzing information at multiple-scales and removing background clutter in an iterative fashion, preserving salient, organized structures. The key idea is to use IMSTV to post-process the boundary posterior probability (PB) map produced by segmentation algorithms. Detailed analysis of our experimental results reveals how and when perceptual organization cues are likely to improve or degrade boundary detection. In particular, we show that using perceptual grouping as a post-processing step improves boundary detection in 84% of the grayscale test images in the Berkeley segmentation dataset.
自然图像的边界检测是计算机视觉中一个重要而又具有挑战性的问题。由于心理物理学的研究表明人类使用多种线索进行分割,因此提出了几种有前途的方法,通过最佳地结合局部图像测量(如颜色、纹理和亮度)来进行边界检测。通过将这些方法应用于具有挑战性的数据集(如伯克利分割基准),已经报告了非常有趣的结果。虽然结合不同的线索进行边界检测已被证明优于使用单一线索的方法,但通过将感知组织线索与边界检测过程相结合,结果可以进一步改善。本研究的主要目的是研究感知组织线索如何以及何时改善自然图像的边界检测。在这种情况下,我们研究了将迭代多尺度张量投票(IMSTV)与分割相结合的想法,IMSTV是张量投票(TV)的一种变体,它通过在多尺度上分析信息并以迭代的方式去除背景杂波来执行感知分组,保留显著的、有组织的结构。其关键思想是利用IMSTV对分割算法产生的边界后验概率图进行后处理。对实验结果的详细分析揭示了感知组织线索如何以及何时可能改善或降低边界检测。特别是,我们表明,使用感知分组作为后处理步骤提高了伯克利分割数据集中84%的灰度测试图像的边界检测。
{"title":"Investigating how and when perceptual organization cues improve boundary detection in natural images","authors":"Leandro A. Loss, G. Bebis, M. Nicolescu, A. Skurikhin","doi":"10.1109/CVPRW.2008.4562974","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4562974","url":null,"abstract":"Boundary detection in natural images represents an important but also challenging problem in computer vision. Motivated by studies in psychophysics claiming that humans use multiple cues for segmentation, several promising methods have been proposed which perform boundary detection by optimally combining local image measurements such as color, texture, and brightness. Very interesting results have been reported by applying these methods on challenging datasets such as the Berkeley segmentation benchmark. Although combining different cues for boundary detection has been shown to outperform methods using a single cue, results can be further improved by integrating perceptual organization cues with the boundary detection process. The main goal of this study is to investigate how and when perceptual organization cues improve boundary detection in natural images. In this context, we investigate the idea of integrating with segmentation the iterative multi-scale tensor voting (IMSTV), a variant of tensor voting (TV) that performs perceptual grouping by analyzing information at multiple-scales and removing background clutter in an iterative fashion, preserving salient, organized structures. The key idea is to use IMSTV to post-process the boundary posterior probability (PB) map produced by segmentation algorithms. Detailed analysis of our experimental results reveals how and when perceptual organization cues are likely to improve or degrade boundary detection. In particular, we show that using perceptual grouping as a post-processing step improves boundary detection in 84% of the grayscale test images in the Berkeley segmentation dataset.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"54 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131470794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Camera localization and building reconstruction from single monocular images 单目图像的摄像机定位与建筑物重建
Ruisheng Wang, F. Ferrie
This paper presents a new method for reconstructing rectilinear buildings from single images under the assumption of flat terrain. An intuition of the method is that, given an image composed of rectilinear buildings, the 3D buildings can be geometrically reconstructed by using the image only. The recovery algorithm is formulated in terms of two objective functions which are based on the equivalence between the vector normal to the interpretation plane in the image space and the vector normal to the rotated interpretation plane in the object space. These objective functions are minimized with respect to the camera pose, the building dimensions, locations and orientations to obtain estimates for the structure of the scene. The method potentially provides a solution for large-scale urban modelling using aerial images, and can be easily extended to deal with piecewise planar objects in a more general situation.
本文提出了一种在平坦地形条件下从单幅图像重建直线建筑物的新方法。该方法的直观效果是,给定由直线建筑组成的图像,仅使用该图像就可以对三维建筑进行几何重构。基于图像空间中解释平面的法向量和物体空间中旋转解释平面的法向量的等价性,用两个目标函数来表述恢复算法。这些目标函数根据相机姿态、建筑尺寸、位置和方向进行最小化,以获得场景结构的估计。该方法潜在地为使用航空图像的大规模城市建模提供了解决方案,并且可以很容易地扩展到在更一般的情况下处理分段平面物体。
{"title":"Camera localization and building reconstruction from single monocular images","authors":"Ruisheng Wang, F. Ferrie","doi":"10.1109/CVPRW.2008.4563132","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563132","url":null,"abstract":"This paper presents a new method for reconstructing rectilinear buildings from single images under the assumption of flat terrain. An intuition of the method is that, given an image composed of rectilinear buildings, the 3D buildings can be geometrically reconstructed by using the image only. The recovery algorithm is formulated in terms of two objective functions which are based on the equivalence between the vector normal to the interpretation plane in the image space and the vector normal to the rotated interpretation plane in the object space. These objective functions are minimized with respect to the camera pose, the building dimensions, locations and orientations to obtain estimates for the structure of the scene. The method potentially provides a solution for large-scale urban modelling using aerial images, and can be easily extended to deal with piecewise planar objects in a more general situation.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125523584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Binocular dance pose recognition and body orientation estimation via multilinear analysis 基于多线性分析的双目舞蹈姿态识别与身体方位估计
Bo Peng, G. Qian
In this paper, we propose a novel approach to dance pose recognition and body orientation estimation using multilinear analysis. By performing tensor decomposition and projection using silhouette images obtained from wide base-line binocular cameras, low dimensional pose and body orientation coefficient vectors can be extracted. Different from traditional tensor-based recognition methods, the proposed approach takes the pose coefficient vector as features to train a family of support vector machines as pose classifiers. Using the body orientation coefficient vectors, a one-dimensional orientation manifold is learned and further used for the estimation of body orientation. Experiment results obtained using both synthetic and real image data showed the efficacy of the proposed approach, and that our approach outperformed the traditional tensor-based approach in the comparative test.
本文提出了一种基于多线性分析的舞蹈姿态识别和姿态估计方法。利用宽基线双目摄像机获得的轮廓图像进行张量分解和投影,提取低维姿态和身体方向系数矢量。与传统的基于张量的识别方法不同,该方法以位姿系数向量为特征,训练一组支持向量机作为位姿分类器。利用身体的方向系数向量,学习一维方向流形,并进一步用于身体的方向估计。合成和真实图像数据的实验结果均表明了该方法的有效性,并且在对比测试中优于传统的基于张量的方法。
{"title":"Binocular dance pose recognition and body orientation estimation via multilinear analysis","authors":"Bo Peng, G. Qian","doi":"10.1109/CVPRW.2008.4562970","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4562970","url":null,"abstract":"In this paper, we propose a novel approach to dance pose recognition and body orientation estimation using multilinear analysis. By performing tensor decomposition and projection using silhouette images obtained from wide base-line binocular cameras, low dimensional pose and body orientation coefficient vectors can be extracted. Different from traditional tensor-based recognition methods, the proposed approach takes the pose coefficient vector as features to train a family of support vector machines as pose classifiers. Using the body orientation coefficient vectors, a one-dimensional orientation manifold is learned and further used for the estimation of body orientation. Experiment results obtained using both synthetic and real image data showed the efficacy of the proposed approach, and that our approach outperformed the traditional tensor-based approach in the comparative test.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126884091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Improving the selection and detection of visual landmarks through object tracking 通过目标跟踪改进视觉地标的选择和检测
P. Espinace, A. Soto
The unsupervised selection and posterior recognition of visual landmarks is a highly valuable perceptual capability for a mobile robot. Recently, we proposed a system that aims to achieve this capability by combining a bottom-up data driven approach with top-down feedback provided by high level semantic representations. The bottom-up approach is based on three main mechanisms: visual attention, area segmentation, and landmark characterization. The top-down feedback is based on two information sources: i) An estimation of the robot position that reduces the searching scope for potential matches with previously selected landmarks, ii) A set of weights that, according to the results of previous recognitions, controls the influence of different segmentation algorithms in the recognition of each landmark. In this paper we explore the benefits of extending our previous work by including a visual tracking step for each of the selected landmarks. Our intuition is that the inclusion of a tracking step can help to improve the model of each landmark by associating and selecting information from its most significant views. Furthermore, it can also help to avoid problems related to the selection of spurious landmarks. Our results confirm these intuitions by showing that the inclusion of the tracking step produces a significant increase in the recall rate for landmark recognition.
视觉标志的无监督选择和后验识别是移动机器人非常宝贵的感知能力。最近,我们提出了一个旨在通过将自底向上的数据驱动方法与由高级语义表示提供的自顶向下的反馈相结合来实现此功能的系统。自下而上的方法基于三个主要机制:视觉注意、区域分割和地标表征。自上而下的反馈基于两个信息源:i)对机器人位置的估计,减少与先前选择的地标的潜在匹配的搜索范围;ii)一组权重,根据先前的识别结果,控制不同分割算法在识别每个地标时的影响。在本文中,我们探讨了通过为每个选定的地标包括视觉跟踪步骤来扩展我们以前的工作的好处。我们的直觉是,包含跟踪步骤可以通过关联和选择来自其最重要视图的信息来帮助改进每个地标的模型。此外,它还可以帮助避免与选择虚假地标相关的问题。我们的研究结果证实了这些直觉,表明跟踪步骤的加入显著提高了地标识别的召回率。
{"title":"Improving the selection and detection of visual landmarks through object tracking","authors":"P. Espinace, A. Soto","doi":"10.1109/CVPRW.2008.4563133","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563133","url":null,"abstract":"The unsupervised selection and posterior recognition of visual landmarks is a highly valuable perceptual capability for a mobile robot. Recently, we proposed a system that aims to achieve this capability by combining a bottom-up data driven approach with top-down feedback provided by high level semantic representations. The bottom-up approach is based on three main mechanisms: visual attention, area segmentation, and landmark characterization. The top-down feedback is based on two information sources: i) An estimation of the robot position that reduces the searching scope for potential matches with previously selected landmarks, ii) A set of weights that, according to the results of previous recognitions, controls the influence of different segmentation algorithms in the recognition of each landmark. In this paper we explore the benefits of extending our previous work by including a visual tracking step for each of the selected landmarks. Our intuition is that the inclusion of a tracking step can help to improve the model of each landmark by associating and selecting information from its most significant views. Furthermore, it can also help to avoid problems related to the selection of spurious landmarks. Our results confirm these intuitions by showing that the inclusion of the tracking step produces a significant increase in the recall rate for landmark recognition.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"11 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120807545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1