Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711500
Geoffrey Oxholm, K. Nishino
We introduce a novel method for matching and aligning 3D surfaces that do not have any overlapping surface information. When two matching surfaces do not overlap, all that remains in common between them is a thin strip along their borders. Aligning such fragments is challenging but crucial for various applications, such as reassembly of thin-shell ceramics from their broken pieces. Past work approach this problem by heavily relying on simplistic assumptions about the shape of the object, or its texture. Our method makes no such assumptions; instead, we leverage the geometric and photometric similarity of the matching surfaces along the break-line. We first encode the shape and color of the boundary contour of each fragment at various scales in a novel 2D representation. Reformulating contour matching as 2D image registration based on these scale-space images enables efficient and accurate break-line matching. We then align the fragments by estimating the rotation around the break-line through maximizing the geometric continuity across it with a least-squares minimization. We evaluate our method on real-word colonial artifacts recently excavated in Philadelphia, Pennsylvania. Our system dramatically increases the ease and efficiency at which users reassemble artifacts as we demonstrate on three different vessels.
{"title":"Aligning surfaces without aligning surfaces","authors":"Geoffrey Oxholm, K. Nishino","doi":"10.1109/WACV.2011.5711500","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711500","url":null,"abstract":"We introduce a novel method for matching and aligning 3D surfaces that do not have any overlapping surface information. When two matching surfaces do not overlap, all that remains in common between them is a thin strip along their borders. Aligning such fragments is challenging but crucial for various applications, such as reassembly of thin-shell ceramics from their broken pieces. Past work approach this problem by heavily relying on simplistic assumptions about the shape of the object, or its texture. Our method makes no such assumptions; instead, we leverage the geometric and photometric similarity of the matching surfaces along the break-line. We first encode the shape and color of the boundary contour of each fragment at various scales in a novel 2D representation. Reformulating contour matching as 2D image registration based on these scale-space images enables efficient and accurate break-line matching. We then align the fragments by estimating the rotation around the break-line through maximizing the geometric continuity across it with a least-squares minimization. We evaluate our method on real-word colonial artifacts recently excavated in Philadelphia, Pennsylvania. Our system dramatically increases the ease and efficiency at which users reassemble artifacts as we demonstrate on three different vessels.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124507570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711507
A. Dantcheva, N. Erdogmus, J. Dugelay
This work studies eye color as a soft biometric trait and provides a novel insight about the influence of pertinent factors in this context, like color spaces, illumination and presence of glasses. A motivation for the paper is the fact that the human iris color is an essential facial trait for Caucasians, which can be employed in iris pattern recognition systems for pruning the search or in soft biometrics systems for person re-identification. Towards studying iris color as a soft biometric trait, we consider a system for automatic detection of eye color, based on standard facial images. The system entails automatic iris localization, followed by classification based on Gaussian Mixture Models with Expectation Maximization. We finally provide related detection results on the UBIRIS2 database employable in a real time eye color detection system.
{"title":"On the reliability of eye color as a soft biometric trait","authors":"A. Dantcheva, N. Erdogmus, J. Dugelay","doi":"10.1109/WACV.2011.5711507","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711507","url":null,"abstract":"This work studies eye color as a soft biometric trait and provides a novel insight about the influence of pertinent factors in this context, like color spaces, illumination and presence of glasses. A motivation for the paper is the fact that the human iris color is an essential facial trait for Caucasians, which can be employed in iris pattern recognition systems for pruning the search or in soft biometrics systems for person re-identification. Towards studying iris color as a soft biometric trait, we consider a system for automatic detection of eye color, based on standard facial images. The system entails automatic iris localization, followed by classification based on Gaussian Mixture Models with Expectation Maximization. We finally provide related detection results on the UBIRIS2 database employable in a real time eye color detection system.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125310425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711521
Jens Puwein, R. Ziegler, Julia Vogel, M. Pollefeys
Real-world camera networks are often characterized by very wide baselines covering a wide range of viewpoints. We describe a method not only calibrating each camera sequence added to the system automatically, but also taking advantage of multi-view correspondences to make the entire calibration framework more robust. Novel camera sequences can be seamlessly integrated into the system at any time, adding to the robustness of future computations. One of the challenges consists in establishing correspondences between cameras. Initializing a bag of features from a calibrated frame, correspondences between cameras are established in a two-step procedure. First, affine invariant features of camera sequences are warped into a common coordinate frame and a coarse matching is obtained between the collected features and the incrementally built and updated bag of features. This allows us to warp images to a common view. Second, scale invariant features are extracted from the warped images. This leads to both more numerous and more accurate correspondences. Finally, the parameters are optimized in a bundle adjustment. Adding the feature descriptors and the optimized 3D positions to the bag of features, we obtain a feature-based scene abstraction, allowing for the calibration of novel sequences and the correction of drift in single-view calibration tracking. We demonstrate that our approach can deal with wide baselines. Novel sequences can seamlessly be integrated in the calibration framework.
{"title":"Robust multi-view camera calibration for wide-baseline camera networks","authors":"Jens Puwein, R. Ziegler, Julia Vogel, M. Pollefeys","doi":"10.1109/WACV.2011.5711521","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711521","url":null,"abstract":"Real-world camera networks are often characterized by very wide baselines covering a wide range of viewpoints. We describe a method not only calibrating each camera sequence added to the system automatically, but also taking advantage of multi-view correspondences to make the entire calibration framework more robust. Novel camera sequences can be seamlessly integrated into the system at any time, adding to the robustness of future computations. One of the challenges consists in establishing correspondences between cameras. Initializing a bag of features from a calibrated frame, correspondences between cameras are established in a two-step procedure. First, affine invariant features of camera sequences are warped into a common coordinate frame and a coarse matching is obtained between the collected features and the incrementally built and updated bag of features. This allows us to warp images to a common view. Second, scale invariant features are extracted from the warped images. This leads to both more numerous and more accurate correspondences. Finally, the parameters are optimized in a bundle adjustment. Adding the feature descriptors and the optimized 3D positions to the bag of features, we obtain a feature-based scene abstraction, allowing for the calibration of novel sequences and the correction of drift in single-view calibration tracking. We demonstrate that our approach can deal with wide baselines. Novel sequences can seamlessly be integrated in the calibration framework.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130305231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711573
Jonathan Ventura, Tobias Höllerer
In this paper we report an evaluation of keypoint descriptor compression using as little as 16 bits to describe a single keypoint. We use spectral hashing to compress keypoint descriptors, and match them using the Hamming distance. By indexing the keypoints in a binary tree, we can quickly recognize keypoints with a very small database, and efficiently insert new keypoints. Our tests using image datasets with perspective distortion show the method to enable fast keypoint recognition and image retrieval with a small code size, and point towards potential applications for scalable visual SLAM on mobile phones.
{"title":"Fast and scalable keypoint recognition and image retrieval using binary codes","authors":"Jonathan Ventura, Tobias Höllerer","doi":"10.1109/WACV.2011.5711573","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711573","url":null,"abstract":"In this paper we report an evaluation of keypoint descriptor compression using as little as 16 bits to describe a single keypoint. We use spectral hashing to compress keypoint descriptors, and match them using the Hamming distance. By indexing the keypoints in a binary tree, we can quickly recognize keypoints with a very small database, and efficiently insert new keypoints. Our tests using image datasets with perspective distortion show the method to enable fast keypoint recognition and image retrieval with a small code size, and point towards potential applications for scalable visual SLAM on mobile phones.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122160219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711478
Quan Wang, Wei Guan, Suya You
Finding corresponding image points is a challenging computer vision problem, especially for confusing scenes with surfaces of low textures or repeated patterns. Despite the well-known challenges of extracting conceptually meaningful high-level matching primitives, many recent works describe high-level image features such as edge groups, lines and regions, which are more distinctive than traditional local appearance based features, to tackle such difficult scenes. In this paper, we propose a different and more general approach, which treats the image matching problem as a recognition problem of spatially related image patch sets. We construct augmented semi-global descriptors (ordinal codes) based on subsets of scale and orientation invariant local keypoint descriptors. Tied ranking problem of ordinal codes is handled by increasingly keypoint sampling around image patch sets. Finally, similarities of augmented features are measured using Spearman correlation coefficient. Our proposed method is compatible with a large range of existing local image descriptors. Experimental results based on standard benchmark datasets and SURF descriptors have demonstrated its distinctiveness and effectiveness.
{"title":"Augmented distinctive features for efficient image matching","authors":"Quan Wang, Wei Guan, Suya You","doi":"10.1109/WACV.2011.5711478","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711478","url":null,"abstract":"Finding corresponding image points is a challenging computer vision problem, especially for confusing scenes with surfaces of low textures or repeated patterns. Despite the well-known challenges of extracting conceptually meaningful high-level matching primitives, many recent works describe high-level image features such as edge groups, lines and regions, which are more distinctive than traditional local appearance based features, to tackle such difficult scenes. In this paper, we propose a different and more general approach, which treats the image matching problem as a recognition problem of spatially related image patch sets. We construct augmented semi-global descriptors (ordinal codes) based on subsets of scale and orientation invariant local keypoint descriptors. Tied ranking problem of ordinal codes is handled by increasingly keypoint sampling around image patch sets. Finally, similarities of augmented features are measured using Spearman correlation coefficient. Our proposed method is compatible with a large range of existing local image descriptors. Experimental results based on standard benchmark datasets and SURF descriptors have demonstrated its distinctiveness and effectiveness.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117032881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711547
D. Vaquero, Natasha Gelfand, M. Tico, K. Pulli, M. Turk
All-in-focus imaging is a computational photography technique that produces images free of defocus blur by capturing a stack of images focused at different distances and merging them into a single sharp result. Current approaches assume that images have been captured offline, and that a reasonably powerful computer is available to process them. In contrast, we focus on the problem of how to capture such input stacks in an efficient and scene-adaptive fashion. Inspired by passive autofocus techniques, which select a single best plane of focus in the scene, we propose a method to automatically select a minimal set of images, focused at different depths, such that all objects in a given scene are in focus in at least one image. We aim to minimize both the amount of time spent metering the scene and capturing the images, and the total amount of high-resolution data that is captured. The algorithm first analyzes a set of low-resolution sharpness measurements of the scene while continuously varying the focus distance of the lens. From these measurements, we estimate the final lens positions required to capture all objects in the scene in acceptable focus. We demonstrate the use of our technique in a mobile computational photography scenario, where it is essential to minimize image capture time (as the camera is typically handheld) and processing time (as the computation and energy resources are limited).
{"title":"Generalized autofocus","authors":"D. Vaquero, Natasha Gelfand, M. Tico, K. Pulli, M. Turk","doi":"10.1109/WACV.2011.5711547","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711547","url":null,"abstract":"All-in-focus imaging is a computational photography technique that produces images free of defocus blur by capturing a stack of images focused at different distances and merging them into a single sharp result. Current approaches assume that images have been captured offline, and that a reasonably powerful computer is available to process them. In contrast, we focus on the problem of how to capture such input stacks in an efficient and scene-adaptive fashion. Inspired by passive autofocus techniques, which select a single best plane of focus in the scene, we propose a method to automatically select a minimal set of images, focused at different depths, such that all objects in a given scene are in focus in at least one image. We aim to minimize both the amount of time spent metering the scene and capturing the images, and the total amount of high-resolution data that is captured. The algorithm first analyzes a set of low-resolution sharpness measurements of the scene while continuously varying the focus distance of the lens. From these measurements, we estimate the final lens positions required to capture all objects in the scene in acceptable focus. We demonstrate the use of our technique in a mobile computational photography scenario, where it is essential to minimize image capture time (as the camera is typically handheld) and processing time (as the computation and energy resources are limited).","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128163000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711489
Mahmoud Bassiouny, M. El-Saban
Object instance matching is a cornerstone component in many computer vision applications such as image search, augmented reality and unsupervised tagging. The common flow in these applications is to take an input image and match it against a database of previously enrolled images of objects of interest. This is usually difficult as one needs to capture an image corresponding to an object view already present in the database, especially in the case of 3D objects with high curvature where light reflection, viewpoint change and partial occlusion can significantly alter the appearance of the captured image. Rather than relying on having numerous views of each object in the database, we propose an alternative method of capturing a short video sequence scanning a certain object and utilize information from multiple frames to improve the chance of a successful match in the database. The matching step combines local features from a number of frames and incrementally forms a point cloud describing the object. We conduct experiments on a database of different object types showing promising matching results on both a privately collected set of videos and those freely available on the Web such that on YouTube. Increase in accuracy of up to 20% over the baseline of using a single frame matching is shown to be possible.
{"title":"Object matching using feature aggregation over a frame sequence","authors":"Mahmoud Bassiouny, M. El-Saban","doi":"10.1109/WACV.2011.5711489","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711489","url":null,"abstract":"Object instance matching is a cornerstone component in many computer vision applications such as image search, augmented reality and unsupervised tagging. The common flow in these applications is to take an input image and match it against a database of previously enrolled images of objects of interest. This is usually difficult as one needs to capture an image corresponding to an object view already present in the database, especially in the case of 3D objects with high curvature where light reflection, viewpoint change and partial occlusion can significantly alter the appearance of the captured image. Rather than relying on having numerous views of each object in the database, we propose an alternative method of capturing a short video sequence scanning a certain object and utilize information from multiple frames to improve the chance of a successful match in the database. The matching step combines local features from a number of frames and incrementally forms a point cloud describing the object. We conduct experiments on a database of different object types showing promising matching results on both a privately collected set of videos and those freely available on the Web such that on YouTube. Increase in accuracy of up to 20% over the baseline of using a single frame matching is shown to be possible.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"06 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115628406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711482
Bo Li, H. Johan
2D sketch-3D model alignment is important for many applications such as sketch-based 3D model retrieval, sketch-based 3D modeling as well as model-based vision and recognition. In this paper, we propose a 2D sketch-3D model alignment algorithm using view context and shape context matching. A sketch consists of a set of curves. A 3D model is typically a 3D triangle mesh. It includes two main steps: precomputation and actual alignment. In the precomputation, we extract the view context features of a set of sample views for a 3D model to be aligned. To speed up the precomputation, two computationally efficient and rotation-invariant features, Zernike moments and Fourier descriptors are used to represent a view. In the actual alignment, we prune most sample views which are dissimilar to the sketch very quickly based on their view context similarities. Finally, to find an approximate pose, we only compare the sketch with a very small portion (e.g. 5% in our experiments) of the sample views based on shape context matching. Experiments on two types of datasets show that the algorithm can align 2D sketches with 3D models approximately.
{"title":"View context based 2D sketch-3D model alignment","authors":"Bo Li, H. Johan","doi":"10.1109/WACV.2011.5711482","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711482","url":null,"abstract":"2D sketch-3D model alignment is important for many applications such as sketch-based 3D model retrieval, sketch-based 3D modeling as well as model-based vision and recognition. In this paper, we propose a 2D sketch-3D model alignment algorithm using view context and shape context matching. A sketch consists of a set of curves. A 3D model is typically a 3D triangle mesh. It includes two main steps: precomputation and actual alignment. In the precomputation, we extract the view context features of a set of sample views for a 3D model to be aligned. To speed up the precomputation, two computationally efficient and rotation-invariant features, Zernike moments and Fourier descriptors are used to represent a view. In the actual alignment, we prune most sample views which are dissimilar to the sketch very quickly based on their view context similarities. Finally, to find an approximate pose, we only compare the sketch with a very small portion (e.g. 5% in our experiments) of the sample views based on shape context matching. Experiments on two types of datasets show that the algorithm can align 2D sketches with 3D models approximately.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121788939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711554
Tao Wang, Rui Li, Zhigang Zhu, Yufu Qu
Laser Doppler Vibrometers (LDVs) have been widely applied for detecting vibrations in applications such as mechanics, bridge inspection, biometrics, as well as long-range surveillance in which acoustic signatures can be obtained at a large distance. However, in both industrial and scientific applications, the LDVs are manually controlled in surface selection, laser focusing, and acoustic acquisition. In this paper, we propose an active stereo vision approach to facilitate fast and automated laser pointing and tracking for long-range LDV hearing. The system contains: 1) a mirror on a Pan-Tilt-Unit (PTU) to reflect the laser beam to any locations freely and quickly, and 2) two Pan-Tilt-Zoom (PTZ) cameras, one of which is mounted on the Pan-Tilt-Unit (PTU) and aligned with the laser beam synchronously. The distance measurement using the stereo vision system as well as triangulation between camera and the LDV laser beam allow us to fast focus the laser beam on selected surfaces and to obtain acoustic signals up to 200 meters in real time. We present some promising results with the collaborative visual and LDV measurements for laser pointing and focusing in order to achieve long range audio detection.
{"title":"Active stereo vision for improving long range hearing using a Laser Doppler Vibrometer","authors":"Tao Wang, Rui Li, Zhigang Zhu, Yufu Qu","doi":"10.1109/WACV.2011.5711554","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711554","url":null,"abstract":"Laser Doppler Vibrometers (LDVs) have been widely applied for detecting vibrations in applications such as mechanics, bridge inspection, biometrics, as well as long-range surveillance in which acoustic signatures can be obtained at a large distance. However, in both industrial and scientific applications, the LDVs are manually controlled in surface selection, laser focusing, and acoustic acquisition. In this paper, we propose an active stereo vision approach to facilitate fast and automated laser pointing and tracking for long-range LDV hearing. The system contains: 1) a mirror on a Pan-Tilt-Unit (PTU) to reflect the laser beam to any locations freely and quickly, and 2) two Pan-Tilt-Zoom (PTZ) cameras, one of which is mounted on the Pan-Tilt-Unit (PTU) and aligned with the laser beam synchronously. The distance measurement using the stereo vision system as well as triangulation between camera and the LDV laser beam allow us to fast focus the laser beam on selected surfaces and to obtain acoustic signals up to 200 meters in real time. We present some promising results with the collaborative visual and LDV measurements for laser pointing and focusing in order to achieve long range audio detection.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131737338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711542
L. Ríha, M. Manohar
The connected component labeling is an essential task for detecting moving objects and tracking them in video surveillance application. Since tracking algorithms are designed for real-time applications, efficiencies of the underlying algorithms become critical. In this paper we present a new one-pass algorithm for computing minimal binding rectangles of all the connected components of background foreground segmented video frames (binary data) using GPU accelerator. The given image frame is scanned once in raster scan mode and the background foreground transition information is stored in a directed-graph where each transition is represented by a node. This data structure contains the locations of object edges in every row, and it is used to detect connected components in the image and extract its main features, e.g. bounding box size and location, location of the centroid, real size, etc. Further we use GPU acceleration to speed up feature extraction from the image to a directed graph from which minimal bounding rectangles will be computed subsequently. Also we compare the performance of GPU acceleration (using Tesla C2050 accelerator card) with the performance of multi-core (up 24 cores) general purpose CPU implementation of the algorithm.
{"title":"GPU accelerated one-pass algorithm for computing minimal rectangles of connected components","authors":"L. Ríha, M. Manohar","doi":"10.1109/WACV.2011.5711542","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711542","url":null,"abstract":"The connected component labeling is an essential task for detecting moving objects and tracking them in video surveillance application. Since tracking algorithms are designed for real-time applications, efficiencies of the underlying algorithms become critical. In this paper we present a new one-pass algorithm for computing minimal binding rectangles of all the connected components of background foreground segmented video frames (binary data) using GPU accelerator. The given image frame is scanned once in raster scan mode and the background foreground transition information is stored in a directed-graph where each transition is represented by a node. This data structure contains the locations of object edges in every row, and it is used to detect connected components in the image and extract its main features, e.g. bounding box size and location, location of the centroid, real size, etc. Further we use GPU acceleration to speed up feature extraction from the image to a directed graph from which minimal bounding rectangles will be computed subsequently. Also we compare the performance of GPU acceleration (using Tesla C2050 accelerator card) with the performance of multi-core (up 24 cores) general purpose CPU implementation of the algorithm.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132043779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}