Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711487
Bor-Yiing Su, T. Brutch, K. Keutzer
Object recognition is a key problem in the field of computer vision. However, highly accurate object recognition systems are also computationally intensive, which limits their applicability. In this paper, we focus on a state-of-the-art object recognition system. We identify key computations of the system, examine efficient algorithms for parallelizing key computations, and develop a parallel object recognition system. The time taken by the training procedure on 127 images, with an average size of 0.15 M pixels, is reduced from 2332 seconds to 20 seconds. Similarly, the classification time of one 0.15 M pixel image is reduced from 331 seconds to 2.78 seconds. This efficient implementation of the object recognition system now makes it practical to train hundreds of images within minutes, and makes it possible to analyze image databases with hundreds or thousands of images in minutes, which was previously not possible.
{"title":"A parallel region based object recognition system","authors":"Bor-Yiing Su, T. Brutch, K. Keutzer","doi":"10.1109/WACV.2011.5711487","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711487","url":null,"abstract":"Object recognition is a key problem in the field of computer vision. However, highly accurate object recognition systems are also computationally intensive, which limits their applicability. In this paper, we focus on a state-of-the-art object recognition system. We identify key computations of the system, examine efficient algorithms for parallelizing key computations, and develop a parallel object recognition system. The time taken by the training procedure on 127 images, with an average size of 0.15 M pixels, is reduced from 2332 seconds to 20 seconds. Similarly, the classification time of one 0.15 M pixel image is reduced from 331 seconds to 2.78 seconds. This efficient implementation of the object recognition system now makes it practical to train hundreds of images within minutes, and makes it possible to analyze image databases with hundreds or thousands of images in minutes, which was previously not possible.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124859022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711513
Mohamed A. Naiel, M. Abdelwahab, M. El-Saban
A novel algorithm for view-invariant human action recognition is presented. This approach is based on Two-Dimensional Principal Component Analysis (2DPCA) applied directly on the Motion Energy Image (MEI) or the Motion History Image (MHI) in both the spatial domain and the transform domain. This method reduces the computational complexity by a factor of at least 66, achieving the highest recognition accuracy per camera, while maintaining minimum storage requirements, compared with the most recent reports in the field. Experimental results performed on the Weizmann action and the INIRIA IXMAS datasets confirm the excellent properties of the proposed algorithm, showing its robustness and ability to work with small number of training sequences. The dramatic reduction in computational complexity promotes the use in real time applications.
{"title":"Multi-view human action recognition system employing 2DPCA","authors":"Mohamed A. Naiel, M. Abdelwahab, M. El-Saban","doi":"10.1109/WACV.2011.5711513","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711513","url":null,"abstract":"A novel algorithm for view-invariant human action recognition is presented. This approach is based on Two-Dimensional Principal Component Analysis (2DPCA) applied directly on the Motion Energy Image (MEI) or the Motion History Image (MHI) in both the spatial domain and the transform domain. This method reduces the computational complexity by a factor of at least 66, achieving the highest recognition accuracy per camera, while maintaining minimum storage requirements, compared with the most recent reports in the field. Experimental results performed on the Weizmann action and the INIRIA IXMAS datasets confirm the excellent properties of the proposed algorithm, showing its robustness and ability to work with small number of training sequences. The dramatic reduction in computational complexity promotes the use in real time applications.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125854293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711553
Lu Wang, N. Yung
In this paper, we solve the problem of human detection in crowded scenes using a Bayesian 3D model based method. Human candidates are first nominated by a head detector and a foot detector, then optimization is performed to find the best configuration of the candidates and their corresponding shape models. The solution is obtained by decomposing the mutually related candidates into un-occluded ones and occluded ones in each iteration, and then performing model matching for the un-occluded candidates. To this end, in addition to some obvious clues, we also derive a graph that depicts the inter-object relation so that unreasonable decomposition is avoided. The merit of the proposed optimization procedure is that its computational cost is similar to the greedy optimization methods while its performance is comparable to the global optimization approaches. For model matching, it is performed by employing both prior knowledge and image likelihood, where the priors include the distribution of individual shape models and the restriction on the inter-object distance in real world, and image likelihood is provided by foreground extraction and the edge information. After the model matching, a validation and rejection strategy based on minimum description length is applied to confirm the candidates that have reliable matching results. The proposed method is tested on both the publicly available Caviar dataset and a challenging dataset constructed by ourselves. The experimental results demonstrate the effectiveness of our approach.
{"title":"Bayesian 3D model based human detection in crowded scenes using efficient optimization","authors":"Lu Wang, N. Yung","doi":"10.1109/WACV.2011.5711553","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711553","url":null,"abstract":"In this paper, we solve the problem of human detection in crowded scenes using a Bayesian 3D model based method. Human candidates are first nominated by a head detector and a foot detector, then optimization is performed to find the best configuration of the candidates and their corresponding shape models. The solution is obtained by decomposing the mutually related candidates into un-occluded ones and occluded ones in each iteration, and then performing model matching for the un-occluded candidates. To this end, in addition to some obvious clues, we also derive a graph that depicts the inter-object relation so that unreasonable decomposition is avoided. The merit of the proposed optimization procedure is that its computational cost is similar to the greedy optimization methods while its performance is comparable to the global optimization approaches. For model matching, it is performed by employing both prior knowledge and image likelihood, where the priors include the distribution of individual shape models and the restriction on the inter-object distance in real world, and image likelihood is provided by foreground extraction and the edge information. After the model matching, a validation and rejection strategy based on minimum description length is applied to confirm the candidates that have reliable matching results. The proposed method is tested on both the publicly available Caviar dataset and a challenging dataset constructed by ourselves. The experimental results demonstrate the effectiveness of our approach.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117248044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711551
Bo Yang, Pramod Sharma, R. Nevatia
In this paper we propose a vehicle detection framework on low resolution aerial range data. Our system consists of three steps: data mapping, 2D vehicle detection and postprocessing. First, we map the range data into 2D grayscale images by using the depth information only. For this purpose we propose a novel local ground plane estimation method, and the estimated ground plane is further refined by a global refinement process. Then we compute the depth value of missing points (points for which no depth information is available) by an effective interpolation method. In the second step, to train a classifier for the vehicles, we describe a method to generate more training examples from very few training annotations and adopt the fast cascade Adaboost approach for detecting vehicles in 2D grayscale images. Finally, in post-processing step we design a novel method to detect some vehicles which are comprised of clusters of missing points. We evaluate our method on real aerial data and the experiments demonstrate the effectiveness of our approach.
{"title":"Vehicle detection from low quality aerial LIDAR data","authors":"Bo Yang, Pramod Sharma, R. Nevatia","doi":"10.1109/WACV.2011.5711551","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711551","url":null,"abstract":"In this paper we propose a vehicle detection framework on low resolution aerial range data. Our system consists of three steps: data mapping, 2D vehicle detection and postprocessing. First, we map the range data into 2D grayscale images by using the depth information only. For this purpose we propose a novel local ground plane estimation method, and the estimated ground plane is further refined by a global refinement process. Then we compute the depth value of missing points (points for which no depth information is available) by an effective interpolation method. In the second step, to train a classifier for the vehicles, we describe a method to generate more training examples from very few training annotations and adopt the fast cascade Adaboost approach for detecting vehicles in 2D grayscale images. Finally, in post-processing step we design a novel method to detect some vehicles which are comprised of clusters of missing points. We evaluate our method on real aerial data and the experiments demonstrate the effectiveness of our approach.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129778686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711514
Yoshihiro Watanabe, Tetsuo Hatanaka, T. Komuro, M. Ishikawa
We focus on the growing need for a technology that can achieve motion capture in outdoor environments. The conventional approaches have relied mainly on fixed installed cameras. With this approach, however, it is difficult to capture motion in everyday surroundings. This paper describes a new method for motion estimation using a single wearable camera. We focused on walking motion. The key point is how the system can estimate the original walking state using limited information from a wearable sensor. This paper describes three aspects: the configuration of the sensing system, gait representation, and the gait estimation method.
{"title":"Human gait estimation using a wearable camera","authors":"Yoshihiro Watanabe, Tetsuo Hatanaka, T. Komuro, M. Ishikawa","doi":"10.1109/WACV.2011.5711514","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711514","url":null,"abstract":"We focus on the growing need for a technology that can achieve motion capture in outdoor environments. The conventional approaches have relied mainly on fixed installed cameras. With this approach, however, it is difficult to capture motion in everyday surroundings. This paper describes a new method for motion estimation using a single wearable camera. We focused on walking motion. The key point is how the system can estimate the original walking state using limited information from a wearable sensor. This paper describes three aspects: the configuration of the sensing system, gait representation, and the gait estimation method.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132538412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711498
W. Xu, Scott McCloskey
We describe a system for localizing and deblurring motion-blurred 2D barcodes. Previous work on barcode detection and deblurring has mainly focused on 1D barcodes, and has employed traditional image acquisition which is not robust to motion blur. Our solution is based on coded exposure imaging which, as we show, enables well-posed de-convolution and decoding over a wider range of velocities. To serve this solution, we developed a simple and effective approach for 2D barcode localization under motion blur, a metric for evaluating the quality of the deblurred 2D barcodes, and an approach for motion direction estimation in coded exposure images. We tested our system on real camera images of three popular 2D barcode symbologies: Data Matrix, PDF417 and Aztec Code.
{"title":"2D Barcode localization and motion deblurring using a flutter shutter camera","authors":"W. Xu, Scott McCloskey","doi":"10.1109/WACV.2011.5711498","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711498","url":null,"abstract":"We describe a system for localizing and deblurring motion-blurred 2D barcodes. Previous work on barcode detection and deblurring has mainly focused on 1D barcodes, and has employed traditional image acquisition which is not robust to motion blur. Our solution is based on coded exposure imaging which, as we show, enables well-posed de-convolution and decoding over a wider range of velocities. To serve this solution, we developed a simple and effective approach for 2D barcode localization under motion blur, a metric for evaluating the quality of the deblurred 2D barcodes, and an approach for motion direction estimation in coded exposure images. We tested our system on real camera images of three popular 2D barcode symbologies: Data Matrix, PDF417 and Aztec Code.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132086969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711481
M. Kaiser, Gernot Heym, Nicolas H. Lehment, D. Arsic, G. Rigoll
In this contribution a novel method to compute dense point-to-point correspondences between 3D faces is presented. The correspondences can be employed for various face processing applications, for example for building up a 3D Morphable Model (3DMM). Paths connecting landmarks are traced on the 3D facial surface and the resulting patches are mapped into a uv-space. Triangle quadrisection is used to build up remeshes with high point density for each 3D facial surface. Each vertex of a remesh has one corresponding vertex in another remesh and all remeshes have the same connectivity. The quality of the point-to-point correspondences is demonstrated on the bases of two applications, namely morphing and constructing a 3DMM.
{"title":"Dense point-to-point correspondences between 3D faces using parametric remeshing for constructing 3D Morphable Models","authors":"M. Kaiser, Gernot Heym, Nicolas H. Lehment, D. Arsic, G. Rigoll","doi":"10.1109/WACV.2011.5711481","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711481","url":null,"abstract":"In this contribution a novel method to compute dense point-to-point correspondences between 3D faces is presented. The correspondences can be employed for various face processing applications, for example for building up a 3D Morphable Model (3DMM). Paths connecting landmarks are traced on the 3D facial surface and the resulting patches are mapped into a uv-space. Triangle quadrisection is used to build up remeshes with high point density for each 3D facial surface. Each vertex of a remesh has one corresponding vertex in another remesh and all remeshes have the same connectivity. The quality of the point-to-point correspondences is demonstrated on the bases of two applications, namely morphing and constructing a 3DMM.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115790716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711572
Johan Almbladh, K. Netzell
An algorithm for robust motion detection in video is proposed in this work. The algorithm continuously analyses the dense pixel volume formed by the current frame and its nearest neighbours in time. By assuming continuity of motion in space and time, pixels on slanted edges in this timespace pixel volume are considered to be in motion. This is in contrast to prevailing foreground-background models used for motion detection that consider a pixel's history in aggregation. By using an efficient data reduction scheme and leveraging logical bit-parallel operations of current CPUs, real-time performance is achieved even on resource-scarce embedded devices. Video surveillance applications demand for efficient algorithms which robustly detect motion across a wide variety of conditions without the need for on-site parameter adjustments. Experiments with real-world video show robust motion detection results with the proposed method, especially under conditions normally considered difficult, such as continuously changing illumination.
{"title":"Real-time illumination-invariant motion detection in spatio-temporal image volumes","authors":"Johan Almbladh, K. Netzell","doi":"10.1109/WACV.2011.5711572","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711572","url":null,"abstract":"An algorithm for robust motion detection in video is proposed in this work. The algorithm continuously analyses the dense pixel volume formed by the current frame and its nearest neighbours in time. By assuming continuity of motion in space and time, pixels on slanted edges in this timespace pixel volume are considered to be in motion. This is in contrast to prevailing foreground-background models used for motion detection that consider a pixel's history in aggregation. By using an efficient data reduction scheme and leveraging logical bit-parallel operations of current CPUs, real-time performance is achieved even on resource-scarce embedded devices. Video surveillance applications demand for efficient algorithms which robustly detect motion across a wide variety of conditions without the need for on-site parameter adjustments. Experiments with real-world video show robust motion detection results with the proposed method, especially under conditions normally considered difficult, such as continuously changing illumination.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114580276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711533
Wei Guan, Suya You, U. Neumann
We present a retrieval-based tracking system that requires less computational time and cost. The system tracks a user's location through a small portion of an image captured by the camera, and then refines the camera pose by propagating matchings to the whole image. Augmented information such as building names and locations will be delivered to the user. The progressive way to process image data not only can provide the user with location information at real-time speed, but more importantly, it reduces the feature matching time by limiting the searching ranges. The proposed system contains two parts, offline database building and online user tracking. The database is composed of image patches with features and location information. The images are captured at different locations of interests from different viewing angles and distances, and then these images are partitioned into smaller patches. The location of a user can be calculated by querying one or more patches of the captured image. Moreover, the system is capable to handle large occlusions in images due to the patch approach. Experiments show that the proposed tracking system is efficient and robust in many different environments.
{"title":"Computationally efficient retrieval-based tracking system and augmented reality for large-scale areas","authors":"Wei Guan, Suya You, U. Neumann","doi":"10.1109/WACV.2011.5711533","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711533","url":null,"abstract":"We present a retrieval-based tracking system that requires less computational time and cost. The system tracks a user's location through a small portion of an image captured by the camera, and then refines the camera pose by propagating matchings to the whole image. Augmented information such as building names and locations will be delivered to the user. The progressive way to process image data not only can provide the user with location information at real-time speed, but more importantly, it reduces the feature matching time by limiting the searching ranges. The proposed system contains two parts, offline database building and online user tracking. The database is composed of image patches with features and location information. The images are captured at different locations of interests from different viewing angles and distances, and then these images are partitioned into smaller patches. The location of a user can be calculated by querying one or more patches of the captured image. Moreover, the system is capable to handle large occlusions in images due to the patch approach. Experiments show that the proposed tracking system is efficient and robust in many different environments.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115055675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a novel approach for enhancing precision in a leading video analytics system that detects cashier fraud in grocery stores for loss prevention. While intelligent video analytics has recently become a promising means of loss prevention for retailers, most of the real-world systems suffer from a large number of false alarms, resulting in a significant waste of human labor during manual verification. Our proposed approach starts with the candidate fraudulent events detected by a state-of-the-art system. Such fraudulent events are a set of visually recognized checkout-related activities of the cashier without barcode associations. Instead of conducting costly video analysis, we extract a few keyframes to represent the essence of each candidate fraudulent event, and compare those keyframes to identify whether or not the event is a valid check-out process that involves consistent appearance changes on the lead-in belt, the scan area and the take-away belt. Our approach also performs a margin-based soft classification so that the user could trade off between saving human labor and preserving high recall. Experiments on days of surveillance videos collected from real grocery stores show that our algorithm can save about 50% of human labor while preserving over 90% of true alarms with small computational overhead.
{"title":"Soft margin keyframe comparison: Enhancing precision of fraud detection in retail surveillance","authors":"Jiyan Pan, Quanfu Fan, Sharath Pankanti, Hoang Trinh, Prasad Gabbur, S. Miyazawa","doi":"10.1109/WACV.2011.5711552","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711552","url":null,"abstract":"We propose a novel approach for enhancing precision in a leading video analytics system that detects cashier fraud in grocery stores for loss prevention. While intelligent video analytics has recently become a promising means of loss prevention for retailers, most of the real-world systems suffer from a large number of false alarms, resulting in a significant waste of human labor during manual verification. Our proposed approach starts with the candidate fraudulent events detected by a state-of-the-art system. Such fraudulent events are a set of visually recognized checkout-related activities of the cashier without barcode associations. Instead of conducting costly video analysis, we extract a few keyframes to represent the essence of each candidate fraudulent event, and compare those keyframes to identify whether or not the event is a valid check-out process that involves consistent appearance changes on the lead-in belt, the scan area and the take-away belt. Our approach also performs a margin-based soft classification so that the user could trade off between saving human labor and preserving high recall. Experiments on days of surveillance videos collected from real grocery stores show that our algorithm can save about 50% of human labor while preserving over 90% of true alarms with small computational overhead.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116396566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}