Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126443
Bhaskar Chakraborty, M. B. Holte, T. Moeslund, Jordi Gonzàlez, F. X. Roca
Recent progress in the field of human action recognition points towards the use of Spatio-Temporal Interest Points (STIPs) for local descriptor-based recognition strategies. In this paper we present a new approach for STIP detection by applying surround suppression combined with local and temporal constraints. Our method is significantly different from existing STIP detectors and improves the performance by detecting more repeatable, stable and distinctive STIPs for human actors, while suppressing unwanted background STIPs. For action representation we use a bag-of-visual words (BoV) model of local N-jet features to build a vocabulary of visual-words. To this end, we introduce a novel vocabulary building strategy by combining spatial pyramid and vocabulary compression techniques, resulting in improved performance and efficiency. Action class specific Support Vector Machine (SVM) classifiers are trained for categorization of human actions. A comprehensive set of experiments on existing benchmark datasets, and more challenging datasets of complex scenes, validate our approach and show state-of-the-art performance.
{"title":"A selective spatio-temporal interest point detector for human action recognition in complex scenes","authors":"Bhaskar Chakraborty, M. B. Holte, T. Moeslund, Jordi Gonzàlez, F. X. Roca","doi":"10.1109/ICCV.2011.6126443","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126443","url":null,"abstract":"Recent progress in the field of human action recognition points towards the use of Spatio-Temporal Interest Points (STIPs) for local descriptor-based recognition strategies. In this paper we present a new approach for STIP detection by applying surround suppression combined with local and temporal constraints. Our method is significantly different from existing STIP detectors and improves the performance by detecting more repeatable, stable and distinctive STIPs for human actors, while suppressing unwanted background STIPs. For action representation we use a bag-of-visual words (BoV) model of local N-jet features to build a vocabulary of visual-words. To this end, we introduce a novel vocabulary building strategy by combining spatial pyramid and vocabulary compression techniques, resulting in improved performance and efficiency. Action class specific Support Vector Machine (SVM) classifiers are trained for categorization of human actions. A comprehensive set of experiments on existing benchmark datasets, and more challenging datasets of complex scenes, validate our approach and show state-of-the-art performance.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"1 1","pages":"1776-1783"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79874981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126441
T. Pock, A. Chambolle
In this paper we study preconditioning techniques for the first-order primal-dual algorithm proposed in [5]. In particular, we propose simple and easy to compute diagonal preconditioners for which convergence of the algorithm is guaranteed without the need to compute any step size parameters. As a by-product, we show that for a certain instance of the preconditioning, the proposed algorithm is equivalent to the old and widely unknown alternating step method for monotropic programming [7]. We show numerical results on general linear programming problems and a few standard computer vision problems. In all examples, the preconditioned algorithm significantly outperforms the algorithm of [5].
{"title":"Diagonal preconditioning for first order primal-dual algorithms in convex optimization","authors":"T. Pock, A. Chambolle","doi":"10.1109/ICCV.2011.6126441","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126441","url":null,"abstract":"In this paper we study preconditioning techniques for the first-order primal-dual algorithm proposed in [5]. In particular, we propose simple and easy to compute diagonal preconditioners for which convergence of the algorithm is guaranteed without the need to compute any step size parameters. As a by-product, we show that for a certain instance of the preconditioning, the proposed algorithm is equivalent to the old and widely unknown alternating step method for monotropic programming [7]. We show numerical results on general linear programming problems and a few standard computer vision problems. In all examples, the preconditioned algorithm significantly outperforms the algorithm of [5].","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"68 1","pages":"1762-1769"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84142628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126485
W. Chojnacki, A. Hengel
It is shown that the set of all I-element collections of interdependent homography matrices describing homographies induced by I planes in the 3D scene between two views has dimension 4I + 7. This improves on an earlier result which gave an upper bound for the dimension in question, and solves a long-standing open problem. The significance of the present result lies in that it is critical to the identification of the full set of constraints to which collections of interdependent homography matrices are subject, which in turn is critical to the design of constrained optimisation techniques for estimating such collections from image data.
{"title":"A dimensionality result for multiple homography matrices","authors":"W. Chojnacki, A. Hengel","doi":"10.1109/ICCV.2011.6126485","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126485","url":null,"abstract":"It is shown that the set of all I-element collections of interdependent homography matrices describing homographies induced by I planes in the 3D scene between two views has dimension 4I + 7. This improves on an earlier result which gave an upper bound for the dimension in question, and solves a long-standing open problem. The significance of the present result lies in that it is critical to the identification of the full set of constraints to which collections of interdependent homography matrices are subject, which in turn is critical to the design of constrained optimisation techniques for estimating such collections from image data.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"324 1","pages":"2104-2109"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80322229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126455
Esther Horbert, Konstantinos Rematas, B. Leibe
In this paper, we address the problem of segmentation-based tracking of multiple articulated persons. We propose two improvements to current level-set tracking formulations. The first is a localized appearance model that uses additional level-sets in order to enforce a hierarchical subdivision of the object shape into multiple connected regions with distinct appearance models. The second is a novel mechanism to include detailed object shape information in the form of a per-pixel figure/ground probability map obtained from an object detection process. Both contributions are seamlessly integrated into the level-set framework. Together, they considerably improve the accuracy of the tracked segmentations. We experimentally evaluate our proposed approach on two challenging sequences and demonstrate its good performance in practice.
{"title":"Level-set person segmentation and tracking with multi-region appearance models and top-down shape information","authors":"Esther Horbert, Konstantinos Rematas, B. Leibe","doi":"10.1109/ICCV.2011.6126455","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126455","url":null,"abstract":"In this paper, we address the problem of segmentation-based tracking of multiple articulated persons. We propose two improvements to current level-set tracking formulations. The first is a localized appearance model that uses additional level-sets in order to enforce a hierarchical subdivision of the object shape into multiple connected regions with distinct appearance models. The second is a novel mechanism to include detailed object shape information in the form of a per-pixel figure/ground probability map obtained from an object detection process. Both contributions are seamlessly integrated into the level-set framework. Together, they considerably improve the accuracy of the tracked segmentations. We experimentally evaluate our proposed approach on two challenging sequences and demonstrate its good performance in practice.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"274 1","pages":"1871-1878"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77831238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126348
J. Feng, Yichen Wei, Litian Tao, Chao Zhang, Jian Sun
Conventional saliency analysis methods measure the saliency of individual pixels. The resulting saliency map inevitably loses information in the original image and finding salient objects in it is difficult. We propose to detect salient objects by directly measuring the saliency of an image window in the original image and adopt the well established sliding window based object detection paradigm.
{"title":"Salient object detection by composition","authors":"J. Feng, Yichen Wei, Litian Tao, Chao Zhang, Jian Sun","doi":"10.1109/ICCV.2011.6126348","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126348","url":null,"abstract":"Conventional saliency analysis methods measure the saliency of individual pixels. The resulting saliency map inevitably loses information in the original image and finding salient objects in it is difficult. We propose to detect salient objects by directly measuring the saliency of an image window in the original image and adopt the well established sliding window based object detection paradigm.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"8 1","pages":"1028-1035"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82225422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126527
J. Yagnik, Dennis W. Strelow, David A. Ross, Ruei-Sung Lin
Rank correlation measures are known for their resilience to perturbations in numeric values and are widely used in many evaluation metrics. Such ordinal measures have rarely been applied in treatment of numeric features as a representational transformation. We emphasize the benefits of ordinal representations of input features both theoretically and empirically. We present a family of algorithms for computing ordinal embeddings based on partial order statistics. Apart from having the stability benefits of ordinal measures, these embeddings are highly nonlinear, giving rise to sparse feature spaces highly favored by several machine learning methods. These embeddings are deterministic, data independent and by virtue of being based on partial order statistics, add another degree of resilience to noise. These machine-learning-free methods when applied to the task of fast similarity search outperform state-of-the-art machine learning methods with complex optimization setups. For solving classification problems, the embeddings provide a nonlinear transformation resulting in sparse binary codes that are well-suited for a large class of machine learning algorithms. These methods show significant improvement on VOC 2010 using simple linear classifiers which can be trained quickly. Our method can be extended to the case of polynomial kernels, while permitting very efficient computation. Further, since the popular Min Hash algorithm is a special case of our method, we demonstrate an efficient scheme for computing Min Hash on conjunctions of binary features. The actual method can be implemented in about 10 lines of code in most languages (2 lines in MAT-LAB), and does not require any data-driven optimization.
{"title":"The power of comparative reasoning","authors":"J. Yagnik, Dennis W. Strelow, David A. Ross, Ruei-Sung Lin","doi":"10.1109/ICCV.2011.6126527","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126527","url":null,"abstract":"Rank correlation measures are known for their resilience to perturbations in numeric values and are widely used in many evaluation metrics. Such ordinal measures have rarely been applied in treatment of numeric features as a representational transformation. We emphasize the benefits of ordinal representations of input features both theoretically and empirically. We present a family of algorithms for computing ordinal embeddings based on partial order statistics. Apart from having the stability benefits of ordinal measures, these embeddings are highly nonlinear, giving rise to sparse feature spaces highly favored by several machine learning methods. These embeddings are deterministic, data independent and by virtue of being based on partial order statistics, add another degree of resilience to noise. These machine-learning-free methods when applied to the task of fast similarity search outperform state-of-the-art machine learning methods with complex optimization setups. For solving classification problems, the embeddings provide a nonlinear transformation resulting in sparse binary codes that are well-suited for a large class of machine learning algorithms. These methods show significant improvement on VOC 2010 using simple linear classifiers which can be trained quickly. Our method can be extended to the case of polynomial kernels, while permitting very efficient computation. Further, since the popular Min Hash algorithm is a special case of our method, we demonstrate an efficient scheme for computing Min Hash on conjunctions of binary features. The actual method can be implemented in about 10 lines of code in most languages (2 lines in MAT-LAB), and does not require any data-driven optimization.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"44 1","pages":"2431-2438"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82558505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126262
Yao-Jen Chang, Tsuhan Chen
Images taken from scenes under water suffer distortion due to refraction. While refraction causes magnification with mild distortion on the observed images, severe distortions in geometry reconstruction would be resulted if the refractive distortion is not properly handled. Different from the radial distortion model, the refractive distortion depends on the scene depth seen from each light ray as well as the camera pose relative to the refractive surface. Therefore, it's crucial to obtain a good estimate of scene depth, camera pose and optical center to alleviate the impact of refractive distortion. In this work, we formulate the forward and back projections of light rays involving a refractive plane for the perspective camera model by explicitly modeling refractive distortion as a function of depth. Furthermore, for cameras with an inertial measurement unit (IMU), we show that a linear solution to the relative pose and a closed-form solution to the absolute pose can be derived with known camera vertical directions. We incorporate our formulations with the general structure from motion framework followed by the patch-based multiview stereo algorithm to obtain a 3D reconstruction of the scene. We show through experiments that the explicit modeling of depth-dependent refractive distortion physically leads to more accurate scene reconstructions.
{"title":"Multi-view 3D reconstruction for scenes under the refractive plane with known vertical direction","authors":"Yao-Jen Chang, Tsuhan Chen","doi":"10.1109/ICCV.2011.6126262","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126262","url":null,"abstract":"Images taken from scenes under water suffer distortion due to refraction. While refraction causes magnification with mild distortion on the observed images, severe distortions in geometry reconstruction would be resulted if the refractive distortion is not properly handled. Different from the radial distortion model, the refractive distortion depends on the scene depth seen from each light ray as well as the camera pose relative to the refractive surface. Therefore, it's crucial to obtain a good estimate of scene depth, camera pose and optical center to alleviate the impact of refractive distortion. In this work, we formulate the forward and back projections of light rays involving a refractive plane for the perspective camera model by explicitly modeling refractive distortion as a function of depth. Furthermore, for cameras with an inertial measurement unit (IMU), we show that a linear solution to the relative pose and a closed-form solution to the absolute pose can be derived with known camera vertical directions. We incorporate our formulations with the general structure from motion framework followed by the patch-based multiview stereo algorithm to obtain a 3D reconstruction of the scene. We show through experiments that the explicit modeling of depth-dependent refractive distortion physically leads to more accurate scene reconstructions.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"19 1","pages":"351-358"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83012786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126331
Xiang Huang, G. Hua, J. Tumblin, Lance Williams
Despite decades of study, robust shadow detection remains difficult, especially within a single color image. We describe a new approach to detect shadow boundaries in images of outdoor scenes lit only by the sun and sky. The method first extracts visual features of candidate edges that are motivated by physical models of illumination and occluders. We feed these features into a Support Vector Machine (SVM) that was trained to discriminate between most-likely shadow-edge candidates and less-likely ones. Finally, we connect edges to help reject non-shadow edge candidates, and to encourage closed, connected shadow boundaries. On benchmark shadow-edge data sets from Lalonde et al. and Zhu et al., our method showed substantial improvements when compared to other recent shadow-detection methods based on statistical learning.
{"title":"What characterizes a shadow boundary under the sun and sky?","authors":"Xiang Huang, G. Hua, J. Tumblin, Lance Williams","doi":"10.1109/ICCV.2011.6126331","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126331","url":null,"abstract":"Despite decades of study, robust shadow detection remains difficult, especially within a single color image. We describe a new approach to detect shadow boundaries in images of outdoor scenes lit only by the sun and sky. The method first extracts visual features of candidate edges that are motivated by physical models of illumination and occluders. We feed these features into a Support Vector Machine (SVM) that was trained to discriminate between most-likely shadow-edge candidates and less-likely ones. Finally, we connect edges to help reject non-shadow edge candidates, and to encourage closed, connected shadow boundaries. On benchmark shadow-edge data sets from Lalonde et al. and Zhu et al., our method showed substantial improvements when compared to other recent shadow-detection methods based on statistical learning.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"42 1","pages":"898-905"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83182610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126380
Zhi Han, Zongben Xu, Song-Chun Zhu
This paper presents a middle-level video representation named Video Primal Sketch (VPS), which integrates two regimes of models: i) sparse coding model using static or moving primitives to explicitly represent moving corners, lines, feature points, etc., ii) FRAME/MRF model with spatio-temporal filters to implicitly represent textured motion, such as water and fire, by matching feature statistics, i.e. histograms. This paper makes three contributions: i) learning a dictionary of video primitives as parametric generative model; ii) studying the Spatio-Temporal FRAME (ST-FRAME) model for modeling and synthesizing textured motion; and iii) developing a parsimonious hybrid model for generic video representation. VPS selects the proper representation automatically and is compatible with high-level action representations. In the experiments, we synthesize a series of dynamic textures, reconstruct real videos and show varying VPS over the change of densities causing by the scale transition in videos.
{"title":"Video Primal Sketch: A generic middle-level representation of video","authors":"Zhi Han, Zongben Xu, Song-Chun Zhu","doi":"10.1109/ICCV.2011.6126380","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126380","url":null,"abstract":"This paper presents a middle-level video representation named Video Primal Sketch (VPS), which integrates two regimes of models: i) sparse coding model using static or moving primitives to explicitly represent moving corners, lines, feature points, etc., ii) FRAME/MRF model with spatio-temporal filters to implicitly represent textured motion, such as water and fire, by matching feature statistics, i.e. histograms. This paper makes three contributions: i) learning a dictionary of video primitives as parametric generative model; ii) studying the Spatio-Temporal FRAME (ST-FRAME) model for modeling and synthesizing textured motion; and iii) developing a parsimonious hybrid model for generic video representation. VPS selects the proper representation automatically and is compatible with high-level action representations. In the experiments, we synthesize a series of dynamic textures, reconstruct real videos and show varying VPS over the change of densities causing by the scale transition in videos.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"257 1","pages":"1283-1290"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83431684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126237
Feng Lu, Yusuke Sugano, Takahiro Okabe, Yoichi Sato
The problem of estimating human gaze from eye appearance is regarded as mapping high-dimensional features to low-dimensional target space. Conventional methods require densely obtained training samples on the eye appearance manifold, which results in a tedious calibration stage. In this paper, we introduce an adaptive linear regression (ALR) method for accurate mapping via sparsely collected training samples. The key idea is to adaptively find the subset of training samples where the test sample is most linearly representable. We solve the problem via l1-optimization and thoroughly study the key issues to seek for the best solution for regression. The proposed gaze estimation approach based on ALR is naturally sparse and low-dimensional, giving the ability to infer human gaze from variant resolution eye images using much fewer training samples than existing methods. Especially, the optimization procedure in ALR is extended to solve the subpixel alignment problem simultaneously for low resolution test eye images. Performance of the proposed method is evaluated by extensive experiments against various factors such as number of training samples, feature dimensionality and eye image resolution to verify its effectiveness.
{"title":"Inferring human gaze from appearance via adaptive linear regression","authors":"Feng Lu, Yusuke Sugano, Takahiro Okabe, Yoichi Sato","doi":"10.1109/ICCV.2011.6126237","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126237","url":null,"abstract":"The problem of estimating human gaze from eye appearance is regarded as mapping high-dimensional features to low-dimensional target space. Conventional methods require densely obtained training samples on the eye appearance manifold, which results in a tedious calibration stage. In this paper, we introduce an adaptive linear regression (ALR) method for accurate mapping via sparsely collected training samples. The key idea is to adaptively find the subset of training samples where the test sample is most linearly representable. We solve the problem via l1-optimization and thoroughly study the key issues to seek for the best solution for regression. The proposed gaze estimation approach based on ALR is naturally sparse and low-dimensional, giving the ability to infer human gaze from variant resolution eye images using much fewer training samples than existing methods. Especially, the optimization procedure in ALR is extended to solve the subpixel alignment problem simultaneously for low resolution test eye images. Performance of the proposed method is evaluated by extensive experiments against various factors such as number of training samples, feature dimensionality and eye image resolution to verify its effectiveness.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"1 1","pages":"153-160"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90994607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}