Most existing approaches to indoor scene understanding formulate the problem based on the pinhole camera geometry. Unfortunately, these approaches cannot be utilized well for an omni directional image. In this paper, we focus on the problem of estimating the spatial layout of rooms from a single fisheye image. Considering the wide field of view of fisheye cameras, we introduce a structure symmetrical rule which describes geometric constraints. A method is given to estimate and recover the preliminary spatial layout of room only from a collection of line segments extracted from a fisheye image. Then, an orientation map of structure is generated. Finally, we refine the spatial layout to obtain the main structure. The experiments demonstrate that our approach based on geometric reasoning can be used to estimate the structure of indoor scene from a single fisheye image.
{"title":"Estimating the Structure of Rooms from a Single Fisheye Image","authors":"Hanchao Jia, Shigang Li","doi":"10.1109/ACPR.2013.148","DOIUrl":"https://doi.org/10.1109/ACPR.2013.148","url":null,"abstract":"Most existing approaches to indoor scene understanding formulate the problem based on the pinhole camera geometry. Unfortunately, these approaches cannot be utilized well for an omni directional image. In this paper, we focus on the problem of estimating the spatial layout of rooms from a single fisheye image. Considering the wide field of view of fisheye cameras, we introduce a structure symmetrical rule which describes geometric constraints. A method is given to estimate and recover the preliminary spatial layout of room only from a collection of line segments extracted from a fisheye image. Then, an orientation map of structure is generated. Finally, we refine the spatial layout to obtain the main structure. The experiments demonstrate that our approach based on geometric reasoning can be used to estimate the structure of indoor scene from a single fisheye image.","PeriodicalId":365633,"journal":{"name":"2013 2nd IAPR Asian Conference on Pattern Recognition","volume":"36 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120857889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A text image classifier that requires only image-wise annotation is proposed. Although text detection methods using classifiers have been investigated, they require character-wise annotation by human operators, which is the most time-consuming phase when constructing a text detection system. The proposed classifier uses image-wise annotation whether the image contains text or not, which requires much less effort by an operator than that of character-wise annotation. From this annotation, the classifier estimates likelihood of detecting text-character candidates in an image as well as the threshold value for the system to determine if the image contains text based on prior probabilities. Experiments using real images showed the effectiveness of the proposed text image classifier.
{"title":"Text Image Classifier Using Image-Wise Annotation","authors":"N. Chiba","doi":"10.1109/ACPR.2013.160","DOIUrl":"https://doi.org/10.1109/ACPR.2013.160","url":null,"abstract":"A text image classifier that requires only image-wise annotation is proposed. Although text detection methods using classifiers have been investigated, they require character-wise annotation by human operators, which is the most time-consuming phase when constructing a text detection system. The proposed classifier uses image-wise annotation whether the image contains text or not, which requires much less effort by an operator than that of character-wise annotation. From this annotation, the classifier estimates likelihood of detecting text-character candidates in an image as well as the threshold value for the system to determine if the image contains text based on prior probabilities. Experiments using real images showed the effectiveness of the proposed text image classifier.","PeriodicalId":365633,"journal":{"name":"2013 2nd IAPR Asian Conference on Pattern Recognition","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115182698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Masaki Hayashi, Taiki Yamamoto, Y. Aoki, Kyoko Oshima, Masamoto Tanabiki
We propose a head and upper body pose estimation method in low-resolution team sports videos such as for American Football or Hockey, where all players wear helmets and often lean forward. Compared to the pedestrian cases in surveillance videos, head pose estimation technique for team sports videos has to deal with various types of activities (poses) and image scales according to the position of the player in the field. Using both the pelvis aligned player tracker and the head tracker, our system tracks the player's pelvis and head positions, which results in estimation of player's 2D spine. Then, we estimate the head and upper body orientations independently with random decision forest classifiers learned from a dataset including multiple-scale images. Integrating upper body direction and 2D spine pose, we also estimate the 3D spine pose of the player. Experiments show our method can estimate head and upper body pose accurately for sports players with intensive movement even without any temporal filtering techniques by focusing on the upper body region.
{"title":"Head and Upper Body Pose Estimation in Team Sport Videos","authors":"Masaki Hayashi, Taiki Yamamoto, Y. Aoki, Kyoko Oshima, Masamoto Tanabiki","doi":"10.1109/ACPR.2013.177","DOIUrl":"https://doi.org/10.1109/ACPR.2013.177","url":null,"abstract":"We propose a head and upper body pose estimation method in low-resolution team sports videos such as for American Football or Hockey, where all players wear helmets and often lean forward. Compared to the pedestrian cases in surveillance videos, head pose estimation technique for team sports videos has to deal with various types of activities (poses) and image scales according to the position of the player in the field. Using both the pelvis aligned player tracker and the head tracker, our system tracks the player's pelvis and head positions, which results in estimation of player's 2D spine. Then, we estimate the head and upper body orientations independently with random decision forest classifiers learned from a dataset including multiple-scale images. Integrating upper body direction and 2D spine pose, we also estimate the 3D spine pose of the player. Experiments show our method can estimate head and upper body pose accurately for sports players with intensive movement even without any temporal filtering techniques by focusing on the upper body region.","PeriodicalId":365633,"journal":{"name":"2013 2nd IAPR Asian Conference on Pattern Recognition","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115992019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abhijit Das, U. Pal, M. Blumenstein, M. A. Ferrer-Ballester
This paper presents a survey on sclera-based biometric recognition. Among the various biometric methods, sclera is one of the novel and promising biometric techniques. The sclera, a white region of connective tissue and blood vessels, surrounds the iris. A survey of the techniques available in the area of sclera biometrics will be of great assistance to researchers, and hence a comprehensive effort is made in this article to discuss the advancements reported in this regard during the past few decades. As a limited number of publications are found in the literature, an attempt is made in this paper to increase awareness of this area so that the topic gains popularity and interest among researchers. In this survey, a brief introduction is given initially about the sclera biometric, which is subsequently followed by background concepts, various pre-processing techniques, feature extraction and finally classification techniques associated with the sclera biometric. Benchmarking databases are very important for any pattern recognition related research, so the databases related with this work is also discussed. Finally, our observations, future scope and existing difficulties, which are unsolved in sclera biometrics, are discussed. We hope that this survey will serve to focus more researcher attention towards the emerging sclera biometric.
{"title":"Sclera Recognition - A Survey","authors":"Abhijit Das, U. Pal, M. Blumenstein, M. A. Ferrer-Ballester","doi":"10.1109/ACPR.2013.168","DOIUrl":"https://doi.org/10.1109/ACPR.2013.168","url":null,"abstract":"This paper presents a survey on sclera-based biometric recognition. Among the various biometric methods, sclera is one of the novel and promising biometric techniques. The sclera, a white region of connective tissue and blood vessels, surrounds the iris. A survey of the techniques available in the area of sclera biometrics will be of great assistance to researchers, and hence a comprehensive effort is made in this article to discuss the advancements reported in this regard during the past few decades. As a limited number of publications are found in the literature, an attempt is made in this paper to increase awareness of this area so that the topic gains popularity and interest among researchers. In this survey, a brief introduction is given initially about the sclera biometric, which is subsequently followed by background concepts, various pre-processing techniques, feature extraction and finally classification techniques associated with the sclera biometric. Benchmarking databases are very important for any pattern recognition related research, so the databases related with this work is also discussed. Finally, our observations, future scope and existing difficulties, which are unsolved in sclera biometrics, are discussed. We hope that this survey will serve to focus more researcher attention towards the emerging sclera biometric.","PeriodicalId":365633,"journal":{"name":"2013 2nd IAPR Asian Conference on Pattern Recognition","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114159378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A face recognition system can be deceived by photos, mimic masks, mannequins and etc. And with the advances in the 3D printing technology, a more robust face liveness detection method is needed. In this paper, a gradient-based multispectral method has been proposed for face liveness detection. Based on two spectral bands, the developed method is tested for the classification of genuine faces and common disguised faces. A true positive rate of 96.7% and a true negative rate of 97% have been achieved. The performance of the method is also tested when face rotation occurs. The contributions of this paper are: First, a gradient-based multispectral method has been proposed. Except for the reflectance of the skin regions, the reflectance of other distinctive regions in a face are also considered in the developed method. Second, the method is tested based on a dataset with both planar photos and 3D mannequins and masks. The performance on different face orientations is also discussed.
{"title":"A New Multispectral Method for Face Liveness Detection","authors":"Yueyang Wang, X. Hao, Yali Hou, Changqing Guo","doi":"10.1109/ACPR.2013.169","DOIUrl":"https://doi.org/10.1109/ACPR.2013.169","url":null,"abstract":"A face recognition system can be deceived by photos, mimic masks, mannequins and etc. And with the advances in the 3D printing technology, a more robust face liveness detection method is needed. In this paper, a gradient-based multispectral method has been proposed for face liveness detection. Based on two spectral bands, the developed method is tested for the classification of genuine faces and common disguised faces. A true positive rate of 96.7% and a true negative rate of 97% have been achieved. The performance of the method is also tested when face rotation occurs. The contributions of this paper are: First, a gradient-based multispectral method has been proposed. Except for the reflectance of the skin regions, the reflectance of other distinctive regions in a face are also considered in the developed method. Second, the method is tested based on a dataset with both planar photos and 3D mannequins and masks. The performance on different face orientations is also discussed.","PeriodicalId":365633,"journal":{"name":"2013 2nd IAPR Asian Conference on Pattern Recognition","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114438674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Edge-preserving smoothing has recently emerged as a valuable tool for a variety of applications in computer graphics and image processing. Edge-preserving smoothing using first order smoothness prior in the regularization term under optimization framework tends to bias the smoothing result forward the constant image. Although using high order smoothness prior can alleviate this problem, it tends to obtain the over-smoothed result. In this paper, we present an effective and practical image editing method which can sharply preserve the salient edges and at the same time smooths the continuous regions using high order smoothness prior to achieve the smoothing results different from the first order smoothness prior. Finally, we demonstrate the effectiveness of our method in the context of image denoising, image abstraction and image enhancement.
{"title":"Edge Guided High Order Image Smoothing","authors":"Haoxing Wang, Longquan Dai, Xiaopeng Zhang","doi":"10.1109/ACPR.2013.47","DOIUrl":"https://doi.org/10.1109/ACPR.2013.47","url":null,"abstract":"Edge-preserving smoothing has recently emerged as a valuable tool for a variety of applications in computer graphics and image processing. Edge-preserving smoothing using first order smoothness prior in the regularization term under optimization framework tends to bias the smoothing result forward the constant image. Although using high order smoothness prior can alleviate this problem, it tends to obtain the over-smoothed result. In this paper, we present an effective and practical image editing method which can sharply preserve the salient edges and at the same time smooths the continuous regions using high order smoothness prior to achieve the smoothing results different from the first order smoothness prior. Finally, we demonstrate the effectiveness of our method in the context of image denoising, image abstraction and image enhancement.","PeriodicalId":365633,"journal":{"name":"2013 2nd IAPR Asian Conference on Pattern Recognition","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121983328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shota Ishikawa, J. Tan, Hyoungseop Kim, S. Ishikawa
This paper proposes a novel technique for 3-D recovery of a non-rigid object, such as a human in motion, from a single camera view. To achieve the 3-D recovery, the proposed technique performs segmentation of an object under deformation into respective parts which are all regarded as rigid objects. For high accuracy segmentation, multi-stage learning and local subspace affinity are employed for the segmentation. Each part recovers its 3-D shape by applying the factorization method to it. Obviously the deformed portion containing twist or stretch motion cannot recover the 3-D shape by this procedure. The idea of the present paper is to recover such deformed portion by averaging the 3-D locations of a point on the portion described by the coordinates of respective parts. The experiments employing a synthetic non-rigid object and real human motion data show effectiveness of the proposed technique.
{"title":"3-D Recovery of a Non-rigid Object from a Single Camera View Employing Multiple Coordinates Representation","authors":"Shota Ishikawa, J. Tan, Hyoungseop Kim, S. Ishikawa","doi":"10.1109/ACPR.2013.174","DOIUrl":"https://doi.org/10.1109/ACPR.2013.174","url":null,"abstract":"This paper proposes a novel technique for 3-D recovery of a non-rigid object, such as a human in motion, from a single camera view. To achieve the 3-D recovery, the proposed technique performs segmentation of an object under deformation into respective parts which are all regarded as rigid objects. For high accuracy segmentation, multi-stage learning and local subspace affinity are employed for the segmentation. Each part recovers its 3-D shape by applying the factorization method to it. Obviously the deformed portion containing twist or stretch motion cannot recover the 3-D shape by this procedure. The idea of the present paper is to recover such deformed portion by averaging the 3-D locations of a point on the portion described by the coordinates of respective parts. The experiments employing a synthetic non-rigid object and real human motion data show effectiveness of the proposed technique.","PeriodicalId":365633,"journal":{"name":"2013 2nd IAPR Asian Conference on Pattern Recognition","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122167396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Li, Shan Zhou, Junliang Xing, Changyin Sun, Weiming Hu
This paper presents an improved bag-of-words (BoW) framework for detecting near-duplicates of images on the Web and makes three main contributions. Firstly, based on the SIFT feature descriptors, Locality-constrained Linear Coding (LLC) with the spatial pyramid is introduced to encode features. Secondly, a weighted Chi-square distance metric is proposed to compare two histograms, with an inverted indexing scheme for fast similarity evaluation. Thirdly, a 6K dataset consisting of eight categories of objects, which can also be applicable to image retrieval and classification, is built and will be made available to the public in the future. We verify our technique on two benchmarks: our 6K dataset and the publicly available University of Kentucky Benchmark (UKB). The promising experimental results demonstrate the effectiveness and efficiency of our approach for Web Near-Duplicate Image Detection (Web-NDID), which outperforms several state-of-the-art methods.
{"title":"An Efficient Approach to Web Near-Duplicate Image Detection","authors":"Jun Li, Shan Zhou, Junliang Xing, Changyin Sun, Weiming Hu","doi":"10.1109/ACPR.2013.101","DOIUrl":"https://doi.org/10.1109/ACPR.2013.101","url":null,"abstract":"This paper presents an improved bag-of-words (BoW) framework for detecting near-duplicates of images on the Web and makes three main contributions. Firstly, based on the SIFT feature descriptors, Locality-constrained Linear Coding (LLC) with the spatial pyramid is introduced to encode features. Secondly, a weighted Chi-square distance metric is proposed to compare two histograms, with an inverted indexing scheme for fast similarity evaluation. Thirdly, a 6K dataset consisting of eight categories of objects, which can also be applicable to image retrieval and classification, is built and will be made available to the public in the future. We verify our technique on two benchmarks: our 6K dataset and the publicly available University of Kentucky Benchmark (UKB). The promising experimental results demonstrate the effectiveness and efficiency of our approach for Web Near-Duplicate Image Detection (Web-NDID), which outperforms several state-of-the-art methods.","PeriodicalId":365633,"journal":{"name":"2013 2nd IAPR Asian Conference on Pattern Recognition","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116814579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Most of highly accurate face recognition methods are not suitable for real-time requirement in smart devices which have computational limitations. In this demonstration, we exhibit a face recognition application, in which only essential facial features from images are used for personal identification. In the algorithm used in this application, the face feature size is dramatically compressed into 512 bytes per face in spite of high recognition rate, a false rejection rate of 1.6% at false acceptance rate of 0.1% on identification photos. Consequently, computational cost for face matching is reduced dramatically and the system achieves 1.16 million times matching/second in dual-core 1.5GHz ARM processor. The demonstration on the smart device shows a high recognition performance and the feasibility for diverse applications.
{"title":"Large-Scale Face Recognition on Smart Devices","authors":"Jian-jun Hao, Yusuke Morishita, Toshinori Hosoi, K. Sakurai, Hitoshi Imaoka, Takao Imaizumi, Hideki Irisawa","doi":"10.1109/ACPR.2013.189","DOIUrl":"https://doi.org/10.1109/ACPR.2013.189","url":null,"abstract":"Most of highly accurate face recognition methods are not suitable for real-time requirement in smart devices which have computational limitations. In this demonstration, we exhibit a face recognition application, in which only essential facial features from images are used for personal identification. In the algorithm used in this application, the face feature size is dramatically compressed into 512 bytes per face in spite of high recognition rate, a false rejection rate of 1.6% at false acceptance rate of 0.1% on identification photos. Consequently, computational cost for face matching is reduced dramatically and the system achieves 1.16 million times matching/second in dual-core 1.5GHz ARM processor. The demonstration on the smart device shows a high recognition performance and the feasibility for diverse applications.","PeriodicalId":365633,"journal":{"name":"2013 2nd IAPR Asian Conference on Pattern Recognition","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128443289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The task of image segmentation is to group image pixels into visually meaningful objects. It has long been a challenging problem in computer vision and image processing. In this paper we address the segmentation as a super pixel grouping problem. We propose a novel graph-based segmentation framework which is able to integrate different cues from bilayer super pixels simultaneously. The key idea is that segmentation is formulated as grouping a subset of super pixels that partitions a bilayer graph over super pixels, with graph edges encoding super pixel similarity. We first construct a bipartite graph incorporating super pixel cue and long-range cue. Furthermore, mid-range cue is also incorporated in a hybrid graph model. Segmentation is solved by spectral clustering. Our approach is fully automatic, bottom-up, and unsupervised. We evaluate our proposed framework by comparing it to other generic segmentation approaches on the state-of-the-art benchmark database.
{"title":"Image Segmentation by Bilayer Superpixel Grouping","authors":"M. Yang","doi":"10.1109/ACPR.2013.62","DOIUrl":"https://doi.org/10.1109/ACPR.2013.62","url":null,"abstract":"The task of image segmentation is to group image pixels into visually meaningful objects. It has long been a challenging problem in computer vision and image processing. In this paper we address the segmentation as a super pixel grouping problem. We propose a novel graph-based segmentation framework which is able to integrate different cues from bilayer super pixels simultaneously. The key idea is that segmentation is formulated as grouping a subset of super pixels that partitions a bilayer graph over super pixels, with graph edges encoding super pixel similarity. We first construct a bipartite graph incorporating super pixel cue and long-range cue. Furthermore, mid-range cue is also incorporated in a hybrid graph model. Segmentation is solved by spectral clustering. Our approach is fully automatic, bottom-up, and unsupervised. We evaluate our proposed framework by comparing it to other generic segmentation approaches on the state-of-the-art benchmark database.","PeriodicalId":365633,"journal":{"name":"2013 2nd IAPR Asian Conference on Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129232430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}