This paper presents a robust text detection approach based on generalized color-enhanced contrasting extremal region (CER) and neural networks. Given a color natural scene image, six component-trees are built from its gray scale image, hue and saturation channel images in a perception-based illumination invariant color space, and their inverted images, respectively. From each component-tree, generalized color-enhanced CERs are extracted as character candidates. By using a "divide-and-conquer" strategy, each candidate image patch is labeled reliably by rules as one of five types, namely, Long, Thin, Fill, Square-large and Square-small, and classified as text or non-text by a corresponding neural network, which is trained by an ambiguity-free learning strategy. After pruning non-text components, repeating components in each component-tree are pruned by using color and area information to obtain a component graph, from which candidate text-lines are formed and verified by another set of neural networks. Finally, results from six component-trees are combined, and a post-processing step is used to recover lost characters and split text lines into words as appropriate. Our proposed method achieves 85.72% recall, 87.03% precision, and 86.37% F-score on ICDAR-2013 "Reading Text in Scene Images" test set.
{"title":"Robust Text Detection in Natural Scene Images by Generalized Color-Enhanced Contrasting Extremal Region and Neural Networks","authors":"Lei Sun, Qiang Huo, Wei Jia, Kai Chen","doi":"10.1109/ICPR.2014.469","DOIUrl":"https://doi.org/10.1109/ICPR.2014.469","url":null,"abstract":"This paper presents a robust text detection approach based on generalized color-enhanced contrasting extremal region (CER) and neural networks. Given a color natural scene image, six component-trees are built from its gray scale image, hue and saturation channel images in a perception-based illumination invariant color space, and their inverted images, respectively. From each component-tree, generalized color-enhanced CERs are extracted as character candidates. By using a \"divide-and-conquer\" strategy, each candidate image patch is labeled reliably by rules as one of five types, namely, Long, Thin, Fill, Square-large and Square-small, and classified as text or non-text by a corresponding neural network, which is trained by an ambiguity-free learning strategy. After pruning non-text components, repeating components in each component-tree are pruned by using color and area information to obtain a component graph, from which candidate text-lines are formed and verified by another set of neural networks. Finally, results from six component-trees are combined, and a post-processing step is used to recover lost characters and split text lines into words as appropriate. Our proposed method achieves 85.72% recall, 87.03% precision, and 86.37% F-score on ICDAR-2013 \"Reading Text in Scene Images\" test set.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127704866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A new geometric framework, called generalized coupled line camera (GCLC), is proposed to derive an analytic solution to reconstruct an unknown scene quadrilateral and the relevant projective structure from a single or multiple image quadrilaterals. We extend the previous approach developed for rectangle to handle arbitrary scene quadrilaterals. First, we generalize a single line camera by removing the centering constraint that the principal axis should bisect a scene line. Then, we couple a pair of generalized line cameras to model a frustum with a quadrilateral base. Finally, we show that the scene quadrilateral and the center of projection can be analytically reconstructed from a single view when prior knowledge on the quadrilateral is available. A completely unknown quadrilateral can be reconstructed from four views through non-linear optimization. We also describe a improved method to handle an off-centered case by geometrically inferring a centered proxy quadrilateral, which accelerates a reconstruction process without relying on homography. The proposed method is easy to implement since each step is expressed as a simple analytic equation. We present the experimental results on real and synthetic examples.
{"title":"New Geometric Interpretation and Analytic Solution for Quadrilateral Reconstruction","authors":"Joo-Haeng Lee","doi":"10.1109/ICPR.2014.688","DOIUrl":"https://doi.org/10.1109/ICPR.2014.688","url":null,"abstract":"A new geometric framework, called generalized coupled line camera (GCLC), is proposed to derive an analytic solution to reconstruct an unknown scene quadrilateral and the relevant projective structure from a single or multiple image quadrilaterals. We extend the previous approach developed for rectangle to handle arbitrary scene quadrilaterals. First, we generalize a single line camera by removing the centering constraint that the principal axis should bisect a scene line. Then, we couple a pair of generalized line cameras to model a frustum with a quadrilateral base. Finally, we show that the scene quadrilateral and the center of projection can be analytically reconstructed from a single view when prior knowledge on the quadrilateral is available. A completely unknown quadrilateral can be reconstructed from four views through non-linear optimization. We also describe a improved method to handle an off-centered case by geometrically inferring a centered proxy quadrilateral, which accelerates a reconstruction process without relying on homography. The proposed method is easy to implement since each step is expressed as a simple analytic equation. We present the experimental results on real and synthetic examples.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"455 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125796264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nearest Neighbour search (NNS) is a widely used technique in Pattern Recognition. In order to speed up the search many indexing techniques have been proposed. The need to work with large dynamic databases, in interactive or online systems, has resulted in an increasing interest in adapting or creating fast methods to update these indexes. TLAESA is a fast search algorithm with sub linear overhead that, using of a branch and bound technique, can find the nearest neighbour computing a very low number of distance computations. In this paper, we propose a new fast updating method for the TLAESA index. The behaviour of this index has been analysed theoretical and experimentally. We have obtained a log-square upper bound of the rebuilding expected time. This bound has been verified experimentally on several synthetic and real data experiments.
{"title":"Dynamic Insertions in TLAESA Fast NN Search Algorithm","authors":"L. Micó, J. Oncina","doi":"10.1109/ICPR.2014.657","DOIUrl":"https://doi.org/10.1109/ICPR.2014.657","url":null,"abstract":"Nearest Neighbour search (NNS) is a widely used technique in Pattern Recognition. In order to speed up the search many indexing techniques have been proposed. The need to work with large dynamic databases, in interactive or online systems, has resulted in an increasing interest in adapting or creating fast methods to update these indexes. TLAESA is a fast search algorithm with sub linear overhead that, using of a branch and bound technique, can find the nearest neighbour computing a very low number of distance computations. In this paper, we propose a new fast updating method for the TLAESA index. The behaviour of this index has been analysed theoretical and experimentally. We have obtained a log-square upper bound of the rebuilding expected time. This bound has been verified experimentally on several synthetic and real data experiments.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"70 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125974735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In mathematical expression recognition, symbol classification is a crucial step. Numerous approaches for recognizing handwritten math symbols have been published, but most of them are either an online approach or a hybrid approach. There is an absence of a study focused on offline features for handwritten math symbol recognition. Furthermore, many papers provide results difficult to compare. In this paper we assess the performance of several well-known offline features for this task. We also test a novel set of features based on polar histograms and the vertical repositioning method for feature extraction. Finally, we report and analyze the results of several experiments using recurrent neural networks on a large public database of online handwritten math expressions. The combination of online and offline features significantly improved the recognition rate.
{"title":"Offline Features for Classifying Handwritten Math Symbols with Recurrent Neural Networks","authors":"Francisco Alvaro, Joan Andreu Sánchez, J. Benedí","doi":"10.1109/ICPR.2014.507","DOIUrl":"https://doi.org/10.1109/ICPR.2014.507","url":null,"abstract":"In mathematical expression recognition, symbol classification is a crucial step. Numerous approaches for recognizing handwritten math symbols have been published, but most of them are either an online approach or a hybrid approach. There is an absence of a study focused on offline features for handwritten math symbol recognition. Furthermore, many papers provide results difficult to compare. In this paper we assess the performance of several well-known offline features for this task. We also test a novel set of features based on polar histograms and the vertical repositioning method for feature extraction. Finally, we report and analyze the results of several experiments using recurrent neural networks on a large public database of online handwritten math expressions. The combination of online and offline features significantly improved the recognition rate.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121850076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present a method to perform layout analysis in structured documents. We proposed an EM-based algorithm to fit a set of Gaussian mixtures to the different regions according to the logical distribution along the page. After the convergence, we estimate the final shape of the regions according to the parameters computed for each component of the mixture. We evaluated our method in the task of record detection in a collection of historical structured documents and performed a comparison with other previous works in this task.
{"title":"EM-Based Layout Analysis Method for Structured Documents","authors":"Francisco Cruz, O. R. Terrades","doi":"10.1109/ICPR.2014.63","DOIUrl":"https://doi.org/10.1109/ICPR.2014.63","url":null,"abstract":"In this paper we present a method to perform layout analysis in structured documents. We proposed an EM-based algorithm to fit a set of Gaussian mixtures to the different regions according to the logical distribution along the page. After the convergence, we estimate the final shape of the regions according to the parameters computed for each component of the mixture. We evaluated our method in the task of record detection in a collection of historical structured documents and performed a comparison with other previous works in this task.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121630297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In traditional vision systems, high level information is usually inferred from images or videos captured by cameras, or depth images captured by depth sensors. These images, whether gray-level, RGB, or depth, have a human-readable 2D structure which describes the spatial distribution of the scene. In this paper, we explore the possibility to use distributed color sensors to infer high level information, such as room occupancy. Unlike a camera, the output of a color sensor has only a few variables. However, if the light in the room is color controllable, we can use the outputs of multiple color sensors under different lighting conditions to recover the light transport model (LTM) in the room. While the room occupancy changes, the LTM also changes accordingly, and we can use machine learning to establish the mapping from LTM to room occupancy.
{"title":"Learning Room Occupancy Patterns from Sparsely Recovered Light Transport Models","authors":"Quan Wang, Xinchi Zhang, M. Wang, K. Boyer","doi":"10.1109/ICPR.2014.347","DOIUrl":"https://doi.org/10.1109/ICPR.2014.347","url":null,"abstract":"In traditional vision systems, high level information is usually inferred from images or videos captured by cameras, or depth images captured by depth sensors. These images, whether gray-level, RGB, or depth, have a human-readable 2D structure which describes the spatial distribution of the scene. In this paper, we explore the possibility to use distributed color sensors to infer high level information, such as room occupancy. Unlike a camera, the output of a color sensor has only a few variables. However, if the light in the room is color controllable, we can use the outputs of multiple color sensors under different lighting conditions to recover the light transport model (LTM) in the room. While the room occupancy changes, the LTM also changes accordingly, and we can use machine learning to establish the mapping from LTM to room occupancy.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115791485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Yan, Subramanian Ramanathan, E. Ricci, O. Lanz, N. Sebe
Social attention behavior offers vital cues towards inferring one's personality traits from interactive settings such as round-table meetings and cocktail parties. Head orientation is typically employed as a proxy for determining the social attention direction when faces are captured at low-resolution. Recently, multi-task learning has been proposed to robustly compute head pose under perspective and scale-based facial appearance variations when multiple, distant and large field-of-view cameras are employed for visual analysis in smart-room applications. In this paper, we evaluate the effectiveness of an SVM-based MTL (SVM+MTL) framework with various facial descriptors (KL, HOG, LBP, etc.). The KL+HOG feature combination is found to produce the best classification performance, with SVM+MTL outperforming classical SVM irrespective of the feature used.
{"title":"Evaluating Multi-task Learning for Multi-view Head-Pose Classification in Interactive Environments","authors":"Yan Yan, Subramanian Ramanathan, E. Ricci, O. Lanz, N. Sebe","doi":"10.1109/ICPR.2014.717","DOIUrl":"https://doi.org/10.1109/ICPR.2014.717","url":null,"abstract":"Social attention behavior offers vital cues towards inferring one's personality traits from interactive settings such as round-table meetings and cocktail parties. Head orientation is typically employed as a proxy for determining the social attention direction when faces are captured at low-resolution. Recently, multi-task learning has been proposed to robustly compute head pose under perspective and scale-based facial appearance variations when multiple, distant and large field-of-view cameras are employed for visual analysis in smart-room applications. In this paper, we evaluate the effectiveness of an SVM-based MTL (SVM+MTL) framework with various facial descriptors (KL, HOG, LBP, etc.). The KL+HOG feature combination is found to produce the best classification performance, with SVM+MTL outperforming classical SVM irrespective of the feature used.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115837397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. D. Martínez-Vargas, Cristian Castro Hoyos, A. Álvarez-Meza, C. Acosta-Medina, G. Castellanos-Domínguez
We propose a filtration approach to discriminate between stationary and non-stationary signals which consist into recursively update an enhanced representation of input time-series in such a way that the decomposition is able to identify time-varying statistical parameters of the data. The approach is based on the hypothesis that such updating providing a time-varying subspace projection under stationary constraints, allows to obtain a better separation. Validation of quality separation is carried on simulated and real data. In both cases, obtained separation shows that proposed approach is able to identify different dynamics on analyzed data.
{"title":"Recursive Separation of Stationary Components by Subspace Projection and Stochastic Constraints","authors":"J. D. Martínez-Vargas, Cristian Castro Hoyos, A. Álvarez-Meza, C. Acosta-Medina, G. Castellanos-Domínguez","doi":"10.1109/ICPR.2014.597","DOIUrl":"https://doi.org/10.1109/ICPR.2014.597","url":null,"abstract":"We propose a filtration approach to discriminate between stationary and non-stationary signals which consist into recursively update an enhanced representation of input time-series in such a way that the decomposition is able to identify time-varying statistical parameters of the data. The approach is based on the hypothesis that such updating providing a time-varying subspace projection under stationary constraints, allows to obtain a better separation. Validation of quality separation is carried on simulated and real data. In both cases, obtained separation shows that proposed approach is able to identify different dynamics on analyzed data.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"232 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115839113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roman coins play an important role to understand the Roman empire because they convey rich information about key historical events of the time. Moreover, as large amounts of coins are daily traded over the Internet, it becomes necessary to develop automatic coin recognition systems to prevent illegal trades. In this paper, we propose an automatic recognition method for ancient Roman coins. The proposed method exploits the structure of the coin by using a spatially local coding method. Results show that the proposed method outperforms traditional rigid spatial structure models such as the spatial pyramid.
{"title":"Ancient Coin Recognition Based on Spatial Coding","authors":"Jongpil Kim, V. Pavlovic","doi":"10.1109/ICPR.2014.64","DOIUrl":"https://doi.org/10.1109/ICPR.2014.64","url":null,"abstract":"Roman coins play an important role to understand the Roman empire because they convey rich information about key historical events of the time. Moreover, as large amounts of coins are daily traded over the Internet, it becomes necessary to develop automatic coin recognition systems to prevent illegal trades. In this paper, we propose an automatic recognition method for ancient Roman coins. The proposed method exploits the structure of the coin by using a spatially local coding method. Results show that the proposed method outperforms traditional rigid spatial structure models such as the spatial pyramid.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"208 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115903863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Activity recognition in smart homes plays an important role in healthcare by maintaining the well being of elderly and patients through remote monitoring and assisted technologies. In this paper, we propose a two level classification approach for activity recognition by utilizing the information obtained from the sensors deployed in a smart home. In order to separates the similar activities from the non similar activities, we group the homogeneous activities using the Lloyd's clustering algorithm. For the classification of non-separated activities within each cluster, we apply a computationally less expensive learning algorithm Evidence Theoretic K-Nearest Neighbor, which performs better in uncertain conditions and noisy data. The approach enables us to achieve improved recognition accuracy particularly for overlapping activities. A comparison of the proposed approach with the existing activity recognition approaches is presented on two publicly available smart home datasets. The proposed approach demonstrates better recognition rate compared to the existing methods.
{"title":"Activity Recognition in Smart Homes Using Clustering Based Classification","authors":"L. Fahad, Syed Fahad Tahir, M. Rajarajan","doi":"10.1109/ICPR.2014.241","DOIUrl":"https://doi.org/10.1109/ICPR.2014.241","url":null,"abstract":"Activity recognition in smart homes plays an important role in healthcare by maintaining the well being of elderly and patients through remote monitoring and assisted technologies. In this paper, we propose a two level classification approach for activity recognition by utilizing the information obtained from the sensors deployed in a smart home. In order to separates the similar activities from the non similar activities, we group the homogeneous activities using the Lloyd's clustering algorithm. For the classification of non-separated activities within each cluster, we apply a computationally less expensive learning algorithm Evidence Theoretic K-Nearest Neighbor, which performs better in uncertain conditions and noisy data. The approach enables us to achieve improved recognition accuracy particularly for overlapping activities. A comparison of the proposed approach with the existing activity recognition approaches is presented on two publicly available smart home datasets. The proposed approach demonstrates better recognition rate compared to the existing methods.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132036921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}