Script Recognition is a challenging field for the recognition of documents in a multilingual country like India where different scripts are in use. For optical character recognition of such multilingual documents, it is necessary to separate blocks, lines, words and characters of different scripts before feeding them to the OCRs of individual scripts. Many approaches have been proposed by the researchers towards script recognition at different levels (Block, Line, Word and Character Level). Normally Indian documents, in any its state language contain English words mixed with other words in its own state language. In this paper, we extract three different types of features: Structural, Gabor and Discrete Cosine Transforms(DCT) Features from Isolated English and Gurmukhi words and compare their script recognition performance using three different classifiers: Support Vector Machine (SVM), k-Nearest Neighbor and Parzen Probabilistic Neural Network (PNN).
{"title":"Performance analysis of feature extractors and classifiers for script recognition of English and Gurmukhi words","authors":"Rajneesh Rani, R. Dhir, Gurpreet Singh Lehal","doi":"10.1145/2432553.2432559","DOIUrl":"https://doi.org/10.1145/2432553.2432559","url":null,"abstract":"Script Recognition is a challenging field for the recognition of documents in a multilingual country like India where different scripts are in use. For optical character recognition of such multilingual documents, it is necessary to separate blocks, lines, words and characters of different scripts before feeding them to the OCRs of individual scripts. Many approaches have been proposed by the researchers towards script recognition at different levels (Block, Line, Word and Character Level). Normally Indian documents, in any its state language contain English words mixed with other words in its own state language. In this paper, we extract three different types of features: Structural, Gabor and Discrete Cosine Transforms(DCT) Features from Isolated English and Gurmukhi words and compare their script recognition performance using three different classifiers: Support Vector Machine (SVM), k-Nearest Neighbor and Parzen Probabilistic Neural Network (PNN).","PeriodicalId":410986,"journal":{"name":"DAR '12","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132114771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rim Walha, Fadoua Drira, Frank Lebourgeois, A. Alimi
This paper addresses the problem of generating a super-resolved text image from a single low-resolution image. The proposed Super-Resolution (SR) method is based on sparse coding which suggests that image patches can be well represented as a sparse linear combination of elements from a suitably chosen learned dictionary. Toward this strategy, a High-Resolution/Low-Resolution (HR/LR) patch pair data base is collected from high quality character images. To our knowledge, it is the first generic database allowing SR of text images may be contained in documents, signs, labels, bills, etc. This database is used to train jointly two dictionaries. The sparse representation of a LR image patch from the first dictionary can be applied to generate a HR image patch from the second dictionary. The performance of such approach is evaluated and compared visually and quantitatively to other existing SR methods applied to text images. In addition, we examine the influence of text image resolution on automatic recognition performance and we further justify the effectiveness of the proposed SR method compared to others.
{"title":"Super-resolution of single text image by sparse representation","authors":"Rim Walha, Fadoua Drira, Frank Lebourgeois, A. Alimi","doi":"10.1145/2432553.2432558","DOIUrl":"https://doi.org/10.1145/2432553.2432558","url":null,"abstract":"This paper addresses the problem of generating a super-resolved text image from a single low-resolution image. The proposed Super-Resolution (SR) method is based on sparse coding which suggests that image patches can be well represented as a sparse linear combination of elements from a suitably chosen learned dictionary. Toward this strategy, a High-Resolution/Low-Resolution (HR/LR) patch pair data base is collected from high quality character images. To our knowledge, it is the first generic database allowing SR of text images may be contained in documents, signs, labels, bills, etc. This database is used to train jointly two dictionaries. The sparse representation of a LR image patch from the first dictionary can be applied to generate a HR image patch from the second dictionary. The performance of such approach is evaluated and compared visually and quantitatively to other existing SR methods applied to text images. In addition, we examine the influence of text image resolution on automatic recognition performance and we further justify the effectiveness of the proposed SR method compared to others.","PeriodicalId":410986,"journal":{"name":"DAR '12","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128016871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sanjoy Pratihar, Partha Bhowmick, S. Sural, J. Mukhopadhyay
A novel algorithm for detection and removal of underlines present in a scanned document page is proposed. The underlines treated here are hand-drawn and of various patterns. One of the important features of these underlines is that they are drawn by hand in almost a horizontal fashion. To locate these underlines, we detect the edges of their covers as a sequence of approximately straight segments, which are grown horizontally. The novelty of the algorithm lies in the detection of almost straight segments from the boundary edge map of the underline parts. After getting the exact cover of the underlines, an effective strategy is taken for underline removal. Experimental results are given to show the efficiency and robustness of the method.
{"title":"Detection and removal of hand-drawn underlines in a document image using approximate digital straightness","authors":"Sanjoy Pratihar, Partha Bhowmick, S. Sural, J. Mukhopadhyay","doi":"10.1145/2432553.2432576","DOIUrl":"https://doi.org/10.1145/2432553.2432576","url":null,"abstract":"A novel algorithm for detection and removal of underlines present in a scanned document page is proposed. The underlines treated here are hand-drawn and of various patterns. One of the important features of these underlines is that they are drawn by hand in almost a horizontal fashion. To locate these underlines, we detect the edges of their covers as a sequence of approximately straight segments, which are grown horizontally. The novelty of the algorithm lies in the detection of almost straight segments from the boundary edge map of the underline parts. After getting the exact cover of the underlines, an effective strategy is taken for underline removal. Experimental results are given to show the efficiency and robustness of the method.","PeriodicalId":410986,"journal":{"name":"DAR '12","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130960505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Soumyadeep Dey, J. Mukhopadhyay, S. Sural, Partha Bhowmick
In this paper, we propose a technique for removing margin noise (both textual and non-textual noise) from scanned document images. We perform layout analysis to detect words, lines, and paragraphs in the document image. These detected elements are classified into text and non-text components on the basis of their characteristics (size, position, etc.). The geometric properties of the text blocks are sought to detect and remove the margin noise. We evaluate our algorithm on several scanned pages of Bengali literature books.
{"title":"Margin noise removal from printed document images","authors":"Soumyadeep Dey, J. Mukhopadhyay, S. Sural, Partha Bhowmick","doi":"10.1145/2432553.2432570","DOIUrl":"https://doi.org/10.1145/2432553.2432570","url":null,"abstract":"In this paper, we propose a technique for removing margin noise (both textual and non-textual noise) from scanned document images. We perform layout analysis to detect words, lines, and paragraphs in the document image. These detected elements are classified into text and non-text components on the basis of their characteristics (size, position, etc.). The geometric properties of the text blocks are sought to detect and remove the margin noise. We evaluate our algorithm on several scanned pages of Bengali literature books.","PeriodicalId":410986,"journal":{"name":"DAR '12","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131106642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Performance evaluation of End-to-End OCR systems of Indic scripts requires matching of UNICODE sequences of OCR output and ground truth. In the literature, Levenshtein edit distance has been used to compute error rates of OCR systems but the accuracies are not explicitly reported. In the present work, we have proposed an accuracy measure based on edit distance and used it in conjunction with error rate to report the performance of an OCR system. We have analyzed the relationship between accuracy and error rates in a quantitative manner. Our analysis has shown that accuracy and error rate are independent of each other and so both are needed to report complete performance of an OCR system. Proposed approach is applicable to all the Indic scripts and the experimental results on different scripts like Devanagari, Telugu, Kannada etc. are shown.
{"title":"On performance analysis of end-to-end OCR systems of Indic scripts","authors":"P. P. Kumar, C. Bhagvati, A. Agarwal","doi":"10.1145/2432553.2432577","DOIUrl":"https://doi.org/10.1145/2432553.2432577","url":null,"abstract":"Performance evaluation of End-to-End OCR systems of Indic scripts requires matching of UNICODE sequences of OCR output and ground truth. In the literature, Levenshtein edit distance has been used to compute error rates of OCR systems but the accuracies are not explicitly reported. In the present work, we have proposed an accuracy measure based on edit distance and used it in conjunction with error rate to report the performance of an OCR system. We have analyzed the relationship between accuracy and error rates in a quantitative manner. Our analysis has shown that accuracy and error rate are independent of each other and so both are needed to report complete performance of an OCR system. Proposed approach is applicable to all the Indic scripts and the experimental results on different scripts like Devanagari, Telugu, Kannada etc. are shown.","PeriodicalId":410986,"journal":{"name":"DAR '12","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114947704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Belhe, Chetan Paulzagade, Akash Deshmukh, Saumya Jetley, Kapil Mehrotra
The proposed approach performs recognition of online handwritten isolated Hindi words using a combination of HMMs trained on Devanagari symbols and a tree formed by the multiple, possible sequences of recognized symbols. In general, words in Indic languages are composed of a number of aksharas or syllables, which in turn are formed by groups of consonants and vowel modifiers. Segmentation of aksharas is critical to accurate recognition of both recognition primitives as well as the complete word. Also, recognition in itself is an intricate job. This holistic task of akshara segmentation, symbol identification and subsequent word recognition is targeted in our work. It is handled in an integrated segmentation-recognition framework. By making use of online stroke information for postulating symbol candidates and deriving HOG feature set from their image counterparts, the recognition becomes independent of stroke order and stroke shape variations. Thus, the system is well suited to unconstrained handwriting. Data for this work is collected from different parts of India where Hindi language is predominantly in use. Symbols extracted from 60,000 words are used to train and test 140 symbol-HMM models. The system is designed to output one or more candidate words to the user, by tracing multiple tree paths (up to leaf nodes) under the condition that the symbol likelihood (confidence score) at every node is above threshold. Tests performed on 10,000 words yield an accuracy of 89%.
{"title":"Hindi handwritten word recognition using HMM and symbol tree","authors":"S. Belhe, Chetan Paulzagade, Akash Deshmukh, Saumya Jetley, Kapil Mehrotra","doi":"10.1145/2432553.2432556","DOIUrl":"https://doi.org/10.1145/2432553.2432556","url":null,"abstract":"The proposed approach performs recognition of online handwritten isolated Hindi words using a combination of HMMs trained on Devanagari symbols and a tree formed by the multiple, possible sequences of recognized symbols.\u0000 In general, words in Indic languages are composed of a number of aksharas or syllables, which in turn are formed by groups of consonants and vowel modifiers. Segmentation of aksharas is critical to accurate recognition of both recognition primitives as well as the complete word. Also, recognition in itself is an intricate job. This holistic task of akshara segmentation, symbol identification and subsequent word recognition is targeted in our work. It is handled in an integrated segmentation-recognition framework. By making use of online stroke information for postulating symbol candidates and deriving HOG feature set from their image counterparts, the recognition becomes independent of stroke order and stroke shape variations. Thus, the system is well suited to unconstrained handwriting.\u0000 Data for this work is collected from different parts of India where Hindi language is predominantly in use. Symbols extracted from 60,000 words are used to train and test 140 symbol-HMM models. The system is designed to output one or more candidate words to the user, by tracing multiple tree paths (up to leaf nodes) under the condition that the symbol likelihood (confidence score) at every node is above threshold. Tests performed on 10,000 words yield an accuracy of 89%.","PeriodicalId":410986,"journal":{"name":"DAR '12","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131049408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anand Mishra, Naveen Sankaran, Viresh Ranjan, C. V. Jawahar
Text line segmentation is a basic step in any OCR system. Its failure deteriorates the performance of OCR engines. This is especially true for the Indian languages due to the nature of scripts. Many segmentation algorithms are proposed in literature. Often these algorithms fail to adapt dynamically to a given page and thus tend to yield poor segmentation for some specific regions or some specific pages. In this work we design a text line segmentation post processor which automatically localizes and corrects the segmentation errors. The proposed segmentation post processor, which works in a "learning by examples" framework, is not only independent to segmentation algorithms but also robust to the diversity of scanned pages. We show over 5% improvement in text line segmentation on a large dataset of scanned pages for multiple Indian languages.
{"title":"Automatic localization and correction of line segmentation errors","authors":"Anand Mishra, Naveen Sankaran, Viresh Ranjan, C. V. Jawahar","doi":"10.1145/2432553.2432555","DOIUrl":"https://doi.org/10.1145/2432553.2432555","url":null,"abstract":"Text line segmentation is a basic step in any OCR system. Its failure deteriorates the performance of OCR engines. This is especially true for the Indian languages due to the nature of scripts. Many segmentation algorithms are proposed in literature. Often these algorithms fail to adapt dynamically to a given page and thus tend to yield poor segmentation for some specific regions or some specific pages. In this work we design a text line segmentation post processor which automatically localizes and corrects the segmentation errors. The proposed segmentation post processor, which works in a \"learning by examples\" framework, is not only independent to segmentation algorithms but also robust to the diversity of scanned pages.\u0000 We show over 5% improvement in text line segmentation on a large dataset of scanned pages for multiple Indian languages.","PeriodicalId":410986,"journal":{"name":"DAR '12","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129080933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Dutta, Aruni Roy Chowdhury, U. Bhattacharya, S. K. Parui
Here, we present our recent attempt to develop a lightweight handwriting recognizer suitable for resource constrained handheld devices. Such an application requires real-time recognition of handwritten characters produced on their touchscreens. The proposed approach is well suited for minimal user-lag on devices having only limited computing power in sharp contrast to standard laptops or desktop computers. Moreover, the approach is user-adaptive in the sense that it can adapt through user corrections to wrong predictions. With an increasing number of interactive corrections by the user, the recognition accuracy improves significantly. An input stroke is first re-sampled generating a fixed small number of sample points such that at most two critical points (points corresponding to high curvature) are preserved. We use their x- and y-coordinates as the feature vector and do not compute any other high-level feature vector. The squared Mahalanobis distance is used to identify each stroke of the input sample as one of several stroke categories pre-determined based on a large pool of training samples. The inverted covariance matrix and mean vector for a stroke class that are required for computing the Mahalanobis distance are pre-calculated and stored as Serialized Objects on the SD card of the device. A Look-Up Table (LUT) of stroke combinations as keys and corresponding character class as values is used for the final Unicode character output. In case of an incorrect character output, user corrections are used to automatically update the LUT adapting to the user's particular handwriting style.
{"title":"Lightweight user-adaptive handwriting recognizer for resource constrained handheld devices","authors":"D. Dutta, Aruni Roy Chowdhury, U. Bhattacharya, S. K. Parui","doi":"10.1145/2432553.2432574","DOIUrl":"https://doi.org/10.1145/2432553.2432574","url":null,"abstract":"Here, we present our recent attempt to develop a lightweight handwriting recognizer suitable for resource constrained handheld devices. Such an application requires real-time recognition of handwritten characters produced on their touchscreens. The proposed approach is well suited for minimal user-lag on devices having only limited computing power in sharp contrast to standard laptops or desktop computers. Moreover, the approach is user-adaptive in the sense that it can adapt through user corrections to wrong predictions. With an increasing number of interactive corrections by the user, the recognition accuracy improves significantly. An input stroke is first re-sampled generating a fixed small number of sample points such that at most two critical points (points corresponding to high curvature) are preserved. We use their x- and y-coordinates as the feature vector and do not compute any other high-level feature vector. The squared Mahalanobis distance is used to identify each stroke of the input sample as one of several stroke categories pre-determined based on a large pool of training samples. The inverted covariance matrix and mean vector for a stroke class that are required for computing the Mahalanobis distance are pre-calculated and stored as Serialized Objects on the SD card of the device. A Look-Up Table (LUT) of stroke combinations as keys and corresponding character class as values is used for the final Unicode character output. In case of an incorrect character output, user corrections are used to automatically update the LUT adapting to the user's particular handwriting style.","PeriodicalId":410986,"journal":{"name":"DAR '12","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132354328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a novel method for detection and extraction of contents of table cells from handwritten document images. Given a model of the table and a document image containing a table, the hand-drawn or pre-printed table is detected and the contents of the table cells are extracted automatically. The algorithms described are designed to handle degraded binary document images. The target images may include a wide variety of noise, ranging from clutter noise, salt-and-pepper noise to non-text objects such as graphics and logos. The presented algorithm effectively eliminates extraneous noise and identifies potentially matching table layout candidates by detecting horizontal and vertical table line candidates. A table is represented as a matrix based on the locations of intersections of horizontal and vertical table lines, and a matching algorithm searches for the best table structure that matches the given layout model and using the matching score to eliminate spurious table line candidates. The optimally matched table candidate is then used for cell content extraction. This method was tested on a set of document page images containing tables from the challenge set of the DARPA MADCAT Arabic handwritten document image data. Preliminary results indicate that the method is effective and is capable of reliably extracting text from the table cells.
{"title":"Model based table cell detection and content extraction from degraded document images","authors":"Zhixin Shi, S. Setlur, V. Govindaraju","doi":"10.1145/2432553.2432565","DOIUrl":"https://doi.org/10.1145/2432553.2432565","url":null,"abstract":"This paper describes a novel method for detection and extraction of contents of table cells from handwritten document images. Given a model of the table and a document image containing a table, the hand-drawn or pre-printed table is detected and the contents of the table cells are extracted automatically. The algorithms described are designed to handle degraded binary document images. The target images may include a wide variety of noise, ranging from clutter noise, salt-and-pepper noise to non-text objects such as graphics and logos.\u0000 The presented algorithm effectively eliminates extraneous noise and identifies potentially matching table layout candidates by detecting horizontal and vertical table line candidates. A table is represented as a matrix based on the locations of intersections of horizontal and vertical table lines, and a matching algorithm searches for the best table structure that matches the given layout model and using the matching score to eliminate spurious table line candidates. The optimally matched table candidate is then used for cell content extraction.\u0000 This method was tested on a set of document page images containing tables from the challenge set of the DARPA MADCAT Arabic handwritten document image data. Preliminary results indicate that the method is effective and is capable of reliably extracting text from the table cells.","PeriodicalId":410986,"journal":{"name":"DAR '12","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114165693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Offline handwritten character recognition (OHCR) is the method of converting handwritten text into machine processable layout. Since late sixties, efforts have been made for offline handwritten character recognition throughout the world. Principal Component Analysis (PCA) has also been used for extracting representative features for character recognition. In order to assess the prominence of features in offline handwritten Gurmukhi character recognition, we have recognized offline handwritten Gurmukhi characters with different combinations of features and classifiers. The recognition system first sets up a skeleton of the character so that significant feature information about the character can be extracted. For the purpose of classification, we have used k-NN, Linear-SVM, Polynomial-SVM and RBF-SVM based approaches. In present work, we have collected 7,000 samples of isolated offline handwritten Gurmukhi characters from 200 different writers. The set of basic 35 akhars of Gurmukhi has been considered here. A partitioning policy for selecting the training and testing patterns has also been experimented in present work. We have used zoning feature; diagonal feature; directional feature; intersection and open end points feature; transition feature; parabola curve fitting based feature and power curve fitting based feature extraction technique in order to find the feature set for a given character. The proposed system achieves a recognition accuracy of 94.8% when PCA is not applied and a recognition accuracy of 97.7% when PCA is applied.
{"title":"Offline handwritten Gurmukhi character recognition: study of different feature-classifier combinations","authors":"Munish Kumar, R. Sharma, M. Jindal","doi":"10.1145/2432553.2432571","DOIUrl":"https://doi.org/10.1145/2432553.2432571","url":null,"abstract":"Offline handwritten character recognition (OHCR) is the method of converting handwritten text into machine processable layout. Since late sixties, efforts have been made for offline handwritten character recognition throughout the world. Principal Component Analysis (PCA) has also been used for extracting representative features for character recognition. In order to assess the prominence of features in offline handwritten Gurmukhi character recognition, we have recognized offline handwritten Gurmukhi characters with different combinations of features and classifiers. The recognition system first sets up a skeleton of the character so that significant feature information about the character can be extracted. For the purpose of classification, we have used k-NN, Linear-SVM, Polynomial-SVM and RBF-SVM based approaches. In present work, we have collected 7,000 samples of isolated offline handwritten Gurmukhi characters from 200 different writers. The set of basic 35 akhars of Gurmukhi has been considered here. A partitioning policy for selecting the training and testing patterns has also been experimented in present work. We have used zoning feature; diagonal feature; directional feature; intersection and open end points feature; transition feature; parabola curve fitting based feature and power curve fitting based feature extraction technique in order to find the feature set for a given character. The proposed system achieves a recognition accuracy of 94.8% when PCA is not applied and a recognition accuracy of 97.7% when PCA is applied.","PeriodicalId":410986,"journal":{"name":"DAR '12","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127608031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}