{"title":"Primitive printed Arabic Optical Character Recognition using statistical features","authors":"Mohamed Dahi, N. Semary, M. Hadhoud","doi":"10.1109/INTELCIS.2015.7397278","DOIUrl":null,"url":null,"abstract":"Due to the several forms of different Arabic font types, Arabic character recognition is still a challenge. Most literature works consider only one font per text what results in low recognition accuracy. This paper tends to enhance the accuracy of AOCR (Arabic Optical Character Recognition) by considering an automatic Optical Font Recognition (OFR) stage before going ahead with the traditional OCR stages. This has been achieved using SIFT (Scale Invariant Feature Transform) descriptors. First, a comparative study of four most recent algorithms of primitive OCR has been performed to evaluate the different features and classifiers utilized in their systems. Accordingly, a combining of statistical features have been proposed as well as selecting Random Forest Tree classifier for classification stage. The combination of the features are used to train the classifiers. As a result, each recognized text font is directed to a specific classifier tree. The proposed system was tested on a generated Primitive Arabic Characters Noise Free dataset (PAC-NF) containing 30000 samples. Experimental results achieved a promising character recognition accuracy of 99.8-100%.","PeriodicalId":6478,"journal":{"name":"2015 IEEE Seventh International Conference on Intelligent Computing and Information Systems (ICICIS)","volume":"1 1","pages":"567-571"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE Seventh International Conference on Intelligent Computing and Information Systems (ICICIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INTELCIS.2015.7397278","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Due to the several forms of different Arabic font types, Arabic character recognition is still a challenge. Most literature works consider only one font per text what results in low recognition accuracy. This paper tends to enhance the accuracy of AOCR (Arabic Optical Character Recognition) by considering an automatic Optical Font Recognition (OFR) stage before going ahead with the traditional OCR stages. This has been achieved using SIFT (Scale Invariant Feature Transform) descriptors. First, a comparative study of four most recent algorithms of primitive OCR has been performed to evaluate the different features and classifiers utilized in their systems. Accordingly, a combining of statistical features have been proposed as well as selecting Random Forest Tree classifier for classification stage. The combination of the features are used to train the classifiers. As a result, each recognized text font is directed to a specific classifier tree. The proposed system was tested on a generated Primitive Arabic Characters Noise Free dataset (PAC-NF) containing 30000 samples. Experimental results achieved a promising character recognition accuracy of 99.8-100%.