Asmae Lamsaf, M. A. Kerroum, S. Boulaknadel, Y. Fakhri
{"title":"Lines segmentation and word extraction of Arabic handwritten text","authors":"Asmae Lamsaf, M. A. Kerroum, S. Boulaknadel, Y. Fakhri","doi":"10.1145/3286606.3286831","DOIUrl":null,"url":null,"abstract":"Words are often a succession of sub-words (characters, connected components) separated by spaces, in Arabic handwritten its spaces are divided into two types: the first type represents the spaces that separate two connected components of the same word (within-word). the second type are spaces that separate two connected components from two different words(between-words). in our work we designate by the second type. Spaces in Arabic handwriting do not respect any rule because each person has his own style of writing, which increases the difficulty of segmentation between words. The extraction of words based on the classification of spaces detected and extracts between-words spaces to segment the text into words. In this paper, we present a method that aims to compute the threshold for each line, the threshold is not fixed in the document, each line is associated its classification threshold spaces. Before segmenting the text image into words, it is necessary to segment it into lines in order to apply our method to each line of text. To extract the lines, the preprocessing is applied to the text images in order to apply the proposed method for the line segmentation step. Our system is applied on the benchmarking datasets of the Arabic handwriting database for text recognition (AHDB) and the experimental results are very promising as we achieved a success word extraction rate of 87.9%.","PeriodicalId":416459,"journal":{"name":"Proceedings of the 3rd International Conference on Smart City Applications","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Smart City Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3286606.3286831","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Words are often a succession of sub-words (characters, connected components) separated by spaces, in Arabic handwritten its spaces are divided into two types: the first type represents the spaces that separate two connected components of the same word (within-word). the second type are spaces that separate two connected components from two different words(between-words). in our work we designate by the second type. Spaces in Arabic handwriting do not respect any rule because each person has his own style of writing, which increases the difficulty of segmentation between words. The extraction of words based on the classification of spaces detected and extracts between-words spaces to segment the text into words. In this paper, we present a method that aims to compute the threshold for each line, the threshold is not fixed in the document, each line is associated its classification threshold spaces. Before segmenting the text image into words, it is necessary to segment it into lines in order to apply our method to each line of text. To extract the lines, the preprocessing is applied to the text images in order to apply the proposed method for the line segmentation step. Our system is applied on the benchmarking datasets of the Arabic handwriting database for text recognition (AHDB) and the experimental results are very promising as we achieved a success word extraction rate of 87.9%.