{"title":"高精度波斯语字符分割与识别","authors":"Pantea Kiaei, Mojan Javaheripi, H. Mohammadzade","doi":"10.1109/IranianCEE.2019.8786480","DOIUrl":null,"url":null,"abstract":"Despite many advances in optical character recognition in general, there are still serious challenges remaining in recognizing Farsi text. The main reason is the cursive nature of the letters in written Farsi, i.e., depending on the position of a letter within a word, it might join to its neighboring letters, which consequently changes the shape of the character. As a result, each letter can have up to four different character shapes. In addition to the problem of segmenting the characters, the increased number of characters makes the recognition task even more challenging. This paper introduces a complete framework for character recognition, including a method for segmenting the characters and one for classifying the resulting separated characters. Character segmentation is performed using a new sliding-window algorithm with a high accuracy rate of 98.23%. With a total of 32 Farsi letters resulting in 114 character shapes, an almost perfect character recognition rate of 99.94% is achieved using the proposed Fisher characters method. The final system, including segmentation and recognition modules, achieves a recognition rate of 98.17% and is robust against the scale and rotation of the image, and the font size of the written text.","PeriodicalId":6683,"journal":{"name":"2019 27th Iranian Conference on Electrical Engineering (ICEE)","volume":"63 1","pages":"1692-1698"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"High Accuracy Farsi Language Character Segmentation and Recognition\",\"authors\":\"Pantea Kiaei, Mojan Javaheripi, H. Mohammadzade\",\"doi\":\"10.1109/IranianCEE.2019.8786480\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite many advances in optical character recognition in general, there are still serious challenges remaining in recognizing Farsi text. The main reason is the cursive nature of the letters in written Farsi, i.e., depending on the position of a letter within a word, it might join to its neighboring letters, which consequently changes the shape of the character. As a result, each letter can have up to four different character shapes. In addition to the problem of segmenting the characters, the increased number of characters makes the recognition task even more challenging. This paper introduces a complete framework for character recognition, including a method for segmenting the characters and one for classifying the resulting separated characters. Character segmentation is performed using a new sliding-window algorithm with a high accuracy rate of 98.23%. With a total of 32 Farsi letters resulting in 114 character shapes, an almost perfect character recognition rate of 99.94% is achieved using the proposed Fisher characters method. The final system, including segmentation and recognition modules, achieves a recognition rate of 98.17% and is robust against the scale and rotation of the image, and the font size of the written text.\",\"PeriodicalId\":6683,\"journal\":{\"name\":\"2019 27th Iranian Conference on Electrical Engineering (ICEE)\",\"volume\":\"63 1\",\"pages\":\"1692-1698\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 27th Iranian Conference on Electrical Engineering (ICEE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IranianCEE.2019.8786480\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 27th Iranian Conference on Electrical Engineering (ICEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IranianCEE.2019.8786480","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
High Accuracy Farsi Language Character Segmentation and Recognition
Despite many advances in optical character recognition in general, there are still serious challenges remaining in recognizing Farsi text. The main reason is the cursive nature of the letters in written Farsi, i.e., depending on the position of a letter within a word, it might join to its neighboring letters, which consequently changes the shape of the character. As a result, each letter can have up to four different character shapes. In addition to the problem of segmenting the characters, the increased number of characters makes the recognition task even more challenging. This paper introduces a complete framework for character recognition, including a method for segmenting the characters and one for classifying the resulting separated characters. Character segmentation is performed using a new sliding-window algorithm with a high accuracy rate of 98.23%. With a total of 32 Farsi letters resulting in 114 character shapes, an almost perfect character recognition rate of 99.94% is achieved using the proposed Fisher characters method. The final system, including segmentation and recognition modules, achieves a recognition rate of 98.17% and is robust against the scale and rotation of the image, and the font size of the written text.