Segmentation algorithm for Arabic handwritten text based on contour analysis

Yusra Osman
{"title":"Segmentation algorithm for Arabic handwritten text based on contour analysis","authors":"Yusra Osman","doi":"10.1109/ICCEEE.2013.6633980","DOIUrl":null,"url":null,"abstract":"Segmentation is the process of dividing the binary image into useful regions according to certain conditions. It is the most important phase in any optical character recognition (OCR) system and its accuracy affects significantly the recognition rate of that system. In cursive nature languages such as Arabic, the segmentation procedure is complicated especially in handwritten documents because writers' styles differs as well as the special cases of characters overlapping and ligatures. Hence, the design of the segmentation algorithms must be based on general descriptors that most writers follow. In this paper, a segmentation algorithm for Arabic handwriting has been developed. The main idea of the algorithm is to divide the selected image into lines and sub-words. Then, for each subword, the contour of each sub-word is traced. After that, the algorithm detects the exact points where the contour changes its state from a horizontal line to another state of vertical or curved line. Finally, the coordinates of these points are considered as the segmentation points. The algorithm was tested over the IFN/ENIT database words. Over 537 tested words containing 3222 character; the algorithm achieved 89.4% of correct character segmentation points.","PeriodicalId":256793,"journal":{"name":"2013 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRICAL AND ELECTRONIC ENGINEERING (ICCEEE)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRICAL AND ELECTRONIC ENGINEERING (ICCEEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCEEE.2013.6633980","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

Segmentation is the process of dividing the binary image into useful regions according to certain conditions. It is the most important phase in any optical character recognition (OCR) system and its accuracy affects significantly the recognition rate of that system. In cursive nature languages such as Arabic, the segmentation procedure is complicated especially in handwritten documents because writers' styles differs as well as the special cases of characters overlapping and ligatures. Hence, the design of the segmentation algorithms must be based on general descriptors that most writers follow. In this paper, a segmentation algorithm for Arabic handwriting has been developed. The main idea of the algorithm is to divide the selected image into lines and sub-words. Then, for each subword, the contour of each sub-word is traced. After that, the algorithm detects the exact points where the contour changes its state from a horizontal line to another state of vertical or curved line. Finally, the coordinates of these points are considered as the segmentation points. The algorithm was tested over the IFN/ENIT database words. Over 537 tested words containing 3222 character; the algorithm achieved 89.4% of correct character segmentation points.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于轮廓分析的阿拉伯文手写体文本分割算法
分割是将二值图像按照一定条件分割成有用区域的过程。它是光学字符识别(OCR)系统中最重要的一个阶段,其准确性直接影响到系统的识别率。在阿拉伯语等草书性质的语言中,由于写作者的风格不同以及字符重叠和结扎的特殊情况,切分过程非常复杂,特别是在手写文档中。因此,分割算法的设计必须基于大多数编写者遵循的通用描述符。本文提出了一种针对阿拉伯文笔迹的分割算法。该算法的主要思想是将选定的图像划分为线和子词。然后,对于每个子词,跟踪每个子词的轮廓。然后,算法检测轮廓从水平线状态转变为另一种垂直或曲线状态的精确点。最后,将这些点的坐标作为分割点。该算法在IFN/ENIT数据库单词上进行了测试。超过537个测试单词,包含3222个字符;该算法的字符分割正确率达到89.4%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Problems and future trends of software process improvement in some Sudanese software organizations Development of lightning risk assessment software in accordance with IEC 62305–2 Semantic web services for Nubian language A parallel computer system for the detection and classification of breast masses Detection of volatile compounds in urine using an electronic nose instrument
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1