走向自动抄写叙利亚文笔迹

12th International Conference on Image Analysis and Processing, 2003.Proceedings. Pub Date : 2003-09-17 DOI:10.1109/ICIAP.2003.1234126

W. Clocksin, P. P. Fernando

{"title":"走向自动抄写叙利亚文笔迹","authors":"W. Clocksin, P. P. Fernando","doi":"10.1109/ICIAP.2003.1234126","DOIUrl":null,"url":null,"abstract":"We describe a method implemented for the recognition of Syriac handwriting from historical manuscripts. The Syriac language has been a neglected area for handwriting recognition research, yet is interesting because the preponderance of scribe-written manuscripts offers a challenging yet tractable medium for OCR research between the extremes of typewritten text and free handwriting. Like Arabic, Syriac is written in a cursive form from right-to-left, and letter shape depends on the position within the word. The method described does not need to find character strokes or contours. Both whole words and character shapes were used in recognition experiments. After segmentation using a novel probabilistic method, features of these shapes are found that tolerate variation in formation and image quality. Each shape is recognised individually using a discriminative support vector machine with 10-fold cross-validation. We describe experiments using a variety of segmentation methods and combinations of features on characters and words. Images from scribe-written historical manuscripts are used, and the recognition results are compared with those for images taken from clearer 19th century typeset documents. Recognition rates vary from 61-100%, depending on the algorithms used and the size and source of the data set.","PeriodicalId":218076,"journal":{"name":"12th International Conference on Image Analysis and Processing, 2003.Proceedings.","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":"{\"title\":\"Towards automatic transcription of Syriac handwriting\",\"authors\":\"W. Clocksin, P. P. Fernando\",\"doi\":\"10.1109/ICIAP.2003.1234126\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We describe a method implemented for the recognition of Syriac handwriting from historical manuscripts. The Syriac language has been a neglected area for handwriting recognition research, yet is interesting because the preponderance of scribe-written manuscripts offers a challenging yet tractable medium for OCR research between the extremes of typewritten text and free handwriting. Like Arabic, Syriac is written in a cursive form from right-to-left, and letter shape depends on the position within the word. The method described does not need to find character strokes or contours. Both whole words and character shapes were used in recognition experiments. After segmentation using a novel probabilistic method, features of these shapes are found that tolerate variation in formation and image quality. Each shape is recognised individually using a discriminative support vector machine with 10-fold cross-validation. We describe experiments using a variety of segmentation methods and combinations of features on characters and words. Images from scribe-written historical manuscripts are used, and the recognition results are compared with those for images taken from clearer 19th century typeset documents. Recognition rates vary from 61-100%, depending on the algorithms used and the size and source of the data set.\",\"PeriodicalId\":218076,\"journal\":{\"name\":\"12th International Conference on Image Analysis and Processing, 2003.Proceedings.\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"35\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"12th International Conference on Image Analysis and Processing, 2003.Proceedings.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIAP.2003.1234126\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"12th International Conference on Image Analysis and Processing, 2003.Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIAP.2003.1234126","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 35

摘要

我们描述了一种从历史手稿中识别叙利亚笔迹的方法。在手写识别研究中，叙利亚语一直是一个被忽视的领域，但它很有趣，因为大量的抄写手稿为OCR研究提供了一种具有挑战性但易于处理的媒介，介于打字文本和自由手写之间。和阿拉伯语一样，叙利亚语也是草书形式，从右向左书写，字母的形状取决于在单词中的位置。所描述的方法不需要查找字符笔画或轮廓。在识别实验中采用了整词和汉字形状两种方法。在使用一种新的概率方法分割后，发现这些形状的特征可以容忍信息和图像质量的变化。使用具有10倍交叉验证的判别支持向量机单独识别每个形状。我们描述了使用各种分割方法和字符和单词特征组合的实验。使用了抄写历史手稿中的图像，并将识别结果与从更清晰的19世纪排版文件中获取的图像进行了比较。识别率从61-100%不等，取决于所使用的算法以及数据集的大小和来源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Towards automatic transcription of Syriac handwriting

We describe a method implemented for the recognition of Syriac handwriting from historical manuscripts. The Syriac language has been a neglected area for handwriting recognition research, yet is interesting because the preponderance of scribe-written manuscripts offers a challenging yet tractable medium for OCR research between the extremes of typewritten text and free handwriting. Like Arabic, Syriac is written in a cursive form from right-to-left, and letter shape depends on the position within the word. The method described does not need to find character strokes or contours. Both whole words and character shapes were used in recognition experiments. After segmentation using a novel probabilistic method, features of these shapes are found that tolerate variation in formation and image quality. Each shape is recognised individually using a discriminative support vector machine with 10-fold cross-validation. We describe experiments using a variety of segmentation methods and combinations of features on characters and words. Images from scribe-written historical manuscripts are used, and the recognition results are compared with those for images taken from clearer 19th century typeset documents. Recognition rates vary from 61-100%, depending on the algorithms used and the size and source of the data set.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

12th International Conference on Image Analysis and Processing, 2003.Proceedings.

自引率

0.00%

发文量

期刊最新文献

Classification method for colored natural textures using Gabor filtering Perceptive visual texture classification and retrieval Deferring range/domain comparisons in fractal image compression Modeling the world: the virtualization pipeline A graphics hardware implementation of the generalized Hough transform for fast object recognition, scale, and 3D pose detection