利用鲁棒特征数据库识别传真文档

Proceedings of the Fourth International Conference on Document Analysis and Recognition Pub Date : 1997-08-18 DOI:10.1109/ICDAR.1997.619886

G. Raza, A. Hennig, N. Sherkat, R. Whitrow

{"title":"利用鲁棒特征数据库识别传真文档","authors":"G. Raza, A. Hennig, N. Sherkat, R. Whitrow","doi":"10.1109/ICDAR.1997.619886","DOIUrl":null,"url":null,"abstract":"A method for the recognition of poor quality documents containing touching characters is presented. The method is based on extraction of independent and robust features of each object of a sample word, where objects consist of single letters or of several touching ones. Thus avoiding letter segmentation the method eliminates errors frequently introduced in segmentation based approaches. Features are attributed by their position and extent in order to facilitate discrimination between different classes of objects. A method for automatic construction of a comprehensive database is presented. From a given dictionary every possible letter combination is obtained and the images of the artificially touching letters created. These images are subjected to noise and their features extracted. For recognition, alternatives for each object are found based on the database. Object alternatives are then combined into valid word alternatives using lexicon lookup. It has been observed that the developed method is effective for the recognition of poor quality documents.","PeriodicalId":435320,"journal":{"name":"Proceedings of the Fourth International Conference on Document Analysis and Recognition","volume":"117 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Recognition of facsimile documents using a database of robust features\",\"authors\":\"G. Raza, A. Hennig, N. Sherkat, R. Whitrow\",\"doi\":\"10.1109/ICDAR.1997.619886\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A method for the recognition of poor quality documents containing touching characters is presented. The method is based on extraction of independent and robust features of each object of a sample word, where objects consist of single letters or of several touching ones. Thus avoiding letter segmentation the method eliminates errors frequently introduced in segmentation based approaches. Features are attributed by their position and extent in order to facilitate discrimination between different classes of objects. A method for automatic construction of a comprehensive database is presented. From a given dictionary every possible letter combination is obtained and the images of the artificially touching letters created. These images are subjected to noise and their features extracted. For recognition, alternatives for each object are found based on the database. Object alternatives are then combined into valid word alternatives using lexicon lookup. It has been observed that the developed method is effective for the recognition of poor quality documents.\",\"PeriodicalId\":435320,\"journal\":{\"name\":\"Proceedings of the Fourth International Conference on Document Analysis and Recognition\",\"volume\":\"117 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1997-08-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Fourth International Conference on Document Analysis and Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.1997.619886\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Fourth International Conference on Document Analysis and Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.1997.619886","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

提出了一种含有触摸字符的低质量文档的识别方法。该方法是基于提取样本词的每个对象的独立和鲁棒性特征，其中对象由单个字母或多个相邻字母组成。因此，该方法避免了字母分割，消除了基于分割的方法中经常引入的错误。特征是根据它们的位置和程度来归类的，以便于区分不同类别的物体。提出了一种自动构建综合数据库的方法。从给定的字典中获取所有可能的字母组合，并创建人工接触字母的图像。对这些图像进行噪声处理并提取其特征。为了识别，根据数据库找到每个对象的替代方案。然后使用词典查找将对象替代组合为有效的单词替代。结果表明，该方法对识别质量较差的文件是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Recognition of facsimile documents using a database of robust features

A method for the recognition of poor quality documents containing touching characters is presented. The method is based on extraction of independent and robust features of each object of a sample word, where objects consist of single letters or of several touching ones. Thus avoiding letter segmentation the method eliminates errors frequently introduced in segmentation based approaches. Features are attributed by their position and extent in order to facilitate discrimination between different classes of objects. A method for automatic construction of a comprehensive database is presented. From a given dictionary every possible letter combination is obtained and the images of the artificially touching letters created. These images are subjected to noise and their features extracted. For recognition, alternatives for each object are found based on the database. Object alternatives are then combined into valid word alternatives using lexicon lookup. It has been observed that the developed method is effective for the recognition of poor quality documents.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Fourth International Conference on Document Analysis and Recognition

自引率

0.00%

发文量

期刊最新文献

Document layout analysis based on emergent computation Offline handwritten Chinese character recognition via radical extraction and recognition Boundary normalization for recognition of non-touching non-degraded characters Words recognition using associative memory Image and text coupling for creating electronic books from manuscripts