在噪声索引卡图像中快速基于词典的单词识别

Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings. Pub Date : 2003-08-03 DOI:10.1109/ICDAR.2003.1227708

S. Lucas, Gregory Patoulas, A. Downton

{"title":"在噪声索引卡图像中快速基于词典的单词识别","authors":"S. Lucas, Gregory Patoulas, A. Downton","doi":"10.1109/ICDAR.2003.1227708","DOIUrl":null,"url":null,"abstract":"This paper describes a complete system for reading type-written lexicon words in noisy images - in this case museum index cards. The system is conceptually simple, and straightforward to implement. It involves three stages of processing. The first stage extracts row-regions from the image, where each row is a hypothesized line of text. The next stage scans an OCR classifier over each row image, creating a character hypothesis graph in the process. This graph is then searched using a priority-queue based algorithm for the best matches with a set of words (lexicon). Performance evaluation on a set of museum archive cards indicates competitive accuracy and also reasonable throughput. The priority queue algorithm is over two hundred times faster than using flat dynamic programming on these graphs.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"94 17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Fast lexicon-based word recognition in noisy index card images\",\"authors\":\"S. Lucas, Gregory Patoulas, A. Downton\",\"doi\":\"10.1109/ICDAR.2003.1227708\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes a complete system for reading type-written lexicon words in noisy images - in this case museum index cards. The system is conceptually simple, and straightforward to implement. It involves three stages of processing. The first stage extracts row-regions from the image, where each row is a hypothesized line of text. The next stage scans an OCR classifier over each row image, creating a character hypothesis graph in the process. This graph is then searched using a priority-queue based algorithm for the best matches with a set of words (lexicon). Performance evaluation on a set of museum archive cards indicates competitive accuracy and also reasonable throughput. The priority queue algorithm is over two hundred times faster than using flat dynamic programming on these graphs.\",\"PeriodicalId\":249193,\"journal\":{\"name\":\"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.\",\"volume\":\"94 17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-08-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2003.1227708\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2003.1227708","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

摘要

本文描述了一个完整的系统，用于在嘈杂的图像中阅读打字的词汇-在本例中是博物馆索引卡。该系统在概念上很简单，并且易于实现。它包括三个处理阶段。第一阶段从图像中提取行区域，其中每一行都是假设的文本行。下一阶段扫描每个行图像的OCR分类器，在此过程中创建一个字符假设图。然后使用基于优先级队列的算法搜索此图，以寻找与一组单词(词典)的最佳匹配。对一套博物馆档案卡的性能评估表明具有竞争力的准确性和合理的吞吐量。优先级队列算法比在这些图上使用平面动态规划快200多倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Fast lexicon-based word recognition in noisy index card images

This paper describes a complete system for reading type-written lexicon words in noisy images - in this case museum index cards. The system is conceptually simple, and straightforward to implement. It involves three stages of processing. The first stage extracts row-regions from the image, where each row is a hypothesized line of text. The next stage scans an OCR classifier over each row image, creating a character hypothesis graph in the process. This graph is then searched using a priority-queue based algorithm for the best matches with a set of words (lexicon). Performance evaluation on a set of museum archive cards indicates competitive accuracy and also reasonable throughput. The priority queue algorithm is over two hundred times faster than using flat dynamic programming on these graphs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.

自引率

0.00%

发文量

期刊最新文献

Impact of imperfect OCR on part-of-speech tagging Writer identification using innovative binarised features of handwritten numerals Word searching in CCITT group 4 compressed document images Exploiting reliability for dynamic selection of classi .ers by means of genetic algorithms Investigation of off-line Japanese signature verification using a pattern matching