Text identification in noisy document images using Markov random model

Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings. Pub Date : 2003-08-03 DOI:10.1109/ICDAR.2003.1227734

Yefeng Zheng, Huiping Li, D. Doermann

引用次数: 27

Abstract

In this paper we address the problem of the identification of text from noisy documents. We segment and identify handwriting from machine printed text because 1) handwriting in a document often indicates corrections, additions or other supplemental information that should be treated differently from the main body or body content, and 2) the segmentation and recognition techniques for machine printed text and handwriting are significantly different. Our novelty is that we treat noise as a separate class and model noise based on selected features. Trained Fisher classifiers are used to identify machine printed text and handwriting from noise. We further exploit context to refine the classification. A Markov random field (MRF) based approach is used to model the geometrical structure of the printed text, handwriting and noise to rectify the mis-classification. Experimental results show our approach is promising and robust, and can significantly improve the page segmentation results in noise documents.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于马尔可夫随机模型的噪声文档图像文本识别

在本文中，我们解决了从噪声文档中识别文本的问题。我们从机器打印文本中分割和识别手写，因为1)文档中的手写通常表示更正、添加或其他补充信息，这些信息应该与主体或主体内容区别对待，2)机器打印文本和手写的分割和识别技术有很大不同。我们的新颖之处在于，我们将噪声视为一个单独的类别，并基于选定的特征对噪声进行建模。经过训练的Fisher分类器用于从噪声中识别机器打印的文本和手写。我们进一步利用上下文来改进分类。基于马尔可夫随机场(MRF)的方法对打印文本、手写和噪声的几何结构进行建模，以纠正错误分类。实验结果表明，该方法具有良好的鲁棒性，可以显著改善噪声文档中的页面分割效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.

自引率

0.00%

发文量

期刊最新文献

Impact of imperfect OCR on part-of-speech tagging Writer identification using innovative binarised features of handwritten numerals Word searching in CCITT group 4 compressed document images Exploiting reliability for dynamic selection of classi .ers by means of genetic algorithms Investigation of off-line Japanese signature verification using a pattern matching