Text - image separation in Devanagari documents

Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings. Pub Date : 2003-08-03 DOI:10.1109/ICDAR.2003.1227861

Swapnil Khedekar, V. Ramanaprasad, S. Setlur, V. Govindaraju

引用次数: 54

Abstract

In this paper we present a top-down, projection-profilebased algorithm to separate text blocks from image blocksin a Devanagari document. We use a distinctive feature ofDevanagari text, called Shirorekha (Header Line) to analyzethe pattern produced by Devanagari text in the horizontalprofile. The horizontal profile corresponding to a textblock possesses certain regularity in frequency, orientationand shows spatial cohesion. The algorithm uses these featuresto identify text blocks in a document image containingboth text and graphics.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Devanagari文档中的文本-图像分离

在本文中，我们提出了一种自上而下的，基于投影轮廓的算法来分离Devanagari文档中的文本块和图像块。我们使用梵文文本的一个独特特征，称为Shirorekha(标题行)来分析梵文文本在水平剖面中产生的模式。文本块对应的水平轮廓在频率、方向上具有一定的规律性，具有空间内聚性。该算法使用这些特征来识别包含文本和图形的文档图像中的文本块。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.

自引率

0.00%

发文量

期刊最新文献

Impact of imperfect OCR on part-of-speech tagging Writer identification using innovative binarised features of handwritten numerals Word searching in CCITT group 4 compressed document images Exploiting reliability for dynamic selection of classi .ers by means of genetic algorithms Investigation of off-line Japanese signature verification using a pattern matching