On Using Classical Poetry Structure for Indian Language Post-Processing

Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Pub Date : 2007-09-23 DOI:10.1109/ICDAR.2007.199

A. Namboodiri, P J Narayanan, C. V. Jawahar

引用次数: 13

Abstract

Post-processors are critical to the performance of language recognizers like OCRs, speech recognizers, etc. Dictionary-based post-processing commonly employ either an algorithmic approach or a statistical approach. Other linguistic features are not exploited for this purpose. The language analysis is also largely limited to the prose form. This paper proposes a framework to use the rich metric and formal structure of classical poetic forms in Indian languages for post-processing a recognizer like an OCR engine. We show that the structure present in the form of the vrtta and prasa can be efficiently used to disambiguate some cases that may be difficult for an OCR. The approach is efficient, and complementary to other post-processing approaches and can be used in conjunction with them.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

古典诗歌结构在印度语后处理中的应用

后置处理器对于语言识别器(如ocr、语音识别器等)的性能至关重要。基于字典的后处理通常采用算法方法或统计方法。其他语言特征没有被用于此目的。语言分析也主要局限于散文形式。本文提出了一个框架，利用印度语言古典诗歌形式丰富的韵律和形式结构对识别器进行后处理，如OCR引擎。我们表明，以vrta和prasa形式存在的结构可以有效地用于消除某些情况下可能难以用于OCR的歧义。该方法是有效的，是其他后处理方法的补充，可以与它们结合使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)

自引率

0.00%

发文量