A bilingual Gurmukhi-English OCR based on multiple script identifiers and language models

MOCR '13 Pub Date : 2013-08-24 DOI:10.1145/2505377.2505381
Gurpreet Singh Lehal
{"title":"A bilingual Gurmukhi-English OCR based on multiple script identifiers and language models","authors":"Gurpreet Singh Lehal","doi":"10.1145/2505377.2505381","DOIUrl":null,"url":null,"abstract":"English words are frequently encountered in Gurmukhi texts. A monolingual Gurmukhi OCR will recognize such words as garbage. It becomes necessary to add bilingual capability to the Gurmukhi OCR to recognize English text too. But adding bilingual capability reduces the recognition accuracy for monolingual texts due to errors in script identification. Even a system with 99% script identification accuracy results in reduction of 1% recognition accuracy on monolingual text. In this paper, we present a bilingual OCR, which recognizes both English and Gurmukhi scripts without any significant reduction in recognition accuracy as compared to the monolingual Gurmukhi OCR when recognizing monolingual Gurmukhi text. This is achieved by using multiple script identification engines and language models for both English and Gurmukhi scripts. For the first time, such a system has been developed, which recognizes with high accuracy document images containing mixed Gurmukhi and English text or only Gurmukhi/English text.","PeriodicalId":288465,"journal":{"name":"MOCR '13","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MOCR '13","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2505377.2505381","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

English words are frequently encountered in Gurmukhi texts. A monolingual Gurmukhi OCR will recognize such words as garbage. It becomes necessary to add bilingual capability to the Gurmukhi OCR to recognize English text too. But adding bilingual capability reduces the recognition accuracy for monolingual texts due to errors in script identification. Even a system with 99% script identification accuracy results in reduction of 1% recognition accuracy on monolingual text. In this paper, we present a bilingual OCR, which recognizes both English and Gurmukhi scripts without any significant reduction in recognition accuracy as compared to the monolingual Gurmukhi OCR when recognizing monolingual Gurmukhi text. This is achieved by using multiple script identification engines and language models for both English and Gurmukhi scripts. For the first time, such a system has been developed, which recognizes with high accuracy document images containing mixed Gurmukhi and English text or only Gurmukhi/English text.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于多个脚本标识符和语言模型的双语Gurmukhi-English OCR
英语单词经常出现在Gurmukhi文本中。单语古尔穆克语OCR将把这些词识别为垃圾。有必要在古慕克语OCR中增加双语功能来识别英语文本。但是,增加双语功能会降低单语文本的识别精度,这是由于脚本识别中的错误。即使系统具有99%的文字识别准确率,对单语文本的识别准确率也会降低1%。在本文中,我们提出了一种双语OCR,它在识别单语廓尔穆克语文本时,与单语廓尔穆克语OCR相比,同时识别英语和廓尔穆克语脚本,而识别准确率没有明显降低。这是通过为英语和Gurmukhi脚本使用多个脚本识别引擎和语言模型来实现的。该系统首次实现了对英语和廓尔穆克语混合文本或只有廓尔穆克语/英语文本的文档图像的高精度识别。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Can we build language-independent OCR using LSTM networks? Recognition of offline handwritten numerals using an ensemble of MLPs combined by Adaboost Word level script recognition for Uighur document mixed with English script An approach for Bangla and Devanagari video text recognition HMM-based script identification for OCR
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1