Multi-lingual City Name Recognition for Indian Postal Automation

U. Pal, Rami Kumar Roy, F. Kimura
{"title":"Multi-lingual City Name Recognition for Indian Postal Automation","authors":"U. Pal, Rami Kumar Roy, F. Kimura","doi":"10.1109/ICFHR.2012.238","DOIUrl":null,"url":null,"abstract":"Under three-language formula, the destination address block of postal document of an Indian state is generally written in three languages: English, Hindi and the State official language. From the statistical analysis we found that 12.37%, 76.32% and 10.21% postal documents are written in Bangla, English and Devanagari script, respectively. Because of inter-mixing of these scripts in postal address writings, it is very difficult to identify the script by which a city name is written. To avoid such script identification difficulties, in this paper we proposed a lexicon-driven method for multi-lingual (English, Hindi and Bangla) city name recognition for Indian postal automation. In the proposed scheme, at first, to take care of slanted handwriting of different individuals a slant correction technique is performed. Next, a water reservoir concept is applied to pre-segment the slant corrected city names into possible primitive components (characters or its parts). Pre-segmented components of a city name are then merged into possible characters to get the best city name using the lexicon information. In order to merge these primitive components into characters and to find optimum character segmentation, dynamic programming (DP) is applied using total likelihood of the characters of a city name as an objective function. We tested our system on 16132 Indian trilingual city names and 92.25% overall recognition accuracy was obtained.","PeriodicalId":291062,"journal":{"name":"2012 International Conference on Frontiers in Handwriting Recognition","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Frontiers in Handwriting Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFHR.2012.238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 32

Abstract

Under three-language formula, the destination address block of postal document of an Indian state is generally written in three languages: English, Hindi and the State official language. From the statistical analysis we found that 12.37%, 76.32% and 10.21% postal documents are written in Bangla, English and Devanagari script, respectively. Because of inter-mixing of these scripts in postal address writings, it is very difficult to identify the script by which a city name is written. To avoid such script identification difficulties, in this paper we proposed a lexicon-driven method for multi-lingual (English, Hindi and Bangla) city name recognition for Indian postal automation. In the proposed scheme, at first, to take care of slanted handwriting of different individuals a slant correction technique is performed. Next, a water reservoir concept is applied to pre-segment the slant corrected city names into possible primitive components (characters or its parts). Pre-segmented components of a city name are then merged into possible characters to get the best city name using the lexicon information. In order to merge these primitive components into characters and to find optimum character segmentation, dynamic programming (DP) is applied using total likelihood of the characters of a city name as an objective function. We tested our system on 16132 Indian trilingual city names and 92.25% overall recognition accuracy was obtained.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
印度邮政自动化的多语言城市名称识别
在三语模式下,印度邦邮政文件的目的地址块一般用英语、印地语和邦官方语言三种语言书写。从统计分析中我们发现,12.37%、76.32%和10.21%的邮政文件分别用孟加拉文、英文和德文书写。由于这些文字在邮政地址书写中相互混合,因此很难识别写城市名称的文字。为了避免这种脚本识别困难,本文提出了一种词典驱动的方法,用于印度邮政自动化的多语言(英语、印地语和孟加拉语)城市名称识别。在提出的方案中,首先,为了照顾不同个体的倾斜书写,执行了倾斜校正技术。接下来,应用水库概念将倾斜校正的城市名称预分割为可能的原始成分(字符或其部分)。然后将城市名称的预分割组件合并为可能的字符,以使用词典信息获得最佳城市名称。为了将这些原始成分合并成字符,并找到最优的字符分割,将城市名称字符的总似然作为目标函数,应用动态规划(DP)方法。我们对16132个印度三语城市名进行了测试,获得了92.25%的总体识别准确率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Off-Line Features Integration for On-Line Handwriting Graphemes Modeling Improvement Analysis of Different Subspace Mixture Models in Handwriting Recognition Structural Learning for Writer Identification in Offline Handwriting A Study of Handwritten Characters by Shape Descriptors: Doping Using the Freeman Code Dynamic Programming Matching with Global Features for Online Character Recognition
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1