Multi-lingual City Name Recognition for Indian Postal Automation

2012 International Conference on Frontiers in Handwriting Recognition Pub Date : 2012-09-18 DOI:10.1109/ICFHR.2012.238

U. Pal, Rami Kumar Roy, F. Kimura

{"title":"Multi-lingual City Name Recognition for Indian Postal Automation","authors":"U. Pal, Rami Kumar Roy, F. Kimura","doi":"10.1109/ICFHR.2012.238","DOIUrl":null,"url":null,"abstract":"Under three-language formula, the destination address block of postal document of an Indian state is generally written in three languages: English, Hindi and the State official language. From the statistical analysis we found that 12.37%, 76.32% and 10.21% postal documents are written in Bangla, English and Devanagari script, respectively. Because of inter-mixing of these scripts in postal address writings, it is very difficult to identify the script by which a city name is written. To avoid such script identification difficulties, in this paper we proposed a lexicon-driven method for multi-lingual (English, Hindi and Bangla) city name recognition for Indian postal automation. In the proposed scheme, at first, to take care of slanted handwriting of different individuals a slant correction technique is performed. Next, a water reservoir concept is applied to pre-segment the slant corrected city names into possible primitive components (characters or its parts). Pre-segmented components of a city name are then merged into possible characters to get the best city name using the lexicon information. In order to merge these primitive components into characters and to find optimum character segmentation, dynamic programming (DP) is applied using total likelihood of the characters of a city name as an objective function. We tested our system on 16132 Indian trilingual city names and 92.25% overall recognition accuracy was obtained.","PeriodicalId":291062,"journal":{"name":"2012 International Conference on Frontiers in Handwriting Recognition","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Frontiers in Handwriting Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFHR.2012.238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 32

Abstract

Under three-language formula, the destination address block of postal document of an Indian state is generally written in three languages: English, Hindi and the State official language. From the statistical analysis we found that 12.37%, 76.32% and 10.21% postal documents are written in Bangla, English and Devanagari script, respectively. Because of inter-mixing of these scripts in postal address writings, it is very difficult to identify the script by which a city name is written. To avoid such script identification difficulties, in this paper we proposed a lexicon-driven method for multi-lingual (English, Hindi and Bangla) city name recognition for Indian postal automation. In the proposed scheme, at first, to take care of slanted handwriting of different individuals a slant correction technique is performed. Next, a water reservoir concept is applied to pre-segment the slant corrected city names into possible primitive components (characters or its parts). Pre-segmented components of a city name are then merged into possible characters to get the best city name using the lexicon information. In order to merge these primitive components into characters and to find optimum character segmentation, dynamic programming (DP) is applied using total likelihood of the characters of a city name as an objective function. We tested our system on 16132 Indian trilingual city names and 92.25% overall recognition accuracy was obtained.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

印度邮政自动化的多语言城市名称识别

在三语模式下，印度邦邮政文件的目的地址块一般用英语、印地语和邦官方语言三种语言书写。从统计分析中我们发现，12.37%、76.32%和10.21%的邮政文件分别用孟加拉文、英文和德文书写。由于这些文字在邮政地址书写中相互混合，因此很难识别写城市名称的文字。为了避免这种脚本识别困难，本文提出了一种词典驱动的方法，用于印度邮政自动化的多语言(英语、印地语和孟加拉语)城市名称识别。在提出的方案中，首先，为了照顾不同个体的倾斜书写，执行了倾斜校正技术。接下来，应用水库概念将倾斜校正的城市名称预分割为可能的原始成分(字符或其部分)。然后将城市名称的预分割组件合并为可能的字符，以使用词典信息获得最佳城市名称。为了将这些原始成分合并成字符，并找到最优的字符分割，将城市名称字符的总似然作为目标函数，应用动态规划(DP)方法。我们对16132个印度三语城市名进行了测试，获得了92.25%的总体识别准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2012 International Conference on Frontiers in Handwriting Recognition

自引率

0.00%

发文量