基于乌尔都纳萨克语字符识别的自组织地图

2009 International Conference on Emerging Technologies Pub Date : 2009-12-11 DOI:10.1109/ICET.2009.5353161

S. A. Hussain, S. Zaman, M. Ayub

{"title":"基于乌尔都纳萨克语字符识别的自组织地图","authors":"S. A. Hussain, S. Zaman, M. Ayub","doi":"10.1109/ICET.2009.5353161","DOIUrl":null,"url":null,"abstract":"Research in the field of character recognition for Urdu script faces challenges mainly due to its characteristics, like cursive nature, multiple fonts and context dependent shapes of characters and their position with respect to the base line. This paper addresses problems recognizing Nasakh script of Urdu Language. The proposed system takes segmented character as input and recognizes them in two steps. In the first step the different shapes of each character are classifies into 33 categories using Kohonen Self-organizing Map (SOM) by auto clustering similar ligatures for initial classification. During the Feature Extraction phase more than twenty five different features are extracted from each character which are further processed for final character recognition.","PeriodicalId":307661,"journal":{"name":"2009 International Conference on Emerging Technologies","volume":"79 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"A Self Organizing Map based Urdu Nasakh character recognition\",\"authors\":\"S. A. Hussain, S. Zaman, M. Ayub\",\"doi\":\"10.1109/ICET.2009.5353161\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Research in the field of character recognition for Urdu script faces challenges mainly due to its characteristics, like cursive nature, multiple fonts and context dependent shapes of characters and their position with respect to the base line. This paper addresses problems recognizing Nasakh script of Urdu Language. The proposed system takes segmented character as input and recognizes them in two steps. In the first step the different shapes of each character are classifies into 33 categories using Kohonen Self-organizing Map (SOM) by auto clustering similar ligatures for initial classification. During the Feature Extraction phase more than twenty five different features are extracted from each character which are further processed for final character recognition.\",\"PeriodicalId\":307661,\"journal\":{\"name\":\"2009 International Conference on Emerging Technologies\",\"volume\":\"79 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 International Conference on Emerging Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICET.2009.5353161\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 International Conference on Emerging Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICET.2009.5353161","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

摘要

乌尔都语字符识别研究面临的挑战主要是由于乌尔都语的草书性质、多种字体、字符形状与上下文相关以及相对于基线的位置等特点。本文研究了乌尔都语纳萨克文的识别问题。该系统以分段字符为输入，分两步进行识别。第一步，利用Kohonen自组织图(SOM)对不同形状的字符进行分类，通过自动聚类相似连接进行初始分类。在特征提取阶段，从每个字符中提取25个以上不同的特征，并对这些特征进行进一步处理，以实现最终的字符识别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Self Organizing Map based Urdu Nasakh character recognition

Research in the field of character recognition for Urdu script faces challenges mainly due to its characteristics, like cursive nature, multiple fonts and context dependent shapes of characters and their position with respect to the base line. This paper addresses problems recognizing Nasakh script of Urdu Language. The proposed system takes segmented character as input and recognizes them in two steps. In the first step the different shapes of each character are classifies into 33 categories using Kohonen Self-organizing Map (SOM) by auto clustering similar ligatures for initial classification. During the Feature Extraction phase more than twenty five different features are extracted from each character which are further processed for final character recognition.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2009 International Conference on Emerging Technologies

自引率

0.00%

发文量