An SVM-Based High-accurate Recognition Approach for Handwritten Numerals by Using Difference Features

Kaizhu Huang, Jun Sun, Y. Hotta, K. Fujimoto, S. Naoi
{"title":"An SVM-Based High-accurate Recognition Approach for Handwritten Numerals by Using Difference Features","authors":"Kaizhu Huang, Jun Sun, Y. Hotta, K. Fujimoto, S. Naoi","doi":"10.1109/ICDAR.2007.57","DOIUrl":null,"url":null,"abstract":"Handwritten numeral recognition is an important pattern recognition task. It can be widely used in various domains, e.g., bank money recognition, which requires a very high recognition rate. As a state-of-the-art classifier, support vector machine (SVM), has been extensively used in this area. Typically, SVM is trained in a batch model, i.e., all data points are simultaneously input for training the classification boundary. However, some slightly exceptional data, only accounting for a small proportion, are critical for the recognition rates. Training a classifier among all the data may possibly treat such legal but slightly exceptional samples as \"noise \". In this paper, we propose a novel approach to attack this problem. This approach exploits a two-stage framework by using difference features. In the first stage, a regular SVM is trained on all the training data; in the second stage, only the samples misclassified in the first stage are specially considered. Therefore, the performance can be lifted. The number of misclassifications is often small because of the good performance of SVM. This will present difficulties in training an accurate SVM engine only for these misclassified samples. We then further propose a multi-way to binary approach using difference features. This approach successfully transforms multi-category classification to binary classification and expands the training samples greatly. In order to evaluate the proposed method, experiments are performed on 10,000 handwritten numeral samples extracted from real banks forms. This new algorithm achieves 99.0% accuracy. In comparison, the traditional SVM only gets 98.4%.","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2007.57","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Handwritten numeral recognition is an important pattern recognition task. It can be widely used in various domains, e.g., bank money recognition, which requires a very high recognition rate. As a state-of-the-art classifier, support vector machine (SVM), has been extensively used in this area. Typically, SVM is trained in a batch model, i.e., all data points are simultaneously input for training the classification boundary. However, some slightly exceptional data, only accounting for a small proportion, are critical for the recognition rates. Training a classifier among all the data may possibly treat such legal but slightly exceptional samples as "noise ". In this paper, we propose a novel approach to attack this problem. This approach exploits a two-stage framework by using difference features. In the first stage, a regular SVM is trained on all the training data; in the second stage, only the samples misclassified in the first stage are specially considered. Therefore, the performance can be lifted. The number of misclassifications is often small because of the good performance of SVM. This will present difficulties in training an accurate SVM engine only for these misclassified samples. We then further propose a multi-way to binary approach using difference features. This approach successfully transforms multi-category classification to binary classification and expands the training samples greatly. In order to evaluate the proposed method, experiments are performed on 10,000 handwritten numeral samples extracted from real banks forms. This new algorithm achieves 99.0% accuracy. In comparison, the traditional SVM only gets 98.4%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于svm的差分特征手写体数字高精度识别方法
手写体数字识别是一项重要的模式识别任务。它可以广泛应用于各个领域,例如对识别率要求很高的银行货币识别。支持向量机作为一种最新的分类器,在这一领域得到了广泛的应用。SVM通常采用批处理模型进行训练,即同时输入所有数据点训练分类边界。然而,一些稍微异常的数据,只占很小的比例,对识别率至关重要。在所有数据中训练分类器可能会将这些合法但略有例外的样本视为“噪声”。在本文中,我们提出了一种新的方法来解决这个问题。这种方法通过使用差异特征来利用两阶段框架。第一阶段,在所有训练数据上训练一个正则支持向量机;在第二阶段,只考虑第一阶段误分类的样本。因此,性能可以提升。由于支持向量机的良好性能,错误分类的数量往往很少。这将给只针对这些错误分类的样本训练准确的SVM引擎带来困难。然后,我们进一步提出了一种利用差分特征的多向二值化方法。该方法成功地将多类别分类转化为二值分类,极大地扩展了训练样本。为了评估所提出的方法,对从真实银行表格中提取的10,000个手写数字样本进行了实验。该算法的准确率达到99.0%。相比之下,传统支持向量机的准确率只有98.4%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Language-Based Feature Extraction Using Template-Matching in Farsi/Arabic Handwritten Numeral Recognition A Method of Annotation Extraction from Paper Documents Using Alignment Based on Local Arrangements of Feature Points PRAAD: Preprocessing and Analysis Tool for Arabic Ancient Documents A New Vectorial Signature for Quick Symbol Indexing, Filtering and Recognition Online Handwritten Japanese Character String Recognition Incorporating Geometric Context
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1