An SVM-Based High-accurate Recognition Approach for Handwritten Numerals by Using Difference Features

Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Pub Date : 2007-09-23 DOI:10.1109/ICDAR.2007.57

Kaizhu Huang, Jun Sun, Y. Hotta, K. Fujimoto, S. Naoi

{"title":"An SVM-Based High-accurate Recognition Approach for Handwritten Numerals by Using Difference Features","authors":"Kaizhu Huang, Jun Sun, Y. Hotta, K. Fujimoto, S. Naoi","doi":"10.1109/ICDAR.2007.57","DOIUrl":null,"url":null,"abstract":"Handwritten numeral recognition is an important pattern recognition task. It can be widely used in various domains, e.g., bank money recognition, which requires a very high recognition rate. As a state-of-the-art classifier, support vector machine (SVM), has been extensively used in this area. Typically, SVM is trained in a batch model, i.e., all data points are simultaneously input for training the classification boundary. However, some slightly exceptional data, only accounting for a small proportion, are critical for the recognition rates. Training a classifier among all the data may possibly treat such legal but slightly exceptional samples as \"noise \". In this paper, we propose a novel approach to attack this problem. This approach exploits a two-stage framework by using difference features. In the first stage, a regular SVM is trained on all the training data; in the second stage, only the samples misclassified in the first stage are specially considered. Therefore, the performance can be lifted. The number of misclassifications is often small because of the good performance of SVM. This will present difficulties in training an accurate SVM engine only for these misclassified samples. We then further propose a multi-way to binary approach using difference features. This approach successfully transforms multi-category classification to binary classification and expands the training samples greatly. In order to evaluate the proposed method, experiments are performed on 10,000 handwritten numeral samples extracted from real banks forms. This new algorithm achieves 99.0% accuracy. In comparison, the traditional SVM only gets 98.4%.","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2007.57","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Handwritten numeral recognition is an important pattern recognition task. It can be widely used in various domains, e.g., bank money recognition, which requires a very high recognition rate. As a state-of-the-art classifier, support vector machine (SVM), has been extensively used in this area. Typically, SVM is trained in a batch model, i.e., all data points are simultaneously input for training the classification boundary. However, some slightly exceptional data, only accounting for a small proportion, are critical for the recognition rates. Training a classifier among all the data may possibly treat such legal but slightly exceptional samples as "noise ". In this paper, we propose a novel approach to attack this problem. This approach exploits a two-stage framework by using difference features. In the first stage, a regular SVM is trained on all the training data; in the second stage, only the samples misclassified in the first stage are specially considered. Therefore, the performance can be lifted. The number of misclassifications is often small because of the good performance of SVM. This will present difficulties in training an accurate SVM engine only for these misclassified samples. We then further propose a multi-way to binary approach using difference features. This approach successfully transforms multi-category classification to binary classification and expands the training samples greatly. In order to evaluate the proposed method, experiments are performed on 10,000 handwritten numeral samples extracted from real banks forms. This new algorithm achieves 99.0% accuracy. In comparison, the traditional SVM only gets 98.4%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于svm的差分特征手写体数字高精度识别方法

手写体数字识别是一项重要的模式识别任务。它可以广泛应用于各个领域，例如对识别率要求很高的银行货币识别。支持向量机作为一种最新的分类器，在这一领域得到了广泛的应用。SVM通常采用批处理模型进行训练，即同时输入所有数据点训练分类边界。然而，一些稍微异常的数据，只占很小的比例，对识别率至关重要。在所有数据中训练分类器可能会将这些合法但略有例外的样本视为“噪声”。在本文中，我们提出了一种新的方法来解决这个问题。这种方法通过使用差异特征来利用两阶段框架。第一阶段，在所有训练数据上训练一个正则支持向量机;在第二阶段，只考虑第一阶段误分类的样本。因此，性能可以提升。由于支持向量机的良好性能，错误分类的数量往往很少。这将给只针对这些错误分类的样本训练准确的SVM引擎带来困难。然后，我们进一步提出了一种利用差分特征的多向二值化方法。该方法成功地将多类别分类转化为二值分类，极大地扩展了训练样本。为了评估所提出的方法，对从真实银行表格中提取的10,000个手写数字样本进行了实验。该算法的准确率达到99.0%。相比之下，传统支持向量机的准确率只有98.4%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)

自引率

0.00%

发文量