Confused Distance Maximization for Large Category Dimensionality Reduction

2012 International Conference on Frontiers in Handwriting Recognition Pub Date : 2012-09-18 DOI:10.1109/ICFHR.2012.196

Xu-Yao Zhang, Cheng-Lin Liu

{"title":"Confused Distance Maximization for Large Category Dimensionality Reduction","authors":"Xu-Yao Zhang, Cheng-Lin Liu","doi":"10.1109/ICFHR.2012.196","DOIUrl":null,"url":null,"abstract":"The Fisher linear discriminant analysis (FDA) is the most well-known supervised dimensionality reduction model. However, when the number of classes is much larger than the reduced dimensionality, FDA suffers from the class separation problem in that it will preserve the distances of the already well-separated classes and cause a large overlap of neighboring classes. To cope with this problem, we propose a new model called confused distance maximization (CDM). The objective of CDM is to maximize the distance of the most confusable classes, according to the confusion matrix estimated from the training data with a pre-learned classifier. Compared with FDA that maximizes the sum of the distances of all class pairs, CDM is more relevant to the classification accuracy by weighting the pairwise distance according to the confusion matrix. Furthermore, CDM is computationally inexpensive which makes it indeed efficient and effective for large category problems. Experiments on two large-scale 3,755-class Chinese handwriting databases (offline and online) demonstrate that CDM can achieve the best performance compared with FDA and other competitive weighting based criteria.","PeriodicalId":291062,"journal":{"name":"2012 International Conference on Frontiers in Handwriting Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Frontiers in Handwriting Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFHR.2012.196","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

The Fisher linear discriminant analysis (FDA) is the most well-known supervised dimensionality reduction model. However, when the number of classes is much larger than the reduced dimensionality, FDA suffers from the class separation problem in that it will preserve the distances of the already well-separated classes and cause a large overlap of neighboring classes. To cope with this problem, we propose a new model called confused distance maximization (CDM). The objective of CDM is to maximize the distance of the most confusable classes, according to the confusion matrix estimated from the training data with a pre-learned classifier. Compared with FDA that maximizes the sum of the distances of all class pairs, CDM is more relevant to the classification accuracy by weighting the pairwise distance according to the confusion matrix. Furthermore, CDM is computationally inexpensive which makes it indeed efficient and effective for large category problems. Experiments on two large-scale 3,755-class Chinese handwriting databases (offline and online) demonstrate that CDM can achieve the best performance compared with FDA and other competitive weighting based criteria.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

大类别降维的混淆距离最大化

Fisher线性判别分析(FDA)是最著名的监督降维模型。然而，当类的数量远远大于降维数时，FDA就会遇到类分离问题，因为它会保留已经很好分离的类的距离，并导致相邻类的大量重叠。为了解决这个问题，我们提出了一个新的模型，称为混淆距离最大化(CDM)。CDM的目标是根据预学习分类器从训练数据中估计的混淆矩阵，最大化最容易混淆的类的距离。与最大化所有类对距离之和的FDA相比，CDM通过根据混淆矩阵对成对距离进行加权，与分类精度更相关。此外，CDM计算成本低，这使得它对大类别问题确实是高效和有效的。在两个大型3755类中文手写数据库(离线和在线)上的实验表明，与FDA和其他基于权重的竞争标准相比，CDM可以获得最好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2012 International Conference on Frontiers in Handwriting Recognition

自引率

0.00%

发文量