{"title":"Why error measures are sub-optimal for training neural network pattern classifiers","authors":"J. Hampshire, B. V. Vijaya Kumar","doi":"10.1109/IJCNN.1992.227338","DOIUrl":null,"url":null,"abstract":"Pattern classifiers that are trained in a supervised fashion are typically trained with an error measure objective function such as mean-squared error (MSE) or cross-entropy (CE). These classifiers can in theory yield Bayesian discrimination, but in practice they often fail to do so. The authors explain why this happens and identify a number of characteristics that the optimal objective function for training classifiers must have. They show that classification figures of merit (CFM/sub mono/) possess these optimal characteristics, whereas error measures such as MSE and CE do not. The arguments are illustrated with a simple example in which a CFM/sub mono/-trained low-order polynomial neural network approximates Bayesian discrimination on a random scalar with the fewest number of training samples and the minimum functional complexity necessary for the task. A comparable MSE-trained net yields significantly worse discrimination on the same task.<<ETX>>","PeriodicalId":286849,"journal":{"name":"[Proceedings 1992] IJCNN International Joint Conference on Neural Networks","volume":"2014 12","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1992-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"[Proceedings 1992] IJCNN International Joint Conference on Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.1992.227338","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Pattern classifiers that are trained in a supervised fashion are typically trained with an error measure objective function such as mean-squared error (MSE) or cross-entropy (CE). These classifiers can in theory yield Bayesian discrimination, but in practice they often fail to do so. The authors explain why this happens and identify a number of characteristics that the optimal objective function for training classifiers must have. They show that classification figures of merit (CFM/sub mono/) possess these optimal characteristics, whereas error measures such as MSE and CE do not. The arguments are illustrated with a simple example in which a CFM/sub mono/-trained low-order polynomial neural network approximates Bayesian discrimination on a random scalar with the fewest number of training samples and the minimum functional complexity necessary for the task. A comparable MSE-trained net yields significantly worse discrimination on the same task.<>