{"title":"Improving generalization performance by information minimization","authors":"R. Kamimura, T. Takagi, S. Nakanishi","doi":"10.1109/ICNN.1994.374153","DOIUrl":null,"url":null,"abstract":"In this paper, we attempt to show that the information stored in networks must be as small as possible for the improvement of the generalization performance under the condition that the networks can produce targets with appropriate accuracy. The information is defined by the difference between maximum entropy or uncertainty and observed entropy. Borrowing a definition of fuzzy entropy, the uncertainty function is defined for the internal representation and represented by the equation: -/spl upsi//sub i/ log /spl upsi//sub i/-(1-/spl upsi//sub i/) log (1-/spl upsi//sub i/), where /spl upsi//sub i/ is a hidden unit activity. After having formulated an update rule for the minimization of the information, we applied the method to a problem of language acquisition: the inference of the past tense forms of regular verbs. Experimental results confirmed that by our method, the information was significantly decreased and the generalization performance was greatly improved.<<ETX>>","PeriodicalId":209128,"journal":{"name":"Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"42","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNN.1994.374153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 42
Abstract
In this paper, we attempt to show that the information stored in networks must be as small as possible for the improvement of the generalization performance under the condition that the networks can produce targets with appropriate accuracy. The information is defined by the difference between maximum entropy or uncertainty and observed entropy. Borrowing a definition of fuzzy entropy, the uncertainty function is defined for the internal representation and represented by the equation: -/spl upsi//sub i/ log /spl upsi//sub i/-(1-/spl upsi//sub i/) log (1-/spl upsi//sub i/), where /spl upsi//sub i/ is a hidden unit activity. After having formulated an update rule for the minimization of the information, we applied the method to a problem of language acquisition: the inference of the past tense forms of regular verbs. Experimental results confirmed that by our method, the information was significantly decreased and the generalization performance was greatly improved.<>