{"title":"基于贝叶斯误差条的数据选择","authors":"S. Cho, S. Choi, P. Wong","doi":"10.1109/ICONIP.1999.844025","DOIUrl":null,"url":null,"abstract":"Outliers, noise and data density imbalance, present in most real world data, render it difficult to properly train neural networks. Conventionally residual analysis was used to detect outliers. When used with neural networks, however, the procedure is computationally costly. The authors propose an efficient heuristic data selection method that is based on Bayesian error bars. After a neural network is trained, the residual and error bar are computed for each data. The data that correspond to large residual or large error bars are removed from the training data set. The remaining data are then used to further train the network. The proposed approach was applied to two real world problems: rock porosity and permeability prediction problems in reservoir engineering, with a significant generalization performance improvement of 30-55%. This preliminary result suggests that the approach deserves further investigation.","PeriodicalId":237855,"journal":{"name":"ICONIP'99. ANZIIS'99 & ANNES'99 & ACNN'99. 6th International Conference on Neural Information Processing. Proceedings (Cat. No.99EX378)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Data selection based on Bayesian error bar\",\"authors\":\"S. Cho, S. Choi, P. Wong\",\"doi\":\"10.1109/ICONIP.1999.844025\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Outliers, noise and data density imbalance, present in most real world data, render it difficult to properly train neural networks. Conventionally residual analysis was used to detect outliers. When used with neural networks, however, the procedure is computationally costly. The authors propose an efficient heuristic data selection method that is based on Bayesian error bars. After a neural network is trained, the residual and error bar are computed for each data. The data that correspond to large residual or large error bars are removed from the training data set. The remaining data are then used to further train the network. The proposed approach was applied to two real world problems: rock porosity and permeability prediction problems in reservoir engineering, with a significant generalization performance improvement of 30-55%. This preliminary result suggests that the approach deserves further investigation.\",\"PeriodicalId\":237855,\"journal\":{\"name\":\"ICONIP'99. ANZIIS'99 & ANNES'99 & ACNN'99. 6th International Conference on Neural Information Processing. Proceedings (Cat. No.99EX378)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1999-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICONIP'99. ANZIIS'99 & ANNES'99 & ACNN'99. 6th International Conference on Neural Information Processing. Proceedings (Cat. No.99EX378)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICONIP.1999.844025\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICONIP'99. ANZIIS'99 & ANNES'99 & ACNN'99. 6th International Conference on Neural Information Processing. Proceedings (Cat. No.99EX378)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICONIP.1999.844025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Outliers, noise and data density imbalance, present in most real world data, render it difficult to properly train neural networks. Conventionally residual analysis was used to detect outliers. When used with neural networks, however, the procedure is computationally costly. The authors propose an efficient heuristic data selection method that is based on Bayesian error bars. After a neural network is trained, the residual and error bar are computed for each data. The data that correspond to large residual or large error bars are removed from the training data set. The remaining data are then used to further train the network. The proposed approach was applied to two real world problems: rock porosity and permeability prediction problems in reservoir engineering, with a significant generalization performance improvement of 30-55%. This preliminary result suggests that the approach deserves further investigation.