基于RBFNN的L-GEM的失衡问题主动学习

2012 International Conference on Machine Learning and Cybernetics Pub Date : 2012-07-15 DOI:10.1109/ICMLC.2012.6358972

Junjie Hu

{"title":"基于RBFNN的L-GEM的失衡问题主动学习","authors":"Junjie Hu","doi":"10.1109/ICMLC.2012.6358972","DOIUrl":null,"url":null,"abstract":"In lots of important applications, such as malignant cell detection, network intrusion detection, error signal detection in power system, the data distributions of positive and negative classes are usually imbalance. Many classifiers could not perform well in data imbalance cases. The major problem is that classifiers tend to ignore samples and accuracy of the minority class without regarding the higher cost of misclassification in this minor class. Therefore, pattern classification for imbalance data becomes a hot challenge to both academy and industry. In this paper, we propose an active learning method for imbalance data using a stochastic sensitivity measure (ST-SM) of Radial Basis Function Neural Network (RBFNN). A large ST-SM indicates the RBFNN is uncertain and yields a large output fluctuation around a particular sample. These samples yielding large ST-SM values are selected for adding to the training set in each turn. Empirically, samples with large output perturbation (i.e. large ST-SM) should be located near the classification boundary and is of great significance for the training of classifier. As for the imbalance characteristic of the data set, the ST-SM should be able to reduce the number of redundant samples being selected in the majority class, rebalance the sample distribution of the training set, and finally improve the performance of the classifier.","PeriodicalId":128006,"journal":{"name":"2012 International Conference on Machine Learning and Cybernetics","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Active learning for imbalance problem using L-GEM of RBFNN\",\"authors\":\"Junjie Hu\",\"doi\":\"10.1109/ICMLC.2012.6358972\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In lots of important applications, such as malignant cell detection, network intrusion detection, error signal detection in power system, the data distributions of positive and negative classes are usually imbalance. Many classifiers could not perform well in data imbalance cases. The major problem is that classifiers tend to ignore samples and accuracy of the minority class without regarding the higher cost of misclassification in this minor class. Therefore, pattern classification for imbalance data becomes a hot challenge to both academy and industry. In this paper, we propose an active learning method for imbalance data using a stochastic sensitivity measure (ST-SM) of Radial Basis Function Neural Network (RBFNN). A large ST-SM indicates the RBFNN is uncertain and yields a large output fluctuation around a particular sample. These samples yielding large ST-SM values are selected for adding to the training set in each turn. Empirically, samples with large output perturbation (i.e. large ST-SM) should be located near the classification boundary and is of great significance for the training of classifier. As for the imbalance characteristic of the data set, the ST-SM should be able to reduce the number of redundant samples being selected in the majority class, rebalance the sample distribution of the training set, and finally improve the performance of the classifier.\",\"PeriodicalId\":128006,\"journal\":{\"name\":\"2012 International Conference on Machine Learning and Cybernetics\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 International Conference on Machine Learning and Cybernetics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLC.2012.6358972\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Machine Learning and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC.2012.6358972","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

在恶性细胞检测、网络入侵检测、电力系统错误信号检测等重要应用中，正、负类数据的分布往往不平衡。许多分类器在数据不平衡的情况下表现不佳。主要的问题是，分类器倾向于忽略少数类的样本和准确性，而不考虑在这个少数类中错误分类的更高成本。因此，失衡数据的模式分类成为学术界和业界共同关注的热点问题。本文提出了一种基于径向基函数神经网络(RBFNN)的随机灵敏度测量(ST-SM)的不平衡数据主动学习方法。较大的ST-SM表明RBFNN是不确定的，并且在特定样本周围产生较大的输出波动。这些产生较大ST-SM值的样本被选择添加到每一轮的训练集中。经验上，输出扰动大的样本(即ST-SM大)应该位于分类边界附近，这对分类器的训练有重要意义。对于数据集的不平衡特性，ST-SM应该能够减少多数类中被选择的冗余样本数量，重新平衡训练集的样本分布，最终提高分类器的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Active learning for imbalance problem using L-GEM of RBFNN

In lots of important applications, such as malignant cell detection, network intrusion detection, error signal detection in power system, the data distributions of positive and negative classes are usually imbalance. Many classifiers could not perform well in data imbalance cases. The major problem is that classifiers tend to ignore samples and accuracy of the minority class without regarding the higher cost of misclassification in this minor class. Therefore, pattern classification for imbalance data becomes a hot challenge to both academy and industry. In this paper, we propose an active learning method for imbalance data using a stochastic sensitivity measure (ST-SM) of Radial Basis Function Neural Network (RBFNN). A large ST-SM indicates the RBFNN is uncertain and yields a large output fluctuation around a particular sample. These samples yielding large ST-SM values are selected for adding to the training set in each turn. Empirically, samples with large output perturbation (i.e. large ST-SM) should be located near the classification boundary and is of great significance for the training of classifier. As for the imbalance characteristic of the data set, the ST-SM should be able to reduce the number of redundant samples being selected in the majority class, rebalance the sample distribution of the training set, and finally improve the performance of the classifier.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 International Conference on Machine Learning and Cybernetics

自引率

0.00%

发文量