旋转森林和随机预言器:两种分类器集成方法

Juan José Rodríguez Diez
{"title":"旋转森林和随机预言器:两种分类器集成方法","authors":"Juan José Rodríguez Diez","doi":"10.1109/CBMS.2007.94","DOIUrl":null,"url":null,"abstract":"Classification methods are widely used in computer-based medical systems. Often, the accuracy of a classifier can be improved using a classifier ensemble, the combination of several classifiers. Two classifiers ensembles and their results on several medical data sets will be presented: Rotation Forest (Rodriguez, Kuncheva and Alonso) and Random Oracles (Kuncheva and Rodriguez). Rotation Forest is a method for generating classifier ensembles based on feature extraction. To create the training data for a base classifier, the feature set is randomly split into K subsets (K is a parameter of the algorithm) and Principal Component Analysis (PCA) is applied to each subset. All principal components are retained in order to preserve the variability information in the data. Thus, K axis rotations take place to form the new features for a base classifier. The idea of the rotation approach is to encourage simultaneously individual accuracy and diversity within the ensemble. Diversity is promoted through the feature extraction for each base classifier. Decision trees were chosen here because they are sensitive to rotation of the feature axes, hence the name \"forest.\" Accuracy is sought by keeping all principal components and also using the whole data set to train each base classifier. Comparisons with various standard ensemble methods (Bagging, AdaBoost, and Random Forest) will be reported. Diversity-error diagrams reveal that Rotation Forest ensembles construct individual classifiers which are more accurate than these in AdaBoost and Random Forest and more diverse than these in Bagging, sometimes more accurate as well. A random oracle classifier is a mini-ensemble formed by a pair of classifiers and a fixed, randomly created oracle that selects between them. The random oracle can be thought of as a random discriminant function which splits the data into two subsets with no regard of any class labels or cluster structure. Two random oracles has been considered: linear and spherical. A random oracle classifier can be used as the base classifier of any ensemble method. It is argued that this approach encourages extra diversity in the ensemble while allowing for high accuracy of the individual ensemble members. Experiments with several data sets from UCI and 11 ensemble models will be reported. Each ensemble model will be examined with and without the oracle. The results will show that all ensemble methods benefited from the new approach, most markedly so random subspace and bagging. A further experiment with seven real medical data sets will demonstrate the validity of these findings outside the UCI data collection. When using Naive Bayes Classifiers as base classifiers, the experiments show that ensembles based solely upon the spherical oracle (and no other ensemble heuristic) outrank Bagging, Wagging, Random Subspaces, AdaBoost.Ml, MultiBoost and Decorate. Moreover, all these ensemble methods are better with any of the two random oracles than their standard versions without the oracles.","PeriodicalId":74567,"journal":{"name":"Proceedings. IEEE International Symposium on Computer-Based Medical Systems","volume":"178 1","pages":"3"},"PeriodicalIF":0.0000,"publicationDate":"2007-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Rotation Forest and Random Oracles: Two Classifier Ensemble Methods\",\"authors\":\"Juan José Rodríguez Diez\",\"doi\":\"10.1109/CBMS.2007.94\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Classification methods are widely used in computer-based medical systems. Often, the accuracy of a classifier can be improved using a classifier ensemble, the combination of several classifiers. Two classifiers ensembles and their results on several medical data sets will be presented: Rotation Forest (Rodriguez, Kuncheva and Alonso) and Random Oracles (Kuncheva and Rodriguez). Rotation Forest is a method for generating classifier ensembles based on feature extraction. To create the training data for a base classifier, the feature set is randomly split into K subsets (K is a parameter of the algorithm) and Principal Component Analysis (PCA) is applied to each subset. All principal components are retained in order to preserve the variability information in the data. Thus, K axis rotations take place to form the new features for a base classifier. The idea of the rotation approach is to encourage simultaneously individual accuracy and diversity within the ensemble. Diversity is promoted through the feature extraction for each base classifier. Decision trees were chosen here because they are sensitive to rotation of the feature axes, hence the name \\\"forest.\\\" Accuracy is sought by keeping all principal components and also using the whole data set to train each base classifier. Comparisons with various standard ensemble methods (Bagging, AdaBoost, and Random Forest) will be reported. Diversity-error diagrams reveal that Rotation Forest ensembles construct individual classifiers which are more accurate than these in AdaBoost and Random Forest and more diverse than these in Bagging, sometimes more accurate as well. A random oracle classifier is a mini-ensemble formed by a pair of classifiers and a fixed, randomly created oracle that selects between them. The random oracle can be thought of as a random discriminant function which splits the data into two subsets with no regard of any class labels or cluster structure. Two random oracles has been considered: linear and spherical. A random oracle classifier can be used as the base classifier of any ensemble method. It is argued that this approach encourages extra diversity in the ensemble while allowing for high accuracy of the individual ensemble members. Experiments with several data sets from UCI and 11 ensemble models will be reported. Each ensemble model will be examined with and without the oracle. The results will show that all ensemble methods benefited from the new approach, most markedly so random subspace and bagging. A further experiment with seven real medical data sets will demonstrate the validity of these findings outside the UCI data collection. When using Naive Bayes Classifiers as base classifiers, the experiments show that ensembles based solely upon the spherical oracle (and no other ensemble heuristic) outrank Bagging, Wagging, Random Subspaces, AdaBoost.Ml, MultiBoost and Decorate. Moreover, all these ensemble methods are better with any of the two random oracles than their standard versions without the oracles.\",\"PeriodicalId\":74567,\"journal\":{\"name\":\"Proceedings. IEEE International Symposium on Computer-Based Medical Systems\",\"volume\":\"178 1\",\"pages\":\"3\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. IEEE International Symposium on Computer-Based Medical Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CBMS.2007.94\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Symposium on Computer-Based Medical Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBMS.2007.94","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

摘要

分类方法广泛应用于基于计算机的医疗系统中。通常,可以使用分类器集成(多个分类器的组合)来提高分类器的准确性。将介绍两种分类器集合及其在若干医疗数据集上的结果:轮换森林(Rodriguez, Kuncheva和Alonso)和随机预言器(Kuncheva和Rodriguez)。旋转森林是一种基于特征提取的分类器集成生成方法。为了创建基分类器的训练数据,将特征集随机分成K个子集(K是算法的一个参数),并对每个子集应用主成分分析(PCA)。为了保留数据中的变异性信息,保留了所有主成分。因此,发生K轴旋转以形成基本分类器的新特征。旋转方法的想法是同时鼓励个人的准确性和多样性在整体。通过对每个基分类器的特征提取来提升多样性。这里选择决策树是因为它们对特征轴的旋转很敏感,因此被称为“森林”。准确性是通过保留所有主成分和使用整个数据集来训练每个基分类器来寻求的。将报告与各种标准集成方法(Bagging, AdaBoost和Random Forest)的比较。多样性误差图显示,旋转森林集成构建的单个分类器比AdaBoost和Random Forest中的分类器更准确,比Bagging中的分类器更多样化,有时也更准确。随机oracle分类器是由一对分类器和一个固定的、随机创建的、在它们之间进行选择的oracle组成的小型集合。随机oracle可以被认为是一个随机判别函数,它将数据分成两个子集,而不考虑任何类标签或聚类结构。考虑了两种随机的神谕:线性的和球形的。随机oracle分类器可以作为任何集成方法的基础分类器。有人认为,这种方法鼓励了集合中额外的多样性,同时允许单个集合成员的高精度。本文将报道使用来自UCI和11个集成模型的几个数据集的实验。每个集成模型将在有或没有oracle的情况下进行检查。结果表明,所有的集成方法都受益于新方法,其中最明显的是随机子空间和套袋。对七个真实医疗数据集的进一步实验将证明这些发现在UCI数据收集之外的有效性。当使用朴素贝叶斯分类器作为基本分类器时,实验表明,仅基于球形预测(而没有其他集成启发式)的集成优于Bagging, Wagging, Random Subspaces, AdaBoost。Ml,多重增强和装饰。此外,所有这些集成方法使用任意两种随机oracle都比不使用oracle的标准版本要好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Rotation Forest and Random Oracles: Two Classifier Ensemble Methods
Classification methods are widely used in computer-based medical systems. Often, the accuracy of a classifier can be improved using a classifier ensemble, the combination of several classifiers. Two classifiers ensembles and their results on several medical data sets will be presented: Rotation Forest (Rodriguez, Kuncheva and Alonso) and Random Oracles (Kuncheva and Rodriguez). Rotation Forest is a method for generating classifier ensembles based on feature extraction. To create the training data for a base classifier, the feature set is randomly split into K subsets (K is a parameter of the algorithm) and Principal Component Analysis (PCA) is applied to each subset. All principal components are retained in order to preserve the variability information in the data. Thus, K axis rotations take place to form the new features for a base classifier. The idea of the rotation approach is to encourage simultaneously individual accuracy and diversity within the ensemble. Diversity is promoted through the feature extraction for each base classifier. Decision trees were chosen here because they are sensitive to rotation of the feature axes, hence the name "forest." Accuracy is sought by keeping all principal components and also using the whole data set to train each base classifier. Comparisons with various standard ensemble methods (Bagging, AdaBoost, and Random Forest) will be reported. Diversity-error diagrams reveal that Rotation Forest ensembles construct individual classifiers which are more accurate than these in AdaBoost and Random Forest and more diverse than these in Bagging, sometimes more accurate as well. A random oracle classifier is a mini-ensemble formed by a pair of classifiers and a fixed, randomly created oracle that selects between them. The random oracle can be thought of as a random discriminant function which splits the data into two subsets with no regard of any class labels or cluster structure. Two random oracles has been considered: linear and spherical. A random oracle classifier can be used as the base classifier of any ensemble method. It is argued that this approach encourages extra diversity in the ensemble while allowing for high accuracy of the individual ensemble members. Experiments with several data sets from UCI and 11 ensemble models will be reported. Each ensemble model will be examined with and without the oracle. The results will show that all ensemble methods benefited from the new approach, most markedly so random subspace and bagging. A further experiment with seven real medical data sets will demonstrate the validity of these findings outside the UCI data collection. When using Naive Bayes Classifiers as base classifiers, the experiments show that ensembles based solely upon the spherical oracle (and no other ensemble heuristic) outrank Bagging, Wagging, Random Subspaces, AdaBoost.Ml, MultiBoost and Decorate. Moreover, all these ensemble methods are better with any of the two random oracles than their standard versions without the oracles.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Automated Design of Task-Dedicated Illumination with Particle Swarm Optimization Automatic Polyp Segmentation with Multiple Kernel Dilated Convolution Network. Video Capsule Endoscopy Classification using Focal Modulation Guided Convolutional Neural Network. A Gamification-Based Framework for mHealth Developers in the Context of Self-Care Mental Health Ubiquitous Monitoring: Detecting Context-Enriched Sociability Patterns Through Complex Event Processing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1