如何提高边际也能提高分类器的复杂度

Proceedings of the 23rd international conference on Machine learning Pub Date : 2006-06-25 DOI:10.1145/1143844.1143939

L. Reyzin, R. Schapire

{"title":"如何提高边际也能提高分类器的复杂度","authors":"L. Reyzin, R. Schapire","doi":"10.1145/1143844.1143939","DOIUrl":null,"url":null,"abstract":"Boosting methods are known not to usually overfit training data even as the size of the generated classifiers becomes large. Schapire et al. attempted to explain this phenomenon in terms of the margins the classifier achieves on training examples. Later, however, Breiman cast serious doubt on this explanation by introducing a boosting algorithm, arc-gv, that can generate a higher margins distribution than AdaBoost and yet performs worse. In this paper, we take a close look at Breiman's compelling but puzzling results. Although we can reproduce his main finding, we find that the poorer performance of arc-gv can be explained by the increased complexity of the base classifiers it uses, an explanation supported by our experiments and entirely consistent with the margins theory. Thus, we find maximizing the margins is desirable, but not necessarily at the expense of other factors, especially base-classifier complexity.","PeriodicalId":124011,"journal":{"name":"Proceedings of the 23rd international conference on Machine learning","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"238","resultStr":"{\"title\":\"How boosting the margin can also boost classifier complexity\",\"authors\":\"L. Reyzin, R. Schapire\",\"doi\":\"10.1145/1143844.1143939\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Boosting methods are known not to usually overfit training data even as the size of the generated classifiers becomes large. Schapire et al. attempted to explain this phenomenon in terms of the margins the classifier achieves on training examples. Later, however, Breiman cast serious doubt on this explanation by introducing a boosting algorithm, arc-gv, that can generate a higher margins distribution than AdaBoost and yet performs worse. In this paper, we take a close look at Breiman's compelling but puzzling results. Although we can reproduce his main finding, we find that the poorer performance of arc-gv can be explained by the increased complexity of the base classifiers it uses, an explanation supported by our experiments and entirely consistent with the margins theory. Thus, we find maximizing the margins is desirable, but not necessarily at the expense of other factors, especially base-classifier complexity.\",\"PeriodicalId\":124011,\"journal\":{\"name\":\"Proceedings of the 23rd international conference on Machine learning\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"238\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 23rd international conference on Machine learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1143844.1143939\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 23rd international conference on Machine learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1143844.1143939","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 238

摘要

众所周知，即使生成的分类器变得很大，增强方法通常也不会过度拟合训练数据。Schapire等人试图用分类器在训练样本上获得的边际来解释这种现象。然而，后来Breiman对这一解释提出了严重的质疑，他引入了一种增强算法arc-gv，该算法可以产生比AdaBoost更高的利润率分布，但性能却更差。在本文中，我们仔细研究了Breiman令人信服但令人困惑的结果。虽然我们可以重现他的主要发现，但我们发现arc-gv较差的性能可以通过它使用的基本分类器的复杂性增加来解释，我们的实验支持这一解释，并且与边际理论完全一致。因此，我们发现最大化边际是可取的，但不一定以牺牲其他因素为代价，特别是基分类器的复杂性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

How boosting the margin can also boost classifier complexity

Boosting methods are known not to usually overfit training data even as the size of the generated classifiers becomes large. Schapire et al. attempted to explain this phenomenon in terms of the margins the classifier achieves on training examples. Later, however, Breiman cast serious doubt on this explanation by introducing a boosting algorithm, arc-gv, that can generate a higher margins distribution than AdaBoost and yet performs worse. In this paper, we take a close look at Breiman's compelling but puzzling results. Although we can reproduce his main finding, we find that the poorer performance of arc-gv can be explained by the increased complexity of the base classifiers it uses, an explanation supported by our experiments and entirely consistent with the margins theory. Thus, we find maximizing the margins is desirable, but not necessarily at the expense of other factors, especially base-classifier complexity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 23rd international conference on Machine learning

自引率

0.00%

发文量

期刊最新文献

On a theory of learning with similarity functions Bayesian learning of measurement and structural models Predictive search distributions Data association for topic intensity tracking Feature value acquisition in testing: a sequential batch test algorithm