Krzysztof Mnich, A. Polewko-Klim, A. Golinska, W. Lesiński, W. Rudnicki
{"title":"Super Learning with Repeated Cross Validation","authors":"Krzysztof Mnich, A. Polewko-Klim, A. Golinska, W. Lesiński, W. Rudnicki","doi":"10.1109/ICDMW51313.2020.00089","DOIUrl":null,"url":null,"abstract":"Super learner algorithm was created to combine results of multiple base learners with the use of cross validation. However, in many cases it does not outperform significantly a simple average of the base results. We propose to apply multiple repeats of cross validation to improve the performance of super learning. Two approaches to application of repeated cross validation were tested on artificial data sets and on real-life, biomedical data sets. One of the approaches, MEAN OUTPUT strategy, proved to significantly improve the results. To reduce the computational complexity of the algorithm, we suggest the use of 3-fold, rather than the previously recommended 10-fold validation. The tests showed, that this simplification does not affect the super learning results.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW51313.2020.00089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Super learner algorithm was created to combine results of multiple base learners with the use of cross validation. However, in many cases it does not outperform significantly a simple average of the base results. We propose to apply multiple repeats of cross validation to improve the performance of super learning. Two approaches to application of repeated cross validation were tested on artificial data sets and on real-life, biomedical data sets. One of the approaches, MEAN OUTPUT strategy, proved to significantly improve the results. To reduce the computational complexity of the algorithm, we suggest the use of 3-fold, rather than the previously recommended 10-fold validation. The tests showed, that this simplification does not affect the super learning results.