{"title":"Exploiting residual errors in nonlinear online prediction","authors":"Emirhan Ilhan, Ahmet B. Koc, Suleyman S. Kozat","doi":"10.1007/s10994-024-06554-7","DOIUrl":null,"url":null,"abstract":"<p>We introduce a novel online (or sequential) nonlinear prediction approach that incorporates the residuals, i.e., prediction errors in the past observations, as additional features for the current data. Including the past error terms in an online prediction algorithm naturally improves prediction performance significantly since this information is essential for an algorithm to adjust itself based on its past errors. These terms are well exploited in many linear statistical models such as ARMA, SES, and Holts-Winters models. However, the past error terms are rarely or in a certain sense not optimally exploited in nonlinear prediction models since training them requires complex nonlinear state-space modeling. To this end, for the first time in the literature, we introduce a nonlinear prediction framework that utilizes not only the current features but also the past error terms as additional features, thereby exploiting the residual state information in the error terms, i.e., the model’s performance on the past samples. Since the new feature vectors contain error terms that change with every update, our algorithm jointly optimizes the model parameters and the feature vectors simultaneously. We achieve this by introducing new update equations that handle the effects resulting from the changes in the feature vectors in an online manner. We use soft decision trees and neural networks as the nonlinear prediction algorithms since these are the most widely used methods in highly publicized competitions. However, as we show, our methods are generic and any algorithm supporting gradient calculations can be straightforwardly used. We show through our experiments on the well-known real-life competition datasets that our method significantly outperforms the state-of-the-art. We also provide the implementation of our approach including the source code to facilitate reproducibility (https://github.com/ahmetberkerkoc/SDT-ARMA).</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"34 1","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10994-024-06554-7","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
We introduce a novel online (or sequential) nonlinear prediction approach that incorporates the residuals, i.e., prediction errors in the past observations, as additional features for the current data. Including the past error terms in an online prediction algorithm naturally improves prediction performance significantly since this information is essential for an algorithm to adjust itself based on its past errors. These terms are well exploited in many linear statistical models such as ARMA, SES, and Holts-Winters models. However, the past error terms are rarely or in a certain sense not optimally exploited in nonlinear prediction models since training them requires complex nonlinear state-space modeling. To this end, for the first time in the literature, we introduce a nonlinear prediction framework that utilizes not only the current features but also the past error terms as additional features, thereby exploiting the residual state information in the error terms, i.e., the model’s performance on the past samples. Since the new feature vectors contain error terms that change with every update, our algorithm jointly optimizes the model parameters and the feature vectors simultaneously. We achieve this by introducing new update equations that handle the effects resulting from the changes in the feature vectors in an online manner. We use soft decision trees and neural networks as the nonlinear prediction algorithms since these are the most widely used methods in highly publicized competitions. However, as we show, our methods are generic and any algorithm supporting gradient calculations can be straightforwardly used. We show through our experiments on the well-known real-life competition datasets that our method significantly outperforms the state-of-the-art. We also provide the implementation of our approach including the source code to facilitate reproducibility (https://github.com/ahmetberkerkoc/SDT-ARMA).
期刊介绍:
Machine Learning serves as a global platform dedicated to computational approaches in learning. The journal reports substantial findings on diverse learning methods applied to various problems, offering support through empirical studies, theoretical analysis, or connections to psychological phenomena. It demonstrates the application of learning methods to solve significant problems and aims to enhance the conduct of machine learning research with a focus on verifiable and replicable evidence in published papers.