Exploiting residual errors in nonlinear online prediction

IF 2.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Machine Learning Pub Date : 2024-05-29 DOI:10.1007/s10994-024-06554-7

Emirhan Ilhan, Ahmet B. Koc, Suleyman S. Kozat

{"title":"Exploiting residual errors in nonlinear online prediction","authors":"Emirhan Ilhan, Ahmet B. Koc, Suleyman S. Kozat","doi":"10.1007/s10994-024-06554-7","DOIUrl":null,"url":null,"abstract":"<p>We introduce a novel online (or sequential) nonlinear prediction approach that incorporates the residuals, i.e., prediction errors in the past observations, as additional features for the current data. Including the past error terms in an online prediction algorithm naturally improves prediction performance significantly since this information is essential for an algorithm to adjust itself based on its past errors. These terms are well exploited in many linear statistical models such as ARMA, SES, and Holts-Winters models. However, the past error terms are rarely or in a certain sense not optimally exploited in nonlinear prediction models since training them requires complex nonlinear state-space modeling. To this end, for the first time in the literature, we introduce a nonlinear prediction framework that utilizes not only the current features but also the past error terms as additional features, thereby exploiting the residual state information in the error terms, i.e., the model’s performance on the past samples. Since the new feature vectors contain error terms that change with every update, our algorithm jointly optimizes the model parameters and the feature vectors simultaneously. We achieve this by introducing new update equations that handle the effects resulting from the changes in the feature vectors in an online manner. We use soft decision trees and neural networks as the nonlinear prediction algorithms since these are the most widely used methods in highly publicized competitions. However, as we show, our methods are generic and any algorithm supporting gradient calculations can be straightforwardly used. We show through our experiments on the well-known real-life competition datasets that our method significantly outperforms the state-of-the-art. We also provide the implementation of our approach including the source code to facilitate reproducibility (https://github.com/ahmetberkerkoc/SDT-ARMA).</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"34 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10994-024-06554-7","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

We introduce a novel online (or sequential) nonlinear prediction approach that incorporates the residuals, i.e., prediction errors in the past observations, as additional features for the current data. Including the past error terms in an online prediction algorithm naturally improves prediction performance significantly since this information is essential for an algorithm to adjust itself based on its past errors. These terms are well exploited in many linear statistical models such as ARMA, SES, and Holts-Winters models. However, the past error terms are rarely or in a certain sense not optimally exploited in nonlinear prediction models since training them requires complex nonlinear state-space modeling. To this end, for the first time in the literature, we introduce a nonlinear prediction framework that utilizes not only the current features but also the past error terms as additional features, thereby exploiting the residual state information in the error terms, i.e., the model’s performance on the past samples. Since the new feature vectors contain error terms that change with every update, our algorithm jointly optimizes the model parameters and the feature vectors simultaneously. We achieve this by introducing new update equations that handle the effects resulting from the changes in the feature vectors in an online manner. We use soft decision trees and neural networks as the nonlinear prediction algorithms since these are the most widely used methods in highly publicized competitions. However, as we show, our methods are generic and any algorithm supporting gradient calculations can be straightforwardly used. We show through our experiments on the well-known real-life competition datasets that our method significantly outperforms the state-of-the-art. We also provide the implementation of our approach including the source code to facilitate reproducibility (https://github.com/ahmetberkerkoc/SDT-ARMA).

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用非线性在线预测中的残余误差

我们引入了一种新颖的在线（或连续）非线性预测方法，该方法将残差（即过去观测中的预测误差）作为当前数据的附加特征。在在线预测算法中加入过去的误差项，自然能显著提高预测性能，因为这些信息对于算法根据过去的误差进行自我调整至关重要。在许多线性统计模型（如 ARMA、SES 和 Holts-Winters 模型）中，这些项都得到了很好的利用。然而，在非线性预测模型中，过去的误差项很少被利用，或者从某种意义上说，没有得到最佳利用，因为训练这些模型需要复杂的非线性状态空间建模。为此，我们在文献中首次引入了一个非线性预测框架，该框架不仅利用当前特征，还利用过去的误差项作为附加特征，从而利用误差项中的残余状态信息，即模型在过去样本上的表现。由于新的特征向量包含的误差项会随着每次更新而改变，因此我们的算法会同时对模型参数和特征向量进行联合优化。为此，我们引入了新的更新方程，以在线方式处理特征向量变化带来的影响。我们使用软决策树和神经网络作为非线性预测算法，因为这些方法在备受关注的竞赛中使用最为广泛。不过，正如我们所展示的，我们的方法是通用的，任何支持梯度计算的算法都可以直接使用。我们在著名的真实竞赛数据集上进行的实验表明，我们的方法明显优于最先进的方法。我们还提供了我们方法的实现，包括源代码，以促进可重复性（https://github.com/ahmetberkerkoc/SDT-ARMA）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Machine Learning 工程技术-计算机：人工智能

CiteScore

11.00

自引率

2.70%

发文量

162

审稿时长

3 months

期刊介绍： Machine Learning serves as a global platform dedicated to computational approaches in learning. The journal reports substantial findings on diverse learning methods applied to various problems, offering support through empirical studies, theoretical analysis, or connections to psychological phenomena. It demonstrates the application of learning methods to solve significant problems and aims to enhance the conduct of machine learning research with a focus on verifiable and replicable evidence in published papers.

期刊最新文献

Linear Causal Discovery with Interventional Constraints. Interpretable optimisation-based approach for hyper-box classification. Deep latent force models: ODE-based process convolutions for Bayesian deep learning. Offline reinforcement learning for learning to dispatch for job shop scheduling. Computing the distance between unbalanced distributions: the flat metric.