Fernando Ferreira Lima dos Santos, Farzaneh Khorsandi
All-Terrain Vehicles (ATVs) are popular off-road vehicles in the United States, with a staggering 10.5 million households reported to own at least one ATV. Despite their popularity, ATVs pose a significant risk of severe injuries, leading to substantial healthcare expenses and raising public health concerns. As such, gaining insights into the patterns of ATV-related hospitalizations and accurately predicting these injuries is of paramount importance. This knowledge can guide the development of effective prevention strategies, ultimately mitigating ATV-related injuries and the associated healthcare costs. Therefore, we performed an in-depth analysis of ATV-related hospitalizations from 2010 to 2021. Furthermore, we developed and assessed the performance of three forecasting models—Neural Prophet, SARIMA, and LSTM—to predict ATV-related injuries. The performance of these models was evaluated using the Root Mean Square Error (RMSE) accuracy metric. As a result, the LSTM model outperformed the others and could be used to provide valuable insights that can aid in strategic planning and resource allocation within healthcare systems. In addition, our findings highlight the urgent need for prevention programs that are specifically targeted toward youth and timed for the summer season.
{"title":"Riding into Danger: Predictive Modeling for ATV-Related Injuries and Seasonal Patterns","authors":"Fernando Ferreira Lima dos Santos, Farzaneh Khorsandi","doi":"10.3390/forecast6020015","DOIUrl":"https://doi.org/10.3390/forecast6020015","url":null,"abstract":"All-Terrain Vehicles (ATVs) are popular off-road vehicles in the United States, with a staggering 10.5 million households reported to own at least one ATV. Despite their popularity, ATVs pose a significant risk of severe injuries, leading to substantial healthcare expenses and raising public health concerns. As such, gaining insights into the patterns of ATV-related hospitalizations and accurately predicting these injuries is of paramount importance. This knowledge can guide the development of effective prevention strategies, ultimately mitigating ATV-related injuries and the associated healthcare costs. Therefore, we performed an in-depth analysis of ATV-related hospitalizations from 2010 to 2021. Furthermore, we developed and assessed the performance of three forecasting models—Neural Prophet, SARIMA, and LSTM—to predict ATV-related injuries. The performance of these models was evaluated using the Root Mean Square Error (RMSE) accuracy metric. As a result, the LSTM model outperformed the others and could be used to provide valuable insights that can aid in strategic planning and resource allocation within healthcare systems. In addition, our findings highlight the urgent need for prevention programs that are specifically targeted toward youth and timed for the summer season.","PeriodicalId":508737,"journal":{"name":"Forecasting","volume":"92 9","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140752634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Lekidis, Angelos Georgakis, Christos Dalamagkas, Elpiniki I. Papageorgiou
The scheduled maintenance of industrial equipment is usually performed with a low frequency, as it usually leads to unpredicted downtime in business operations. Nevertheless, this confers a risk of failure in individual modules of the equipment, which may diminish its performance or even lead to its breakdown, rendering it non-operational. Lately, predictive maintenance methods have been considered for industrial systems, such as power generation stations, as a proactive measure for preventing failures. Such methods use data gathered from industrial equipment and Machine Learning (ML) algorithms to identify data patterns that indicate anomalies and may lead to potential failures. However, industrial equipment exhibits specific behavior and interactions that originate from its configuration from the manufacturer and the system that is installed, which constitutes a great challenge for the effectiveness of ML model maintenance and failure predictions. In this article, we propose a novel method for tackling this challenge based on the development of a digital twin for industrial equipment known as a Remote Terminal Unit (RTU). RTUs are used in electrical systems to provide the remote monitoring and control of critical equipment, such as power generators. The method is applied in an RTU that is connected to a real power generator within a Public Power Corporation (PPC) facility, where operational anomalies are forecasted based on measurements of its processing power, operating temperature, voltage, and storage memory.
对工业设备进行定期维护的频率通常很低,因为这通常会导致无法预料的业务停机。然而,这也带来了设备单个模块发生故障的风险,可能会降低设备性能,甚至导致设备故障,使其无法运行。最近,人们开始考虑在发电站等工业系统中采用预测性维护方法,作为预防故障的积极措施。此类方法使用从工业设备收集的数据和机器学习 (ML) 算法来识别数据模式,这些模式表明存在异常情况,并可能导致潜在故障。然而,工业设备表现出特定的行为和交互,这些行为和交互源于制造商和所安装系统的配置,这对 ML 模型维护和故障预测的有效性构成了巨大挑战。在本文中,我们提出了一种新方法来应对这一挑战,该方法基于被称为远程终端设备(RTU)的工业设备数字孪生系统的开发。RTU 用于电力系统,对发电机等关键设备进行远程监控。该方法应用于连接到公共电力公司(PPC)设施内实际发电设备的 RTU,根据对其处理能力、工作温度、电压和存储记忆的测量,预测运行异常情况。
{"title":"Predictive Maintenance Framework for Fault Detection in Remote Terminal Units","authors":"A. Lekidis, Angelos Georgakis, Christos Dalamagkas, Elpiniki I. Papageorgiou","doi":"10.3390/forecast6020014","DOIUrl":"https://doi.org/10.3390/forecast6020014","url":null,"abstract":"The scheduled maintenance of industrial equipment is usually performed with a low frequency, as it usually leads to unpredicted downtime in business operations. Nevertheless, this confers a risk of failure in individual modules of the equipment, which may diminish its performance or even lead to its breakdown, rendering it non-operational. Lately, predictive maintenance methods have been considered for industrial systems, such as power generation stations, as a proactive measure for preventing failures. Such methods use data gathered from industrial equipment and Machine Learning (ML) algorithms to identify data patterns that indicate anomalies and may lead to potential failures. However, industrial equipment exhibits specific behavior and interactions that originate from its configuration from the manufacturer and the system that is installed, which constitutes a great challenge for the effectiveness of ML model maintenance and failure predictions. In this article, we propose a novel method for tackling this challenge based on the development of a digital twin for industrial equipment known as a Remote Terminal Unit (RTU). RTUs are used in electrical systems to provide the remote monitoring and control of critical equipment, such as power generators. The method is applied in an RTU that is connected to a real power generator within a Public Power Corporation (PPC) facility, where operational anomalies are forecasted based on measurements of its processing power, operating temperature, voltage, and storage memory.","PeriodicalId":508737,"journal":{"name":"Forecasting","volume":" September","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140383494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Early identification of acute gout is crucial, enabling healthcare professionals to implement targeted interventions for rapid pain relief and preventing disease progression, ensuring improved long-term joint function. In this study, we comprehensively explored the potential early detection of gout flares (GFs) based on nurses’ chief complaint notes in the Emergency Department (ED). Addressing the challenge of identifying GFs prospectively during an ED visit, where documentation is typically minimal, our research focused on employing alternative Natural Language Processing (NLP) techniques to enhance detection accuracy. We investigated GF detection algorithms using both sparse representations by traditional NLP methods and dense encodings by medical domain-specific Large Language Models (LLMs), distinguishing between generative and discriminative models. Three methods were used to alleviate the issue of severe data imbalances, including oversampling, class weights, and focal loss. Extensive empirical studies were performed on the Gout Emergency Department Chief Complaint Corpora. Sparse text representations like tf-idf proved to produce strong performances, achieving F1 scores higher than 0.75. The best deep learning models were RoBERTa-large-PM-M3-Voc and BioGPT, which had the best F1 scores for each dataset, with a 0.8 on the 2019 dataset and a 0.85 F1 score on the 2020 dataset, respectively. We concluded that although discriminative LLMs performed better for this classification task when compared to generative LLMs, a combination of using generative models as feature extractors and employing a support vector machine for classification yielded promising results comparable to those obtained with discriminative models.
{"title":"Effective Natural Language Processing Algorithms for Early Alerts of Gout Flares from Chief Complaints","authors":"Lucas Lopes Oliveira, Xiaorui Jiang, Aryalakshmi Nellippillipathil Babu, Poonam Karajagi, Alireza Daneshkhah","doi":"10.3390/forecast6010013","DOIUrl":"https://doi.org/10.3390/forecast6010013","url":null,"abstract":"Early identification of acute gout is crucial, enabling healthcare professionals to implement targeted interventions for rapid pain relief and preventing disease progression, ensuring improved long-term joint function. In this study, we comprehensively explored the potential early detection of gout flares (GFs) based on nurses’ chief complaint notes in the Emergency Department (ED). Addressing the challenge of identifying GFs prospectively during an ED visit, where documentation is typically minimal, our research focused on employing alternative Natural Language Processing (NLP) techniques to enhance detection accuracy. We investigated GF detection algorithms using both sparse representations by traditional NLP methods and dense encodings by medical domain-specific Large Language Models (LLMs), distinguishing between generative and discriminative models. Three methods were used to alleviate the issue of severe data imbalances, including oversampling, class weights, and focal loss. Extensive empirical studies were performed on the Gout Emergency Department Chief Complaint Corpora. Sparse text representations like tf-idf proved to produce strong performances, achieving F1 scores higher than 0.75. The best deep learning models were RoBERTa-large-PM-M3-Voc and BioGPT, which had the best F1 scores for each dataset, with a 0.8 on the 2019 dataset and a 0.85 F1 score on the 2020 dataset, respectively. We concluded that although discriminative LLMs performed better for this classification task when compared to generative LLMs, a combination of using generative models as feature extractors and employing a support vector machine for classification yielded promising results comparable to those obtained with discriminative models.","PeriodicalId":508737,"journal":{"name":"Forecasting","volume":"52 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140254685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In today’s evolving global world, the pharmaceutical sector faces an emerging challenge, which is the rapid surge of the global population and the consequent growth in drug production demands. Recognizing this, our study explores the urgent need to strengthen pharmaceutical production capacities, ensuring drugs are allocated and stored strategically to meet diverse regional and demographic needs. Summarizing our key findings, our research focuses on the promising area of drug demand forecasting using artificial intelligence (AI) and machine learning (ML) techniques to enhance predictions in the pharmaceutical field. Supplied with a rich dataset from Kaggle spanning 600,000 sales records from a singular pharmacy, our study embarks on a thorough exploration of univariate time series analysis. Here, we pair conventional analytical tools such as ARIMA with advanced methodologies like LSTM neural networks, all with a singular vision: refining the precision of our sales. Venturing deeper, our data underwent categorisation and were segmented into eight clusters premised on the ATC Anatomical Therapeutic Chemical (ATC) Classification System framework. This segmentation unravels the evident influence of seasonality on drug sales. The analysis not only highlights the effectiveness of machine learning models but also illuminates the remarkable success of XGBoost. This algorithm outperformed traditional models, achieving the lowest MAPE values: 17.89% for M01AB (anti-inflammatory and antirheumatic products, non-steroids, acetic acid derivatives, and related substances), 16.92% for M01AE (anti-inflammatory and antirheumatic products, non-steroids, and propionic acid derivatives), 17.98% for N02BA (analgesics, antipyretics, and anilides), and 16.05% for N02BE (analgesics, antipyretics, pyrazolones, and anilides). XGBoost further demonstrated exceptional precision with the lowest MSE scores: 28.8 for M01AB, 1518.56 for N02BE, and 350.84 for N05C (hypnotics and sedatives). Additionally, the Seasonal Naïve model recorded an MSE of 49.19 for M01AE, while the Single Exponential Smoothing model showed an MSE of 7.19 for N05B. These findings underscore the strengths derived from employing a diverse range of approaches within the forecasting series. In summary, our research accentuates the significance of leveraging machine learning techniques to derive valuable insights for pharmaceutical companies. By applying the power of these methods, companies can optimize their production, storage, distribution, and marketing practices.
{"title":"Applying Machine Learning and Statistical Forecasting Methods for Enhancing Pharmaceutical Sales Predictions","authors":"K. P. Fourkiotis, Athanasios Tsadiras","doi":"10.3390/forecast6010010","DOIUrl":"https://doi.org/10.3390/forecast6010010","url":null,"abstract":"In today’s evolving global world, the pharmaceutical sector faces an emerging challenge, which is the rapid surge of the global population and the consequent growth in drug production demands. Recognizing this, our study explores the urgent need to strengthen pharmaceutical production capacities, ensuring drugs are allocated and stored strategically to meet diverse regional and demographic needs. Summarizing our key findings, our research focuses on the promising area of drug demand forecasting using artificial intelligence (AI) and machine learning (ML) techniques to enhance predictions in the pharmaceutical field. Supplied with a rich dataset from Kaggle spanning 600,000 sales records from a singular pharmacy, our study embarks on a thorough exploration of univariate time series analysis. Here, we pair conventional analytical tools such as ARIMA with advanced methodologies like LSTM neural networks, all with a singular vision: refining the precision of our sales. Venturing deeper, our data underwent categorisation and were segmented into eight clusters premised on the ATC Anatomical Therapeutic Chemical (ATC) Classification System framework. This segmentation unravels the evident influence of seasonality on drug sales. The analysis not only highlights the effectiveness of machine learning models but also illuminates the remarkable success of XGBoost. This algorithm outperformed traditional models, achieving the lowest MAPE values: 17.89% for M01AB (anti-inflammatory and antirheumatic products, non-steroids, acetic acid derivatives, and related substances), 16.92% for M01AE (anti-inflammatory and antirheumatic products, non-steroids, and propionic acid derivatives), 17.98% for N02BA (analgesics, antipyretics, and anilides), and 16.05% for N02BE (analgesics, antipyretics, pyrazolones, and anilides). XGBoost further demonstrated exceptional precision with the lowest MSE scores: 28.8 for M01AB, 1518.56 for N02BE, and 350.84 for N05C (hypnotics and sedatives). Additionally, the Seasonal Naïve model recorded an MSE of 49.19 for M01AE, while the Single Exponential Smoothing model showed an MSE of 7.19 for N05B. These findings underscore the strengths derived from employing a diverse range of approaches within the forecasting series. In summary, our research accentuates the significance of leveraging machine learning techniques to derive valuable insights for pharmaceutical companies. By applying the power of these methods, companies can optimize their production, storage, distribution, and marketing practices.","PeriodicalId":508737,"journal":{"name":"Forecasting","volume":"687 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140453869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Standard time-series modeling requires the stability of model parameters over time. The instability of model parameters is often caused by structural breaks, leading to the formation of nonlinear models. A state-dependent model (SDM) is a more general and flexible scheme in nonlinear modeling. On the other hand, time-series data often exhibit multiple frequency components, such as trends, seasonality, cycles, and noise. These frequency components can be optimized in forecasting using Singular Spectrum Analysis (SSA). Furthermore, the two most widely used approaches in SSA are Linear Recurrent Formula (SSAR) and Vector (SSAV). SSAV has better accuracy and robustness than SSAR, especially in handling structural breaks. Therefore, this research proposes modeling the SSAV coefficient with an SDM approach to take structural breaks called SDM-SSAV. SDM recursively updates the SSAV coefficient to adapt over time and between states using an Extended Kalman Filter (EKF). Empirical results with Indonesian Export data and simulation studies show that the accuracy of SDM-SSAV outperforms SSAR, SSAV, SDM-SSAR, hybrid ARIMA-LSTM, and VARI.
{"title":"State-Dependent Model Based on Singular Spectrum Analysis Vector for Modeling Structural Breaks: Forecasting Indonesian Export","authors":"Yoga Sasmita, Heri Kuswanto, D. Prastyo","doi":"10.3390/forecast6010009","DOIUrl":"https://doi.org/10.3390/forecast6010009","url":null,"abstract":"Standard time-series modeling requires the stability of model parameters over time. The instability of model parameters is often caused by structural breaks, leading to the formation of nonlinear models. A state-dependent model (SDM) is a more general and flexible scheme in nonlinear modeling. On the other hand, time-series data often exhibit multiple frequency components, such as trends, seasonality, cycles, and noise. These frequency components can be optimized in forecasting using Singular Spectrum Analysis (SSA). Furthermore, the two most widely used approaches in SSA are Linear Recurrent Formula (SSAR) and Vector (SSAV). SSAV has better accuracy and robustness than SSAR, especially in handling structural breaks. Therefore, this research proposes modeling the SSAV coefficient with an SDM approach to take structural breaks called SDM-SSAV. SDM recursively updates the SSAV coefficient to adapt over time and between states using an Extended Kalman Filter (EKF). Empirical results with Indonesian Export data and simulation studies show that the accuracy of SDM-SSAV outperforms SSAR, SSAV, SDM-SSAR, hybrid ARIMA-LSTM, and VARI.","PeriodicalId":508737,"journal":{"name":"Forecasting","volume":"131 12","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139843007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Standard time-series modeling requires the stability of model parameters over time. The instability of model parameters is often caused by structural breaks, leading to the formation of nonlinear models. A state-dependent model (SDM) is a more general and flexible scheme in nonlinear modeling. On the other hand, time-series data often exhibit multiple frequency components, such as trends, seasonality, cycles, and noise. These frequency components can be optimized in forecasting using Singular Spectrum Analysis (SSA). Furthermore, the two most widely used approaches in SSA are Linear Recurrent Formula (SSAR) and Vector (SSAV). SSAV has better accuracy and robustness than SSAR, especially in handling structural breaks. Therefore, this research proposes modeling the SSAV coefficient with an SDM approach to take structural breaks called SDM-SSAV. SDM recursively updates the SSAV coefficient to adapt over time and between states using an Extended Kalman Filter (EKF). Empirical results with Indonesian Export data and simulation studies show that the accuracy of SDM-SSAV outperforms SSAR, SSAV, SDM-SSAR, hybrid ARIMA-LSTM, and VARI.
{"title":"State-Dependent Model Based on Singular Spectrum Analysis Vector for Modeling Structural Breaks: Forecasting Indonesian Export","authors":"Yoga Sasmita, Heri Kuswanto, D. Prastyo","doi":"10.3390/forecast6010009","DOIUrl":"https://doi.org/10.3390/forecast6010009","url":null,"abstract":"Standard time-series modeling requires the stability of model parameters over time. The instability of model parameters is often caused by structural breaks, leading to the formation of nonlinear models. A state-dependent model (SDM) is a more general and flexible scheme in nonlinear modeling. On the other hand, time-series data often exhibit multiple frequency components, such as trends, seasonality, cycles, and noise. These frequency components can be optimized in forecasting using Singular Spectrum Analysis (SSA). Furthermore, the two most widely used approaches in SSA are Linear Recurrent Formula (SSAR) and Vector (SSAV). SSAV has better accuracy and robustness than SSAR, especially in handling structural breaks. Therefore, this research proposes modeling the SSAV coefficient with an SDM approach to take structural breaks called SDM-SSAV. SDM recursively updates the SSAV coefficient to adapt over time and between states using an Extended Kalman Filter (EKF). Empirical results with Indonesian Export data and simulation studies show that the accuracy of SDM-SSAV outperforms SSAR, SSAV, SDM-SSAR, hybrid ARIMA-LSTM, and VARI.","PeriodicalId":508737,"journal":{"name":"Forecasting","volume":"15 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139782992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A key summary statistic in a stationary functional time series is the long-run covariance function that measures serial dependence. It can be consistently estimated via a kernel sandwich estimator, which is the core of dynamic functional principal component regression for forecasting functional time series. To measure the uncertainty of the long-run covariance estimation, we consider sieve and functional autoregressive (FAR) bootstrap methods to generate pseudo-functional time series and study variability associated with the long-run covariance. The sieve bootstrap method is nonparametric (i.e., model-free), while the FAR bootstrap method is semi-parametric. The sieve bootstrap method relies on functional principal component analysis to decompose a functional time series into a set of estimated functional principal components and their associated scores. The scores can be bootstrapped via a vector autoregressive representation. The bootstrapped functional time series are obtained by multiplying the bootstrapped scores by the estimated functional principal components. The FAR bootstrap method relies on the FAR of order 1 to model the conditional mean of a functional time series, while residual functions can be bootstrapped via independent and identically distributed resampling. Through a series of Monte Carlo simulations, we evaluate and compare the finite-sample accuracy between the sieve and FAR bootstrap methods for quantifying the estimation uncertainty of the long-run covariance of a stationary functional time series.
静态函数时间序列的一个关键汇总统计量是衡量序列依赖性的长期协方差函数。它可以通过核三明治估计器进行一致估计,而核三明治估计器正是预测函数时间序列的动态函数主成分回归的核心。为了衡量长期协方差估计的不确定性,我们考虑采用筛法和函数自回归(FAR)引导法生成伪函数时间序列,并研究与长期协方差相关的变异性。筛自举法是非参数法(即无模型),而 FAR 自举法是半参数法。筛式自举法依靠函数主成分分析将函数时间序列分解为一组估计的函数主成分及其相关分 数。分数可以通过向量自回归表示进行引导。将自举得分乘以估计的功能主成分,即可得到自举功能时间序列。FAR 引导法依赖于阶 1 的 FAR 来模拟函数时间序列的条件均值,而残差函数可以通过独立同分布的重采样进行引导。通过一系列蒙特卡罗模拟,我们评估并比较了筛法和 FAR 引导法在量化静态函数时间序列长期协方差估计不确定性方面的有限样本精度。
{"title":"Bootstrapping Long-Run Covariance of Stationary Functional Time Series","authors":"Han Lin Shang","doi":"10.3390/forecast6010008","DOIUrl":"https://doi.org/10.3390/forecast6010008","url":null,"abstract":"A key summary statistic in a stationary functional time series is the long-run covariance function that measures serial dependence. It can be consistently estimated via a kernel sandwich estimator, which is the core of dynamic functional principal component regression for forecasting functional time series. To measure the uncertainty of the long-run covariance estimation, we consider sieve and functional autoregressive (FAR) bootstrap methods to generate pseudo-functional time series and study variability associated with the long-run covariance. The sieve bootstrap method is nonparametric (i.e., model-free), while the FAR bootstrap method is semi-parametric. The sieve bootstrap method relies on functional principal component analysis to decompose a functional time series into a set of estimated functional principal components and their associated scores. The scores can be bootstrapped via a vector autoregressive representation. The bootstrapped functional time series are obtained by multiplying the bootstrapped scores by the estimated functional principal components. The FAR bootstrap method relies on the FAR of order 1 to model the conditional mean of a functional time series, while residual functions can be bootstrapped via independent and identically distributed resampling. Through a series of Monte Carlo simulations, we evaluate and compare the finite-sample accuracy between the sieve and FAR bootstrap methods for quantifying the estimation uncertainty of the long-run covariance of a stationary functional time series.","PeriodicalId":508737,"journal":{"name":"Forecasting","volume":"45 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139865747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A key summary statistic in a stationary functional time series is the long-run covariance function that measures serial dependence. It can be consistently estimated via a kernel sandwich estimator, which is the core of dynamic functional principal component regression for forecasting functional time series. To measure the uncertainty of the long-run covariance estimation, we consider sieve and functional autoregressive (FAR) bootstrap methods to generate pseudo-functional time series and study variability associated with the long-run covariance. The sieve bootstrap method is nonparametric (i.e., model-free), while the FAR bootstrap method is semi-parametric. The sieve bootstrap method relies on functional principal component analysis to decompose a functional time series into a set of estimated functional principal components and their associated scores. The scores can be bootstrapped via a vector autoregressive representation. The bootstrapped functional time series are obtained by multiplying the bootstrapped scores by the estimated functional principal components. The FAR bootstrap method relies on the FAR of order 1 to model the conditional mean of a functional time series, while residual functions can be bootstrapped via independent and identically distributed resampling. Through a series of Monte Carlo simulations, we evaluate and compare the finite-sample accuracy between the sieve and FAR bootstrap methods for quantifying the estimation uncertainty of the long-run covariance of a stationary functional time series.
静态函数时间序列的一个关键汇总统计量是衡量序列依赖性的长期协方差函数。它可以通过核三明治估计器进行一致估计,而核三明治估计器正是预测函数时间序列的动态函数主成分回归的核心。为了衡量长期协方差估计的不确定性,我们考虑采用筛法和函数自回归(FAR)引导法生成伪函数时间序列,并研究与长期协方差相关的变异性。筛自举法是非参数法(即无模型),而 FAR 自举法是半参数法。筛式自举法依靠函数主成分分析将函数时间序列分解为一组估计的函数主成分及其相关分 数。分数可以通过向量自回归表示进行引导。将自举得分乘以估计的功能主成分,即可得到自举功能时间序列。FAR 引导法依赖于阶 1 的 FAR 来模拟函数时间序列的条件均值,而残差函数可以通过独立同分布的重采样进行引导。通过一系列蒙特卡罗模拟,我们评估并比较了筛法和 FAR 引导法在量化静态函数时间序列长期协方差估计不确定性方面的有限样本精度。
{"title":"Bootstrapping Long-Run Covariance of Stationary Functional Time Series","authors":"Han Lin Shang","doi":"10.3390/forecast6010008","DOIUrl":"https://doi.org/10.3390/forecast6010008","url":null,"abstract":"A key summary statistic in a stationary functional time series is the long-run covariance function that measures serial dependence. It can be consistently estimated via a kernel sandwich estimator, which is the core of dynamic functional principal component regression for forecasting functional time series. To measure the uncertainty of the long-run covariance estimation, we consider sieve and functional autoregressive (FAR) bootstrap methods to generate pseudo-functional time series and study variability associated with the long-run covariance. The sieve bootstrap method is nonparametric (i.e., model-free), while the FAR bootstrap method is semi-parametric. The sieve bootstrap method relies on functional principal component analysis to decompose a functional time series into a set of estimated functional principal components and their associated scores. The scores can be bootstrapped via a vector autoregressive representation. The bootstrapped functional time series are obtained by multiplying the bootstrapped scores by the estimated functional principal components. The FAR bootstrap method relies on the FAR of order 1 to model the conditional mean of a functional time series, while residual functions can be bootstrapped via independent and identically distributed resampling. Through a series of Monte Carlo simulations, we evaluate and compare the finite-sample accuracy between the sieve and FAR bootstrap methods for quantifying the estimation uncertainty of the long-run covariance of a stationary functional time series.","PeriodicalId":508737,"journal":{"name":"Forecasting","volume":"23 12","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139805846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This research proposes an investigative experiment employing binary classification for short-term electricity price spike forecasting. Numerical definitions for price spikes are derived from economic and statistical thresholds. The predictive task employs two tree-based machine learning classifiers and a deterministic point forecaster; a statistical regression model. Hyperparameters for the tree-based classifiers are optimized for statistical performance based on recall, precision, and F1-score. The deterministic forecaster is adapted from the literature on electricity price forecasting for the classification task. Additionally, one tree-based model prioritizes interpretability, generating decision rules that are subsequently utilized to produce price spike forecasts. For all models, we evaluate the final statistical and economic predictive performance. The interpretable model is analyzed for the trade-off between performance and interpretability. Numerical results highlight the significance of complementing statistical performance with economic assessment in electricity price spike forecasting. All experiments utilize data from Alberta’s electricity market.
本研究提出了一种采用二元分类法进行短期电价峰值预测的调查实验。价格峰值的数字定义来自经济和统计阈值。预测任务采用了两个基于树的机器学习分类器和一个确定性点预测器;一个统计回归模型。树型分类器的超参数根据召回率、精确度和 F1 分数对统计性能进行了优化。确定性预测器是根据电价预测文献改编的,用于分类任务。此外,一个基于树的模型优先考虑了可解释性,生成了决策规则,随后用于生成价格峰值预测。我们对所有模型的最终统计和经济预测性能进行了评估。我们对可解释模型进行了分析,以权衡性能和可解释性。数值结果凸显了在电价峰值预测中以经济评估补充统计性能的重要性。所有实验均采用阿尔伯塔省电力市场的数据。
{"title":"Forecasting the Occurrence of Electricity Price Spikes: A Statistical-Economic Investigation Study","authors":"Manuel Zamudio López, H. Zareipour, Mike Quashie","doi":"10.3390/forecast6010007","DOIUrl":"https://doi.org/10.3390/forecast6010007","url":null,"abstract":"This research proposes an investigative experiment employing binary classification for short-term electricity price spike forecasting. Numerical definitions for price spikes are derived from economic and statistical thresholds. The predictive task employs two tree-based machine learning classifiers and a deterministic point forecaster; a statistical regression model. Hyperparameters for the tree-based classifiers are optimized for statistical performance based on recall, precision, and F1-score. The deterministic forecaster is adapted from the literature on electricity price forecasting for the classification task. Additionally, one tree-based model prioritizes interpretability, generating decision rules that are subsequently utilized to produce price spike forecasts. For all models, we evaluate the final statistical and economic predictive performance. The interpretable model is analyzed for the trade-off between performance and interpretability. Numerical results highlight the significance of complementing statistical performance with economic assessment in electricity price spike forecasting. All experiments utilize data from Alberta’s electricity market.","PeriodicalId":508737,"journal":{"name":"Forecasting","volume":"12 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139686651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sabrina De Nardi, C. Carnevale, Sara Raccagni, L. Sangiorgi
Models are a core element in performing local estimation of the climate change input. In this work, a novel approach to perform a fast downscaling of global temperature anomalies on a regional level is presented. The approach is based on a set of data-driven models linking global temperature anomalies and regional and global emissions to regional temperature anomalies. In particular, due to the limited number of available data, a linear autoregressive structure with exogenous input (ARX) has been considered. To demonstrate their relevance to the existing literature and context, the proposed ARX models have been employed to evaluate the impact of temperature anomalies on rice production in a socially, economically, and climatologically fragile area like Southeast Asia. The results show a significant impact on this region, with estimations strongly in accordance with information presented in the literature from different sources and scientific fields. The work represents a first step towards the development of a fast, data-driven, holistic approach to the climate change impact evaluation problem. The proposed ARX data-driven models reveal a novel and feasible way to downscale global temperature anomalies to regional levels, showing their importance in comprehending global temperature anomalies, emissions, and regional climatic conditions.
{"title":"Data-Driven Models to Forecast the Impact of Temperature Anomalies on Rice Production in Southeast Asia","authors":"Sabrina De Nardi, C. Carnevale, Sara Raccagni, L. Sangiorgi","doi":"10.3390/forecast6010006","DOIUrl":"https://doi.org/10.3390/forecast6010006","url":null,"abstract":"Models are a core element in performing local estimation of the climate change input. In this work, a novel approach to perform a fast downscaling of global temperature anomalies on a regional level is presented. The approach is based on a set of data-driven models linking global temperature anomalies and regional and global emissions to regional temperature anomalies. In particular, due to the limited number of available data, a linear autoregressive structure with exogenous input (ARX) has been considered. To demonstrate their relevance to the existing literature and context, the proposed ARX models have been employed to evaluate the impact of temperature anomalies on rice production in a socially, economically, and climatologically fragile area like Southeast Asia. The results show a significant impact on this region, with estimations strongly in accordance with information presented in the literature from different sources and scientific fields. The work represents a first step towards the development of a fast, data-driven, holistic approach to the climate change impact evaluation problem. The proposed ARX data-driven models reveal a novel and feasible way to downscale global temperature anomalies to regional levels, showing their importance in comprehending global temperature anomalies, emissions, and regional climatic conditions.","PeriodicalId":508737,"journal":{"name":"Forecasting","volume":"497 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140471262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}