S. Akinola, Qing-Guo Wang, Peter O. Olukanmi, T. Marwala
{"title":"Early Prediction of Monkeypox Virus Outbreak Using Machine Learning","authors":"S. Akinola, Qing-Guo Wang, Peter O. Olukanmi, T. Marwala","doi":"10.3991/itdaf.v1i2.40175","DOIUrl":null,"url":null,"abstract":"At the onset of an infectious disease, such as the monkeypox virus (MPXV), surveillance data is crucial in keeping track of the outbreak’s progression. The surveillance data for MPXV received considerable attention after multiple European countries recorded cases. Historical data obtained from May 9, 2022, to August 10, 2022, were used to model the cumulative case trajectories of MPXV in five countries. Our study employed autoregressive integrated moving averages (ARIMA), neural network autoregression (NNETAR), exponential smoothing (ETS), and seasonal naïve regression (SNAÏVE) for training and evaluation. The paper makes the following contributions: (1) enhanced model stability with the Box-Cox transformation as a preprocessing step, (2) experimentation with both linear and non-linear models, and (3) simulation of the top five countries during the impulsive rise in cases of MPXV. The results were evaluated using three metrics: root mean square error (RMSE), mean square error (MAE), and mean absolute percentage error (MAPE). The ARIMA (0,1,3) (1,0,0)[7] model yielded the lowest percentage error of 5.16 in the holdout set for MAPE in France observations. The ETS (A, A, A) model, the lowest percentage error in the holdout set for MAE was 7.35 in Germany. Regarding the NNETAR (1,1,2) [7] model, the lowest percentage error in the holdout observations for RMSE was 8.33 in Spain, 2.75 in the United Kingdom (UK), and 8.05 in the United States of America (USA) in that order. Based on these findings, we can conclude that while the transformation proved crucial for model performance, it was not necessary for all experiments, as ARIMA remained dominant in France and the ETS model in Germany. At the same time, NNETAR model outperformed in cumulative case counts in Spain, the UK, and the USA. Our experimentation allows for early identification and contributes to a better understanding of forecasting MPXV cases using combinations of both linear and nonlinear models.","PeriodicalId":222021,"journal":{"name":"IETI Transactions on Data Analysis and Forecasting (iTDAF)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IETI Transactions on Data Analysis and Forecasting (iTDAF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3991/itdaf.v1i2.40175","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
At the onset of an infectious disease, such as the monkeypox virus (MPXV), surveillance data is crucial in keeping track of the outbreak’s progression. The surveillance data for MPXV received considerable attention after multiple European countries recorded cases. Historical data obtained from May 9, 2022, to August 10, 2022, were used to model the cumulative case trajectories of MPXV in five countries. Our study employed autoregressive integrated moving averages (ARIMA), neural network autoregression (NNETAR), exponential smoothing (ETS), and seasonal naïve regression (SNAÏVE) for training and evaluation. The paper makes the following contributions: (1) enhanced model stability with the Box-Cox transformation as a preprocessing step, (2) experimentation with both linear and non-linear models, and (3) simulation of the top five countries during the impulsive rise in cases of MPXV. The results were evaluated using three metrics: root mean square error (RMSE), mean square error (MAE), and mean absolute percentage error (MAPE). The ARIMA (0,1,3) (1,0,0)[7] model yielded the lowest percentage error of 5.16 in the holdout set for MAPE in France observations. The ETS (A, A, A) model, the lowest percentage error in the holdout set for MAE was 7.35 in Germany. Regarding the NNETAR (1,1,2) [7] model, the lowest percentage error in the holdout observations for RMSE was 8.33 in Spain, 2.75 in the United Kingdom (UK), and 8.05 in the United States of America (USA) in that order. Based on these findings, we can conclude that while the transformation proved crucial for model performance, it was not necessary for all experiments, as ARIMA remained dominant in France and the ETS model in Germany. At the same time, NNETAR model outperformed in cumulative case counts in Spain, the UK, and the USA. Our experimentation allows for early identification and contributes to a better understanding of forecasting MPXV cases using combinations of both linear and nonlinear models.
在诸如猴痘病毒(MPXV)等传染病发病时,监测数据对于跟踪疫情的进展至关重要。在多个欧洲国家记录病例后,MPXV的监测数据受到了相当大的关注。利用2022年5月9日至2022年8月10日期间获得的历史数据,对五个国家的MPXV累积病例轨迹进行了建模。我们的研究采用自回归综合移动平均线(ARIMA)、神经网络自回归(NNETAR)、指数平滑(ETS)和季节性naïve回归(SNAÏVE)进行训练和评估。本文的贡献如下:(1)利用Box-Cox变换作为预处理步骤增强了模型的稳定性;(2)对线性和非线性模型进行了实验;(3)对MPXV情况下脉冲上升期间前5个国家进行了模拟。使用三个指标评估结果:均方根误差(RMSE)、均方误差(MAE)和平均绝对百分比误差(MAPE)。ARIMA(0,1,3)(1,0,0)[7]模型在法国MAPE观测的保留集中产生了5.16的最低百分比误差。在ETS (A, A, A)模型中,德国MAE的最低百分比错误率为7.35。在NNETAR(1,1,2)[7]模型中,滞留观测中RMSE的最低百分比误差依次为:西班牙8.33,英国2.75,美国8.05。基于这些发现,我们可以得出结论,虽然转换证明对模型性能至关重要,但并非所有实验都需要转换,因为ARIMA在法国和ETS模型在德国仍然占主导地位。同时,NNETAR模型在西班牙、英国和美国的累积病例数方面表现优于其他国家。我们的实验允许早期识别,并有助于更好地理解使用线性和非线性模型组合预测MPXV病例。