Multistep Forecasting of New COVID-19 Cases Based on LSTMs Using Bayesian Optimization

2021 International Symposium on Electrical, Electronics and Information Engineering Pub Date : 2021-02-19 DOI:10.1145/3459104.3459116

Tianqian Chen, Shuyu Chen, Shan Mei, Shuqi An, Xiaohan Yuan, Yuwen Lu

{"title":"Multistep Forecasting of New COVID-19 Cases Based on LSTMs Using Bayesian Optimization","authors":"Tianqian Chen, Shuyu Chen, Shan Mei, Shuqi An, Xiaohan Yuan, Yuwen Lu","doi":"10.1145/3459104.3459116","DOIUrl":null,"url":null,"abstract":"The multistep prediction of new Corona Virus Disease (COVID-19) cases plays a vital role during the epidemic control period, and the Long Short-Term Memory (LSTM) based time series analysis model is the most frequently used among many prediction methods. But whether it is the cumulative error of the multistep prediction or the instability of the new case data of the COVID-19 make the performance of LSTM in this task not so good. In this paper, we selected three countries with more severe COVID-19 epidemics—India, Russia, and Chile, to predict new cases in the next 15 days with different multistep LSTM network models, and use Bayesian Optimization to explore the optimal hyperparameter space. The results show that: a) the performance of Recursive Prediction LSTM is the best (Mean Absolute Percentage Error, MAPE was reduced to 14.88%, 6.46%, and 16.31% for the three countries respectively), Encoder Decoder LSTM is second (15.52%, 19.61%, 19.87%), and the effect of vector output LSTM is the worst (23.55%, 26.82%, 19.57%); b) there are obvious extremely poor areas in the hyperparameter space, and the Bayesian Optimizer can focus on the good areas to avoid cost of tuning parameters based on bad hyperparameters; c) the data of new cases of COVID-19 in different countries have great differences in the hyperparameter expectations for the model. The bad area of hyperparameters and different expectations are likely to be one of the reasons why the COVID-19 data of different countries is hard to train jointly.","PeriodicalId":142284,"journal":{"name":"2021 International Symposium on Electrical, Electronics and Information Engineering","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Symposium on Electrical, Electronics and Information Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3459104.3459116","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The multistep prediction of new Corona Virus Disease (COVID-19) cases plays a vital role during the epidemic control period, and the Long Short-Term Memory (LSTM) based time series analysis model is the most frequently used among many prediction methods. But whether it is the cumulative error of the multistep prediction or the instability of the new case data of the COVID-19 make the performance of LSTM in this task not so good. In this paper, we selected three countries with more severe COVID-19 epidemics—India, Russia, and Chile, to predict new cases in the next 15 days with different multistep LSTM network models, and use Bayesian Optimization to explore the optimal hyperparameter space. The results show that: a) the performance of Recursive Prediction LSTM is the best (Mean Absolute Percentage Error, MAPE was reduced to 14.88%, 6.46%, and 16.31% for the three countries respectively), Encoder Decoder LSTM is second (15.52%, 19.61%, 19.87%), and the effect of vector output LSTM is the worst (23.55%, 26.82%, 19.57%); b) there are obvious extremely poor areas in the hyperparameter space, and the Bayesian Optimizer can focus on the good areas to avoid cost of tuning parameters based on bad hyperparameters; c) the data of new cases of COVID-19 in different countries have great differences in the hyperparameter expectations for the model. The bad area of hyperparameters and different expectations are likely to be one of the reasons why the COVID-19 data of different countries is hard to train jointly.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于贝叶斯优化lstm的新冠肺炎多步预测

新型冠状病毒病(COVID-19)病例的多步骤预测在疫情控制期间起着至关重要的作用，而基于LSTM的时间序列分析模型是众多预测方法中最常用的一种。但无论是多步预测的累积误差，还是新冠肺炎病例数据的不稳定性，都使得LSTM在这项任务中的表现不尽如人意。本文选取疫情较为严重的三个国家——印度、俄罗斯和智利，采用不同的多步LSTM网络模型预测未来15天的新增病例，并利用贝叶斯优化方法探索最优超参数空间。结果表明:a)递归预测LSTM的性能最好(三个国家的Mean Absolute Percentage Error、MAPE分别降低到14.88%、6.46%和16.31%)，Encoder - Decoder LSTM次之(15.52%、19.61%、19.87%)，vector output LSTM效果最差(23.55%、26.82%、19.57%);b)超参数空间中存在明显的极差区域，贝叶斯优化器可以专注于较好的区域，避免了基于较差超参数调优参数的代价;c)不同国家新发病例数据对模型的超参数期望存在较大差异。超参数的坏区和不同的预期可能是不同国家COVID-19数据难以联合训练的原因之一。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 International Symposium on Electrical, Electronics and Information Engineering

自引率

0.00%

发文量