Quoc-Thang Phan, Yuan-Kang Wu, Q. Phan, Hsin-Yen Lo
{"title":"A Study on Missing Data Imputation Methods for Improving Hourly Solar Dataset","authors":"Quoc-Thang Phan, Yuan-Kang Wu, Q. Phan, Hsin-Yen Lo","doi":"10.1109/ICASI55125.2022.9774453","DOIUrl":null,"url":null,"abstract":"In the era of big data, large period of missing data is a common problem which affect the data quality and final forecasting results if not handled properly. Therefore, filling missing data in datasets is importance since the most of real-time datasets have a huge number of missing values. This paper first gives a comprehensive overview of various imputation methods for filling missing data. Then proposes a technique based on a popular Multivariate Imputation by Chained Equation (MICE) to fill numeric data in PV dataset. Finally analyses the impact of this technique and compares the performance with other imputation algorithms. For practice, this study uses historical measurement PV generation from the North PV site of Taiwan, and Numerical Weather Prediction (NWP) data consists of solar irradiance, temperature, sea level pressure, humidity, rainfall, wind speed. The NWP dataset is provided by Taiwan Central Weather Bureau (CWB) which is called Deterministic Weather Research and Forecasting (WRFD). Experimental results showed that the proposed imputation algorithm can improve short-term PV generation forecasting accuracy based on RMSE.","PeriodicalId":190229,"journal":{"name":"2022 8th International Conference on Applied System Innovation (ICASI)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 8th International Conference on Applied System Innovation (ICASI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASI55125.2022.9774453","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
In the era of big data, large period of missing data is a common problem which affect the data quality and final forecasting results if not handled properly. Therefore, filling missing data in datasets is importance since the most of real-time datasets have a huge number of missing values. This paper first gives a comprehensive overview of various imputation methods for filling missing data. Then proposes a technique based on a popular Multivariate Imputation by Chained Equation (MICE) to fill numeric data in PV dataset. Finally analyses the impact of this technique and compares the performance with other imputation algorithms. For practice, this study uses historical measurement PV generation from the North PV site of Taiwan, and Numerical Weather Prediction (NWP) data consists of solar irradiance, temperature, sea level pressure, humidity, rainfall, wind speed. The NWP dataset is provided by Taiwan Central Weather Bureau (CWB) which is called Deterministic Weather Research and Forecasting (WRFD). Experimental results showed that the proposed imputation algorithm can improve short-term PV generation forecasting accuracy based on RMSE.