{"title":"Analysing the Determinants of Surface Solar Radiation with Tree-Based Machine Learning Methods: Case of Istanbul","authors":"Denizhan Guven","doi":"10.1007/s00024-024-03472-6","DOIUrl":null,"url":null,"abstract":"<div><p>This study estimates both hourly and daily Downward Surface Solar Radiation (SSR) in Istanbul while determining the importance of variables on SSR using tree-based machine learning methods, namely Decision Tree (DT), Random Forest (RF), and Gradient Boosted Regression Tree (GBRT). The hourly and daily data of climatic factors for the period between January 2016 and December 2020 are gathered from the European Centre for Medium-Range Weather Forecasts' (ECMWF) ERA5 reanalysis data sets. In addition to the meteorology data, hourly data of selected aerosols are obtained from the Ministry of Environment, Urbanization and Climate Change. Temperature, cloud coverage, ozone level, precipitation, pressure, and two components of wind speeds, PM<sub>10</sub>, PM<sub>2.5</sub>, and SO<sub>2</sub> are utilized to train and test the established models. The model performances are determined with the out-of-bag errors by calculating R-squared, MSE, RMSE, and MBE. The GBRT model is found to be the most accurate model with the lowest error rates. Furthermore, this study provides the variable importance in determining the SSR. Although all models provide different values for the variable importance; temperature, ozone level, cloud coverage, and precipitation are found to be the most important variables in estimating daily SSR. For the hourly estimation, the time of day (hour) becomes the most important factor in addition to temperature, ozone level, and cloud coverage. Finally, this study shows that the tree-based machine learning methods used with these variables to estimate hourly and daily SSR results are very accurate when it is not possible to measure the SSR values directly.</p></div>","PeriodicalId":21078,"journal":{"name":"pure and applied geophysics","volume":null,"pages":null},"PeriodicalIF":1.9000,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"pure and applied geophysics","FirstCategoryId":"89","ListUrlMain":"https://link.springer.com/article/10.1007/s00024-024-03472-6","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOCHEMISTRY & GEOPHYSICS","Score":null,"Total":0}
引用次数: 0
Abstract
This study estimates both hourly and daily Downward Surface Solar Radiation (SSR) in Istanbul while determining the importance of variables on SSR using tree-based machine learning methods, namely Decision Tree (DT), Random Forest (RF), and Gradient Boosted Regression Tree (GBRT). The hourly and daily data of climatic factors for the period between January 2016 and December 2020 are gathered from the European Centre for Medium-Range Weather Forecasts' (ECMWF) ERA5 reanalysis data sets. In addition to the meteorology data, hourly data of selected aerosols are obtained from the Ministry of Environment, Urbanization and Climate Change. Temperature, cloud coverage, ozone level, precipitation, pressure, and two components of wind speeds, PM10, PM2.5, and SO2 are utilized to train and test the established models. The model performances are determined with the out-of-bag errors by calculating R-squared, MSE, RMSE, and MBE. The GBRT model is found to be the most accurate model with the lowest error rates. Furthermore, this study provides the variable importance in determining the SSR. Although all models provide different values for the variable importance; temperature, ozone level, cloud coverage, and precipitation are found to be the most important variables in estimating daily SSR. For the hourly estimation, the time of day (hour) becomes the most important factor in addition to temperature, ozone level, and cloud coverage. Finally, this study shows that the tree-based machine learning methods used with these variables to estimate hourly and daily SSR results are very accurate when it is not possible to measure the SSR values directly.
本研究采用基于树的机器学习方法,即决策树(DT)、随机森林(RF)和梯度提升回归树(GBRT),估算伊斯坦布尔每小时和每天的向下表面太阳辐射(SSR),同时确定变量对 SSR 的重要性。从欧洲中期天气预报中心(ECMWF)ERA5 再分析数据集收集了 2016 年 1 月至 2020 年 12 月期间每小时和每天的气候因子数据。除气象数据外,还从环境、城市化和气候变化部获得了部分气溶胶的每小时数据。温度、云层覆盖率、臭氧水平、降水、气压、风速的两个分量、PM10、PM2.5 和二氧化硫被用来训练和测试已建立的模型。通过计算 R-squared、MSE、RMSE 和 MBE,利用袋外误差确定了模型的性能。结果发现,GBRT 模型是最准确的模型,误差率最低。此外,这项研究还提供了确定 SSR 的变量重要性。尽管所有模型都提供了不同的变量重要性值,但温度、臭氧水平、云层覆盖率和降水量被认为是估算每日 SSR 的最重要变量。在每小时的估算中,除了温度、臭氧水平和云层覆盖之外,一天中的时间(小时)成为最重要的因素。最后,本研究表明,在无法直接测量 SSR 值的情况下,利用这些变量估算每小时和每天 SSR 结果的基于树的机器学习方法非常准确。
期刊介绍:
pure and applied geophysics (pageoph), a continuation of the journal "Geofisica pura e applicata", publishes original scientific contributions in the fields of solid Earth, atmospheric and oceanic sciences. Regular and special issues feature thought-provoking reports on active areas of current research and state-of-the-art surveys.
Long running journal, founded in 1939 as Geofisica pura e applicata
Publishes peer-reviewed original scientific contributions and state-of-the-art surveys in solid earth and atmospheric sciences
Features thought-provoking reports on active areas of current research and is a major source for publications on tsunami research
Coverage extends to research topics in oceanic sciences
See Instructions for Authors on the right hand side.