Analysing the Determinants of Surface Solar Radiation with Tree-Based Machine Learning Methods: Case of Istanbul

IF 1.9 4区 地球科学 Q2 GEOCHEMISTRY & GEOPHYSICS pure and applied geophysics Pub Date : 2024-04-15 DOI:10.1007/s00024-024-03472-6
Denizhan Guven
{"title":"Analysing the Determinants of Surface Solar Radiation with Tree-Based Machine Learning Methods: Case of Istanbul","authors":"Denizhan Guven","doi":"10.1007/s00024-024-03472-6","DOIUrl":null,"url":null,"abstract":"<div><p>This study estimates both hourly and daily Downward Surface Solar Radiation (SSR) in Istanbul while determining the importance of variables on SSR using tree-based machine learning methods, namely Decision Tree (DT), Random Forest (RF), and Gradient Boosted Regression Tree (GBRT). The hourly and daily data of climatic factors for the period between January 2016 and December 2020 are gathered from the European Centre for Medium-Range Weather Forecasts' (ECMWF) ERA5 reanalysis data sets. In addition to the meteorology data, hourly data of selected aerosols are obtained from the Ministry of Environment, Urbanization and Climate Change. Temperature, cloud coverage, ozone level, precipitation, pressure, and two components of wind speeds, PM<sub>10</sub>, PM<sub>2.5</sub>, and SO<sub>2</sub> are utilized to train and test the established models. The model performances are determined with the out-of-bag errors by calculating R-squared, MSE, RMSE, and MBE. The GBRT model is found to be the most accurate model with the lowest error rates. Furthermore, this study provides the variable importance in determining the SSR. Although all models provide different values for the variable importance; temperature, ozone level, cloud coverage, and precipitation are found to be the most important variables in estimating daily SSR. For the hourly estimation, the time of day (hour) becomes the most important factor in addition to temperature, ozone level, and cloud coverage. Finally, this study shows that the tree-based machine learning methods used with these variables to estimate hourly and daily SSR results are very accurate when it is not possible to measure the SSR values directly.</p></div>","PeriodicalId":21078,"journal":{"name":"pure and applied geophysics","volume":null,"pages":null},"PeriodicalIF":1.9000,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"pure and applied geophysics","FirstCategoryId":"89","ListUrlMain":"https://link.springer.com/article/10.1007/s00024-024-03472-6","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOCHEMISTRY & GEOPHYSICS","Score":null,"Total":0}
引用次数: 0

Abstract

This study estimates both hourly and daily Downward Surface Solar Radiation (SSR) in Istanbul while determining the importance of variables on SSR using tree-based machine learning methods, namely Decision Tree (DT), Random Forest (RF), and Gradient Boosted Regression Tree (GBRT). The hourly and daily data of climatic factors for the period between January 2016 and December 2020 are gathered from the European Centre for Medium-Range Weather Forecasts' (ECMWF) ERA5 reanalysis data sets. In addition to the meteorology data, hourly data of selected aerosols are obtained from the Ministry of Environment, Urbanization and Climate Change. Temperature, cloud coverage, ozone level, precipitation, pressure, and two components of wind speeds, PM10, PM2.5, and SO2 are utilized to train and test the established models. The model performances are determined with the out-of-bag errors by calculating R-squared, MSE, RMSE, and MBE. The GBRT model is found to be the most accurate model with the lowest error rates. Furthermore, this study provides the variable importance in determining the SSR. Although all models provide different values for the variable importance; temperature, ozone level, cloud coverage, and precipitation are found to be the most important variables in estimating daily SSR. For the hourly estimation, the time of day (hour) becomes the most important factor in addition to temperature, ozone level, and cloud coverage. Finally, this study shows that the tree-based machine learning methods used with these variables to estimate hourly and daily SSR results are very accurate when it is not possible to measure the SSR values directly.

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用基于树的机器学习方法分析地表太阳辐射的决定因素:伊斯坦布尔案例
本研究采用基于树的机器学习方法,即决策树(DT)、随机森林(RF)和梯度提升回归树(GBRT),估算伊斯坦布尔每小时和每天的向下表面太阳辐射(SSR),同时确定变量对 SSR 的重要性。从欧洲中期天气预报中心(ECMWF)ERA5 再分析数据集收集了 2016 年 1 月至 2020 年 12 月期间每小时和每天的气候因子数据。除气象数据外,还从环境、城市化和气候变化部获得了部分气溶胶的每小时数据。温度、云层覆盖率、臭氧水平、降水、气压、风速的两个分量、PM10、PM2.5 和二氧化硫被用来训练和测试已建立的模型。通过计算 R-squared、MSE、RMSE 和 MBE,利用袋外误差确定了模型的性能。结果发现,GBRT 模型是最准确的模型,误差率最低。此外,这项研究还提供了确定 SSR 的变量重要性。尽管所有模型都提供了不同的变量重要性值,但温度、臭氧水平、云层覆盖率和降水量被认为是估算每日 SSR 的最重要变量。在每小时的估算中,除了温度、臭氧水平和云层覆盖之外,一天中的时间(小时)成为最重要的因素。最后,本研究表明,在无法直接测量 SSR 值的情况下,利用这些变量估算每小时和每天 SSR 结果的基于树的机器学习方法非常准确。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
pure and applied geophysics
pure and applied geophysics 地学-地球化学与地球物理
CiteScore
4.20
自引率
5.00%
发文量
240
审稿时长
9.8 months
期刊介绍: pure and applied geophysics (pageoph), a continuation of the journal "Geofisica pura e applicata", publishes original scientific contributions in the fields of solid Earth, atmospheric and oceanic sciences. Regular and special issues feature thought-provoking reports on active areas of current research and state-of-the-art surveys. Long running journal, founded in 1939 as Geofisica pura e applicata Publishes peer-reviewed original scientific contributions and state-of-the-art surveys in solid earth and atmospheric sciences Features thought-provoking reports on active areas of current research and is a major source for publications on tsunami research Coverage extends to research topics in oceanic sciences See Instructions for Authors on the right hand side.
期刊最新文献
Investigation of Kula Volcanic Field (Türkiye) Through the Inversion of Aeromagnetic Anomalies Using Success-History-Based Adaptive Differential Evolution with Exponential Population Reduction Strategy Reliability of Moment Tensor Inversion for Different Seismic Networks On the Monitoring of Small Islands Belonging to the Aeolian Archipelago by MT-InSAR Data Stochastic Approach to the Evolution of the Global Water Cycle: Results of Historical Experiments on the CMIP-6 Models Basin Style Variation Along a Transform Fault: Southern Colorado River Delta, Baja California, México
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1