Improved machine learning estimation of surface turbulent flux using interpretable model selection and adaptive ensemble algorithms over the Horqin Sandy Land area
{"title":"Improved machine learning estimation of surface turbulent flux using interpretable model selection and adaptive ensemble algorithms over the Horqin Sandy Land area","authors":"Jing Zhao, Yiyi Guo, Hongsheng Zhang, Yihua Lin, Feng Liu, Zhenhai Guo","doi":"10.1016/j.atmosres.2025.107952","DOIUrl":null,"url":null,"abstract":"The turbulent exchanges between the land surface and atmosphere, crucial for global climate change and atmospheric circulation, are typically represented through bulk formulae based on Monin-Obukhov similarity theory (MOST), using simple regression as a function of the non-dimensional stability parameter derived from limited field experiments, which leaves large uncertainties. Recently, machine learning is anticipated as an alternative or complement to bulk algorithms, leveraging its ability to detect nonlinear relationships in large datasets without constraints from the similarity relationships and self-correlations prescribed in MOST. However, there are still unresolved problems and gaps, even though common models like random forest and neural networks can be directly applied. This study proposes a hybrid approach for improved estimation of surface turbulent flux, consisting of meta-learner estimation, interpretable model selection, and adaptive model integration. Motivated by understanding how different machine learning algorithms perform as surface-layer flux estimators and further exploring how to utilize results from multiple meta-learners for better estimations, the method starts with eight different machine learning algorithms. Then, a combination of Elastic Net and Shapley Additive Explanations is developed as an interpretable model selection module, followed by an adaptive model integration using AdaBoost and extreme learning machine. Experiments at the continuous observation station in the Horqin Sandy Land area, Inner Mongolia, China, demonstrate that the proposed system delivers reliable and stable performance, significantly reducing estimation bias of three scaling parameters, with root mean square error reductions of 43.16 %–56.97 % compared to MOST, and outperforming the best single machine learning model with additional error reductions of 4.24 %–7.90 %.","PeriodicalId":8600,"journal":{"name":"Atmospheric Research","volume":"17 1","pages":""},"PeriodicalIF":4.5000,"publicationDate":"2025-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric Research","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1016/j.atmosres.2025.107952","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"METEOROLOGY & ATMOSPHERIC SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
The turbulent exchanges between the land surface and atmosphere, crucial for global climate change and atmospheric circulation, are typically represented through bulk formulae based on Monin-Obukhov similarity theory (MOST), using simple regression as a function of the non-dimensional stability parameter derived from limited field experiments, which leaves large uncertainties. Recently, machine learning is anticipated as an alternative or complement to bulk algorithms, leveraging its ability to detect nonlinear relationships in large datasets without constraints from the similarity relationships and self-correlations prescribed in MOST. However, there are still unresolved problems and gaps, even though common models like random forest and neural networks can be directly applied. This study proposes a hybrid approach for improved estimation of surface turbulent flux, consisting of meta-learner estimation, interpretable model selection, and adaptive model integration. Motivated by understanding how different machine learning algorithms perform as surface-layer flux estimators and further exploring how to utilize results from multiple meta-learners for better estimations, the method starts with eight different machine learning algorithms. Then, a combination of Elastic Net and Shapley Additive Explanations is developed as an interpretable model selection module, followed by an adaptive model integration using AdaBoost and extreme learning machine. Experiments at the continuous observation station in the Horqin Sandy Land area, Inner Mongolia, China, demonstrate that the proposed system delivers reliable and stable performance, significantly reducing estimation bias of three scaling parameters, with root mean square error reductions of 43.16 %–56.97 % compared to MOST, and outperforming the best single machine learning model with additional error reductions of 4.24 %–7.90 %.
期刊介绍:
The journal publishes scientific papers (research papers, review articles, letters and notes) dealing with the part of the atmosphere where meteorological events occur. Attention is given to all processes extending from the earth surface to the tropopause, but special emphasis continues to be devoted to the physics of clouds, mesoscale meteorology and air pollution, i.e. atmospheric aerosols; microphysical processes; cloud dynamics and thermodynamics; numerical simulation, climatology, climate change and weather modification.