Ground visibility prediction using tree-based and random-forest machine learning algorithm: Comparative study based on atmospheric pollution and atmospheric boundary layer data
Fuzeng Wang , Ruolan Liu , Hao Yan , Duanyang Liu , Lin Han , Shujie Yuan
{"title":"Ground visibility prediction using tree-based and random-forest machine learning algorithm: Comparative study based on atmospheric pollution and atmospheric boundary layer data","authors":"Fuzeng Wang , Ruolan Liu , Hao Yan , Duanyang Liu , Lin Han , Shujie Yuan","doi":"10.1016/j.apr.2024.102270","DOIUrl":null,"url":null,"abstract":"<div><p>To mitigate haze impacts, three visibility simulation schemes were designed using decision tree and random forest algorithms, leveraging atmospheric boundary layer meteorological data, pollutant concentrations, and ground observations. The optimal approach was identified to investigate the boundary layer's effect on simulations. The results showed that the simulation effect of the random forest algorithm for two haze processes was better than that of the decision tree algorithm. In the first haze process, the random forest algorithm had a more significant reduction in root mean square error than the decision tree algorithm in the same visibility range (Scheme 3, visibility<200 m, mean absolute error reduced by 5.9%, root mean square error reduced by 19.1%). Simulation models significantly improved the accuracy of the models by adding atmospheric boundary layer observation data to the two fog-hazes process visibility. However, the addition of atmospheric boundary layer meteorological data in the first haze process had a better improvement effect (random forest: visibility<200 m, mean absolute errors of 25.0 (relative error<12.5%) and 25.5 m (relative error<12.8%) in Scheme 2 and 3, respectively). The addition of atmospheric boundary-layer pollutant concentrations data was more effective in the second haze process (random forest: visibility<200 m, scheme 2 and scheme 3 had mean absolute errors of 25.6 (relative error<12.8%) and 11.1 m (relative error<5.6%), respectively). The influence of atmospheric boundary layer meteorological data and pollutant data on the two fog processes is affected by the cause of the fog process.</p></div>","PeriodicalId":8604,"journal":{"name":"Atmospheric Pollution Research","volume":"15 11","pages":"Article 102270"},"PeriodicalIF":3.9000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric Pollution Research","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1309104224002356","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
To mitigate haze impacts, three visibility simulation schemes were designed using decision tree and random forest algorithms, leveraging atmospheric boundary layer meteorological data, pollutant concentrations, and ground observations. The optimal approach was identified to investigate the boundary layer's effect on simulations. The results showed that the simulation effect of the random forest algorithm for two haze processes was better than that of the decision tree algorithm. In the first haze process, the random forest algorithm had a more significant reduction in root mean square error than the decision tree algorithm in the same visibility range (Scheme 3, visibility<200 m, mean absolute error reduced by 5.9%, root mean square error reduced by 19.1%). Simulation models significantly improved the accuracy of the models by adding atmospheric boundary layer observation data to the two fog-hazes process visibility. However, the addition of atmospheric boundary layer meteorological data in the first haze process had a better improvement effect (random forest: visibility<200 m, mean absolute errors of 25.0 (relative error<12.5%) and 25.5 m (relative error<12.8%) in Scheme 2 and 3, respectively). The addition of atmospheric boundary-layer pollutant concentrations data was more effective in the second haze process (random forest: visibility<200 m, scheme 2 and scheme 3 had mean absolute errors of 25.6 (relative error<12.8%) and 11.1 m (relative error<5.6%), respectively). The influence of atmospheric boundary layer meteorological data and pollutant data on the two fog processes is affected by the cause of the fog process.
期刊介绍:
Atmospheric Pollution Research (APR) is an international journal designed for the publication of articles on air pollution. Papers should present novel experimental results, theory and modeling of air pollution on local, regional, or global scales. Areas covered are research on inorganic, organic, and persistent organic air pollutants, air quality monitoring, air quality management, atmospheric dispersion and transport, air-surface (soil, water, and vegetation) exchange of pollutants, dry and wet deposition, indoor air quality, exposure assessment, health effects, satellite measurements, natural emissions, atmospheric chemistry, greenhouse gases, and effects on climate change.