A novel ensemble machine learning exposure model system for ground-level ozone at the national scale: A case of mainland China from 2013 to 2020

IF 11.2 1区社会学 Q1 ENVIRONMENTAL STUDIES Environmental Impact Assessment Review Pub Date : 2024-08-20 DOI:10.1016/j.eiar.2024.107630

Jiawei Wang

{"title":"A novel ensemble machine learning exposure model system for ground-level ozone at the national scale: A case of mainland China from 2013 to 2020","authors":"Jiawei Wang","doi":"10.1016/j.eiar.2024.107630","DOIUrl":null,"url":null,"abstract":"<div>In epidemiological research, accurate estimation of historical ground-level ozone (O3) concentrations with enhanced spatiotemporal resolution is crucial for effective exposure assessment. The current state-of-the-art for estimating air pollutant concentrations is a two-stage ensemble method that integrates outputs from multiple machine learning algorithms. Despite its effectiveness, opportunities exist to refine this approach for more precise O3 estimation. In this study, we propose an enhanced ensemble method that incorporates four key strategies. First, we employ high-resolution spatiotemporal predictors derived from prior machine learning studies for refined secondary learning. Second, we use sophisticated algorithms, including categorical gradient boosting, deep neural network, random forest, stochastic variable Gaussian process, transformer, and a combination of convolutional neural network and long short-term memory neural network, as sublearners to enhance learning capabilities. Third, we spatiotemporally split the sample set and then train submodels separately on each subset to eliminate the unobserved spatiotemporal heterogeneity. Finally, we apply a complex machine learning algorithm, rather than the generalized additive model, for integrating sublearner predictions, enabling the capture of intricate nonlinear relationships beyond basic spatiotemporal linear weights. To validate these improvements, we estimated daily maximum 8-h moving average O3 concentrations ([O3]MDA8) across Chinese mainland from 2013 to 2020 at a 1 km spatial resolution. The proposed method demonstrated notable accuracy, achieving an out-of-station determination coefficient (R2) of 0.943 and a root-mean-square error (RMSE) of 10.197 μg/m3. This performance marks a nearly 15% improvement over the best existing Chinese O3 exposure model based on a single algorithm and also surpasses previous studies utilizing traditional ensemble methods for other air pollutants. Our enhanced ensemble approach significantly bolsters the reliability and robustness of future environmental epidemiological studies by further mitigating “misclassification” errors.</div>","PeriodicalId":309,"journal":{"name":"Environmental Impact Assessment Review","volume":"109 ","pages":"Article 107630"},"PeriodicalIF":11.2000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Impact Assessment Review","FirstCategoryId":"90","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0195925524002178","RegionNum":1,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL STUDIES","Score":null,"Total":0}

引用次数: 0

Abstract

In epidemiological research, accurate estimation of historical ground-level ozone (O₃) concentrations with enhanced spatiotemporal resolution is crucial for effective exposure assessment. The current state-of-the-art for estimating air pollutant concentrations is a two-stage ensemble method that integrates outputs from multiple machine learning algorithms. Despite its effectiveness, opportunities exist to refine this approach for more precise O₃ estimation. In this study, we propose an enhanced ensemble method that incorporates four key strategies. First, we employ high-resolution spatiotemporal predictors derived from prior machine learning studies for refined secondary learning. Second, we use sophisticated algorithms, including categorical gradient boosting, deep neural network, random forest, stochastic variable Gaussian process, transformer, and a combination of convolutional neural network and long short-term memory neural network, as sublearners to enhance learning capabilities. Third, we spatiotemporally split the sample set and then train submodels separately on each subset to eliminate the unobserved spatiotemporal heterogeneity. Finally, we apply a complex machine learning algorithm, rather than the generalized additive model, for integrating sublearner predictions, enabling the capture of intricate nonlinear relationships beyond basic spatiotemporal linear weights. To validate these improvements, we estimated daily maximum 8-h moving average O₃ concentrations ([O₃]MDA8) across Chinese mainland from 2013 to 2020 at a 1 km spatial resolution. The proposed method demonstrated notable accuracy, achieving an out-of-station determination coefficient (R²) of 0.943 and a root-mean-square error (RMSE) of 10.197 μg/m³. This performance marks a nearly 15% improvement over the best existing Chinese O₃ exposure model based on a single algorithm and also surpasses previous studies utilizing traditional ensemble methods for other air pollutants. Our enhanced ensemble approach significantly bolsters the reliability and robustness of future environmental epidemiological studies by further mitigating “misclassification” errors.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

针对全国范围地面臭氧的新型集合机器学习暴露模型系统：2013-2020年中国大陆案例

在流行病学研究中，以更高的时空分辨率准确估算历史地面臭氧（O3）浓度对于有效的暴露评估至关重要。目前估算空气污染物浓度的最先进方法是一种两阶段集合方法，它整合了多种机器学习算法的输出结果。尽管这种方法非常有效，但仍有机会对其进行改进，以实现更精确的臭氧估算。在本研究中，我们提出了一种包含四种关键策略的增强型集合方法。首先，我们采用从先前机器学习研究中得出的高分辨率时空预测因子，进行精细化二次学习。其次，我们使用复杂的算法，包括分类梯度提升、深度神经网络、随机森林、随机变量高斯过程、变换器以及卷积神经网络和长短期记忆神经网络的组合，作为子学习器来增强学习能力。第三，我们对样本集进行时空分割，然后在每个子集上分别训练子模型，以消除未观测到的时空异质性。最后，我们采用复杂的机器学习算法，而不是广义加法模型，来整合子学习器的预测，从而能够捕捉基本时空线性权重之外的复杂非线性关系。为了验证这些改进，我们以 1 千米的空间分辨率估算了 2013 年至 2020 年中国大陆每天最大 8 小时移动平均臭氧浓度（[O3]MDA8）。所提出的方法具有显著的准确性，站外判定系数 (R2) 为 0.943，均方根误差 (RMSE) 为 10.197 μg/m3。与中国现有的基于单一算法的最佳臭氧暴露模型相比，这一性能提高了近 15%，同时也超过了之前利用传统集合方法对其他空气污染物进行的研究。我们的增强型集合方法通过进一步减少 "误分类 "误差，大大提高了未来环境流行病学研究的可靠性和稳健性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Environmental Impact Assessment Review ENVIRONMENTAL STUDIES-

CiteScore

12.60

自引率

10.10%

发文量

200

审稿时长

33 days

期刊介绍： Environmental Impact Assessment Review is an interdisciplinary journal that serves a global audience of practitioners, policymakers, and academics involved in assessing the environmental impact of policies, projects, processes, and products. The journal focuses on innovative theory and practice in environmental impact assessment (EIA). Papers are expected to present innovative ideas, be topical, and coherent. The journal emphasizes concepts, methods, techniques, approaches, and systems related to EIA theory and practice.