A novel ensemble machine learning exposure model system for ground-level ozone at the national scale: A case of mainland China from 2013 to 2020

IF 9.8 1区 社会学 Q1 ENVIRONMENTAL STUDIES Environmental Impact Assessment Review Pub Date : 2024-08-20 DOI:10.1016/j.eiar.2024.107630
Jiawei Wang
{"title":"A novel ensemble machine learning exposure model system for ground-level ozone at the national scale: A case of mainland China from 2013 to 2020","authors":"Jiawei Wang","doi":"10.1016/j.eiar.2024.107630","DOIUrl":null,"url":null,"abstract":"<div><p>In epidemiological research, accurate estimation of historical ground-level ozone (O<sub>3</sub>) concentrations with enhanced spatiotemporal resolution is crucial for effective exposure assessment. The current state-of-the-art for estimating air pollutant concentrations is a two-stage ensemble method that integrates outputs from multiple machine learning algorithms. Despite its effectiveness, opportunities exist to refine this approach for more precise O<sub>3</sub> estimation. In this study, we propose an enhanced ensemble method that incorporates four key strategies. First, we employ high-resolution spatiotemporal predictors derived from prior machine learning studies for refined secondary learning. Second, we use sophisticated algorithms, including categorical gradient boosting, deep neural network, random forest, stochastic variable Gaussian process, transformer, and a combination of convolutional neural network and long short-term memory neural network, as sublearners to enhance learning capabilities. Third, we spatiotemporally split the sample set and then train submodels separately on each subset to eliminate the unobserved spatiotemporal heterogeneity. Finally, we apply a complex machine learning algorithm, rather than the generalized additive model, for integrating sublearner predictions, enabling the capture of intricate nonlinear relationships beyond basic spatiotemporal linear weights. To validate these improvements, we estimated daily maximum 8-h moving average O<sub>3</sub> concentrations ([O<sub>3</sub>]MDA8) across Chinese mainland from 2013 to 2020 at a 1 km spatial resolution. The proposed method demonstrated notable accuracy, achieving an out-of-station determination coefficient (R<sup>2</sup>) of 0.943 and a root-mean-square error (RMSE) of 10.197 μg/m<sup>3</sup>. This performance marks a nearly 15% improvement over the best existing Chinese O<sub>3</sub> exposure model based on a single algorithm and also surpasses previous studies utilizing traditional ensemble methods for other air pollutants. Our enhanced ensemble approach significantly bolsters the reliability and robustness of future environmental epidemiological studies by further mitigating “misclassification” errors.</p></div>","PeriodicalId":309,"journal":{"name":"Environmental Impact Assessment Review","volume":"109 ","pages":"Article 107630"},"PeriodicalIF":9.8000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Impact Assessment Review","FirstCategoryId":"90","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0195925524002178","RegionNum":1,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL STUDIES","Score":null,"Total":0}
引用次数: 0

Abstract

In epidemiological research, accurate estimation of historical ground-level ozone (O3) concentrations with enhanced spatiotemporal resolution is crucial for effective exposure assessment. The current state-of-the-art for estimating air pollutant concentrations is a two-stage ensemble method that integrates outputs from multiple machine learning algorithms. Despite its effectiveness, opportunities exist to refine this approach for more precise O3 estimation. In this study, we propose an enhanced ensemble method that incorporates four key strategies. First, we employ high-resolution spatiotemporal predictors derived from prior machine learning studies for refined secondary learning. Second, we use sophisticated algorithms, including categorical gradient boosting, deep neural network, random forest, stochastic variable Gaussian process, transformer, and a combination of convolutional neural network and long short-term memory neural network, as sublearners to enhance learning capabilities. Third, we spatiotemporally split the sample set and then train submodels separately on each subset to eliminate the unobserved spatiotemporal heterogeneity. Finally, we apply a complex machine learning algorithm, rather than the generalized additive model, for integrating sublearner predictions, enabling the capture of intricate nonlinear relationships beyond basic spatiotemporal linear weights. To validate these improvements, we estimated daily maximum 8-h moving average O3 concentrations ([O3]MDA8) across Chinese mainland from 2013 to 2020 at a 1 km spatial resolution. The proposed method demonstrated notable accuracy, achieving an out-of-station determination coefficient (R2) of 0.943 and a root-mean-square error (RMSE) of 10.197 μg/m3. This performance marks a nearly 15% improvement over the best existing Chinese O3 exposure model based on a single algorithm and also surpasses previous studies utilizing traditional ensemble methods for other air pollutants. Our enhanced ensemble approach significantly bolsters the reliability and robustness of future environmental epidemiological studies by further mitigating “misclassification” errors.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
针对全国范围地面臭氧的新型集合机器学习暴露模型系统:2013-2020年中国大陆案例
在流行病学研究中,以更高的时空分辨率准确估算历史地面臭氧(O3)浓度对于有效的暴露评估至关重要。目前估算空气污染物浓度的最先进方法是一种两阶段集合方法,它整合了多种机器学习算法的输出结果。尽管这种方法非常有效,但仍有机会对其进行改进,以实现更精确的臭氧估算。在本研究中,我们提出了一种包含四种关键策略的增强型集合方法。首先,我们采用从先前机器学习研究中得出的高分辨率时空预测因子,进行精细化二次学习。其次,我们使用复杂的算法,包括分类梯度提升、深度神经网络、随机森林、随机变量高斯过程、变换器以及卷积神经网络和长短期记忆神经网络的组合,作为子学习器来增强学习能力。第三,我们对样本集进行时空分割,然后在每个子集上分别训练子模型,以消除未观测到的时空异质性。最后,我们采用复杂的机器学习算法,而不是广义加法模型,来整合子学习器的预测,从而能够捕捉基本时空线性权重之外的复杂非线性关系。为了验证这些改进,我们以 1 千米的空间分辨率估算了 2013 年至 2020 年中国大陆每天最大 8 小时移动平均臭氧浓度([O3]MDA8)。所提出的方法具有显著的准确性,站外判定系数 (R2) 为 0.943,均方根误差 (RMSE) 为 10.197 μg/m3。与中国现有的基于单一算法的最佳臭氧暴露模型相比,这一性能提高了近 15%,同时也超过了之前利用传统集合方法对其他空气污染物进行的研究。我们的增强型集合方法通过进一步减少 "误分类 "误差,大大提高了未来环境流行病学研究的可靠性和稳健性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
12.60
自引率
10.10%
发文量
200
审稿时长
33 days
期刊介绍: Environmental Impact Assessment Review is an interdisciplinary journal that serves a global audience of practitioners, policymakers, and academics involved in assessing the environmental impact of policies, projects, processes, and products. The journal focuses on innovative theory and practice in environmental impact assessment (EIA). Papers are expected to present innovative ideas, be topical, and coherent. The journal emphasizes concepts, methods, techniques, approaches, and systems related to EIA theory and practice.
期刊最新文献
Estimation of potential topsoil organic carbon loss due to industrial complex development: Implications for topsoil conservation in South Korea The cost of rural environmental degradation in China: An integrated evaluation framework and city-level case study Dynamic lifecycle emissions of electric and hydrogen fuel cell vehicles in a multi-regional perspective Assessing the impact of emission trading scheme and carbon tax in the building sector: An embodied carbon perspective A systematic review of life cycle assessment and environmental footprint for the global coffee value chain
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1