High resolution mapping of nitrogen dioxide and particulate matter in Great Britain (2003–2021) with multi-stage data reconstruction and ensemble machine learning methods
Arturo de la Cruz Libardi , Pierre Masselot , Rochelle Schneider , Emily Nightingale , Ai Milojevic , Jacopo Vanoli , Malcolm N. Mistry , Antonio Gasparrini
{"title":"High resolution mapping of nitrogen dioxide and particulate matter in Great Britain (2003–2021) with multi-stage data reconstruction and ensemble machine learning methods","authors":"Arturo de la Cruz Libardi , Pierre Masselot , Rochelle Schneider , Emily Nightingale , Ai Milojevic , Jacopo Vanoli , Malcolm N. Mistry , Antonio Gasparrini","doi":"10.1016/j.apr.2024.102284","DOIUrl":null,"url":null,"abstract":"<div><p>In this contribution, we applied a multi-stage machine learning (ML) framework to map daily values of nitrogen dioxide (NO<sub>2</sub>) and particulate matter (PM<sub>10</sub> and PM<sub>2.5</sub>) at a 1 km<sup>2</sup> resolution over Great Britain for the period 2003–2021. The process combined ground monitoring observations, satellite-derived products, climate reanalyses and chemical transport model datasets, and traffic and land-use data. Each feature was harmonized to 1 km resolution and extracted at monitoring sites. Models used single and ensemble-based algorithms featuring random forests (RF), extreme gradient boosting (XGB), light gradient boosting machine (LGBM), as well as lasso and ridge regression. The various stages focused on augmenting PM<sub>2.5</sub> using co-occurring PM<sub>10</sub> values, gap-filling aerosol optical depth and columnar NO<sub>2</sub> data obtained from satellite instruments, and finally the training of an ensemble model and the prediction of daily values across the whole geographical domain (2003–2021). Results show a good ensemble model performance, calculated through a ten-fold monitor-based cross-validation procedure, with an average R<sup>2</sup> of 0.690 (range 0.611–0.792) for NO<sub>2</sub>, 0.704 (0.609–0.786) for PM<sub>10</sub>, and 0.802 (0.746–0.888) for PM<sub>2.5</sub>. Reconstructed pollution levels decreased markedly within the study period, with a stronger reduction in the latter eight years. The pollutants exhibited different spatial patterns, while NO<sub>2</sub> rose in close proximity to high-traffic areas, PM demonstrated variation at a larger scale. The resulting 1 km<sup>2</sup> spatially resolved daily datasets allow for linkage with health data across Great Britain over nearly two decades, thus contributing to extensive, extended, and detailed research on the long-and short-term health effects of air pollution.</p></div>","PeriodicalId":8604,"journal":{"name":"Atmospheric Pollution Research","volume":"15 11","pages":"Article 102284"},"PeriodicalIF":3.9000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1309104224002496/pdfft?md5=e119041ff04ee1dee807ada024e25167&pid=1-s2.0-S1309104224002496-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric Pollution Research","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1309104224002496","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
In this contribution, we applied a multi-stage machine learning (ML) framework to map daily values of nitrogen dioxide (NO2) and particulate matter (PM10 and PM2.5) at a 1 km2 resolution over Great Britain for the period 2003–2021. The process combined ground monitoring observations, satellite-derived products, climate reanalyses and chemical transport model datasets, and traffic and land-use data. Each feature was harmonized to 1 km resolution and extracted at monitoring sites. Models used single and ensemble-based algorithms featuring random forests (RF), extreme gradient boosting (XGB), light gradient boosting machine (LGBM), as well as lasso and ridge regression. The various stages focused on augmenting PM2.5 using co-occurring PM10 values, gap-filling aerosol optical depth and columnar NO2 data obtained from satellite instruments, and finally the training of an ensemble model and the prediction of daily values across the whole geographical domain (2003–2021). Results show a good ensemble model performance, calculated through a ten-fold monitor-based cross-validation procedure, with an average R2 of 0.690 (range 0.611–0.792) for NO2, 0.704 (0.609–0.786) for PM10, and 0.802 (0.746–0.888) for PM2.5. Reconstructed pollution levels decreased markedly within the study period, with a stronger reduction in the latter eight years. The pollutants exhibited different spatial patterns, while NO2 rose in close proximity to high-traffic areas, PM demonstrated variation at a larger scale. The resulting 1 km2 spatially resolved daily datasets allow for linkage with health data across Great Britain over nearly two decades, thus contributing to extensive, extended, and detailed research on the long-and short-term health effects of air pollution.
期刊介绍:
Atmospheric Pollution Research (APR) is an international journal designed for the publication of articles on air pollution. Papers should present novel experimental results, theory and modeling of air pollution on local, regional, or global scales. Areas covered are research on inorganic, organic, and persistent organic air pollutants, air quality monitoring, air quality management, atmospheric dispersion and transport, air-surface (soil, water, and vegetation) exchange of pollutants, dry and wet deposition, indoor air quality, exposure assessment, health effects, satellite measurements, natural emissions, atmospheric chemistry, greenhouse gases, and effects on climate change.