{"title":"Calibration of CAMS PM2.5 data over Hungary: A machine learning approach","authors":"Achraf Qor-el-aine, A. Béres, Gábor Géczi","doi":"10.1088/2515-7620/ad6239","DOIUrl":null,"url":null,"abstract":"\n Air pollution is a major environmental problem, and reliable monitoring of particulate matter (PM) concentrations is critical for assessing its impact on human health and the environment. The Copernicus Atmosphere Monitoring Service (CAMS) offers vital data on PM2.5 concentrations by applying a worldwide modelling system. This study compares in-situ PM2.5 measurements and raw CAMS data at 0.1°x 0.1° resolutions for 2019 and 2020 in Hungary. It proposes a calibration method to improve the accuracy of CAMS PM2.5 data at the scale of air monitoring stations. In the study, the accuracy of the raw CAMS PM2.5 data is assessed based on the chosen air quality stations. Then, to improve the precision, we employed machine learning algorithms (LightGBM, Random Forest (RF), and Multiple Linear Regression (MLR)) for calibration. Initial assessment of the raw CAMS PM2.5 data showed positive hourly Spearman correlation coefficient values (SR between 0.64 and 0.87 for the 14 air quality stations used), indicating a positive relationship between the datasets but a systemic underestimation. Our findings highlight LightGBM as the most effective method, consistently demonstrating elevated correlation SR and R² values reaching up to 0.95 and 0.93, respectively, and very good RSR and NSE values (lower than 0.5 and higher than 0.75 for RSR and NSE, respectively). In contrast, RF yields mixed results, and MLR exhibits variable performance. By correcting underestimation and lowering modelling biases, the calibrated PM2.5 data better matches ground-based observations, which can be promising for using the obtained model for accurate predictions at individual air monitoring stations.","PeriodicalId":48496,"journal":{"name":"Environmental Research Communications","volume":null,"pages":null},"PeriodicalIF":2.5000,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Research Communications","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1088/2515-7620/ad6239","RegionNum":4,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Air pollution is a major environmental problem, and reliable monitoring of particulate matter (PM) concentrations is critical for assessing its impact on human health and the environment. The Copernicus Atmosphere Monitoring Service (CAMS) offers vital data on PM2.5 concentrations by applying a worldwide modelling system. This study compares in-situ PM2.5 measurements and raw CAMS data at 0.1°x 0.1° resolutions for 2019 and 2020 in Hungary. It proposes a calibration method to improve the accuracy of CAMS PM2.5 data at the scale of air monitoring stations. In the study, the accuracy of the raw CAMS PM2.5 data is assessed based on the chosen air quality stations. Then, to improve the precision, we employed machine learning algorithms (LightGBM, Random Forest (RF), and Multiple Linear Regression (MLR)) for calibration. Initial assessment of the raw CAMS PM2.5 data showed positive hourly Spearman correlation coefficient values (SR between 0.64 and 0.87 for the 14 air quality stations used), indicating a positive relationship between the datasets but a systemic underestimation. Our findings highlight LightGBM as the most effective method, consistently demonstrating elevated correlation SR and R² values reaching up to 0.95 and 0.93, respectively, and very good RSR and NSE values (lower than 0.5 and higher than 0.75 for RSR and NSE, respectively). In contrast, RF yields mixed results, and MLR exhibits variable performance. By correcting underestimation and lowering modelling biases, the calibrated PM2.5 data better matches ground-based observations, which can be promising for using the obtained model for accurate predictions at individual air monitoring stations.