Evaluation of machine learning and deep learning models for daily air quality index prediction in Delhi city, India

IF 2.9 4区环境科学与生态学 Q3 ENVIRONMENTAL SCIENCES Environmental Monitoring and Assessment Pub Date : 2024-11-19 DOI:10.1007/s10661-024-13351-1

Chaitanya Baliram Pande, Latha Radhadevi, Murthy Bandaru Satyanarayana

{"title":"Evaluation of machine learning and deep learning models for daily air quality index prediction in Delhi city, India","authors":"Chaitanya Baliram Pande, Latha Radhadevi, Murthy Bandaru Satyanarayana","doi":"10.1007/s10661-024-13351-1","DOIUrl":null,"url":null,"abstract":"<div>The air quality index (AQI), based on criteria for air contaminants, is defined to provide a shared vision of air quality. As air pollution continues to rise in global cities due to urbanization and climate change, air pollution monitoring and forecasting models for effective air quality monitoring that gather and forecast information about air pollution concentration are essential in every city. Air quality predictions have evolved to be more helpful for management. Recently, better performance and ability have developed due to the involvement of machine learning (ML) and artificial intelligence (AI) in forecasting air quality in urban cities in India. This paper focuses on air pollution as a significant ecological problem that directly impacts human health and the distribution of an environmental system in urban areas. Hence, we have developed advanced models for daily AQI forecasting to understand the air effluence level in the upcoming days. In this research, six data-driven models have been developed and implemented for daily AQI forecasting in the study area; it is crucial for understanding the future air pollution levels to plan and control air pollution in the entire city. The developed model is applied to air quality datasets. A comparison of the performance of ML models tested here indicates that the XGBoost algorithm achieves the highest coefficient of determination (R2) and root-mean-square deviation (RMSE) value of 0.99 and lower values value of 4.65 than other models in the testing phase. The results of the artificial neural network (ANN) algorithm are slightly lower than the extreme gradient boosting (XGBoost model); the ANN model results are as R2, mean squared error (MSE), and RMSE values of 0.99, 13.99, and 198.88, respectively. All the models were subjected to a ten-fold cross-validation model. However, the RF cross-validation model outperforms other models; the RF model result shows the R2, RMSE, and MSE values of 0.99, 3.64, and 4.12, respectively. This study also employed two interpretable models, namely feature importance analysis and Shapley additive explanation (SHAP), to evaluate both the global and local methods in a manner that is independent of specific ML models. The feature importance shows that particle matter (PM) 2.5, PM10, carbon monoxide (CO), and nitrogen oxides (NOx) were the most influential variables. The results determined that such novel DL and ML models may improve the accuracy of AQI forecasts and understanding of air pollution, particularly in metropolitan cities.</div>","PeriodicalId":544,"journal":{"name":"Environmental Monitoring and Assessment","volume":"196 12","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Monitoring and Assessment","FirstCategoryId":"93","ListUrlMain":"https://link.springer.com/article/10.1007/s10661-024-13351-1","RegionNum":4,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

The air quality index (AQI), based on criteria for air contaminants, is defined to provide a shared vision of air quality. As air pollution continues to rise in global cities due to urbanization and climate change, air pollution monitoring and forecasting models for effective air quality monitoring that gather and forecast information about air pollution concentration are essential in every city. Air quality predictions have evolved to be more helpful for management. Recently, better performance and ability have developed due to the involvement of machine learning (ML) and artificial intelligence (AI) in forecasting air quality in urban cities in India. This paper focuses on air pollution as a significant ecological problem that directly impacts human health and the distribution of an environmental system in urban areas. Hence, we have developed advanced models for daily AQI forecasting to understand the air effluence level in the upcoming days. In this research, six data-driven models have been developed and implemented for daily AQI forecasting in the study area; it is crucial for understanding the future air pollution levels to plan and control air pollution in the entire city. The developed model is applied to air quality datasets. A comparison of the performance of ML models tested here indicates that the XGBoost algorithm achieves the highest coefficient of determination (R²) and root-mean-square deviation (RMSE) value of 0.99 and lower values value of 4.65 than other models in the testing phase. The results of the artificial neural network (ANN) algorithm are slightly lower than the extreme gradient boosting (XGBoost model); the ANN model results are as R², mean squared error (MSE), and RMSE values of 0.99, 13.99, and 198.88, respectively. All the models were subjected to a ten-fold cross-validation model. However, the RF cross-validation model outperforms other models; the RF model result shows the R², RMSE, and MSE values of 0.99, 3.64, and 4.12, respectively. This study also employed two interpretable models, namely feature importance analysis and Shapley additive explanation (SHAP), to evaluate both the global and local methods in a manner that is independent of specific ML models. The feature importance shows that particle matter (PM) 2.5, PM10, carbon monoxide (CO), and nitrogen oxides (NO_x) were the most influential variables. The results determined that such novel DL and ML models may improve the accuracy of AQI forecasts and understanding of air pollution, particularly in metropolitan cities.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

对印度德里市每日空气质量指数预测的机器学习和深度学习模型进行评估。

空气质量指数（AQI）是根据空气污染物的标准定义的，旨在提供空气质量的共同愿景。由于城市化和气候变化，全球城市的空气污染持续上升，因此，收集和预测空气污染浓度信息的有效空气质量监测和预测模型对每个城市都至关重要。空气质量预测的发展对管理更有帮助。最近，由于机器学习（ML）和人工智能（AI）在印度城市空气质量预测中的应用，其性能和能力得到了提高。本文重点关注空气污染这一直接影响人类健康和城市地区环境系统分布的重大生态问题。因此，我们开发了用于每日空气质量指数预测的先进模型，以了解未来几天的空气污染程度。在这项研究中，我们开发并实施了六个数据驱动模型，用于研究区域的每日空气质量指数预报；这对于了解未来空气污染水平以规划和控制整个城市的空气污染至关重要。所开发的模型适用于空气质量数据集。对所测试的 ML 模型的性能进行比较后发现，在测试阶段，XGBoost 算法的判定系数（R2）和均方根偏差（RMSE）值最高，分别为 0.99 和 4.65，低于其他模型。人工神经网络（ANN）算法的结果略低于极梯度提升（XGBoost 模型）；ANN 模型结果的 R2、均方误差（MSE）和 RMSE 值分别为 0.99、13.99 和 198.88。所有模型都进行了十倍交叉验证。然而，RF 交叉验证模型优于其他模型；RF 模型结果显示 R2、RMSE 和 MSE 值分别为 0.99、3.64 和 4.12。本研究还采用了两个可解释的模型，即特征重要性分析和夏普利加法解释（SHAP），以独立于特定 ML 模型的方式对全局和局部方法进行评估。特征重要性表明，颗粒物（PM）2.5、PM10、一氧化碳（CO）和氮氧化物（NOx）是影响最大的变量。结果表明，这种新颖的 DL 和 ML 模型可以提高空气质量指数预报的准确性和对空气污染的了解，尤其是在大都市。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Environmental Monitoring and Assessment 环境科学-环境科学

CiteScore

4.70

自引率

6.70%

发文量

1000

审稿时长

7.3 months

期刊介绍： Environmental Monitoring and Assessment emphasizes technical developments and data arising from environmental monitoring and assessment, the use of scientific principles in the design of monitoring systems at the local, regional and global scales, and the use of monitoring data in assessing the consequences of natural resource management actions and pollution risks to man and the environment.