Evaluation of machine learning and deep learning models for daily air quality index prediction in Delhi city, India

IF 2.9 4区 环境科学与生态学 Q3 ENVIRONMENTAL SCIENCES Environmental Monitoring and Assessment Pub Date : 2024-11-19 DOI:10.1007/s10661-024-13351-1
Chaitanya Baliram Pande, Latha Radhadevi, Murthy Bandaru Satyanarayana
{"title":"Evaluation of machine learning and deep learning models for daily air quality index prediction in Delhi city, India","authors":"Chaitanya Baliram Pande,&nbsp;Latha Radhadevi,&nbsp;Murthy Bandaru Satyanarayana","doi":"10.1007/s10661-024-13351-1","DOIUrl":null,"url":null,"abstract":"<div><p>The air quality index (AQI), based on criteria for air contaminants, is defined to provide a shared vision of air quality. As air pollution continues to rise in global cities due to urbanization and climate change, air pollution monitoring and forecasting models for effective air quality monitoring that gather and forecast information about air pollution concentration are essential in every city. Air quality predictions have evolved to be more helpful for management. Recently, better performance and ability have developed due to the involvement of machine learning (ML) and artificial intelligence (AI) in forecasting air quality in urban cities in India. This paper focuses on air pollution as a significant ecological problem that directly impacts human health and the distribution of an environmental system in urban areas. Hence, we have developed advanced models for daily AQI forecasting to understand the air effluence level in the upcoming days. In this research, six data-driven models have been developed and implemented for daily AQI forecasting in the study area; it is crucial for understanding the future air pollution levels to plan and control air pollution in the entire city. The developed model is applied to air quality datasets. A comparison of the performance of ML models tested here indicates that the XGBoost algorithm achieves the highest coefficient of determination (<i>R</i><sup>2</sup>) and root-mean-square deviation (RMSE) value of 0.99 and lower values value of 4.65 than other models in the testing phase. The results of the artificial neural network (ANN) algorithm are slightly lower than the extreme gradient boosting (XGBoost model); the ANN model results are as <i>R</i><sup>2</sup>, mean squared error (MSE), and RMSE values of 0.99, 13.99, and 198.88, respectively. All the models were subjected to a ten-fold cross-validation model. However, the RF cross-validation model outperforms other models; the RF model result shows the <i>R</i><sup>2</sup>, RMSE, and MSE values of 0.99, 3.64, and 4.12, respectively. This study also employed two interpretable models, namely feature importance analysis and Shapley additive explanation (SHAP), to evaluate both the global and local methods in a manner that is independent of specific ML models. The feature importance shows that particle matter (PM) 2.5, PM10, carbon monoxide (CO), and nitrogen oxides (NO<sub><i>x</i></sub>) were the most influential variables. The results determined that such novel DL and ML models may improve the accuracy of AQI forecasts and understanding of air pollution, particularly in metropolitan cities.</p></div>","PeriodicalId":544,"journal":{"name":"Environmental Monitoring and Assessment","volume":"196 12","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Monitoring and Assessment","FirstCategoryId":"93","ListUrlMain":"https://link.springer.com/article/10.1007/s10661-024-13351-1","RegionNum":4,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

The air quality index (AQI), based on criteria for air contaminants, is defined to provide a shared vision of air quality. As air pollution continues to rise in global cities due to urbanization and climate change, air pollution monitoring and forecasting models for effective air quality monitoring that gather and forecast information about air pollution concentration are essential in every city. Air quality predictions have evolved to be more helpful for management. Recently, better performance and ability have developed due to the involvement of machine learning (ML) and artificial intelligence (AI) in forecasting air quality in urban cities in India. This paper focuses on air pollution as a significant ecological problem that directly impacts human health and the distribution of an environmental system in urban areas. Hence, we have developed advanced models for daily AQI forecasting to understand the air effluence level in the upcoming days. In this research, six data-driven models have been developed and implemented for daily AQI forecasting in the study area; it is crucial for understanding the future air pollution levels to plan and control air pollution in the entire city. The developed model is applied to air quality datasets. A comparison of the performance of ML models tested here indicates that the XGBoost algorithm achieves the highest coefficient of determination (R2) and root-mean-square deviation (RMSE) value of 0.99 and lower values value of 4.65 than other models in the testing phase. The results of the artificial neural network (ANN) algorithm are slightly lower than the extreme gradient boosting (XGBoost model); the ANN model results are as R2, mean squared error (MSE), and RMSE values of 0.99, 13.99, and 198.88, respectively. All the models were subjected to a ten-fold cross-validation model. However, the RF cross-validation model outperforms other models; the RF model result shows the R2, RMSE, and MSE values of 0.99, 3.64, and 4.12, respectively. This study also employed two interpretable models, namely feature importance analysis and Shapley additive explanation (SHAP), to evaluate both the global and local methods in a manner that is independent of specific ML models. The feature importance shows that particle matter (PM) 2.5, PM10, carbon monoxide (CO), and nitrogen oxides (NOx) were the most influential variables. The results determined that such novel DL and ML models may improve the accuracy of AQI forecasts and understanding of air pollution, particularly in metropolitan cities.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
对印度德里市每日空气质量指数预测的机器学习和深度学习模型进行评估。
空气质量指数(AQI)是根据空气污染物的标准定义的,旨在提供空气质量的共同愿景。由于城市化和气候变化,全球城市的空气污染持续上升,因此,收集和预测空气污染浓度信息的有效空气质量监测和预测模型对每个城市都至关重要。空气质量预测的发展对管理更有帮助。最近,由于机器学习(ML)和人工智能(AI)在印度城市空气质量预测中的应用,其性能和能力得到了提高。本文重点关注空气污染这一直接影响人类健康和城市地区环境系统分布的重大生态问题。因此,我们开发了用于每日空气质量指数预测的先进模型,以了解未来几天的空气污染程度。在这项研究中,我们开发并实施了六个数据驱动模型,用于研究区域的每日空气质量指数预报;这对于了解未来空气污染水平以规划和控制整个城市的空气污染至关重要。所开发的模型适用于空气质量数据集。对所测试的 ML 模型的性能进行比较后发现,在测试阶段,XGBoost 算法的判定系数(R2)和均方根偏差(RMSE)值最高,分别为 0.99 和 4.65,低于其他模型。人工神经网络(ANN)算法的结果略低于极梯度提升(XGBoost 模型);ANN 模型结果的 R2、均方误差(MSE)和 RMSE 值分别为 0.99、13.99 和 198.88。所有模型都进行了十倍交叉验证。然而,RF 交叉验证模型优于其他模型;RF 模型结果显示 R2、RMSE 和 MSE 值分别为 0.99、3.64 和 4.12。本研究还采用了两个可解释的模型,即特征重要性分析和夏普利加法解释(SHAP),以独立于特定 ML 模型的方式对全局和局部方法进行评估。特征重要性表明,颗粒物(PM)2.5、PM10、一氧化碳(CO)和氮氧化物(NOx)是影响最大的变量。结果表明,这种新颖的 DL 和 ML 模型可以提高空气质量指数预报的准确性和对空气污染的了解,尤其是在大都市。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Environmental Monitoring and Assessment
Environmental Monitoring and Assessment 环境科学-环境科学
CiteScore
4.70
自引率
6.70%
发文量
1000
审稿时长
7.3 months
期刊介绍: Environmental Monitoring and Assessment emphasizes technical developments and data arising from environmental monitoring and assessment, the use of scientific principles in the design of monitoring systems at the local, regional and global scales, and the use of monitoring data in assessing the consequences of natural resource management actions and pollution risks to man and the environment.
期刊最新文献
Heavy metal(loid)s pollution in soils of a typical agricultural and rural area: Source apportionment and derived risk quantification Spatial distribution patterns and hotspots of extreme agro-climatic resources in the Horro Guduru Wollega Zone, Northwestern Ethiopia Health risks and pathological effects of heavy metals in Oreochromis mossambicus from Usuma River, Nigeria Comprehensive assessment of fish diversity and water health in river Indus, Khyber Pakhtunkhwa, Pakistan Enhanced removal of methyl orange and malachite green using mesoporous TO@CTAB nanocomposite: Synthesis, characterization, optimization and real wastewater treatment efficiency
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1