An evolutionary deep learning model based on XGBoost feature selection and Gaussian data augmentation for AQI prediction

IF 6.9 2区 环境科学与生态学 Q1 ENGINEERING, CHEMICAL Process Safety and Environmental Protection Pub Date : 2024-09-03 DOI:10.1016/j.psep.2024.08.119
{"title":"An evolutionary deep learning model based on XGBoost feature selection and Gaussian data augmentation for AQI prediction","authors":"","doi":"10.1016/j.psep.2024.08.119","DOIUrl":null,"url":null,"abstract":"<div><p>Accurate prediction of air quality is crucial for ensuring the scientific validity and effectiveness of air pollution control measures. This study proposes a combined deep learning (DL) model (XGBoost-GDA-TCN-IMRFO-GRU) for predicting hourly air quality index (AQI) data in four cities. The model integrates Extreme gradient boosting (XGBoost) for feature selection, Gaussian data augmentation (GDA), improved manta ray foraging optimization (IMRFO) algorithm, temporal convolutional network (TCN), and gated recurrent unit (GRU). XGBoost calculates the scores of pollutant gases affecting AQI, selecting the top four important pollutants (PM<sub>2.5</sub>, PM<sub>10</sub>, NO<sub>2</sub>, O<sub>3</sub>) based on their importance rankings. GDA enhances the robustness of the DL models and addresses the limitations of insufficient and overfitting training datasets. Additionally, the IMRFO algorithm, with two improved strategies, is applied to enhance the GRU model. TCN extracts spatiotemporal features of AQI, while GRU constructs a temporal model for efficient computations. Compared to eleven benchmark models, the proposed model demonstrates superior performance in terms of MAE, RMSE, MAPE, and NSE, achieving high accuracy and optimal prediction performance. Specifically, the XGBoost-GDA-TCN-IMRFO-GRU model reduces RMSE, MAE, and MAPE by 33–60 %, 39–68 %, and 39–66 %, respectively, compared to the TCN model. Therefore, the XGBoost-GDA-TCN-IMRFO-GRU model can provide reliable early warnings for air quality, which is of great significance for air pollution prevention and the sustainable development of society.</p></div>","PeriodicalId":20743,"journal":{"name":"Process Safety and Environmental Protection","volume":null,"pages":null},"PeriodicalIF":6.9000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Process Safety and Environmental Protection","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957582024010929","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Accurate prediction of air quality is crucial for ensuring the scientific validity and effectiveness of air pollution control measures. This study proposes a combined deep learning (DL) model (XGBoost-GDA-TCN-IMRFO-GRU) for predicting hourly air quality index (AQI) data in four cities. The model integrates Extreme gradient boosting (XGBoost) for feature selection, Gaussian data augmentation (GDA), improved manta ray foraging optimization (IMRFO) algorithm, temporal convolutional network (TCN), and gated recurrent unit (GRU). XGBoost calculates the scores of pollutant gases affecting AQI, selecting the top four important pollutants (PM2.5, PM10, NO2, O3) based on their importance rankings. GDA enhances the robustness of the DL models and addresses the limitations of insufficient and overfitting training datasets. Additionally, the IMRFO algorithm, with two improved strategies, is applied to enhance the GRU model. TCN extracts spatiotemporal features of AQI, while GRU constructs a temporal model for efficient computations. Compared to eleven benchmark models, the proposed model demonstrates superior performance in terms of MAE, RMSE, MAPE, and NSE, achieving high accuracy and optimal prediction performance. Specifically, the XGBoost-GDA-TCN-IMRFO-GRU model reduces RMSE, MAE, and MAPE by 33–60 %, 39–68 %, and 39–66 %, respectively, compared to the TCN model. Therefore, the XGBoost-GDA-TCN-IMRFO-GRU model can provide reliable early warnings for air quality, which is of great significance for air pollution prevention and the sustainable development of society.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于 XGBoost 特征选择和高斯数据增强的进化深度学习模型,用于空气质量指数预测
准确预测空气质量对于确保空气污染控制措施的科学性和有效性至关重要。本研究提出了一种组合式深度学习(DL)模型(XGBoost-GDA-TCN-IMRFO-GRU),用于预测四个城市的每小时空气质量指数(AQI)数据。该模型集成了用于特征选择的极限梯度提升(XGBoost)、高斯数据增强(GDA)、改进的蝠鲼觅食优化(IMRFO)算法、时序卷积网络(TCN)和门控递归单元(GRU)。XGBoost 计算影响空气质量指数的污染气体得分,根据重要性排名选择前四种重要污染物(PM2.5、PM10、NO2、O3)。GDA 增强了 DL 模型的鲁棒性,解决了训练数据集不足和过拟合的局限性。此外,IMRFO 算法还采用了两种改进策略来增强 GRU 模型。TCN 提取了空气质量指数的时空特征,而 GRU 则构建了一个时间模型以提高计算效率。与 11 个基准模型相比,所提出的模型在 MAE、RMSE、MAPE 和 NSE 方面表现出卓越的性能,实现了高准确率和最佳预测性能。具体来说,与 TCN 模型相比,XGBoost-GDA-TCN-IMRFO-GRU 模型的 RMSE、MAE 和 MAPE 分别降低了 33-60%、39-68% 和 39-66%。因此,XGBoost-GDA-TCN-IMRFO-GRU 模型可以提供可靠的空气质量预警,对空气污染防治和社会可持续发展具有重要意义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Process Safety and Environmental Protection
Process Safety and Environmental Protection 环境科学-工程:化工
CiteScore
11.40
自引率
15.40%
发文量
929
审稿时长
8.0 months
期刊介绍: The Process Safety and Environmental Protection (PSEP) journal is a leading international publication that focuses on the publication of high-quality, original research papers in the field of engineering, specifically those related to the safety of industrial processes and environmental protection. The journal encourages submissions that present new developments in safety and environmental aspects, particularly those that show how research findings can be applied in process engineering design and practice. PSEP is particularly interested in research that brings fresh perspectives to established engineering principles, identifies unsolved problems, or suggests directions for future research. The journal also values contributions that push the boundaries of traditional engineering and welcomes multidisciplinary papers. PSEP's articles are abstracted and indexed by a range of databases and services, which helps to ensure that the journal's research is accessible and recognized in the academic and professional communities. These databases include ANTE, Chemical Abstracts, Chemical Hazards in Industry, Current Contents, Elsevier Engineering Information database, Pascal Francis, Web of Science, Scopus, Engineering Information Database EnCompass LIT (Elsevier), and INSPEC. This wide coverage facilitates the dissemination of the journal's content to a global audience interested in process safety and environmental engineering.
期刊最新文献
An avalanche transistor-based Marx circuit pulse generator with sub-nanosecond, high frequency and high-voltage for pathogenic Escherichia coli ablation Fabrication of heterogeneous catalyst for production of biodiesel form municipal sludge Soil utilization analysis of synergistic pyrolysis products of flue gas desulfurization gypsum and biomass Dispersion and explosion characteristics of multi-phase fuel with different charge structure Optimizing multivariate alarm systems: A study on joint false alarm rate, and joint missed alarm rate using linear programming technique
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1