使用不同的决定因素和机器学习算法预测中国两个特大城市的 PM2.5 个人暴露量

IF 4.3 2区环境科学与生态学 Q1 CONSTRUCTION & BUILDING TECHNOLOGY Indoor air Pub Date : 2024-03-08 DOI:10.1155/2024/5589891

Na Li, Yunpu Li, Dongqun Xu, Zhe Liu, Ning Li, Ryan Chartier, Junrui Chang, Qin Wang, Chunyu Xu

{"title":"使用不同的决定因素和机器学习算法预测中国两个特大城市的 PM2.5 个人暴露量","authors":"Na Li, Yunpu Li, Dongqun Xu, Zhe Liu, Ning Li, Ryan Chartier, Junrui Chang, Qin Wang, Chunyu Xu","doi":"10.1155/2024/5589891","DOIUrl":null,"url":null,"abstract":"The primary aim of this study is to explore the utility of machine learning algorithms for predicting personal PM2.5 exposures of elderly participants and to evaluate the effect of individual variables on model performance. Personal PM2.5 was measured on five consecutive days across seasons in 66 retired adults in Beijing (BJ) and Nanjing (NJ), China. The potential predictors were extracted from routine monitoring data (ambient PM2.5 concentrations and meteorological factors), basic questionnaires (personal and household characteristics), and time-activity diary (TAD). Prediction models were developed based on either traditional multiple linear regression (MLR) or five advanced machine learning methods. Our results revealed that personal PM2.5 exposures were well predicted by both MLR and machine learning models with predictors extracted from routine monitoring data, which was indicated by the high nested cross-validation (CV) R2 ranging from 0.76 to 0.88. The addition of predictors from either the questionnaire or TAD did not improve predictive accuracy for all algorithms. The ambient PM2.5 concentrations were the most important predictor. Overall, the random forest, support vector machine, and extreme gradient boosting algorithms outperformed the reference MLR method. Compared with the traditional MLR approach, the CV R2 of the RF model increased up to 7% (from 0.82 ± 0.13 to 0.88 ± 0.10), while the RMSE reduced up to 18% (from 19.8 ± 5.4 to 16.3 ± 4.5) in BJ.","PeriodicalId":13529,"journal":{"name":"Indoor air","volume":"2024 1","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predicting Personal Exposure to PM2.5 Using Different Determinants and Machine Learning Algorithms in Two Megacities, China\",\"authors\":\"Na Li, Yunpu Li, Dongqun Xu, Zhe Liu, Ning Li, Ryan Chartier, Junrui Chang, Qin Wang, Chunyu Xu\",\"doi\":\"10.1155/2024/5589891\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The primary aim of this study is to explore the utility of machine learning algorithms for predicting personal PM2.5 exposures of elderly participants and to evaluate the effect of individual variables on model performance. Personal PM2.5 was measured on five consecutive days across seasons in 66 retired adults in Beijing (BJ) and Nanjing (NJ), China. The potential predictors were extracted from routine monitoring data (ambient PM2.5 concentrations and meteorological factors), basic questionnaires (personal and household characteristics), and time-activity diary (TAD). Prediction models were developed based on either traditional multiple linear regression (MLR) or five advanced machine learning methods. Our results revealed that personal PM2.5 exposures were well predicted by both MLR and machine learning models with predictors extracted from routine monitoring data, which was indicated by the high nested cross-validation (CV) R2 ranging from 0.76 to 0.88. The addition of predictors from either the questionnaire or TAD did not improve predictive accuracy for all algorithms. The ambient PM2.5 concentrations were the most important predictor. Overall, the random forest, support vector machine, and extreme gradient boosting algorithms outperformed the reference MLR method. Compared with the traditional MLR approach, the CV R2 of the RF model increased up to 7% (from 0.82 ± 0.13 to 0.88 ± 0.10), while the RMSE reduced up to 18% (from 19.8 ± 5.4 to 16.3 ± 4.5) in BJ.\",\"PeriodicalId\":13529,\"journal\":{\"name\":\"Indoor air\",\"volume\":\"2024 1\",\"pages\":\"\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-03-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Indoor air\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1155/2024/5589891\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CONSTRUCTION & BUILDING TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Indoor air","FirstCategoryId":"93","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1155/2024/5589891","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CONSTRUCTION & BUILDING TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

本研究的主要目的是探索机器学习算法对预测老年参与者个人 PM2.5 暴露的实用性，并评估个体变量对模型性能的影响。研究人员对中国北京和南京的 66 名退休成年人进行了跨季节、连续五天的个人 PM2.5 测量。从常规监测数据（环境 PM2.5 浓度和气象因素）、基本问卷（个人和家庭特征）和时间活动日记（TAD）中提取了潜在的预测因子。根据传统的多元线性回归（MLR）或五种先进的机器学习方法建立了预测模型。我们的研究结果表明，利用从常规监测数据中提取的预测因子建立的多元线性回归模型和机器学习模型都能很好地预测个人的 PM2.5 暴露，嵌套交叉验证（CV）R2 从 0.76 到 0.88 不等，说明了这一点。在所有算法中，增加来自问卷或 TAD 的预测因子并没有提高预测准确性。环境 PM2.5 浓度是最重要的预测因子。总体而言，随机森林、支持向量机和极端梯度提升算法的表现优于参考的 MLR 方法。与传统的 MLR 方法相比，RF 模型的 CV R2 增加了 7%（从 0.82±0.13 到 0.88±0.10），而 RMSE 在 BJ 中减少了 18%（从 19.8±5.4 到 16.3±4.5）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Predicting Personal Exposure to PM2.5 Using Different Determinants and Machine Learning Algorithms in Two Megacities, China

The primary aim of this study is to explore the utility of machine learning algorithms for predicting personal PM_2.5 exposures of elderly participants and to evaluate the effect of individual variables on model performance. Personal PM_2.5 was measured on five consecutive days across seasons in 66 retired adults in Beijing (BJ) and Nanjing (NJ), China. The potential predictors were extracted from routine monitoring data (ambient PM_2.5 concentrations and meteorological factors), basic questionnaires (personal and household characteristics), and time-activity diary (TAD). Prediction models were developed based on either traditional multiple linear regression (MLR) or five advanced machine learning methods. Our results revealed that personal PM_2.5 exposures were well predicted by both MLR and machine learning models with predictors extracted from routine monitoring data, which was indicated by the high nested cross-validation (CV) R² ranging from 0.76 to 0.88. The addition of predictors from either the questionnaire or TAD did not improve predictive accuracy for all algorithms. The ambient PM_2.5 concentrations were the most important predictor. Overall, the random forest, support vector machine, and extreme gradient boosting algorithms outperformed the reference MLR method. Compared with the traditional MLR approach, the CV R² of the RF model increased up to 7% (from 0.82 ± 0.13 to 0.88 ± 0.10), while the RMSE reduced up to 18% (from 19.8 ± 5.4 to 16.3 ± 4.5) in BJ.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Indoor air 环境科学-工程：环境

CiteScore

10.80

自引率

10.30%

发文量

175

审稿时长

3 months

期刊介绍： The quality of the environment within buildings is a topic of major importance for public health. Indoor Air provides a location for reporting original research results in the broad area defined by the indoor environment of non-industrial buildings. An international journal with multidisciplinary content, Indoor Air publishes papers reflecting the broad categories of interest in this field: health effects; thermal comfort; monitoring and modelling; source characterization; ventilation and other environmental control techniques. The research results present the basic information to allow designers, building owners, and operators to provide a healthy and comfortable environment for building occupants, as well as giving medical practitioners information on how to deal with illnesses related to the indoor environment.