Application of imputation methods for missing values of PM10 and O3 data: Interpolation, moving average and K-nearest neighbor methods

IF 1.3 Q4 ENVIRONMENTAL SCIENCES Environmental Health Engineering and Management Journal Pub Date : 2021-09-16 DOI:10.34172/ehem.2021.25
Parisa Saeipourdizaj, P. Sarbakhsh, Akbar Gholampour
{"title":"Application of imputation methods for missing values of PM10 and O3 data: Interpolation, moving average and K-nearest neighbor methods","authors":"Parisa Saeipourdizaj, P. Sarbakhsh, Akbar Gholampour","doi":"10.34172/ehem.2021.25","DOIUrl":null,"url":null,"abstract":"Background: PIn air quality studies, it is very often to have missing data due to reasons such as machine failure or human error. The approach used in dealing with such missing data can affect the results of the analysis. The main aim of this study was to review the types of missing mechanism, imputation methods, application of some of them in imputation of missing of PM10 and O3 in Tabriz, and compare their efficiency. Methods: Methods of mean, EM algorithm, regression, classification and regression tree, predictive mean matching (PMM), interpolation, moving average, and K-nearest neighbor (KNN) were used. PMM was investigated by considering the spatial and temporal dependencies in the model. Missing data were randomly simulated with 10, 20, and 30% missing values. The efficiency of methods was compared using coefficient of determination (R2 ), mean absolute error (MAE) and root mean square error (RMSE). Results: Based on the results for all indicators, interpolation, moving average, and KNN had the best performance, respectively. PMM did not perform well with and without spatio-temporal information. Conclusion: Given that the nature of pollution data always depends on next and previous information, methods that their computational nature is based on before and after information indicated better performance than others, so in the case of pollutant data, it is recommended to use these methods.","PeriodicalId":51877,"journal":{"name":"Environmental Health Engineering and Management Journal","volume":null,"pages":null},"PeriodicalIF":1.3000,"publicationDate":"2021-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Health Engineering and Management Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34172/ehem.2021.25","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 11

Abstract

Background: PIn air quality studies, it is very often to have missing data due to reasons such as machine failure or human error. The approach used in dealing with such missing data can affect the results of the analysis. The main aim of this study was to review the types of missing mechanism, imputation methods, application of some of them in imputation of missing of PM10 and O3 in Tabriz, and compare their efficiency. Methods: Methods of mean, EM algorithm, regression, classification and regression tree, predictive mean matching (PMM), interpolation, moving average, and K-nearest neighbor (KNN) were used. PMM was investigated by considering the spatial and temporal dependencies in the model. Missing data were randomly simulated with 10, 20, and 30% missing values. The efficiency of methods was compared using coefficient of determination (R2 ), mean absolute error (MAE) and root mean square error (RMSE). Results: Based on the results for all indicators, interpolation, moving average, and KNN had the best performance, respectively. PMM did not perform well with and without spatio-temporal information. Conclusion: Given that the nature of pollution data always depends on next and previous information, methods that their computational nature is based on before and after information indicated better performance than others, so in the case of pollutant data, it is recommended to use these methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PM10和O3数据缺失值的插值方法应用:插值法、移动平均法和k近邻法
背景:在空气质量研究中,由于机器故障或人为错误等原因,经常出现数据缺失的情况。处理这种缺失数据的方法会影响分析的结果。本研究的主要目的是综述PM10和O3缺失的类型、缺失的估算方法,以及其中一些方法在大不里士PM10和O3缺失估算中的应用,并比较它们的效率。方法:采用均值、EM算法、回归、分类与回归树、预测均值匹配(PMM)、插值、移动平均、k -最近邻(KNN)等方法。考虑了模型的时空依赖性,对PMM进行了研究。缺失数据随机模拟,缺失值分别为10%、20%和30%。采用决定系数(R2)、平均绝对误差(MAE)和均方根误差(RMSE)比较各方法的有效性。结果:综合各指标结果,插值法、移动平均法、KNN法表现最佳。在有无时空信息的情况下,PMM均表现不佳。结论:由于污染数据的性质总是依赖于下一个信息和前一个信息,因此基于前后信息计算性质的方法比其他方法性能更好,因此在污染物数据的情况下,建议使用这些方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.40
自引率
37.50%
发文量
17
审稿时长
12 weeks
期刊最新文献
Two phases of online food delivery app users’ behavior in Greater Jakarta during the second year of the COVID-19 pandemic: Perceptions of food safety and hygiene Feasibility of natural wastewater treatment systems and life cycle assessment (LCA) for aquatic systems Modeling the concentration of suspended particles by fuzzy inference system (FIS) and adaptive neuro-fuzzy inference system (ANFIS) techniques: A case study in the metro stations Plant inoculation with Piriformospora indica fungus and additive effects of organic and inorganic Zn fertilize on decreasing the Cd concentration of the plants cultivated in the Cd-polluted soil Quantitative assessment of health, safety, and environment (HSE) resilience based on the Delphi method and analytic hierarchy process (AHP) in municipal solid waste management system: A case study in Tehran
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1