Srishti Singh, Pratyush Agrawal, P. Kulkarni, H. Gautam, Meenakshi Kushwaha, V. Sreekanth
{"title":"Multiple PM Low-Cost Sensors, Multiple Seasons’ Data, and Multiple Calibration Models","authors":"Srishti Singh, Pratyush Agrawal, P. Kulkarni, H. Gautam, Meenakshi Kushwaha, V. Sreekanth","doi":"10.4209/aaqr.220428","DOIUrl":null,"url":null,"abstract":"In this study, we combined state-of-the-art data modelling techniques (machine learning [ML] methods) and data from state-of-the-art low-cost particulate matter (PM) sensors (LCSs) to improve the accuracy of LCS-measured PM 2.5 (PM with aerodynamic diameter less than 2.5 microns) mass concentrations. We collocated nine LCSs and a reference PM 2.5 instrument for 9 months, covering all local seasons, in Bengaluru, India. Using the collocation data, we evaluated the performance of the LCSs and trained around 170 ML models to reduce the observed bias in the LCS-measured PM 2.5 . The ML models included (i) Decision Tree, (ii) Random Forest (RF), (iii) eXtreme Gradient Boosting, and (iv) Support Vector Regression (SVR). A hold-out validation was performed to assess the model performance. Model performance metrics included (i) coefficient of determination (R 2 ), (ii) root mean square error (RMSE), (iii) normalised RMSE, and (iv) mean absolute error. We found that the bias in the LCS PM 2.5 measurements varied across different LCS types (RMSE = 8– 29 µ g m –3 ) and that SVR models performed best in correcting the LCS PM 2.5 measurements. Hyperparameter tuning improved the performance of the ML models (except for RF). The performance of ML models trained with significant predictors (fewer in number than the number of all predictors, chosen based on recursive feature elimination algorithm) was comparable to that of the ‘all predictors’ trained models (except for RF). The performance of most ML models was better than that of the linear models. Finally, as a research objective, we introduced the collocated black carbon mass concentration measurements into the ML models but found no significant improvement in the model performance.","PeriodicalId":7402,"journal":{"name":"Aerosol and Air Quality Research","volume":"1 1","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Aerosol and Air Quality Research","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.4209/aaqr.220428","RegionNum":4,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
In this study, we combined state-of-the-art data modelling techniques (machine learning [ML] methods) and data from state-of-the-art low-cost particulate matter (PM) sensors (LCSs) to improve the accuracy of LCS-measured PM 2.5 (PM with aerodynamic diameter less than 2.5 microns) mass concentrations. We collocated nine LCSs and a reference PM 2.5 instrument for 9 months, covering all local seasons, in Bengaluru, India. Using the collocation data, we evaluated the performance of the LCSs and trained around 170 ML models to reduce the observed bias in the LCS-measured PM 2.5 . The ML models included (i) Decision Tree, (ii) Random Forest (RF), (iii) eXtreme Gradient Boosting, and (iv) Support Vector Regression (SVR). A hold-out validation was performed to assess the model performance. Model performance metrics included (i) coefficient of determination (R 2 ), (ii) root mean square error (RMSE), (iii) normalised RMSE, and (iv) mean absolute error. We found that the bias in the LCS PM 2.5 measurements varied across different LCS types (RMSE = 8– 29 µ g m –3 ) and that SVR models performed best in correcting the LCS PM 2.5 measurements. Hyperparameter tuning improved the performance of the ML models (except for RF). The performance of ML models trained with significant predictors (fewer in number than the number of all predictors, chosen based on recursive feature elimination algorithm) was comparable to that of the ‘all predictors’ trained models (except for RF). The performance of most ML models was better than that of the linear models. Finally, as a research objective, we introduced the collocated black carbon mass concentration measurements into the ML models but found no significant improvement in the model performance.
期刊介绍:
The international journal of Aerosol and Air Quality Research (AAQR) covers all aspects of aerosol science and technology, atmospheric science and air quality related issues. It encompasses a multi-disciplinary field, including:
- Aerosol, air quality, atmospheric chemistry and global change;
- Air toxics (hazardous air pollutants (HAPs), persistent organic pollutants (POPs)) - Sources, control, transport and fate, human exposure;
- Nanoparticle and nanotechnology;
- Sources, combustion, thermal decomposition, emission, properties, behavior, formation, transport, deposition, measurement and analysis;
- Effects on the environments;
- Air quality and human health;
- Bioaerosols;
- Indoor air quality;
- Energy and air pollution;
- Pollution control technologies;
- Invention and improvement of sampling instruments and technologies;
- Optical/radiative properties and remote sensing;
- Carbon dioxide emission, capture, storage and utilization; novel methods for the reduction of carbon dioxide emission;
- Other topics related to aerosol and air quality.