Claire Heffernan, Roger PenG, Drew R Gentner, Kirsten Koehler, Abhirup Datta
{"title":"A DYNAMIC SPATIAL FILTERING APPROACH TO MITIGATE UNDERESTIMATION BIAS IN FIELD CALIBRATED LOW-COST SENSOR AIR POLLUTION DATA.","authors":"Claire Heffernan, Roger PenG, Drew R Gentner, Kirsten Koehler, Abhirup Datta","doi":"10.1214/23-aoas1751","DOIUrl":null,"url":null,"abstract":"<p><p>Low-cost air pollution sensors, offering hyper-local characterization of pollutant concentrations, are becoming increasingly prevalent in environmental and public health research. However, low-cost air pollution data can be noisy, biased by environmental conditions, and usually need to be field-calibrated by collocating low-cost sensors with reference-grade instruments. We show, theoretically and empirically, that the common procedure of regression-based calibration using collocated data systematically underestimates high air pollution concentrations, which are critical to diagnose from a health perspective. Current calibration practices also often fail to utilize the spatial correlation in pollutant concentrations. We propose a novel spatial filtering approach to collocation-based calibration of low-cost networks that mitigates the underestimation issue by using an inverse regression. The inverse-regression also allows for incorporating spatial correlations by a second-stage model for the true pollutant concentrations using a conditional Gaussian Process. Our approach works with one or more collocated sites in the network and is dynamic, leveraging spatial correlation with the latest available reference data. Through extensive simulations, we demonstrate how the spatial filtering substantially improves estimation of pollutant concentrations, and measures peak concentrations with greater accuracy. We apply the methodology for calibration of a low-cost PM<sub>2.5</sub> network in Baltimore, Maryland, and diagnose air pollution peaks that are missed by the regression-calibration.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 4","pages":"3056-3087"},"PeriodicalIF":1.3000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11031266/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Applied Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/23-aoas1751","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/10/30 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
Low-cost air pollution sensors, offering hyper-local characterization of pollutant concentrations, are becoming increasingly prevalent in environmental and public health research. However, low-cost air pollution data can be noisy, biased by environmental conditions, and usually need to be field-calibrated by collocating low-cost sensors with reference-grade instruments. We show, theoretically and empirically, that the common procedure of regression-based calibration using collocated data systematically underestimates high air pollution concentrations, which are critical to diagnose from a health perspective. Current calibration practices also often fail to utilize the spatial correlation in pollutant concentrations. We propose a novel spatial filtering approach to collocation-based calibration of low-cost networks that mitigates the underestimation issue by using an inverse regression. The inverse-regression also allows for incorporating spatial correlations by a second-stage model for the true pollutant concentrations using a conditional Gaussian Process. Our approach works with one or more collocated sites in the network and is dynamic, leveraging spatial correlation with the latest available reference data. Through extensive simulations, we demonstrate how the spatial filtering substantially improves estimation of pollutant concentrations, and measures peak concentrations with greater accuracy. We apply the methodology for calibration of a low-cost PM2.5 network in Baltimore, Maryland, and diagnose air pollution peaks that are missed by the regression-calibration.
期刊介绍:
Statistical research spans an enormous range from direct subject-matter collaborations to pure mathematical theory. The Annals of Applied Statistics, the newest journal from the IMS, is aimed at papers in the applied half of this range. Published quarterly in both print and electronic form, our goal is to provide a timely and unified forum for all areas of applied statistics.