Edgar Santos-Fernandez, Jay M. Ver Hoef, Erin E. Peterson, James McGree, Cesar A. Villa, Catherine Leigh, Ryan Turner, Cameron Roberts, Kerrie Mengersen
{"title":"Unsupervised anomaly detection in spatio-temporal stream network sensor data","authors":"Edgar Santos-Fernandez, Jay M. Ver Hoef, Erin E. Peterson, James McGree, Cesar A. Villa, Catherine Leigh, Ryan Turner, Cameron Roberts, Kerrie Mengersen","doi":"arxiv-2409.07667","DOIUrl":null,"url":null,"abstract":"The use of in-situ digital sensors for water quality monitoring is becoming\nincreasingly common worldwide. While these sensors provide near real-time data\nfor science, the data are prone to technical anomalies that can undermine the\ntrustworthiness of the data and the accuracy of statistical inferences,\nparticularly in spatial and temporal analyses. Here we propose a framework for\ndetecting anomalies in sensor data recorded in stream networks, which takes\nadvantage of spatial and temporal autocorrelation to improve detection rates.\nThe proposed framework involves the implementation of effective data imputation\nto handle missing data, alignment of time-series to address temporal\ndisparities, and the identification of water quality events. We explore the\neffectiveness of a suite of state-of-the-art statistical methods including\nposterior predictive distributions, finite mixtures, and Hidden Markov Models\n(HMM). We showcase the practical implementation of automated anomaly detection\nin near-real time by employing a Bayesian recursive approach. This\ndemonstration is conducted through a comprehensive simulation study and a\npractical application to a substantive case study situated in the Herbert\nRiver, located in Queensland, Australia, which flows into the Great Barrier\nReef. We found that methods such as posterior predictive distributions and HMM\nproduce the best performance in detecting multiple types of anomalies.\nUtilizing data from multiple sensors deployed relatively near one another\nenhances the ability to distinguish between water quality events and technical\nanomalies, thereby significantly improving the accuracy of anomaly detection.\nThus, uncertainty and biases in water quality reporting, interpretation, and\nmodelling are reduced, and the effectiveness of subsequent management actions\nimproved.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07667","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The use of in-situ digital sensors for water quality monitoring is becoming
increasingly common worldwide. While these sensors provide near real-time data
for science, the data are prone to technical anomalies that can undermine the
trustworthiness of the data and the accuracy of statistical inferences,
particularly in spatial and temporal analyses. Here we propose a framework for
detecting anomalies in sensor data recorded in stream networks, which takes
advantage of spatial and temporal autocorrelation to improve detection rates.
The proposed framework involves the implementation of effective data imputation
to handle missing data, alignment of time-series to address temporal
disparities, and the identification of water quality events. We explore the
effectiveness of a suite of state-of-the-art statistical methods including
posterior predictive distributions, finite mixtures, and Hidden Markov Models
(HMM). We showcase the practical implementation of automated anomaly detection
in near-real time by employing a Bayesian recursive approach. This
demonstration is conducted through a comprehensive simulation study and a
practical application to a substantive case study situated in the Herbert
River, located in Queensland, Australia, which flows into the Great Barrier
Reef. We found that methods such as posterior predictive distributions and HMM
produce the best performance in detecting multiple types of anomalies.
Utilizing data from multiple sensors deployed relatively near one another
enhances the ability to distinguish between water quality events and technical
anomalies, thereby significantly improving the accuracy of anomaly detection.
Thus, uncertainty and biases in water quality reporting, interpretation, and
modelling are reduced, and the effectiveness of subsequent management actions
improved.