Unsupervised anomaly detection in spatio-temporal stream network sensor data

arXiv - STAT - Applications Pub Date : 2024-09-11 DOI:arxiv-2409.07667

Edgar Santos-Fernandez, Jay M. Ver Hoef, Erin E. Peterson, James McGree, Cesar A. Villa, Catherine Leigh, Ryan Turner, Cameron Roberts, Kerrie Mengersen

{"title":"Unsupervised anomaly detection in spatio-temporal stream network sensor data","authors":"Edgar Santos-Fernandez, Jay M. Ver Hoef, Erin E. Peterson, James McGree, Cesar A. Villa, Catherine Leigh, Ryan Turner, Cameron Roberts, Kerrie Mengersen","doi":"arxiv-2409.07667","DOIUrl":null,"url":null,"abstract":"The use of in-situ digital sensors for water quality monitoring is becoming\nincreasingly common worldwide. While these sensors provide near real-time data\nfor science, the data are prone to technical anomalies that can undermine the\ntrustworthiness of the data and the accuracy of statistical inferences,\nparticularly in spatial and temporal analyses. Here we propose a framework for\ndetecting anomalies in sensor data recorded in stream networks, which takes\nadvantage of spatial and temporal autocorrelation to improve detection rates.\nThe proposed framework involves the implementation of effective data imputation\nto handle missing data, alignment of time-series to address temporal\ndisparities, and the identification of water quality events. We explore the\neffectiveness of a suite of state-of-the-art statistical methods including\nposterior predictive distributions, finite mixtures, and Hidden Markov Models\n(HMM). We showcase the practical implementation of automated anomaly detection\nin near-real time by employing a Bayesian recursive approach. This\ndemonstration is conducted through a comprehensive simulation study and a\npractical application to a substantive case study situated in the Herbert\nRiver, located in Queensland, Australia, which flows into the Great Barrier\nReef. We found that methods such as posterior predictive distributions and HMM\nproduce the best performance in detecting multiple types of anomalies.\nUtilizing data from multiple sensors deployed relatively near one another\nenhances the ability to distinguish between water quality events and technical\nanomalies, thereby significantly improving the accuracy of anomaly detection.\nThus, uncertainty and biases in water quality reporting, interpretation, and\nmodelling are reduced, and the effectiveness of subsequent management actions\nimproved.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07667","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The use of in-situ digital sensors for water quality monitoring is becoming increasingly common worldwide. While these sensors provide near real-time data for science, the data are prone to technical anomalies that can undermine the trustworthiness of the data and the accuracy of statistical inferences, particularly in spatial and temporal analyses. Here we propose a framework for detecting anomalies in sensor data recorded in stream networks, which takes advantage of spatial and temporal autocorrelation to improve detection rates. The proposed framework involves the implementation of effective data imputation to handle missing data, alignment of time-series to address temporal disparities, and the identification of water quality events. We explore the effectiveness of a suite of state-of-the-art statistical methods including posterior predictive distributions, finite mixtures, and Hidden Markov Models (HMM). We showcase the practical implementation of automated anomaly detection in near-real time by employing a Bayesian recursive approach. This demonstration is conducted through a comprehensive simulation study and a practical application to a substantive case study situated in the Herbert River, located in Queensland, Australia, which flows into the Great Barrier Reef. We found that methods such as posterior predictive distributions and HMM produce the best performance in detecting multiple types of anomalies. Utilizing data from multiple sensors deployed relatively near one another enhances the ability to distinguish between water quality events and technical anomalies, thereby significantly improving the accuracy of anomaly detection. Thus, uncertainty and biases in water quality reporting, interpretation, and modelling are reduced, and the effectiveness of subsequent management actions improved.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

时空流网络传感器数据中的无监督异常检测

在全球范围内，使用原位数字传感器进行水质监测正变得越来越普遍。虽然这些传感器能为科学研究提供近乎实时的数据，但这些数据容易出现技术异常，从而影响数据的可信度和统计推断的准确性，尤其是在空间和时间分析中。在这里，我们提出了一个用于检测溪流网络中记录的传感器数据异常的框架，该框架利用空间和时间自相关性来提高检测率。所提出的框架包括实施有效的数据估算以处理缺失数据、调整时间序列以解决时间差异问题，以及识别水质事件。我们探讨了一系列最新统计方法的有效性，包括后验预测分布、有限混合物和隐马尔可夫模型（HMM）。我们采用贝叶斯递归方法展示了近实时自动异常检测的实际应用。我们通过全面的模拟研究和实际应用，对位于澳大利亚昆士兰州流入大堡礁的赫伯特河进行了案例研究。我们发现，后验预测分布和 HMM 等方法在检测多种类型的异常情况时性能最佳。利用部署在相对较近位置的多个传感器的数据，可以增强区分水质事件和技术异常的能力，从而显著提高异常检测的准确性。因此，可以减少水质报告、解释和建模中的不确定性和偏差，提高后续管理行动的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - STAT - Applications

自引率

0.00%

发文量