时空流网络传感器数据中的无监督异常检测

Edgar Santos-Fernandez, Jay M. Ver Hoef, Erin E. Peterson, James McGree, Cesar A. Villa, Catherine Leigh, Ryan Turner, Cameron Roberts, Kerrie Mengersen
{"title":"时空流网络传感器数据中的无监督异常检测","authors":"Edgar Santos-Fernandez, Jay M. Ver Hoef, Erin E. Peterson, James McGree, Cesar A. Villa, Catherine Leigh, Ryan Turner, Cameron Roberts, Kerrie Mengersen","doi":"arxiv-2409.07667","DOIUrl":null,"url":null,"abstract":"The use of in-situ digital sensors for water quality monitoring is becoming\nincreasingly common worldwide. While these sensors provide near real-time data\nfor science, the data are prone to technical anomalies that can undermine the\ntrustworthiness of the data and the accuracy of statistical inferences,\nparticularly in spatial and temporal analyses. Here we propose a framework for\ndetecting anomalies in sensor data recorded in stream networks, which takes\nadvantage of spatial and temporal autocorrelation to improve detection rates.\nThe proposed framework involves the implementation of effective data imputation\nto handle missing data, alignment of time-series to address temporal\ndisparities, and the identification of water quality events. We explore the\neffectiveness of a suite of state-of-the-art statistical methods including\nposterior predictive distributions, finite mixtures, and Hidden Markov Models\n(HMM). We showcase the practical implementation of automated anomaly detection\nin near-real time by employing a Bayesian recursive approach. This\ndemonstration is conducted through a comprehensive simulation study and a\npractical application to a substantive case study situated in the Herbert\nRiver, located in Queensland, Australia, which flows into the Great Barrier\nReef. We found that methods such as posterior predictive distributions and HMM\nproduce the best performance in detecting multiple types of anomalies.\nUtilizing data from multiple sensors deployed relatively near one another\nenhances the ability to distinguish between water quality events and technical\nanomalies, thereby significantly improving the accuracy of anomaly detection.\nThus, uncertainty and biases in water quality reporting, interpretation, and\nmodelling are reduced, and the effectiveness of subsequent management actions\nimproved.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unsupervised anomaly detection in spatio-temporal stream network sensor data\",\"authors\":\"Edgar Santos-Fernandez, Jay M. Ver Hoef, Erin E. Peterson, James McGree, Cesar A. Villa, Catherine Leigh, Ryan Turner, Cameron Roberts, Kerrie Mengersen\",\"doi\":\"arxiv-2409.07667\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The use of in-situ digital sensors for water quality monitoring is becoming\\nincreasingly common worldwide. While these sensors provide near real-time data\\nfor science, the data are prone to technical anomalies that can undermine the\\ntrustworthiness of the data and the accuracy of statistical inferences,\\nparticularly in spatial and temporal analyses. Here we propose a framework for\\ndetecting anomalies in sensor data recorded in stream networks, which takes\\nadvantage of spatial and temporal autocorrelation to improve detection rates.\\nThe proposed framework involves the implementation of effective data imputation\\nto handle missing data, alignment of time-series to address temporal\\ndisparities, and the identification of water quality events. We explore the\\neffectiveness of a suite of state-of-the-art statistical methods including\\nposterior predictive distributions, finite mixtures, and Hidden Markov Models\\n(HMM). We showcase the practical implementation of automated anomaly detection\\nin near-real time by employing a Bayesian recursive approach. This\\ndemonstration is conducted through a comprehensive simulation study and a\\npractical application to a substantive case study situated in the Herbert\\nRiver, located in Queensland, Australia, which flows into the Great Barrier\\nReef. We found that methods such as posterior predictive distributions and HMM\\nproduce the best performance in detecting multiple types of anomalies.\\nUtilizing data from multiple sensors deployed relatively near one another\\nenhances the ability to distinguish between water quality events and technical\\nanomalies, thereby significantly improving the accuracy of anomaly detection.\\nThus, uncertainty and biases in water quality reporting, interpretation, and\\nmodelling are reduced, and the effectiveness of subsequent management actions\\nimproved.\",\"PeriodicalId\":501172,\"journal\":{\"name\":\"arXiv - STAT - Applications\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07667\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07667","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在全球范围内,使用原位数字传感器进行水质监测正变得越来越普遍。虽然这些传感器能为科学研究提供近乎实时的数据,但这些数据容易出现技术异常,从而影响数据的可信度和统计推断的准确性,尤其是在空间和时间分析中。在这里,我们提出了一个用于检测溪流网络中记录的传感器数据异常的框架,该框架利用空间和时间自相关性来提高检测率。所提出的框架包括实施有效的数据估算以处理缺失数据、调整时间序列以解决时间差异问题,以及识别水质事件。我们探讨了一系列最新统计方法的有效性,包括后验预测分布、有限混合物和隐马尔可夫模型(HMM)。我们采用贝叶斯递归方法展示了近实时自动异常检测的实际应用。我们通过全面的模拟研究和实际应用,对位于澳大利亚昆士兰州流入大堡礁的赫伯特河进行了案例研究。我们发现,后验预测分布和 HMM 等方法在检测多种类型的异常情况时性能最佳。利用部署在相对较近位置的多个传感器的数据,可以增强区分水质事件和技术异常的能力,从而显著提高异常检测的准确性。因此,可以减少水质报告、解释和建模中的不确定性和偏差,提高后续管理行动的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Unsupervised anomaly detection in spatio-temporal stream network sensor data
The use of in-situ digital sensors for water quality monitoring is becoming increasingly common worldwide. While these sensors provide near real-time data for science, the data are prone to technical anomalies that can undermine the trustworthiness of the data and the accuracy of statistical inferences, particularly in spatial and temporal analyses. Here we propose a framework for detecting anomalies in sensor data recorded in stream networks, which takes advantage of spatial and temporal autocorrelation to improve detection rates. The proposed framework involves the implementation of effective data imputation to handle missing data, alignment of time-series to address temporal disparities, and the identification of water quality events. We explore the effectiveness of a suite of state-of-the-art statistical methods including posterior predictive distributions, finite mixtures, and Hidden Markov Models (HMM). We showcase the practical implementation of automated anomaly detection in near-real time by employing a Bayesian recursive approach. This demonstration is conducted through a comprehensive simulation study and a practical application to a substantive case study situated in the Herbert River, located in Queensland, Australia, which flows into the Great Barrier Reef. We found that methods such as posterior predictive distributions and HMM produce the best performance in detecting multiple types of anomalies. Utilizing data from multiple sensors deployed relatively near one another enhances the ability to distinguish between water quality events and technical anomalies, thereby significantly improving the accuracy of anomaly detection. Thus, uncertainty and biases in water quality reporting, interpretation, and modelling are reduced, and the effectiveness of subsequent management actions improved.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Bayesian framework to evaluate evidence in cases of alleged cheating with secret codes in sports Unsupervised anomaly detection in spatio-temporal stream network sensor data A Cost-Aware Approach to Adversarial Robustness in Neural Networks Teacher-student relationship and teaching styles in primary education. A model of analysis Monitoring road infrastructures from satellite images in Greater Maputo: an object-oriented classification approach
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1