基于大量缺失数据的交通流异常检测

Qing He, Charles W. Harrison, Hsin-Hsiung Huang
{"title":"基于大量缺失数据的交通流异常检测","authors":"Qing He, Charles W. Harrison, Hsin-Hsiung Huang","doi":"10.51387/23-nejsds20","DOIUrl":null,"url":null,"abstract":"Anomaly detection plays an important role in traffic operations and control. Missingness in spatial-temporal datasets prohibits anomaly detection algorithms from learning characteristic rules and patterns due to the lack of large amounts of data. This paper proposes an anomaly detection scheme for the 2021 Algorithms for Threat Detection (ATD) challenge based on Gaussian process models that generate features used in a logistic regression model which leads to high prediction accuracy for sparse traffic flow data with a large proportion of missingness. The dataset is provided by the National Science Foundation (NSF) in conjunction with the National Geospatial-Intelligence Agency (NGA), and it consists of thousands of labeled traffic flow records for 400 sensors from 2011 to 2020. Each sensor is purposely downsampled by NSF and NGA in order to simulate missing completely at random, and the missing rates are 99%, 98%, 95%, and 90%. Hence, it is challenging to detect anomalies from the sparse traffic flow data. The proposed scheme makes use of traffic patterns at different times of day and on different days of week to recover the complete data. The proposed anomaly detection scheme is computationally efficient by allowing parallel computation on different sensors. The proposed method is one of the two top performing algorithms in the 2021 ATD challenge.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Detection of Anomalies in Traffic Flows with Large Amounts of Missing Data\",\"authors\":\"Qing He, Charles W. Harrison, Hsin-Hsiung Huang\",\"doi\":\"10.51387/23-nejsds20\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Anomaly detection plays an important role in traffic operations and control. Missingness in spatial-temporal datasets prohibits anomaly detection algorithms from learning characteristic rules and patterns due to the lack of large amounts of data. This paper proposes an anomaly detection scheme for the 2021 Algorithms for Threat Detection (ATD) challenge based on Gaussian process models that generate features used in a logistic regression model which leads to high prediction accuracy for sparse traffic flow data with a large proportion of missingness. The dataset is provided by the National Science Foundation (NSF) in conjunction with the National Geospatial-Intelligence Agency (NGA), and it consists of thousands of labeled traffic flow records for 400 sensors from 2011 to 2020. Each sensor is purposely downsampled by NSF and NGA in order to simulate missing completely at random, and the missing rates are 99%, 98%, 95%, and 90%. Hence, it is challenging to detect anomalies from the sparse traffic flow data. The proposed scheme makes use of traffic patterns at different times of day and on different days of week to recover the complete data. The proposed anomaly detection scheme is computationally efficient by allowing parallel computation on different sensors. The proposed method is one of the two top performing algorithms in the 2021 ATD challenge.\",\"PeriodicalId\":94360,\"journal\":{\"name\":\"The New England Journal of Statistics in Data Science\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The New England Journal of Statistics in Data Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.51387/23-nejsds20\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The New England Journal of Statistics in Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.51387/23-nejsds20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

异常检测在交通运行和控制中起着重要的作用。由于缺乏大量的数据,时空数据集的缺失阻碍了异常检测算法学习特征规则和模式。本文提出了一种基于高斯过程模型的2021威胁检测算法(ATD)挑战的异常检测方案,该方案生成的特征用于逻辑回归模型,从而对具有较大缺失比例的稀疏交通流数据具有较高的预测精度。该数据集由美国国家科学基金会(NSF)与国家地理空间情报局(NGA)联合提供,由400个传感器从2011年到2020年的数千条标记交通流量记录组成。每个传感器通过NSF和NGA故意下采样,以完全随机模拟缺失,缺失率分别为99%,98%,95%和90%。因此,从稀疏的交通流数据中检测异常是一个挑战。该方案利用一天中不同时间和一周中不同日子的交通模式来恢复完整的数据。该异常检测方案允许在不同传感器上进行并行计算,从而提高了计算效率。该方法是2021年ATD挑战赛中表现最好的两种算法之一。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Detection of Anomalies in Traffic Flows with Large Amounts of Missing Data
Anomaly detection plays an important role in traffic operations and control. Missingness in spatial-temporal datasets prohibits anomaly detection algorithms from learning characteristic rules and patterns due to the lack of large amounts of data. This paper proposes an anomaly detection scheme for the 2021 Algorithms for Threat Detection (ATD) challenge based on Gaussian process models that generate features used in a logistic regression model which leads to high prediction accuracy for sparse traffic flow data with a large proportion of missingness. The dataset is provided by the National Science Foundation (NSF) in conjunction with the National Geospatial-Intelligence Agency (NGA), and it consists of thousands of labeled traffic flow records for 400 sensors from 2011 to 2020. Each sensor is purposely downsampled by NSF and NGA in order to simulate missing completely at random, and the missing rates are 99%, 98%, 95%, and 90%. Hence, it is challenging to detect anomalies from the sparse traffic flow data. The proposed scheme makes use of traffic patterns at different times of day and on different days of week to recover the complete data. The proposed anomaly detection scheme is computationally efficient by allowing parallel computation on different sensors. The proposed method is one of the two top performing algorithms in the 2021 ATD challenge.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Modeling Multivariate Spatial Dependencies Using Graphical Models. Effect of model space priors on statistical inference with model uncertainty. Bayesian Variable Selection in Double Generalized Linear Tweedie Spatial Process Models Bayesian D-Optimal Design of Experiments with Quantitative and Qualitative Responses Construction of Supersaturated Designs with Small Coherence for Variable Selection
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1