Kai Ming Ting, Zongyou Liu, Lei Gong, Hang Zhang, Ye Zhu
{"title":"A new distributional treatment for time series anomaly detection","authors":"Kai Ming Ting, Zongyou Liu, Lei Gong, Hang Zhang, Ye Zhu","doi":"10.1007/s00778-023-00832-x","DOIUrl":null,"url":null,"abstract":"<p>Time series is traditionally treated with two main approaches, i.e., the time domain approach and the frequency domain approach. These approaches must rely on a sliding window so that time-shift versions of a sequence can be measured to be similar. Coupled with the use of a root point-to-point measure, existing methods often have quadratic time complexity. We offer the third <span>\\(\\mathbb {R}\\)</span> domain approach. It begins with an <i>insight</i> that sequences in a stationary time series can be treated as sets of independent and identically distributed (iid) points generated from an unknown distribution in <span>\\(\\mathbb {R}\\)</span>. This <span>\\(\\mathbb {R}\\)</span> domain treatment enables two new possibilities: (a) The similarity between two sequences can be computed using a distributional measure such as Wasserstein distance (WD), kernel mean embedding or isolation distributional kernel (<span>\\(\\mathcal {K}_I\\)</span>), and (b) these distributional measures become non-sliding-window-based. Together, they offer an alternative that has more effective similarity measurements and runs significantly faster than the point-to-point and sliding-window-based measures. Our empirical evaluation shows that <span>\\(\\mathcal {K}_I\\)</span> is an effective and efficient distributional measure for time series; and <span>\\(\\mathcal {K}_I\\)</span>-based detectors have better detection accuracy than existing detectors in two tasks: (i) anomalous sequence detection in a stationary time series and (ii) anomalous time series detection in a dataset of non-stationary time series. The <i>insight</i> makes underutilized “old things new again” which gives existing distributional measures and anomaly detectors a new life in time series anomaly detection that would otherwise be impossible.</p>","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"11 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The VLDB Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00778-023-00832-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Time series is traditionally treated with two main approaches, i.e., the time domain approach and the frequency domain approach. These approaches must rely on a sliding window so that time-shift versions of a sequence can be measured to be similar. Coupled with the use of a root point-to-point measure, existing methods often have quadratic time complexity. We offer the third \(\mathbb {R}\) domain approach. It begins with an insight that sequences in a stationary time series can be treated as sets of independent and identically distributed (iid) points generated from an unknown distribution in \(\mathbb {R}\). This \(\mathbb {R}\) domain treatment enables two new possibilities: (a) The similarity between two sequences can be computed using a distributional measure such as Wasserstein distance (WD), kernel mean embedding or isolation distributional kernel (\(\mathcal {K}_I\)), and (b) these distributional measures become non-sliding-window-based. Together, they offer an alternative that has more effective similarity measurements and runs significantly faster than the point-to-point and sliding-window-based measures. Our empirical evaluation shows that \(\mathcal {K}_I\) is an effective and efficient distributional measure for time series; and \(\mathcal {K}_I\)-based detectors have better detection accuracy than existing detectors in two tasks: (i) anomalous sequence detection in a stationary time series and (ii) anomalous time series detection in a dataset of non-stationary time series. The insight makes underutilized “old things new again” which gives existing distributional measures and anomaly detectors a new life in time series anomaly detection that would otherwise be impossible.