Kai Ming Ting, Zongyou Liu, Lei Gong, Hang Zhang, Ye Zhu
{"title":"时间序列异常检测的新分布处理方法","authors":"Kai Ming Ting, Zongyou Liu, Lei Gong, Hang Zhang, Ye Zhu","doi":"10.1007/s00778-023-00832-x","DOIUrl":null,"url":null,"abstract":"<p>Time series is traditionally treated with two main approaches, i.e., the time domain approach and the frequency domain approach. These approaches must rely on a sliding window so that time-shift versions of a sequence can be measured to be similar. Coupled with the use of a root point-to-point measure, existing methods often have quadratic time complexity. We offer the third <span>\\(\\mathbb {R}\\)</span> domain approach. It begins with an <i>insight</i> that sequences in a stationary time series can be treated as sets of independent and identically distributed (iid) points generated from an unknown distribution in <span>\\(\\mathbb {R}\\)</span>. This <span>\\(\\mathbb {R}\\)</span> domain treatment enables two new possibilities: (a) The similarity between two sequences can be computed using a distributional measure such as Wasserstein distance (WD), kernel mean embedding or isolation distributional kernel (<span>\\(\\mathcal {K}_I\\)</span>), and (b) these distributional measures become non-sliding-window-based. Together, they offer an alternative that has more effective similarity measurements and runs significantly faster than the point-to-point and sliding-window-based measures. Our empirical evaluation shows that <span>\\(\\mathcal {K}_I\\)</span> is an effective and efficient distributional measure for time series; and <span>\\(\\mathcal {K}_I\\)</span>-based detectors have better detection accuracy than existing detectors in two tasks: (i) anomalous sequence detection in a stationary time series and (ii) anomalous time series detection in a dataset of non-stationary time series. The <i>insight</i> makes underutilized “old things new again” which gives existing distributional measures and anomaly detectors a new life in time series anomaly detection that would otherwise be impossible.</p>","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"11 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A new distributional treatment for time series anomaly detection\",\"authors\":\"Kai Ming Ting, Zongyou Liu, Lei Gong, Hang Zhang, Ye Zhu\",\"doi\":\"10.1007/s00778-023-00832-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Time series is traditionally treated with two main approaches, i.e., the time domain approach and the frequency domain approach. These approaches must rely on a sliding window so that time-shift versions of a sequence can be measured to be similar. Coupled with the use of a root point-to-point measure, existing methods often have quadratic time complexity. We offer the third <span>\\\\(\\\\mathbb {R}\\\\)</span> domain approach. It begins with an <i>insight</i> that sequences in a stationary time series can be treated as sets of independent and identically distributed (iid) points generated from an unknown distribution in <span>\\\\(\\\\mathbb {R}\\\\)</span>. This <span>\\\\(\\\\mathbb {R}\\\\)</span> domain treatment enables two new possibilities: (a) The similarity between two sequences can be computed using a distributional measure such as Wasserstein distance (WD), kernel mean embedding or isolation distributional kernel (<span>\\\\(\\\\mathcal {K}_I\\\\)</span>), and (b) these distributional measures become non-sliding-window-based. Together, they offer an alternative that has more effective similarity measurements and runs significantly faster than the point-to-point and sliding-window-based measures. Our empirical evaluation shows that <span>\\\\(\\\\mathcal {K}_I\\\\)</span> is an effective and efficient distributional measure for time series; and <span>\\\\(\\\\mathcal {K}_I\\\\)</span>-based detectors have better detection accuracy than existing detectors in two tasks: (i) anomalous sequence detection in a stationary time series and (ii) anomalous time series detection in a dataset of non-stationary time series. The <i>insight</i> makes underutilized “old things new again” which gives existing distributional measures and anomaly detectors a new life in time series anomaly detection that would otherwise be impossible.</p>\",\"PeriodicalId\":501532,\"journal\":{\"name\":\"The VLDB Journal\",\"volume\":\"11 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The VLDB Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s00778-023-00832-x\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The VLDB Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00778-023-00832-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A new distributional treatment for time series anomaly detection
Time series is traditionally treated with two main approaches, i.e., the time domain approach and the frequency domain approach. These approaches must rely on a sliding window so that time-shift versions of a sequence can be measured to be similar. Coupled with the use of a root point-to-point measure, existing methods often have quadratic time complexity. We offer the third \(\mathbb {R}\) domain approach. It begins with an insight that sequences in a stationary time series can be treated as sets of independent and identically distributed (iid) points generated from an unknown distribution in \(\mathbb {R}\). This \(\mathbb {R}\) domain treatment enables two new possibilities: (a) The similarity between two sequences can be computed using a distributional measure such as Wasserstein distance (WD), kernel mean embedding or isolation distributional kernel (\(\mathcal {K}_I\)), and (b) these distributional measures become non-sliding-window-based. Together, they offer an alternative that has more effective similarity measurements and runs significantly faster than the point-to-point and sliding-window-based measures. Our empirical evaluation shows that \(\mathcal {K}_I\) is an effective and efficient distributional measure for time series; and \(\mathcal {K}_I\)-based detectors have better detection accuracy than existing detectors in two tasks: (i) anomalous sequence detection in a stationary time series and (ii) anomalous time series detection in a dataset of non-stationary time series. The insight makes underutilized “old things new again” which gives existing distributional measures and anomaly detectors a new life in time series anomaly detection that would otherwise be impossible.