Konstantinos Bountrogiannis, G. Tzagkarakis, P. Tsakalides
{"title":"基于数据驱动核的时间序列降维概率SAX","authors":"Konstantinos Bountrogiannis, G. Tzagkarakis, P. Tsakalides","doi":"10.23919/Eusipco47968.2020.9287311","DOIUrl":null,"url":null,"abstract":"The ever-increasing volume and complexity of time series data, emerging in various application domains, necessitate efficient dimensionality reduction for facilitating data mining tasks. Symbolic representations, among them symbolic aggregate approximation (SAX), have proven very effective in compacting the information content of time series while exploiting the wealth of search algorithms used in bioinformatics and text mining communities. However, typical SAX-based techniques rely on a Gaussian assumption for the underlying data statistics, which often deteriorates their performance in practical scenarios. To overcome this limitation, this work introduces a method that negates any assumption on the probability distribution of time series. Specifically, a data-driven kernel density estimator is first applied on the data, followed by Lloyd-Max quantization to determine the optimal horizontal segmentation breakpoints. Experimental evaluation on distinct datasets demonstrates the superiority of our method, in terms of reconstruction accuracy and tightness of lower bound, when compared against the conventional and a modified SAX method.","PeriodicalId":6705,"journal":{"name":"2020 28th European Signal Processing Conference (EUSIPCO)","volume":"59 1","pages":"2343-2347"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Data-driven Kernel-based Probabilistic SAX for Time Series Dimensionality Reduction\",\"authors\":\"Konstantinos Bountrogiannis, G. Tzagkarakis, P. Tsakalides\",\"doi\":\"10.23919/Eusipco47968.2020.9287311\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The ever-increasing volume and complexity of time series data, emerging in various application domains, necessitate efficient dimensionality reduction for facilitating data mining tasks. Symbolic representations, among them symbolic aggregate approximation (SAX), have proven very effective in compacting the information content of time series while exploiting the wealth of search algorithms used in bioinformatics and text mining communities. However, typical SAX-based techniques rely on a Gaussian assumption for the underlying data statistics, which often deteriorates their performance in practical scenarios. To overcome this limitation, this work introduces a method that negates any assumption on the probability distribution of time series. Specifically, a data-driven kernel density estimator is first applied on the data, followed by Lloyd-Max quantization to determine the optimal horizontal segmentation breakpoints. Experimental evaluation on distinct datasets demonstrates the superiority of our method, in terms of reconstruction accuracy and tightness of lower bound, when compared against the conventional and a modified SAX method.\",\"PeriodicalId\":6705,\"journal\":{\"name\":\"2020 28th European Signal Processing Conference (EUSIPCO)\",\"volume\":\"59 1\",\"pages\":\"2343-2347\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 28th European Signal Processing Conference (EUSIPCO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/Eusipco47968.2020.9287311\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 28th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/Eusipco47968.2020.9287311","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Data-driven Kernel-based Probabilistic SAX for Time Series Dimensionality Reduction
The ever-increasing volume and complexity of time series data, emerging in various application domains, necessitate efficient dimensionality reduction for facilitating data mining tasks. Symbolic representations, among them symbolic aggregate approximation (SAX), have proven very effective in compacting the information content of time series while exploiting the wealth of search algorithms used in bioinformatics and text mining communities. However, typical SAX-based techniques rely on a Gaussian assumption for the underlying data statistics, which often deteriorates their performance in practical scenarios. To overcome this limitation, this work introduces a method that negates any assumption on the probability distribution of time series. Specifically, a data-driven kernel density estimator is first applied on the data, followed by Lloyd-Max quantization to determine the optimal horizontal segmentation breakpoints. Experimental evaluation on distinct datasets demonstrates the superiority of our method, in terms of reconstruction accuracy and tightness of lower bound, when compared against the conventional and a modified SAX method.