{"title":"Spatially-correlated time series clustering using location-dependent Dirichlet process mixture model","authors":"Junsub Jung, Sungil Kim, Heeyoung Kim","doi":"10.1002/sam.11649","DOIUrl":null,"url":null,"abstract":"The Dirichlet process mixture (DPM) model has been widely used as a Bayesian nonparametric model for clustering. However, the exchangeability assumption of the Dirichlet process is not valid for clustering spatially correlated time series as these data are indexed spatially and temporally. While analyzing spatially correlated time series, correlations between observations at proximal times and locations must be appropriately considered. In this study, we propose a location-dependent DPM model by extending the traditional DPM model for clustering spatially correlated time series. We model the temporal pattern as an infinite mixture of Gaussian processes while considering spatial dependency using a location-dependent Dirichlet process prior over mixture components. This encourages the assignment of observations from proximal locations to the same cluster. By contrast, because mixture atoms for modeling temporal patterns are shared across space, observations with similar temporal patterns can be still grouped together even if they are located far apart. The proposed model also allows the number of clusters to be automatically determined in the clustering procedure. We validate the proposed model using simulated examples. Moreover, in a real case study, we cluster adjacent roads based on their traffic speed patterns that have changed as a result of a traffic accident occurred in Seoul, South Korea.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"30 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1002/sam.11649","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The Dirichlet process mixture (DPM) model has been widely used as a Bayesian nonparametric model for clustering. However, the exchangeability assumption of the Dirichlet process is not valid for clustering spatially correlated time series as these data are indexed spatially and temporally. While analyzing spatially correlated time series, correlations between observations at proximal times and locations must be appropriately considered. In this study, we propose a location-dependent DPM model by extending the traditional DPM model for clustering spatially correlated time series. We model the temporal pattern as an infinite mixture of Gaussian processes while considering spatial dependency using a location-dependent Dirichlet process prior over mixture components. This encourages the assignment of observations from proximal locations to the same cluster. By contrast, because mixture atoms for modeling temporal patterns are shared across space, observations with similar temporal patterns can be still grouped together even if they are located far apart. The proposed model also allows the number of clusters to be automatically determined in the clustering procedure. We validate the proposed model using simulated examples. Moreover, in a real case study, we cluster adjacent roads based on their traffic speed patterns that have changed as a result of a traffic accident occurred in Seoul, South Korea.
期刊介绍:
Statistical Analysis and Data Mining addresses the broad area of data analysis, including statistical approaches, machine learning, data mining, and applications. Topics include statistical and computational approaches for analyzing massive and complex datasets, novel statistical and/or machine learning methods and theory, and state-of-the-art applications with high impact. Of special interest are articles that describe innovative analytical techniques, and discuss their application to real problems, in such a way that they are accessible and beneficial to domain experts across science, engineering, and commerce.
The focus of the journal is on papers which satisfy one or more of the following criteria:
Solve data analysis problems associated with massive, complex datasets
Develop innovative statistical approaches, machine learning algorithms, or methods integrating ideas across disciplines, e.g., statistics, computer science, electrical engineering, operation research.
Formulate and solve high-impact real-world problems which challenge existing paradigms via new statistical and/or computational models
Provide survey to prominent research topics.