{"title":"Tree-based methods for clustering time series using domain-relevant attributes","authors":"Mahsa Ashouri, G. Shmueli, Chor-yiu Sin","doi":"10.1080/2573234X.2019.1645574","DOIUrl":null,"url":null,"abstract":"ABSTRACT We propose two methods for time-series clustering that capture temporal information (trend, seasonality, autocorrelation) and domain-relevant cross-sectional attributes. The methods are based on model-based partitioning (MOB) trees and can be used as automated yet transparent tools for clustering large collections of time series. We address the challenge of using common time-series models in MOB by instead utilising least squares regression. We propose two methods. The single-step method clusters series using trend, seasonality, lags and domain-relevant cross-sectional attributes. The two-step method first clusters by trend, seasonality and cross-sectional attributes, and then clusters the residuals by autocorrelation and domain-relevant attributes. Both methods produce clusters interpretable by domain experts. We illustrate our approach by considering one-step-ahead forecasting and compare to autoregressive integrated moving average (ARIMA) models for forecasting many Wikipedia pageviews time series. The tree-based approach produces forecasts on par with ARIMA, yet is significantly faster and more efficient, thereby suitable for large collections of time-series. The simple parametric forecasting models allow for interpretable time-series clusters.","PeriodicalId":36417,"journal":{"name":"Journal of Business Analytics","volume":"41 1","pages":"1 - 23"},"PeriodicalIF":1.7000,"publicationDate":"2019-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Business Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/2573234X.2019.1645574","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 11
Abstract
ABSTRACT We propose two methods for time-series clustering that capture temporal information (trend, seasonality, autocorrelation) and domain-relevant cross-sectional attributes. The methods are based on model-based partitioning (MOB) trees and can be used as automated yet transparent tools for clustering large collections of time series. We address the challenge of using common time-series models in MOB by instead utilising least squares regression. We propose two methods. The single-step method clusters series using trend, seasonality, lags and domain-relevant cross-sectional attributes. The two-step method first clusters by trend, seasonality and cross-sectional attributes, and then clusters the residuals by autocorrelation and domain-relevant attributes. Both methods produce clusters interpretable by domain experts. We illustrate our approach by considering one-step-ahead forecasting and compare to autoregressive integrated moving average (ARIMA) models for forecasting many Wikipedia pageviews time series. The tree-based approach produces forecasts on par with ARIMA, yet is significantly faster and more efficient, thereby suitable for large collections of time-series. The simple parametric forecasting models allow for interpretable time-series clusters.