Tree-based methods for clustering time series using domain-relevant attributes

IF 1.7 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of Business Analytics Pub Date : 2019-01-02 DOI:10.1080/2573234X.2019.1645574
Mahsa Ashouri, G. Shmueli, Chor-yiu Sin
{"title":"Tree-based methods for clustering time series using domain-relevant attributes","authors":"Mahsa Ashouri, G. Shmueli, Chor-yiu Sin","doi":"10.1080/2573234X.2019.1645574","DOIUrl":null,"url":null,"abstract":"ABSTRACT We propose two methods for time-series clustering that capture temporal information (trend, seasonality, autocorrelation) and domain-relevant cross-sectional attributes. The methods are based on model-based partitioning (MOB) trees and can be used as automated yet transparent tools for clustering large collections of time series. We address the challenge of using common time-series models in MOB by instead utilising least squares regression. We propose two methods. The single-step method clusters series using trend, seasonality, lags and domain-relevant cross-sectional attributes. The two-step method first clusters by trend, seasonality and cross-sectional attributes, and then clusters the residuals by autocorrelation and domain-relevant attributes. Both methods produce clusters interpretable by domain experts. We illustrate our approach by considering one-step-ahead forecasting and compare to autoregressive integrated moving average (ARIMA) models for forecasting many Wikipedia pageviews time series. The tree-based approach produces forecasts on par with ARIMA, yet is significantly faster and more efficient, thereby suitable for large collections of time-series. The simple parametric forecasting models allow for interpretable time-series clusters.","PeriodicalId":36417,"journal":{"name":"Journal of Business Analytics","volume":"41 1","pages":"1 - 23"},"PeriodicalIF":1.7000,"publicationDate":"2019-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Business Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/2573234X.2019.1645574","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 11

Abstract

ABSTRACT We propose two methods for time-series clustering that capture temporal information (trend, seasonality, autocorrelation) and domain-relevant cross-sectional attributes. The methods are based on model-based partitioning (MOB) trees and can be used as automated yet transparent tools for clustering large collections of time series. We address the challenge of using common time-series models in MOB by instead utilising least squares regression. We propose two methods. The single-step method clusters series using trend, seasonality, lags and domain-relevant cross-sectional attributes. The two-step method first clusters by trend, seasonality and cross-sectional attributes, and then clusters the residuals by autocorrelation and domain-relevant attributes. Both methods produce clusters interpretable by domain experts. We illustrate our approach by considering one-step-ahead forecasting and compare to autoregressive integrated moving average (ARIMA) models for forecasting many Wikipedia pageviews time series. The tree-based approach produces forecasts on par with ARIMA, yet is significantly faster and more efficient, thereby suitable for large collections of time-series. The simple parametric forecasting models allow for interpretable time-series clusters.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用领域相关属性聚类时间序列的基于树的方法
本文提出了两种时间序列聚类方法,分别捕获时间信息(趋势、季节性、自相关性)和领域相关的横截面属性。这些方法基于基于模型的分区(MOB)树,可以作为自动但透明的工具用于聚类大型时间序列集合。我们通过使用最小二乘回归来解决在MOB中使用常见时间序列模型的挑战。我们提出两种方法。单步方法利用趋势、季节性、滞后和领域相关的横截面属性对序列进行聚类。该方法首先通过趋势属性、季节性属性和横截面属性对残差进行聚类,然后通过自相关属性和领域相关属性对残差进行聚类。这两种方法产生的聚类都可以被领域专家解释。我们通过考虑一步预测来说明我们的方法,并将其与预测许多维基百科页面浏览量时间序列的自回归综合移动平均(ARIMA)模型进行比较。基于树的方法产生的预测与ARIMA相当,但速度更快,效率更高,因此适用于大量时间序列。简单的参数预测模型允许可解释的时间序列簇。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Business Analytics
Journal of Business Analytics Business, Management and Accounting-Management Information Systems
CiteScore
2.50
自引率
0.00%
发文量
13
期刊最新文献
Exploring the relationship between YouTube video optimisation practices and video rankings for online marketing: a machine learning approach The era of business analytics: identifying and ranking the differences between business intelligence and data science from practitioners’ perspective using the Delphi method Intelligent decision support system using nested ensemble approach for customer churn in the hotel industry Introducing technological disruption: how breaking media attention on corporate events impacts online sentiment An adaptive and enhanced framework for daily stock market prediction using feature selection and ensemble learning algorithms
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1