Odyssey: An Engine Enabling the Time-Series Clustering Journey

IF 2.6 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI:10.14778/3611540.3611622
John Paparrizos, Sai Prasanna Teja Reddy
{"title":"Odyssey: An Engine Enabling the Time-Series Clustering Journey","authors":"John Paparrizos, Sai Prasanna Teja Reddy","doi":"10.14778/3611540.3611622","DOIUrl":null,"url":null,"abstract":"Clustering is one of the most popular time-series tasks because it enables unsupervised data exploration and often serves as a subroutine or preprocessing step for other tasks. Despite being the subject of active research across disciplines for decades, only limited efforts focused on benchmarking clustering methods for time series. Unfortunately, these studies have (i) omitted popular methods and entire classes of methods; (ii) considered limited choices for underlying distance measures; (iii) performed evaluations on a small number of datasets; or (iv) avoided rigorous statistical validation of the findings. In addition, the sudden enthusiasm and recent slew of proposed deep learning methods underscore the vital need for a comprehensive study. Motivated by the aforementioned limitations, we present Odyssey, a modular and extensible web engine to comprehensively evaluate 80 time-series clustering methods spanning 9 different classes from the data mining, machine learning, and deep learning literature. Odyssey enables rigorous statistical analysis across 128 diverse time-series datasets. Through its interactive interface, Odyssey (i) reveals the best-performing method per class; (ii) identifies classes performing exceptionally well that were previously omitted; (iii) challenges claims about the use of elastic measures in clustering; (iv) highlights the effects of parameter tuning; and (v) debunks claims of superiority of deep learning methods. Odyssey does not only facilitate the most extensive study ever performed in this area but, importantly, reveals an illusion of progress while, in reality, none of the evaluated methods could outperform a traditional method, namely, k -Shape, with a statistically significant difference. Overall, Odyssey lays the foundations for advancing the state of the art in time-series clustering.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"18 1","pages":"0"},"PeriodicalIF":2.6000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Vldb Endowment","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14778/3611540.3611622","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1

Abstract

Clustering is one of the most popular time-series tasks because it enables unsupervised data exploration and often serves as a subroutine or preprocessing step for other tasks. Despite being the subject of active research across disciplines for decades, only limited efforts focused on benchmarking clustering methods for time series. Unfortunately, these studies have (i) omitted popular methods and entire classes of methods; (ii) considered limited choices for underlying distance measures; (iii) performed evaluations on a small number of datasets; or (iv) avoided rigorous statistical validation of the findings. In addition, the sudden enthusiasm and recent slew of proposed deep learning methods underscore the vital need for a comprehensive study. Motivated by the aforementioned limitations, we present Odyssey, a modular and extensible web engine to comprehensively evaluate 80 time-series clustering methods spanning 9 different classes from the data mining, machine learning, and deep learning literature. Odyssey enables rigorous statistical analysis across 128 diverse time-series datasets. Through its interactive interface, Odyssey (i) reveals the best-performing method per class; (ii) identifies classes performing exceptionally well that were previously omitted; (iii) challenges claims about the use of elastic measures in clustering; (iv) highlights the effects of parameter tuning; and (v) debunks claims of superiority of deep learning methods. Odyssey does not only facilitate the most extensive study ever performed in this area but, importantly, reveals an illusion of progress while, in reality, none of the evaluated methods could outperform a traditional method, namely, k -Shape, with a statistically significant difference. Overall, Odyssey lays the foundations for advancing the state of the art in time-series clustering.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Odyssey:一个支持时间序列集群旅程的引擎
聚类是最流行的时间序列任务之一,因为它支持无监督的数据探索,并且通常用作其他任务的子例程或预处理步骤。尽管几十年来一直是跨学科研究的活跃主题,但只有有限的努力集中在时间序列的基准聚类方法上。不幸的是,这些研究(i)忽略了流行的方法和整个类别的方法;(ii)考虑了基础距离测量的有限选择;(iii)对少数数据集进行评估;或者(iv)避免对研究结果进行严格的统计验证。此外,突然出现的热情和最近提出的深度学习方法强调了对全面研究的迫切需要。基于上述限制,我们提出了Odyssey,一个模块化和可扩展的web引擎,以全面评估80种时间序列聚类方法,涵盖数据挖掘,机器学习和深度学习文献中的9个不同类别。Odyssey能够对128个不同的时间序列数据集进行严格的统计分析。通过它的交互界面,Odyssey (i)揭示了每个类的最佳执行方法;(ii)确定以前被省略的表现特别好的班级;(iii)质疑关于在聚类中使用弹性度量的主张;(iv)突出参数调整的影响;(v)揭穿深度学习方法优越性的说法。奥德赛不仅促进了在这一领域进行的最广泛的研究,而且重要的是,它揭示了一种进步的错觉,而实际上,所评估的方法都无法超越传统方法,即k -Shape,并具有统计学上的显著差异。总的来说,Odyssey为时间序列集群技术的发展奠定了基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Proceedings of the Vldb Endowment
Proceedings of the Vldb Endowment Computer Science-General Computer Science
CiteScore
7.70
自引率
0.00%
发文量
95
期刊介绍: The Proceedings of the VLDB (PVLDB) welcomes original research papers on a broad range of research topics related to all aspects of data management, where systems issues play a significant role, such as data management system technology and information management infrastructures, including their very large scale of experimentation, novel architectures, and demanding applications as well as their underpinning theory. The scope of a submission for PVLDB is also described by the subject areas given below. Moreover, the scope of PVLDB is restricted to scientific areas that are covered by the combined expertise on the submission’s topic of the journal’s editorial board. Finally, the submission’s contributions should build on work already published in data management outlets, e.g., PVLDB, VLDBJ, ACM SIGMOD, IEEE ICDE, EDBT, ACM TODS, IEEE TKDE, and go beyond a syntactic citation.
期刊最新文献
Auditory Brainstem Response in a Child with Mitochondrial Disorder-Leigh Syndrome. Breathing New Life into an Old Tree: Resolving Logging Dilemma of B + -tree on Modern Computational Storage Drives QO-Insight: Inspecting Steered Query Optimizers A Learned Query Rewrite System Demonstrating ADOPT: Adaptively Optimizing Attribute Orders for Worst-Case Optimal Joins via Reinforcement Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1