{"title":"A Distributed Information Granulation Method for Time Series Clustering","authors":"Yashuang Mu, Tian Liu, Wenqiang Zhang, Hongyue Guo, Lidong Wang, Xiaodong Liu","doi":"10.1002/cpe.8395","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Time series clustering is an important research problem in machine learning and data mining. With the rapid increase in the amount of time series data, many traditional clustering algorithms cannot directly deal with large-scale time series due to some limitations in the memory capacity and the execution time. In this study, we suggest a distributed information granulation method for large-scale time clustering problem. First, a distributed time series partitioning method is designed to randomly divide the original time series dataset into some data blocks. Then, the distributed time series granulation method is developed in the map-reduce framework by the principle of reasonable granularity, where each time series can be described by some representative data points to show the trend state information. Finally, we introduce the large-scale time series clustering method in terms of the fuzzy C-means clustering algorithm. The experimental studies demonstrate the feasibility and the effectiveness on several UCR publicly benchmark time series datasets. Compared with the classical clustering methods, the proposed method can achieve a 4.86–9.65% improvement in average clustering accuracy. Meanwhile, the proposed method exhibits more advantages in both unequal length time series clustering and execution time.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 4-5","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.8395","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Time series clustering is an important research problem in machine learning and data mining. With the rapid increase in the amount of time series data, many traditional clustering algorithms cannot directly deal with large-scale time series due to some limitations in the memory capacity and the execution time. In this study, we suggest a distributed information granulation method for large-scale time clustering problem. First, a distributed time series partitioning method is designed to randomly divide the original time series dataset into some data blocks. Then, the distributed time series granulation method is developed in the map-reduce framework by the principle of reasonable granularity, where each time series can be described by some representative data points to show the trend state information. Finally, we introduce the large-scale time series clustering method in terms of the fuzzy C-means clustering algorithm. The experimental studies demonstrate the feasibility and the effectiveness on several UCR publicly benchmark time series datasets. Compared with the classical clustering methods, the proposed method can achieve a 4.86–9.65% improvement in average clustering accuracy. Meanwhile, the proposed method exhibits more advantages in both unequal length time series clustering and execution time.
期刊介绍:
Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of:
Parallel and distributed computing;
High-performance computing;
Computational and data science;
Artificial intelligence and machine learning;
Big data applications, algorithms, and systems;
Network science;
Ontologies and semantics;
Security and privacy;
Cloud/edge/fog computing;
Green computing; and
Quantum computing.