Feature Weighting-Based Deep Fuzzy C-Means for Clustering Incomplete Time Series

IF 11.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Fuzzy Systems Pub Date : 2024-09-23 DOI:10.1109/TFUZZ.2024.3466175

Yurui Li;Mingjing Du;Wenbin Zhang;Xiang Jiang;Yongquan Dong

{"title":"Feature Weighting-Based Deep Fuzzy C-Means for Clustering Incomplete Time Series","authors":"Yurui Li;Mingjing Du;Wenbin Zhang;Xiang Jiang;Yongquan Dong","doi":"10.1109/TFUZZ.2024.3466175","DOIUrl":null,"url":null,"abstract":"Time-series clustering is a crucial unsupervised technique for analyzing data, commonly used in various fields, including medicine and stock analysis. However, in real-world scenarios, time-series data inevitably contain missing values, consequently reducing the efficiency of traditional clustering methods. In incomplete time series, existing clustering methods typically adopt a two-stage strategy, i.e., initially imputing missing values followed by clustering. However, this approach of separating imputation from clustering may lead to inconsistencies in the optimization objectives and increase the complexity of parameter tuning, potentially resulting in unsatisfactory clustering results. This article proposes an end-to-end deep fuzzy clustering (EEDFC) model for incomplete time series, which jointly optimizes imputation and clustering within a unified framework by integrating multiple losses. In the imputation part, an attention mechanism is integrated to tackle challenges associated with dependencies in extended sequences. In addition, an adversarial strategy is introduced to enhance the encoder's imputation and feature representation learning capability, thus reducing the error propagation from imputation to clustering. In the clustering part, EEDFC combines a feature weighting-based fuzzy clustering, which considers intracluster compactness and intercluster separateness. Furthermore, exponential distance is adopted, and feature and cluster weighting are also integrated into the Kullback–Leibler divergence loss to improve clustering performance. We conduct extensive experiments comparing our proposed model with eleven other methods across ten benchmark datasets. The experimental results demonstrate that our proposed model performs better than eleven comparative methods.","PeriodicalId":13212,"journal":{"name":"IEEE Transactions on Fuzzy Systems","volume":"32 12","pages":"6835-6847"},"PeriodicalIF":11.9000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Fuzzy Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10689343/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Time-series clustering is a crucial unsupervised technique for analyzing data, commonly used in various fields, including medicine and stock analysis. However, in real-world scenarios, time-series data inevitably contain missing values, consequently reducing the efficiency of traditional clustering methods. In incomplete time series, existing clustering methods typically adopt a two-stage strategy, i.e., initially imputing missing values followed by clustering. However, this approach of separating imputation from clustering may lead to inconsistencies in the optimization objectives and increase the complexity of parameter tuning, potentially resulting in unsatisfactory clustering results. This article proposes an end-to-end deep fuzzy clustering (EEDFC) model for incomplete time series, which jointly optimizes imputation and clustering within a unified framework by integrating multiple losses. In the imputation part, an attention mechanism is integrated to tackle challenges associated with dependencies in extended sequences. In addition, an adversarial strategy is introduced to enhance the encoder's imputation and feature representation learning capability, thus reducing the error propagation from imputation to clustering. In the clustering part, EEDFC combines a feature weighting-based fuzzy clustering, which considers intracluster compactness and intercluster separateness. Furthermore, exponential distance is adopted, and feature and cluster weighting are also integrated into the Kullback–Leibler divergence loss to improve clustering performance. We conduct extensive experiments comparing our proposed model with eleven other methods across ten benchmark datasets. The experimental results demonstrate that our proposed model performs better than eleven comparative methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于特征加权的深度模糊 C-Means 对不完整时间序列进行聚类

时间序列聚类是一种重要的无监督数据分析技术，广泛应用于医药、股票分析等领域。然而，在现实场景中，时间序列数据不可避免地包含缺失值，从而降低了传统聚类方法的效率。在不完全时间序列中，现有的聚类方法通常采用两阶段策略，即首先输入缺失值，然后聚类。然而，这种将输入与聚类分离的方法可能导致优化目标不一致，增加参数调优的复杂性，从而可能导致不满意的聚类结果。本文提出了一种不完全时间序列的端到端深度模糊聚类（EEDFC）模型，该模型通过整合多个损失，在统一的框架内共同优化了插补和聚类。在归算部分，引入了一种注意机制来解决扩展序列中依赖关系的挑战。此外，还引入了一种对抗策略来增强编码器的输入和特征表示学习能力，从而减少了从输入到聚类的误差传播。在聚类部分，EEDFC结合了基于特征权重的模糊聚类，考虑了簇内紧密性和簇间分离性。进一步采用指数距离，并将特征权和聚类权结合到Kullback-Leibler散度损失中，提高聚类性能。我们进行了广泛的实验，将我们提出的模型与其他十一种方法在十个基准数据集上进行了比较。实验结果表明，该模型的性能优于11种比较方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Fuzzy Systems 工程技术-工程：电子与电气

CiteScore

20.50

自引率

13.40%

发文量

517

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Fuzzy Systems is a scholarly journal that focuses on the theory, design, and application of fuzzy systems. It aims to publish high-quality technical papers that contribute significant technical knowledge and exploratory developments in the field of fuzzy systems. The journal particularly emphasizes engineering systems and scientific applications. In addition to research articles, the Transactions also includes a letters section featuring current information, comments, and rebuttals related to published papers.