Tuning-free sparse clustering via alternating hard-thresholding

IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Journal of Multivariate Analysis Pub Date : 2024-05-15 DOI:10.1016/j.jmva.2024.105330
Wei Dong , Chen Xu , Jinhan Xie , Niansheng Tang
{"title":"Tuning-free sparse clustering via alternating hard-thresholding","authors":"Wei Dong ,&nbsp;Chen Xu ,&nbsp;Jinhan Xie ,&nbsp;Niansheng Tang","doi":"10.1016/j.jmva.2024.105330","DOIUrl":null,"url":null,"abstract":"<div><p>Model-based clustering is a commonly-used technique to partition heterogeneous data into homogeneous groups. When the analysis is to be conducted with a large number of features, analysts face simultaneous challenges in model interpretability, clustering accuracy, and computational efficiency. Several Bayesian and penalization methods have been proposed to select important features for model-based clustering. However, the performance of those methods relies on a careful algorithmic tuning, which can be time-consuming for high-dimensional cases. In this paper, we propose a new sparse clustering method based on alternating hard-thresholding. The new method is conceptually simple and tuning-free. With a user-specified sparsity level, it efficiently detects a set of key features by eliminating a large number of features that are less useful for clustering. Based on the selected key features, one can readily obtain an effective clustering of the original high-dimensional data under a general sparse covariance structure. Under mild conditions, we show that the new method leads to clusters with a misclassification rate consistent to the optimal rate as if the underlying true model were used. The promising performance of the new method is supported by both simulated and real data examples.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":null,"pages":null},"PeriodicalIF":1.4000,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Multivariate Analysis","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0047259X2400037X","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

Model-based clustering is a commonly-used technique to partition heterogeneous data into homogeneous groups. When the analysis is to be conducted with a large number of features, analysts face simultaneous challenges in model interpretability, clustering accuracy, and computational efficiency. Several Bayesian and penalization methods have been proposed to select important features for model-based clustering. However, the performance of those methods relies on a careful algorithmic tuning, which can be time-consuming for high-dimensional cases. In this paper, we propose a new sparse clustering method based on alternating hard-thresholding. The new method is conceptually simple and tuning-free. With a user-specified sparsity level, it efficiently detects a set of key features by eliminating a large number of features that are less useful for clustering. Based on the selected key features, one can readily obtain an effective clustering of the original high-dimensional data under a general sparse covariance structure. Under mild conditions, we show that the new method leads to clusters with a misclassification rate consistent to the optimal rate as if the underlying true model were used. The promising performance of the new method is supported by both simulated and real data examples.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过交替硬阈值进行无调谐稀疏聚类
基于模型的聚类是将异质数据划分为同质组的常用技术。当需要使用大量特征进行分析时,分析人员同时面临着模型可解释性、聚类准确性和计算效率方面的挑战。目前已经提出了几种贝叶斯方法和惩罚方法来为基于模型的聚类选择重要特征。然而,这些方法的性能依赖于仔细的算法调整,这对于高维情况来说可能非常耗时。在本文中,我们提出了一种基于交替硬阈值的新稀疏聚类方法。新方法概念简单,无需调整。在用户指定的稀疏程度下,它能通过剔除大量对聚类作用较小的特征,高效地检测出一组关键特征。根据所选的关键特征,我们可以在一般稀疏协方差结构下轻松获得原始高维数据的有效聚类。在温和的条件下,我们发现新方法得到的聚类的误分类率与最佳误分类率一致,就像使用了底层真实模型一样。模拟和真实数据实例都证明了新方法的良好性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Multivariate Analysis
Journal of Multivariate Analysis 数学-统计学与概率论
CiteScore
2.40
自引率
25.00%
发文量
108
审稿时长
74 days
期刊介绍: Founded in 1971, the Journal of Multivariate Analysis (JMVA) is the central venue for the publication of new, relevant methodology and particularly innovative applications pertaining to the analysis and interpretation of multidimensional data. The journal welcomes contributions to all aspects of multivariate data analysis and modeling, including cluster analysis, discriminant analysis, factor analysis, and multidimensional continuous or discrete distribution theory. Topics of current interest include, but are not limited to, inferential aspects of Copula modeling Functional data analysis Graphical modeling High-dimensional data analysis Image analysis Multivariate extreme-value theory Sparse modeling Spatial statistics.
期刊最新文献
Covariance parameter estimation of Gaussian processes with approximated functional inputs PDE-regularised spatial quantile regression Diagnostic checking of periodic vector autoregressive time series models with dependent errors A conditional distribution function-based measure for independence and K-sample tests in multivariate data On the exact region determined by Spearman’s ρ and Blest’s measure of rank correlation ν for bivariate extreme-value copulas
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1