重新审视基于均值移动的模态聚类

IF 1.4 4区 计算机科学 Q2 STATISTICS & PROBABILITY Advances in Data Analysis and Classification Pub Date : 2023-12-14 DOI:10.1007/s11634-023-00575-1
Jose Ameijeiras-Alonso, Jochen Einbeck
{"title":"重新审视基于均值移动的模态聚类","authors":"Jose Ameijeiras-Alonso,&nbsp;Jochen Einbeck","doi":"10.1007/s11634-023-00575-1","DOIUrl":null,"url":null,"abstract":"<div><p>Modal clustering is an unsupervised learning technique where cluster centers are identified as the local maxima of nonparametric probability density estimates. A natural algorithmic engine for the computation of these maxima is the <i>mean shift procedure</i>, which is essentially an iteratively computed chain of local means. We revisit this technique, focusing on its link to kernel density gradient estimation, in this course proposing a novel concept for bandwidth selection based on the concept of a critical bandwidth. Furthermore, in the one-dimensional case, an inverse version of the mean shift is developed to provide a novel approach for the estimation of antimodes, which is then used to identify cluster boundaries. A simulation study is provided which assesses, in the univariate case, the classification accuracy of the mean-shift based clustering approach. Three (univariate and multivariate) examples from the fields of philately, engineering, and imaging, illustrate how modal clusterings identified through mean shift based methods relate directly and naturally to physical properties of the data-generating system. Solutions are proposed to deal computationally efficiently with large data sets.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"18 4","pages":"1067 - 1095"},"PeriodicalIF":1.4000,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A fresh look at mean-shift based modal clustering\",\"authors\":\"Jose Ameijeiras-Alonso,&nbsp;Jochen Einbeck\",\"doi\":\"10.1007/s11634-023-00575-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Modal clustering is an unsupervised learning technique where cluster centers are identified as the local maxima of nonparametric probability density estimates. A natural algorithmic engine for the computation of these maxima is the <i>mean shift procedure</i>, which is essentially an iteratively computed chain of local means. We revisit this technique, focusing on its link to kernel density gradient estimation, in this course proposing a novel concept for bandwidth selection based on the concept of a critical bandwidth. Furthermore, in the one-dimensional case, an inverse version of the mean shift is developed to provide a novel approach for the estimation of antimodes, which is then used to identify cluster boundaries. A simulation study is provided which assesses, in the univariate case, the classification accuracy of the mean-shift based clustering approach. Three (univariate and multivariate) examples from the fields of philately, engineering, and imaging, illustrate how modal clusterings identified through mean shift based methods relate directly and naturally to physical properties of the data-generating system. Solutions are proposed to deal computationally efficiently with large data sets.</p></div>\",\"PeriodicalId\":49270,\"journal\":{\"name\":\"Advances in Data Analysis and Classification\",\"volume\":\"18 4\",\"pages\":\"1067 - 1095\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2023-12-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in Data Analysis and Classification\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s11634-023-00575-1\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Data Analysis and Classification","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s11634-023-00575-1","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

摘要

模态聚类是一种无监督学习技术,聚类中心被识别为非参数概率密度估计的局部最大值。计算这些最大值的自然算法引擎是均值移动程序,它本质上是一个迭代计算的局部均值链。在本课程中,我们重温了这一技术,重点关注其与核密度梯度估计的联系,并根据临界带宽的概念提出了带宽选择的新概念。此外,在一维情况下,还开发了均值移动的逆版本,为估计反节点提供了一种新方法,然后用于识别聚类边界。模拟研究评估了基于均值偏移的聚类方法在单变量情况下的分类准确性。来自集邮、工程和成像领域的三个(单变量和多变量)实例说明了通过基于均值偏移的方法确定的模态聚类如何直接、自然地与数据生成系统的物理特性相关联。此外,还提出了高效计算大型数据集的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A fresh look at mean-shift based modal clustering

Modal clustering is an unsupervised learning technique where cluster centers are identified as the local maxima of nonparametric probability density estimates. A natural algorithmic engine for the computation of these maxima is the mean shift procedure, which is essentially an iteratively computed chain of local means. We revisit this technique, focusing on its link to kernel density gradient estimation, in this course proposing a novel concept for bandwidth selection based on the concept of a critical bandwidth. Furthermore, in the one-dimensional case, an inverse version of the mean shift is developed to provide a novel approach for the estimation of antimodes, which is then used to identify cluster boundaries. A simulation study is provided which assesses, in the univariate case, the classification accuracy of the mean-shift based clustering approach. Three (univariate and multivariate) examples from the fields of philately, engineering, and imaging, illustrate how modal clusterings identified through mean shift based methods relate directly and naturally to physical properties of the data-generating system. Solutions are proposed to deal computationally efficiently with large data sets.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
3.40
自引率
6.20%
发文量
45
审稿时长
>12 weeks
期刊介绍: The international journal Advances in Data Analysis and Classification (ADAC) is designed as a forum for high standard publications on research and applications concerning the extraction of knowable aspects from many types of data. It publishes articles on such topics as structural, quantitative, or statistical approaches for the analysis of data; advances in classification, clustering, and pattern recognition methods; strategies for modeling complex data and mining large data sets; methods for the extraction of knowledge from data, and applications of advanced methods in specific domains of practice. Articles illustrate how new domain-specific knowledge can be made available from data by skillful use of data analysis methods. The journal also publishes survey papers that outline, and illuminate the basic ideas and techniques of special approaches.
期刊最新文献
Editorial for ADAC issue 4 of volume 18 (2024) Special issue on “New methodologies in clustering and classification for complex and/or big data” Marginal models with individual-specific effects for the analysis of longitudinal bipartite networks Using Bagging to improve clustering methods in the context of three-dimensional shapes The chiPower transformation: a valid alternative to logratio transformations in compositional data analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1