利用二进制数据集对内部聚类验证指数进行定量评估

IF 2.2 3区 环境科学与生态学 Q2 ECOLOGY Journal of Vegetation Science Pub Date : 2024-10-12 DOI:10.1111/jvs.13310
Naghmeh Pakgohar, Attila Lengyel, Zoltán Botta-Dukát
{"title":"利用二进制数据集对内部聚类验证指数进行定量评估","authors":"Naghmeh Pakgohar,&nbsp;Attila Lengyel,&nbsp;Zoltán Botta-Dukát","doi":"10.1111/jvs.13310","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Aims</h3>\n \n <p>Different clustering methods often classify the same data set differently. Selecting the “best” clustering solution from alternatives is possible with cluster validation indices. Because of the large variety of cluster validation indices (CVIs), choosing the most suitable index concerning the data set and clustering algorithms is challenging. We aim to assess different internal clustering validation indices.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Artificial binary data sets with equal- and unequal-sized well-separated a priori clusters were simulated and three levels of noise were then added. Twenty replications of each of the six types of data sets (two group sizes × three levels of noise) were created and analyzed by three clustering algorithms with Jaccard dissimilarity. Twenty-seven clustering validation indices are evaluated including both geometric and non-geometric indices.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Although, in theory, all CVIs could differentiate between good and wrong classifications, only a few perform as expected with noisy data. Tau and silhouette widths proved to be the best geometric CVIs both for equal and unequal cluster sizes. Among non-geometric indices, crispness and OptimClass performed best.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>We recommend using these best-performing CVIs. We suggest plotting the CVI value against the number of clusters because the lack of a sharp peak means that the position of the maximum is uncertain.</p>\n </section>\n </div>","PeriodicalId":49965,"journal":{"name":"Journal of Vegetation Science","volume":"35 5","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2024-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jvs.13310","citationCount":"0","resultStr":"{\"title\":\"Quantitative evaluation of internal cluster validation indices using binary data sets\",\"authors\":\"Naghmeh Pakgohar,&nbsp;Attila Lengyel,&nbsp;Zoltán Botta-Dukát\",\"doi\":\"10.1111/jvs.13310\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Aims</h3>\\n \\n <p>Different clustering methods often classify the same data set differently. Selecting the “best” clustering solution from alternatives is possible with cluster validation indices. Because of the large variety of cluster validation indices (CVIs), choosing the most suitable index concerning the data set and clustering algorithms is challenging. We aim to assess different internal clustering validation indices.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>Artificial binary data sets with equal- and unequal-sized well-separated a priori clusters were simulated and three levels of noise were then added. Twenty replications of each of the six types of data sets (two group sizes × three levels of noise) were created and analyzed by three clustering algorithms with Jaccard dissimilarity. Twenty-seven clustering validation indices are evaluated including both geometric and non-geometric indices.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>Although, in theory, all CVIs could differentiate between good and wrong classifications, only a few perform as expected with noisy data. Tau and silhouette widths proved to be the best geometric CVIs both for equal and unequal cluster sizes. Among non-geometric indices, crispness and OptimClass performed best.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusion</h3>\\n \\n <p>We recommend using these best-performing CVIs. We suggest plotting the CVI value against the number of clusters because the lack of a sharp peak means that the position of the maximum is uncertain.</p>\\n </section>\\n </div>\",\"PeriodicalId\":49965,\"journal\":{\"name\":\"Journal of Vegetation Science\",\"volume\":\"35 5\",\"pages\":\"\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-10-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jvs.13310\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Vegetation Science\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/jvs.13310\",\"RegionNum\":3,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ECOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Vegetation Science","FirstCategoryId":"93","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jvs.13310","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

目的 不同的聚类方法通常会对同一数据集进行不同的分类。利用聚类验证指数可以从备选方案中选择 "最佳 "聚类解决方案。由于聚类验证指数(CVI)种类繁多,根据数据集和聚类算法选择最合适的指数具有挑战性。我们旨在评估不同的内部聚类验证指数。 方法 模拟具有大小相等和不相等的先验分离好的聚类的人工二进制数据集,然后添加三种水平的噪声。六种类型的数据集(两种群组大小 × 三种噪音水平)中的每种数据集都有 20 个重复集,并通过三种具有 Jaccard 差异性的聚类算法进行分析。评估了 27 个聚类验证指数,包括几何和非几何指数。 结果 尽管从理论上讲,所有的 CVI 都能区分好的分类和错误的分类,但只有少数 CVI 在有噪声数据时的表现符合预期。在聚类大小相等和不相等的情况下,Tau 和轮廓宽度都被证明是最佳的几何 CVI。在非几何指数中,清晰度和 OptimClass 表现最佳。 结论 我们建议使用这些表现最佳的 CVI。我们建议绘制 CVI 值与聚类数的对比图,因为没有尖锐的峰值意味着最大值的位置不确定。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Quantitative evaluation of internal cluster validation indices using binary data sets

Aims

Different clustering methods often classify the same data set differently. Selecting the “best” clustering solution from alternatives is possible with cluster validation indices. Because of the large variety of cluster validation indices (CVIs), choosing the most suitable index concerning the data set and clustering algorithms is challenging. We aim to assess different internal clustering validation indices.

Methods

Artificial binary data sets with equal- and unequal-sized well-separated a priori clusters were simulated and three levels of noise were then added. Twenty replications of each of the six types of data sets (two group sizes × three levels of noise) were created and analyzed by three clustering algorithms with Jaccard dissimilarity. Twenty-seven clustering validation indices are evaluated including both geometric and non-geometric indices.

Results

Although, in theory, all CVIs could differentiate between good and wrong classifications, only a few perform as expected with noisy data. Tau and silhouette widths proved to be the best geometric CVIs both for equal and unequal cluster sizes. Among non-geometric indices, crispness and OptimClass performed best.

Conclusion

We recommend using these best-performing CVIs. We suggest plotting the CVI value against the number of clusters because the lack of a sharp peak means that the position of the maximum is uncertain.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Vegetation Science
Journal of Vegetation Science 环境科学-林学
CiteScore
6.00
自引率
3.60%
发文量
60
审稿时长
2 months
期刊介绍: The Journal of Vegetation Science publishes papers on all aspects of plant community ecology, with particular emphasis on papers that develop new concepts or methods, test theory, identify general patterns, or that are otherwise likely to interest a broad international readership. Papers may focus on any aspect of vegetation science, e.g. community structure (including community assembly and plant functional types), biodiversity (including species richness and composition), spatial patterns (including plant geography and landscape ecology), temporal changes (including demography, community dynamics and palaeoecology) and processes (including ecophysiology), provided the focus is on increasing our understanding of plant communities. The Journal publishes papers on the ecology of a single species only if it plays a key role in structuring plant communities. Papers that apply ecological concepts, theories and methods to the vegetation management, conservation and restoration, and papers on vegetation survey should be directed to our associate journal, Applied Vegetation Science journal.
期刊最新文献
Role of Plant Specialists in Fine-Scale Diversity–Area Relationships (DARs) in Southern European Atlantic Coastal Dunes Willow above, changes below: Seedless tree invader impacts riparian seed bank in the Patagonian ecotone Repeat photography reveals long-term climate change impacts on sub-Antarctic tundra vegetation Mammalian herbivory alters structure, composition and edaphic conditions of a grey-dune community Short-term vegetation shifts in an alpine grassland under current and simulated climate change
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1