ML-aVAT:一种新的两阶段机器学习方法用于自动聚类倾向评估

IF 4.3 3区 材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC ACS Applied Electronic Materials Pub Date : 2023-10-31 DOI:10.1016/j.bdr.2023.100413
Harshal Mittal, Jagarlamudi Sai Laxman, Dheeraj Kumar
{"title":"ML-aVAT:一种新的两阶段机器学习方法用于自动聚类倾向评估","authors":"Harshal Mittal,&nbsp;Jagarlamudi Sai Laxman,&nbsp;Dheeraj Kumar","doi":"10.1016/j.bdr.2023.100413","DOIUrl":null,"url":null,"abstract":"<div><p>Clustering tendency assessment, which aims to deduce if a dataset contains any cluster structure, and, if it does, how many clusters it has, is a critical problem in exploratory data analysis. The VAT family of algorithms provides a “visual” means to assess the clustering tendency for various datasets. The VAT algorithm operates by reordering the pairwise distance matrix of the input data. When viewed as a monochrome image, this reordered dissimilarity matrix is called a reordered dissimilarity image (RDI), showing possible data clusters by dark blocks along the diagonal. This process, however, requires human intervention to interpret an RDI. Moreover, for datasets having complex cluster structure or noise, dark blocks along the diagonal of the RDI are not easily distinguishable, making it difficult to count them accurately, and different individuals can report different numbers of dark blocks. Only a handful of approaches have been proposed in the literature to automatically (algorithmically) infer the cluster structure from a VAT-type RDI without requiring human input. However, these approaches do not perform well for several data types and have impractically high run-time. This paper proposes and develops ML-aVAT: a novel two-stage machine-learning-based approach for automatic clustering tendency assessment from VAT-type RDI. Besides estimating the number of clusters, ML-aVAT can also infer the clustering hierarchy, i.e., sub-clusters within each group, something none of the previously proposed algorithms could do. Numerical experiments performed on various synthetic and real-life labeled and unlabeled datasets prove the effectiveness of ML-aVAT in estimating clustering tendency and cluster hierarchy.</p></div>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ML-aVAT: A Novel 2-Stage Machine-Learning Approach for Automatic Clustering Tendency Assessment\",\"authors\":\"Harshal Mittal,&nbsp;Jagarlamudi Sai Laxman,&nbsp;Dheeraj Kumar\",\"doi\":\"10.1016/j.bdr.2023.100413\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Clustering tendency assessment, which aims to deduce if a dataset contains any cluster structure, and, if it does, how many clusters it has, is a critical problem in exploratory data analysis. The VAT family of algorithms provides a “visual” means to assess the clustering tendency for various datasets. The VAT algorithm operates by reordering the pairwise distance matrix of the input data. When viewed as a monochrome image, this reordered dissimilarity matrix is called a reordered dissimilarity image (RDI), showing possible data clusters by dark blocks along the diagonal. This process, however, requires human intervention to interpret an RDI. Moreover, for datasets having complex cluster structure or noise, dark blocks along the diagonal of the RDI are not easily distinguishable, making it difficult to count them accurately, and different individuals can report different numbers of dark blocks. Only a handful of approaches have been proposed in the literature to automatically (algorithmically) infer the cluster structure from a VAT-type RDI without requiring human input. However, these approaches do not perform well for several data types and have impractically high run-time. This paper proposes and develops ML-aVAT: a novel two-stage machine-learning-based approach for automatic clustering tendency assessment from VAT-type RDI. Besides estimating the number of clusters, ML-aVAT can also infer the clustering hierarchy, i.e., sub-clusters within each group, something none of the previously proposed algorithms could do. Numerical experiments performed on various synthetic and real-life labeled and unlabeled datasets prove the effectiveness of ML-aVAT in estimating clustering tendency and cluster hierarchy.</p></div>\",\"PeriodicalId\":3,\"journal\":{\"name\":\"ACS Applied Electronic Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2023-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Electronic Materials\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2214579623000461\",\"RegionNum\":3,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214579623000461","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

聚类倾向评估是探索性数据分析中的一个关键问题,它旨在推断数据集是否包含任何聚类结构,如果包含,它有多少聚类。VAT系列算法提供了一种“可视化”的方法来评估各种数据集的聚类趋势。VAT算法通过对输入数据的成对距离矩阵重新排序来操作。当被视为单色图像时,这个重新排序的不相似矩阵被称为重新排序的不相似图像(RDI),通过对角线上的深色块显示可能的数据簇。然而,这个过程需要人工干预来解释RDI。此外,对于具有复杂聚类结构或噪声的数据集,RDI对角线上的暗块不易区分,难以准确计数,不同个体报告的暗块数量也不同。文献中只提出了几种方法来自动(算法地)从vat类型的RDI推断集群结构,而不需要人工输入。然而,这些方法在一些数据类型上表现不佳,并且运行时高得不切实际。本文提出并发展了一种新的基于两阶段机器学习的基于vat类型RDI的自动聚类倾向评估方法ML-aVAT。除了估计聚类的数量外,ML-aVAT还可以推断聚类层次结构,即每个组内的子聚类,这是以前提出的算法无法做到的。在各种合成和真实的标记和未标记数据集上进行的数值实验证明了ML-aVAT在估计聚类倾向和聚类层次方面的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ML-aVAT: A Novel 2-Stage Machine-Learning Approach for Automatic Clustering Tendency Assessment

Clustering tendency assessment, which aims to deduce if a dataset contains any cluster structure, and, if it does, how many clusters it has, is a critical problem in exploratory data analysis. The VAT family of algorithms provides a “visual” means to assess the clustering tendency for various datasets. The VAT algorithm operates by reordering the pairwise distance matrix of the input data. When viewed as a monochrome image, this reordered dissimilarity matrix is called a reordered dissimilarity image (RDI), showing possible data clusters by dark blocks along the diagonal. This process, however, requires human intervention to interpret an RDI. Moreover, for datasets having complex cluster structure or noise, dark blocks along the diagonal of the RDI are not easily distinguishable, making it difficult to count them accurately, and different individuals can report different numbers of dark blocks. Only a handful of approaches have been proposed in the literature to automatically (algorithmically) infer the cluster structure from a VAT-type RDI without requiring human input. However, these approaches do not perform well for several data types and have impractically high run-time. This paper proposes and develops ML-aVAT: a novel two-stage machine-learning-based approach for automatic clustering tendency assessment from VAT-type RDI. Besides estimating the number of clusters, ML-aVAT can also infer the clustering hierarchy, i.e., sub-clusters within each group, something none of the previously proposed algorithms could do. Numerical experiments performed on various synthetic and real-life labeled and unlabeled datasets prove the effectiveness of ML-aVAT in estimating clustering tendency and cluster hierarchy.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.20
自引率
4.30%
发文量
567
期刊最新文献
Vitamin B12: prevention of human beings from lethal diseases and its food application. Current status and obstacles of narrowing yield gaps of four major crops. Cold shock treatment alleviates pitting in sweet cherry fruit by enhancing antioxidant enzymes activity and regulating membrane lipid metabolism. Removal of proteins and lipids affects structure, in vitro digestion and physicochemical properties of rice flour modified by heat-moisture treatment. Investigating the impact of climate variables on the organic honey yield in Turkey using XGBoost machine learning.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1