Improving Clustering Method Performance Using K-Means, Mini Batch K-Means, BIRCH and Spectral

Tenia Wahyuningrum, S. Khomsah, S. Suyanto, Selly Meliana, Prasti Eko Yunanto, W. A. Al Maki
{"title":"Improving Clustering Method Performance Using K-Means, Mini Batch K-Means, BIRCH and Spectral","authors":"Tenia Wahyuningrum, S. Khomsah, S. Suyanto, Selly Meliana, Prasti Eko Yunanto, W. A. Al Maki","doi":"10.1109/ISRITI54043.2021.9702823","DOIUrl":null,"url":null,"abstract":"The most pressing problem of the $k$-Nearest Neighbor (KNN) classification method is voting technology, which will lead to poor accuracy of some randomly distributed complex data sets. To overcome the weakness of KNN, we added a step before the KNN classification phase. We developed a new schema for grouping data sets, making the number of clusters greater than the number of data classes. In addition, the committee selects each cluster so that it does not use voting techniques such as standard KNN methods. This study uses two sequential methods, namely the clustering method and the KNN method. Clustering methods can be used to group records into multiple clusters to select commissions from these clusters. Five clustering methods were tested: K-Means, K-Means with Principal Component Analysis (PCA), Mini Batch K-Means, Spectral and Balanced Iterative Reduction and Clustering using Hierarchies (BIRCH). All tested clustering methods are based on the cluster type of the center of gravity. According to the result, the BIRCH method has the lowest error rate among the five clustering methods (2.13), and K-Means has the largest clusters (156.63).","PeriodicalId":156265,"journal":{"name":"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISRITI54043.2021.9702823","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The most pressing problem of the $k$-Nearest Neighbor (KNN) classification method is voting technology, which will lead to poor accuracy of some randomly distributed complex data sets. To overcome the weakness of KNN, we added a step before the KNN classification phase. We developed a new schema for grouping data sets, making the number of clusters greater than the number of data classes. In addition, the committee selects each cluster so that it does not use voting techniques such as standard KNN methods. This study uses two sequential methods, namely the clustering method and the KNN method. Clustering methods can be used to group records into multiple clusters to select commissions from these clusters. Five clustering methods were tested: K-Means, K-Means with Principal Component Analysis (PCA), Mini Batch K-Means, Spectral and Balanced Iterative Reduction and Clustering using Hierarchies (BIRCH). All tested clustering methods are based on the cluster type of the center of gravity. According to the result, the BIRCH method has the lowest error rate among the five clustering methods (2.13), and K-Means has the largest clusters (156.63).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用K-Means、Mini Batch K-Means、BIRCH和谱改进聚类方法的性能
KNN ($k$-Nearest Neighbor)分类方法最紧迫的问题是投票技术,这将导致一些随机分布的复杂数据集的准确率较差。为了克服KNN的缺点,我们在KNN分类阶段之前增加了一个步骤。我们开发了一种新的数据集分组模式,使集群的数量大于数据类的数量。此外,委员会选择每个集群,因此不使用标准KNN方法等投票技术。本研究采用了两种顺序方法,即聚类方法和KNN方法。聚类方法可用于将记录分组到多个集群中,以便从这些集群中选择佣金。测试了5种聚类方法:K-Means、K-Means与主成分分析(PCA)、Mini Batch K-Means、Spectral and Balanced Iterative Reduction and clustering using Hierarchies (BIRCH)。所有测试的聚类方法都是基于重心的聚类类型。结果表明,在5种聚类方法中,BIRCH方法的错误率最低(2.13),K-Means方法的聚类最多(156.63)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Improved HEVC Video Encoding Quality With Multi Scalability Techniques Indonesian Clickbait Detection Using Improved Backpropagation Neural Network Sentiment Analysis for Twitter Chatter During the Early Outbreak Period of COVID-19 Online Retail Pattern Quality Improvement: From Frequent Sequential Pattern to High-Utility Sequential Pattern East Nusa Tenggara Weaving Image Retrieval Using Convolutional Neural Network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1