Comparison of Distance Metrics for Generating Cluster-based Ensemble Learning

L. P. Yulianti, A. Trisetyarso, J. Santoso, K. Surendro
{"title":"Comparison of Distance Metrics for Generating Cluster-based Ensemble Learning","authors":"L. P. Yulianti, A. Trisetyarso, J. Santoso, K. Surendro","doi":"10.1145/3587828.3587833","DOIUrl":null,"url":null,"abstract":"The basis of ensemble learning is using multiple learning algorithms to improve predictive performance compared to individual learners. Behind the various advantages of ensemble learning, there are several issues that need attention, one of which is related to finding a set of diverse base learners. Recently, clustering has been used to generate diverse base learners as opposed to bagging. The main advantages of cluster-based ensemble learners are their robustness and versatility. The key parameters for implementing a clustering algorithm are the cluster size and distance metrics. The contribution of this study is to compare four distance metrics, including the Euclidean, Manhattan, Chebyshev, and Canberra distances, in the clustering method for ensemble generation and evaluate them based on accuracy, purity, and diversity. The methodology is tested on 10 benchmark UCI datasets. The results show that the use of the Chebyshev and Canberra distances achieved superior accuracy to both the Euclidean and Manhattan distances, while the purity and diversity values of the use of the Chebyshev distance outperformed the other three.","PeriodicalId":340917,"journal":{"name":"Proceedings of the 2023 12th International Conference on Software and Computer Applications","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 12th International Conference on Software and Computer Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3587828.3587833","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The basis of ensemble learning is using multiple learning algorithms to improve predictive performance compared to individual learners. Behind the various advantages of ensemble learning, there are several issues that need attention, one of which is related to finding a set of diverse base learners. Recently, clustering has been used to generate diverse base learners as opposed to bagging. The main advantages of cluster-based ensemble learners are their robustness and versatility. The key parameters for implementing a clustering algorithm are the cluster size and distance metrics. The contribution of this study is to compare four distance metrics, including the Euclidean, Manhattan, Chebyshev, and Canberra distances, in the clustering method for ensemble generation and evaluate them based on accuracy, purity, and diversity. The methodology is tested on 10 benchmark UCI datasets. The results show that the use of the Chebyshev and Canberra distances achieved superior accuracy to both the Euclidean and Manhattan distances, while the purity and diversity values of the use of the Chebyshev distance outperformed the other three.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
生成基于聚类的集成学习的距离度量比较
与单个学习器相比,集成学习的基础是使用多种学习算法来提高预测性能。在集成学习的各种优势背后,有几个问题需要注意,其中一个问题与寻找一组不同的基础学习器有关。最近,聚类被用来生成不同的基础学习器,而不是bagging。基于聚类的集成学习器的主要优点是鲁棒性和通用性。实现聚类算法的关键参数是聚类大小和距离度量。本研究的贡献在于比较了欧几里得距离、曼哈顿距离、切比雪夫距离和堪培拉距离这四种距离度量,并基于准确性、纯度和多样性对它们进行了评价。该方法在10个基准UCI数据集上进行了测试。结果表明,切比雪夫距离和堪培拉距离的精度均优于欧几里得距离和曼哈顿距离,而切比雪夫距离的纯度和多样性值均优于其他三种距离。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The Use of Dynamic n-Gram to Enhance TF-IDF Features Extraction for Bahasa Indonesia Cyberbullying Classification Development of IT Equipment Management Methodology based on Carbon Emission and End-of-Life Period with A Design Thinking Approach: Case Study: Bandung Institute of Technology Formal Specification and Model Checking of Raft Leader Election in Maude* An Ontology-based Modeling for Classifying Risk of Suicidal Behavior String Figure Simulation with Multiresolution Wire Model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1