Comparison of Distance Metrics for Generating Cluster-based Ensemble Learning

Proceedings of the 2023 12th International Conference on Software and Computer Applications Pub Date : 2023-02-23 DOI:10.1145/3587828.3587833

L. P. Yulianti, A. Trisetyarso, J. Santoso, K. Surendro

{"title":"Comparison of Distance Metrics for Generating Cluster-based Ensemble Learning","authors":"L. P. Yulianti, A. Trisetyarso, J. Santoso, K. Surendro","doi":"10.1145/3587828.3587833","DOIUrl":null,"url":null,"abstract":"The basis of ensemble learning is using multiple learning algorithms to improve predictive performance compared to individual learners. Behind the various advantages of ensemble learning, there are several issues that need attention, one of which is related to finding a set of diverse base learners. Recently, clustering has been used to generate diverse base learners as opposed to bagging. The main advantages of cluster-based ensemble learners are their robustness and versatility. The key parameters for implementing a clustering algorithm are the cluster size and distance metrics. The contribution of this study is to compare four distance metrics, including the Euclidean, Manhattan, Chebyshev, and Canberra distances, in the clustering method for ensemble generation and evaluate them based on accuracy, purity, and diversity. The methodology is tested on 10 benchmark UCI datasets. The results show that the use of the Chebyshev and Canberra distances achieved superior accuracy to both the Euclidean and Manhattan distances, while the purity and diversity values of the use of the Chebyshev distance outperformed the other three.","PeriodicalId":340917,"journal":{"name":"Proceedings of the 2023 12th International Conference on Software and Computer Applications","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 12th International Conference on Software and Computer Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3587828.3587833","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The basis of ensemble learning is using multiple learning algorithms to improve predictive performance compared to individual learners. Behind the various advantages of ensemble learning, there are several issues that need attention, one of which is related to finding a set of diverse base learners. Recently, clustering has been used to generate diverse base learners as opposed to bagging. The main advantages of cluster-based ensemble learners are their robustness and versatility. The key parameters for implementing a clustering algorithm are the cluster size and distance metrics. The contribution of this study is to compare four distance metrics, including the Euclidean, Manhattan, Chebyshev, and Canberra distances, in the clustering method for ensemble generation and evaluate them based on accuracy, purity, and diversity. The methodology is tested on 10 benchmark UCI datasets. The results show that the use of the Chebyshev and Canberra distances achieved superior accuracy to both the Euclidean and Manhattan distances, while the purity and diversity values of the use of the Chebyshev distance outperformed the other three.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

生成基于聚类的集成学习的距离度量比较

与单个学习器相比，集成学习的基础是使用多种学习算法来提高预测性能。在集成学习的各种优势背后，有几个问题需要注意，其中一个问题与寻找一组不同的基础学习器有关。最近，聚类被用来生成不同的基础学习器，而不是bagging。基于聚类的集成学习器的主要优点是鲁棒性和通用性。实现聚类算法的关键参数是聚类大小和距离度量。本研究的贡献在于比较了欧几里得距离、曼哈顿距离、切比雪夫距离和堪培拉距离这四种距离度量，并基于准确性、纯度和多样性对它们进行了评价。该方法在10个基准UCI数据集上进行了测试。结果表明，切比雪夫距离和堪培拉距离的精度均优于欧几里得距离和曼哈顿距离，而切比雪夫距离的纯度和多样性值均优于其他三种距离。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2023 12th International Conference on Software and Computer Applications

自引率

0.00%

发文量