Comparison between Two Algorithms for Computing the Weighted Generalized Affinity Coefficient in the Case of Interval Data

IF 0.9 Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Stats Pub Date : 2023-10-13 DOI:10.3390/stats6040068

Áurea Sousa, Osvaldo Silva, Leonor Bacelar-Nicolau, João Cabral, Helena Bacelar-Nicolau

{"title":"Comparison between Two Algorithms for Computing the Weighted Generalized Affinity Coefficient in the Case of Interval Data","authors":"Áurea Sousa, Osvaldo Silva, Leonor Bacelar-Nicolau, João Cabral, Helena Bacelar-Nicolau","doi":"10.3390/stats6040068","DOIUrl":null,"url":null,"abstract":"From the affinity coefficient between two discrete probability distributions proposed by Matusita, Bacelar-Nicolau introduced the affinity coefficient in a cluster analysis context and extended it to different types of data, including for the case of complex and heterogeneous data within the scope of symbolic data analysis (SDA). In this study, we refer to the most significant partitions obtained using the hierarchical cluster analysis (h.c.a.) of two well-known datasets that were taken from the literature on complex (symbolic) data analysis. h.c.a. is based on the weighted generalized affinity coefficient for the case of interval data and on probabilistic aggregation criteria from a VL parametric family. To calculate the values of this coefficient, two alternative algorithms were used and compared. Both algorithms were able to detect clusters of macrodata (aggregated data into groups of interest) that were consistent and consonant with those reported in the literature, but one performed better than the other in some specific cases. Moreover, both approaches allow for the treatment of large microdatabases (non-aggregated data) after their transformation into macrodata from the huge microdata.","PeriodicalId":93142,"journal":{"name":"Stats","volume":"55 1","pages":"0"},"PeriodicalIF":0.9000,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Stats","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/stats6040068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 1

Abstract

From the affinity coefficient between two discrete probability distributions proposed by Matusita, Bacelar-Nicolau introduced the affinity coefficient in a cluster analysis context and extended it to different types of data, including for the case of complex and heterogeneous data within the scope of symbolic data analysis (SDA). In this study, we refer to the most significant partitions obtained using the hierarchical cluster analysis (h.c.a.) of two well-known datasets that were taken from the literature on complex (symbolic) data analysis. h.c.a. is based on the weighted generalized affinity coefficient for the case of interval data and on probabilistic aggregation criteria from a VL parametric family. To calculate the values of this coefficient, two alternative algorithms were used and compared. Both algorithms were able to detect clusters of macrodata (aggregated data into groups of interest) that were consistent and consonant with those reported in the literature, but one performed better than the other in some specific cases. Moreover, both approaches allow for the treatment of large microdatabases (non-aggregated data) after their transformation into macrodata from the huge microdata.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

区间数据加权广义关联系数两种计算算法的比较

Bacelar-Nicolau从Matusita提出的两个离散概率分布之间的亲和系数出发，将亲和系数引入到聚类分析环境中，并将其扩展到不同类型的数据，包括符号数据分析(SDA)范围内的复杂和异构数据。在本研究中，我们参考了使用层次聚类分析(h.c.a.)从复杂(符号)数据分析文献中获取的两个知名数据集获得的最显著分区。该方法基于区间数据的加权广义亲和系数和VL参数族的概率聚合准则。为了计算该系数的值，使用了两种替代算法并进行了比较。这两种算法都能够检测与文献中报道的一致和一致的宏观数据簇(将数据聚合到感兴趣的组中)，但在某些特定情况下，一种算法比另一种算法表现得更好。此外，这两种方法都允许将大型微数据库(非聚合数据)从庞大的微数据转换为宏数据后进行处理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Stats

CiteScore

0.60

自引率

0.00%

发文量

审稿时长

7 weeks