首页 > 最新文献

Advances in Data Analysis and Classification最新文献

英文 中文
Editorial for ADAC issue 4 of volume 19 (2025) ADAC第19卷第4期(2025)社论
IF 1.3 4区 计算机科学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-17 DOI: 10.1007/s11634-025-00660-7
Maurizio Vichi, Andrea Cerioli, Hans A. Kestler
{"title":"Editorial for ADAC issue 4 of volume 19 (2025)","authors":"Maurizio Vichi, Andrea Cerioli, Hans A. Kestler","doi":"10.1007/s11634-025-00660-7","DOIUrl":"10.1007/s11634-025-00660-7","url":null,"abstract":"","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 4","pages":"855 - 859"},"PeriodicalIF":1.3,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial for ADAC issue 3 of volume 19 (2025) ADAC第19卷第3期(2025)社论
IF 1.3 4区 计算机科学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-09-05 DOI: 10.1007/s11634-025-00652-7
Maurizio Vichi, Andrea Cerioli, Hans A. Kestler
{"title":"Editorial for ADAC issue 3 of volume 19 (2025)","authors":"Maurizio Vichi, Andrea Cerioli, Hans A. Kestler","doi":"10.1007/s11634-025-00652-7","DOIUrl":"10.1007/s11634-025-00652-7","url":null,"abstract":"","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 3","pages":"545 - 549"},"PeriodicalIF":1.3,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145078993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Increasing biases can be more efficient than increasing weights 增加偏差可能比增加权重更有效
IF 1.3 4区 计算机科学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-06-16 DOI: 10.1007/s11634-025-00649-2
Carlo Metta, Marco Fantozzi, Andrea Papini, Gianluca Amato, Matteo Bergamaschi, Andrea Fois, Silvia Giulia Galfrè, Alessandro Marchetti, Michelangelo Vegliò, Maurizio Parton, Francesco Morandin

We introduce a novel computational unit for neural networks that features multiple biases, challenging the traditional perceptron structure. This unit emphasizes the importance of preserving uncorrupted information as it is passed from one unit to the next, applying activation functions later in the process with specialized biases for each unit. Through both empirical and theoretical analyses, we show that by focusing on increasing biases rather than weights, there is potential for significant enhancement in a neural network model’s performance. This approach offers an alternative perspective on optimizing information flow within neural networks. See source code (CurioSAI in Increasing biases can be more efficient than increasing weights, 2023. https://github.com/CuriosAI/dac-dev).

我们引入了一种新的神经网络计算单元,它具有多偏差的特征,挑战了传统的感知器结构。该单元强调了当信息从一个单元传递到下一个单元时保持未损坏信息的重要性,并在稍后的过程中对每个单元应用具有专门偏差的激活函数。通过实证和理论分析,我们表明,通过关注增加偏差而不是权重,有可能显著增强神经网络模型的性能。这种方法为优化神经网络中的信息流提供了另一种视角。参见源代码(CurioSAI,增加偏差比增加权重更有效,2023)。https://github.com/CuriosAI/dac-dev)。
{"title":"Increasing biases can be more efficient than increasing weights","authors":"Carlo Metta,&nbsp;Marco Fantozzi,&nbsp;Andrea Papini,&nbsp;Gianluca Amato,&nbsp;Matteo Bergamaschi,&nbsp;Andrea Fois,&nbsp;Silvia Giulia Galfrè,&nbsp;Alessandro Marchetti,&nbsp;Michelangelo Vegliò,&nbsp;Maurizio Parton,&nbsp;Francesco Morandin","doi":"10.1007/s11634-025-00649-2","DOIUrl":"10.1007/s11634-025-00649-2","url":null,"abstract":"<div><p>We introduce a novel computational unit for neural networks that features multiple biases, challenging the traditional perceptron structure. This unit emphasizes the importance of preserving uncorrupted information as it is passed from one unit to the next, applying activation functions later in the process with specialized biases for each unit. Through both empirical and theoretical analyses, we show that by focusing on increasing biases rather than weights, there is potential for significant enhancement in a neural network model’s performance. This approach offers an alternative perspective on optimizing information flow within neural networks. See source code (CurioSAI in Increasing biases can be more efficient than increasing weights, 2023. https://github.com/CuriosAI/dac-dev).\u0000</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 classification and related methods”","pages":"437 - 468"},"PeriodicalIF":1.3,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145166365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Special issue on “Advances in clustering, classification and related methods” 专题“聚类、分类及相关方法的进展”
IF 1.3 4区 计算机科学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-05-29 DOI: 10.1007/s11634-025-00645-6
Paolo Giordani, Christian Hennig, Julien Jacques, Carla Rampichini
{"title":"Special issue on “Advances in clustering, classification and related methods”","authors":"Paolo Giordani,&nbsp;Christian Hennig,&nbsp;Julien Jacques,&nbsp;Carla Rampichini","doi":"10.1007/s11634-025-00645-6","DOIUrl":"10.1007/s11634-025-00645-6","url":null,"abstract":"","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 classification and related methods”","pages":"271 - 273"},"PeriodicalIF":1.3,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145170373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variational inference for estimating dynamic stochastic block models through an evolutionary algorithm 用进化算法估计动态随机块模型的变分推理
IF 1.3 4区 计算机科学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-05-23 DOI: 10.1007/s11634-025-00634-9
Luca Brusa, Fulvia Pennoni

Dynamic temporal networks are important structures to capture node dependencies and their evolution over time. The dynamic stochastic block model, commonly used with longitudinal network data, is estimated maximizing the likelihood function through the variational expectation-maximization (VEM) algorithm. However, maximization is challenging due to the presence of multiple local maxima. In this paper, we first conduct a simulation study to assess the performance of six different parameter initialization strategies. Second, we introduce a novel specification of the VEM through a genetic algorithm, enabling a more comprehensive exploration of the parameter space. Results from both simulations and historical data on infectious disease transmission highlight the advantages of this approach in overcoming convergence to local maxima and improving node clustering in temporal network data.

动态时间网络是捕获节点依赖关系及其随时间演变的重要结构。采用变分期望最大化(VEM)算法对纵向网络数据常用的动态随机块模型进行似然函数最大化估计。然而,由于存在多个局部最大值,最大化是具有挑战性的。在本文中,我们首先进行了仿真研究,以评估六种不同参数初始化策略的性能。其次,我们通过遗传算法引入了一种新的VEM规范,使得对参数空间的探索更加全面。传染病传播的仿真结果和历史数据都表明,该方法在克服局部极大值收敛和改进时间网络数据的节点聚类方面具有优势。
{"title":"Variational inference for estimating dynamic stochastic block models through an evolutionary algorithm","authors":"Luca Brusa,&nbsp;Fulvia Pennoni","doi":"10.1007/s11634-025-00634-9","DOIUrl":"10.1007/s11634-025-00634-9","url":null,"abstract":"<div><p>Dynamic temporal networks are important structures to capture node dependencies and their evolution over time. The dynamic stochastic block model, commonly used with longitudinal network data, is estimated maximizing the likelihood function through the variational expectation-maximization (VEM) algorithm. However, maximization is challenging due to the presence of multiple local maxima. In this paper, we first conduct a simulation study to assess the performance of six different parameter initialization strategies. Second, we introduce a novel specification of the VEM through a genetic algorithm, enabling a more comprehensive exploration of the parameter space. Results from both simulations and historical data on infectious disease transmission highlight the advantages of this approach in overcoming convergence to local maxima and improving node clustering in temporal network data.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 classification and related methods”","pages":"469 - 492"},"PeriodicalIF":1.3,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-025-00634-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145168374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparing flexible modelling approaches: the varying-thresholds model versus quantile regression 比较灵活的建模方法:变阈值模型与分位数回归
IF 1.3 4区 计算机科学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-05-14 DOI: 10.1007/s11634-025-00635-8
Niccolò Ducci, Leonardo Grilli, Marta Pittavino

The varying-thresholds model (VTM) is a novel methodology proposed by Tutz ( Flexible predictive distributions from varying-thresholds modelling. https://doi.org/10.48550/arXiv.2103.13324, arXiv:2103.13324 2021) capable of estimating the whole conditional distribution of a response variable in a regression setting. It can be used for continuous, ordinal and count responses. In this study, conditional quantiles and prediction intervals estimated through VTM are compared with those of quantile regression. The comparison is based on a set of data-generating models to assess the performance of the two methodologies regarding the coverage and width of prediction intervals. The simulation study encompasses settings with several functional forms and types of errors. In addition, a discrete version of the continuous ranked probability score is proposed as a tool to choose the best link function for the binary models used in the fitting of VTM. In summary, the varying-thresholds model is a flexible methodology that can be broadly applied with light assumptions; it is advantageous over quantile regression when the conditional quantile function is misspecified.

变阈值模型(VTM)是由Tutz(可变阈值模型的灵活预测分布)提出的一种新方法。https://doi.org/10.48550/arXiv.2103.13324, arXiv:2103.13324 2021)能够估计回归设置中响应变量的整个条件分布。它可用于连续、有序和计数响应。在本研究中,通过VTM估计的条件分位数和预测区间与分位数回归的结果进行了比较。比较基于一组数据生成模型,以评估两种方法在预测区间的覆盖范围和宽度方面的性能。仿真研究包括具有几种功能形式和错误类型的设置。此外,提出了连续排序概率分数的离散版本,作为选择最佳链接函数的工具,用于VTM拟合中使用的二元模型。总之,变阈值模型是一种灵活的方法,可以广泛应用于较轻的假设;当条件分位数函数指定不当时,它优于分位数回归。
{"title":"Comparing flexible modelling approaches: the varying-thresholds model versus quantile regression","authors":"Niccolò Ducci,&nbsp;Leonardo Grilli,&nbsp;Marta Pittavino","doi":"10.1007/s11634-025-00635-8","DOIUrl":"10.1007/s11634-025-00635-8","url":null,"abstract":"<div><p>The varying-thresholds model (VTM) is a novel methodology proposed by Tutz ( Flexible predictive distributions from varying-thresholds modelling. https://doi.org/10.48550/arXiv.2103.13324, arXiv:2103.13324 2021) capable of estimating the whole conditional distribution of a response variable in a regression setting. It can be used for continuous, ordinal and count responses. In this study, conditional quantiles and prediction intervals estimated through VTM are compared with those of quantile regression. The comparison is based on a set of data-generating models to assess the performance of the two methodologies regarding the coverage and width of prediction intervals. The simulation study encompasses settings with several functional forms and types of errors. In addition, a discrete version of the continuous ranked probability score is proposed as a tool to choose the best link function for the binary models used in the fitting of VTM. In summary, the varying-thresholds model is a flexible methodology that can be broadly applied with light assumptions; it is advantageous over quantile regression when the conditional quantile function is misspecified.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 classification and related methods”","pages":"493 - 514"},"PeriodicalIF":1.3,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-025-00635-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145165120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
When non-response makes estimates from a census a small area estimation problem: the case of the survey on graduates’ employment status in Italy 当非回应从人口普查中进行估计时,一个小区域估计问题:以意大利毕业生就业状况调查为例
IF 1.3 4区 计算机科学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-04-10 DOI: 10.1007/s11634-025-00630-z
Maria Giovanna Ranalli, Fulvia Pennoni, Francesco Bartolucci, Antonietta Mira

Since 1998, AlmaLaurea—a consortium of 80 Italian universities and a member of the Italian National Statistical System—has conducted an annual census on graduates’ employment status. The survey provides estimates of descriptive indicators at both the population level and for specific subpopulations (domains) of interest, such as degree programmes. Some domains have very few observations due to a small population size and non-response. In this paper, we address this estimation problem within a Small Area Estimation framework. Specifically, we propose using generalized linear mixed models that incorporate two variables as proxies for graduates’ response propensity, making the assumption of non-informative non-response more plausible. Degree programme estimates of employment rates are derived as (semi-parametric) empirical best predictions using a finite mixture of logistic regression models, with their mean squared error estimated via a second-order, bias-corrected, analytical estimator. Sensitivity analysis is conducted to assess the explanatory power of variables modelling response propensity and to evaluate potential correlations between area-specific random effects and observed heterogeneity.

自1998年以来,由80所意大利大学和意大利国家统计系统成员组成的联盟almalaurea每年对毕业生的就业状况进行一次普查。调查提供了人口一级和有关的特定亚人口(领域)(如学位课程)的描述性指标估计数。一些领域由于人口规模小和无反应而很少观察到。在本文中,我们在一个小区域估计框架中解决了这个估计问题。具体而言,我们建议使用广义线性混合模型,其中包含两个变量作为毕业生响应倾向的代理,使非信息无响应的假设更加合理。学位课程对就业率的估计是使用逻辑回归模型的有限混合得出的(半参数)经验最佳预测,其均方误差通过二阶偏倚校正的分析估计器估计。进行敏感性分析以评估模拟反应倾向的变量的解释能力,并评估区域特异性随机效应与观察到的异质性之间的潜在相关性。
{"title":"When non-response makes estimates from a census a small area estimation problem: the case of the survey on graduates’ employment status in Italy","authors":"Maria Giovanna Ranalli,&nbsp;Fulvia Pennoni,&nbsp;Francesco Bartolucci,&nbsp;Antonietta Mira","doi":"10.1007/s11634-025-00630-z","DOIUrl":"10.1007/s11634-025-00630-z","url":null,"abstract":"<div><p>Since 1998, AlmaLaurea—a consortium of 80 Italian universities and a member of the Italian National Statistical System—has conducted an annual census on graduates’ employment status. The survey provides estimates of descriptive indicators at both the population level and for specific subpopulations (domains) of interest, such as degree programmes. Some domains have very few observations due to a small population size and non-response. In this paper, we address this estimation problem within a Small Area Estimation framework. Specifically, we propose using generalized linear mixed models that incorporate two variables as proxies for graduates’ response propensity, making the assumption of non-informative non-response more plausible. Degree programme estimates of employment rates are derived as (semi-parametric) empirical best predictions using a finite mixture of logistic regression models, with their mean squared error estimated via a second-order, bias-corrected, analytical estimator. Sensitivity analysis is conducted to assess the explanatory power of variables modelling response propensity and to evaluate potential correlations between area-specific random effects and observed heterogeneity.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 classification and related methods”","pages":"515 - 543"},"PeriodicalIF":1.3,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-025-00630-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145163429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parametric models for distributional data 分布数据的参数化模型
IF 1.3 4区 计算机科学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-03-10 DOI: 10.1007/s11634-025-00624-x
Paula Brito, A. Pedro Duarte Silva

We present parametric probabilistic models for numerical distributional variables. The proposed models are based on the representation of each distribution by a location measure and inter-quantile ranges, for given quantiles, thereby characterizing the underlying empirical distributions in a flexible way. Multivariate Normal distributions are assumed for the whole set of indicators, considering alternative structures of the variance–covariance matrix. For all cases, maximum likelihood estimators of the corresponding parameters are derived. This modelling allows for hypothesis testing and multivariate parametric analysis. The proposed framework is applied to Analysis of Variance and parametric Discriminant Analysis of distributional data. A simulation study examines the performance of the proposed models in classification problems under different data conditions. Applications to Internet traffic data and Portuguese official data illustrate the relevance of the proposed approach.

我们提出了数值分布变量的参数概率模型。对于给定的分位数,所提出的模型是基于每个分布的位置测量和分位数间范围的表示,从而以灵活的方式表征潜在的经验分布。考虑到方差-协方差矩阵的可选结构,假设整个指标集为多元正态分布。对于所有的情况,得到了相应参数的极大似然估计。该模型允许假设检验和多元参数分析。将该框架应用于分布数据的方差分析和参数判别分析。仿真研究检验了在不同数据条件下所提出的模型在分类问题中的性能。互联网流量数据和葡萄牙官方数据的应用说明了拟议方法的相关性。
{"title":"Parametric models for distributional data","authors":"Paula Brito,&nbsp;A. Pedro Duarte Silva","doi":"10.1007/s11634-025-00624-x","DOIUrl":"10.1007/s11634-025-00624-x","url":null,"abstract":"<div><p>We present parametric probabilistic models for numerical distributional variables. The proposed models are based on the representation of each distribution by a location measure and inter-quantile ranges, for given quantiles, thereby characterizing the underlying empirical distributions in a flexible way. Multivariate Normal distributions are assumed for the whole set of indicators, considering alternative structures of the variance–covariance matrix. For all cases, maximum likelihood estimators of the corresponding parameters are derived. This modelling allows for hypothesis testing and multivariate parametric analysis. The proposed framework is applied to Analysis of Variance and parametric Discriminant Analysis of distributional data. A simulation study examines the performance of the proposed models in classification problems under different data conditions. Applications to Internet traffic data and Portuguese official data illustrate the relevance of the proposed approach.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 4","pages":"1119 - 1146"},"PeriodicalIF":1.3,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-025-00624-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial for ADAC issue 1 of volume 19 (2025) ADAC第19卷第1期(2025)社论
IF 1.4 4区 计算机科学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-02-28 DOI: 10.1007/s11634-025-00629-6
Maurizio Vichi, Andrea Cerioli, Hans A. Kestler
{"title":"Editorial for ADAC issue 1 of volume 19 (2025)","authors":"Maurizio Vichi,&nbsp;Andrea Cerioli,&nbsp;Hans A. Kestler","doi":"10.1007/s11634-025-00629-6","DOIUrl":"10.1007/s11634-025-00629-6","url":null,"abstract":"","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 1","pages":"1 - 4"},"PeriodicalIF":1.4,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143707011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Random models for adjusting fuzzy rand index extensions 调整模糊指数扩展的随机模型
IF 1.3 4区 计算机科学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-02-13 DOI: 10.1007/s11634-025-00625-w
Ryan DeWolfe, Jeffrey L. Andrews

The adjusted Rand index (ARI) is a widely used method for comparing hard clusterings, but requires a choice of random model that is often left implicit. Several recent works have extended the Rand index to fuzzy clusterings and adjusted for chance agreement with the permutation model, but the assumptions of this random model are difficult to justify for fuzzy clusterings. Previous work on random models for hard clusterings has shown that different random models can impact similarity rankings, so matching the assumptions of the random model to the algorithm is essential. We propose a single framework computing the ARI with three new random models that are intuitive and explainable for both hard and fuzzy clusterings. The theory and assumptions of the proposed models are contrasted with the existing permutation model, and computations on synthetic and benchmark data show that each model has distinct behaviour, meaning accurate model selection is important for the reliability of results.

调整后的兰德指数(ARI)是一种广泛使用的比较硬聚类的方法,但需要选择随机模型,这通常是隐式的。最近的一些工作已经将兰德指数扩展到模糊聚类,并调整了与排列模型的偶然一致性,但这种随机模型的假设很难证明模糊聚类的合理性。先前关于硬聚类随机模型的研究表明,不同的随机模型会影响相似度排名,因此将随机模型的假设与算法相匹配是至关重要的。我们提出了一个计算ARI的单一框架,其中包含三个新的随机模型,这些模型对硬聚类和模糊聚类都是直观和可解释的。将所提模型的理论和假设与现有的置换模型进行了对比,在综合数据和基准数据上的计算表明,每个模型都有不同的行为,这意味着准确的模型选择对结果的可靠性至关重要。
{"title":"Random models for adjusting fuzzy rand index extensions","authors":"Ryan DeWolfe,&nbsp;Jeffrey L. Andrews","doi":"10.1007/s11634-025-00625-w","DOIUrl":"10.1007/s11634-025-00625-w","url":null,"abstract":"<div><p>The adjusted Rand index (ARI) is a widely used method for comparing hard clusterings, but requires a choice of random model that is often left implicit. Several recent works have extended the Rand index to fuzzy clusterings and adjusted for chance agreement with the permutation model, but the assumptions of this random model are difficult to justify for fuzzy clusterings. Previous work on random models for hard clusterings has shown that different random models can impact similarity rankings, so matching the assumptions of the random model to the algorithm is essential. We propose a single framework computing the ARI with three new random models that are intuitive and explainable for both hard and fuzzy clusterings. The theory and assumptions of the proposed models are contrasted with the existing permutation model, and computations on synthetic and benchmark data show that each model has distinct behaviour, meaning accurate model selection is important for the reliability of results.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 classification and related methods”","pages":"361 - 385"},"PeriodicalIF":1.3,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145165581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Advances in Data Analysis and Classification
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1