Pub Date : 2025-11-17DOI: 10.1007/s11634-025-00660-7
Maurizio Vichi, Andrea Cerioli, Hans A. Kestler
{"title":"Editorial for ADAC issue 4 of volume 19 (2025)","authors":"Maurizio Vichi, Andrea Cerioli, Hans A. Kestler","doi":"10.1007/s11634-025-00660-7","DOIUrl":"10.1007/s11634-025-00660-7","url":null,"abstract":"","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 4","pages":"855 - 859"},"PeriodicalIF":1.3,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-05DOI: 10.1007/s11634-025-00652-7
Maurizio Vichi, Andrea Cerioli, Hans A. Kestler
{"title":"Editorial for ADAC issue 3 of volume 19 (2025)","authors":"Maurizio Vichi, Andrea Cerioli, Hans A. Kestler","doi":"10.1007/s11634-025-00652-7","DOIUrl":"10.1007/s11634-025-00652-7","url":null,"abstract":"","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 3","pages":"545 - 549"},"PeriodicalIF":1.3,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145078993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-16DOI: 10.1007/s11634-025-00649-2
Carlo Metta, Marco Fantozzi, Andrea Papini, Gianluca Amato, Matteo Bergamaschi, Andrea Fois, Silvia Giulia Galfrè, Alessandro Marchetti, Michelangelo Vegliò, Maurizio Parton, Francesco Morandin
We introduce a novel computational unit for neural networks that features multiple biases, challenging the traditional perceptron structure. This unit emphasizes the importance of preserving uncorrupted information as it is passed from one unit to the next, applying activation functions later in the process with specialized biases for each unit. Through both empirical and theoretical analyses, we show that by focusing on increasing biases rather than weights, there is potential for significant enhancement in a neural network model’s performance. This approach offers an alternative perspective on optimizing information flow within neural networks. See source code (CurioSAI in Increasing biases can be more efficient than increasing weights, 2023. https://github.com/CuriosAI/dac-dev).
{"title":"Increasing biases can be more efficient than increasing weights","authors":"Carlo Metta, Marco Fantozzi, Andrea Papini, Gianluca Amato, Matteo Bergamaschi, Andrea Fois, Silvia Giulia Galfrè, Alessandro Marchetti, Michelangelo Vegliò, Maurizio Parton, Francesco Morandin","doi":"10.1007/s11634-025-00649-2","DOIUrl":"10.1007/s11634-025-00649-2","url":null,"abstract":"<div><p>We introduce a novel computational unit for neural networks that features multiple biases, challenging the traditional perceptron structure. This unit emphasizes the importance of preserving uncorrupted information as it is passed from one unit to the next, applying activation functions later in the process with specialized biases for each unit. Through both empirical and theoretical analyses, we show that by focusing on increasing biases rather than weights, there is potential for significant enhancement in a neural network model’s performance. This approach offers an alternative perspective on optimizing information flow within neural networks. See source code (CurioSAI in Increasing biases can be more efficient than increasing weights, 2023. https://github.com/CuriosAI/dac-dev).\u0000</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 classification and related methods”","pages":"437 - 468"},"PeriodicalIF":1.3,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145166365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-29DOI: 10.1007/s11634-025-00645-6
Paolo Giordani, Christian Hennig, Julien Jacques, Carla Rampichini
{"title":"Special issue on “Advances in clustering, classification and related methods”","authors":"Paolo Giordani, Christian Hennig, Julien Jacques, Carla Rampichini","doi":"10.1007/s11634-025-00645-6","DOIUrl":"10.1007/s11634-025-00645-6","url":null,"abstract":"","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 classification and related methods”","pages":"271 - 273"},"PeriodicalIF":1.3,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145170373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-23DOI: 10.1007/s11634-025-00634-9
Luca Brusa, Fulvia Pennoni
Dynamic temporal networks are important structures to capture node dependencies and their evolution over time. The dynamic stochastic block model, commonly used with longitudinal network data, is estimated maximizing the likelihood function through the variational expectation-maximization (VEM) algorithm. However, maximization is challenging due to the presence of multiple local maxima. In this paper, we first conduct a simulation study to assess the performance of six different parameter initialization strategies. Second, we introduce a novel specification of the VEM through a genetic algorithm, enabling a more comprehensive exploration of the parameter space. Results from both simulations and historical data on infectious disease transmission highlight the advantages of this approach in overcoming convergence to local maxima and improving node clustering in temporal network data.
{"title":"Variational inference for estimating dynamic stochastic block models through an evolutionary algorithm","authors":"Luca Brusa, Fulvia Pennoni","doi":"10.1007/s11634-025-00634-9","DOIUrl":"10.1007/s11634-025-00634-9","url":null,"abstract":"<div><p>Dynamic temporal networks are important structures to capture node dependencies and their evolution over time. The dynamic stochastic block model, commonly used with longitudinal network data, is estimated maximizing the likelihood function through the variational expectation-maximization (VEM) algorithm. However, maximization is challenging due to the presence of multiple local maxima. In this paper, we first conduct a simulation study to assess the performance of six different parameter initialization strategies. Second, we introduce a novel specification of the VEM through a genetic algorithm, enabling a more comprehensive exploration of the parameter space. Results from both simulations and historical data on infectious disease transmission highlight the advantages of this approach in overcoming convergence to local maxima and improving node clustering in temporal network data.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 classification and related methods”","pages":"469 - 492"},"PeriodicalIF":1.3,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-025-00634-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145168374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-14DOI: 10.1007/s11634-025-00635-8
Niccolò Ducci, Leonardo Grilli, Marta Pittavino
The varying-thresholds model (VTM) is a novel methodology proposed by Tutz ( Flexible predictive distributions from varying-thresholds modelling. https://doi.org/10.48550/arXiv.2103.13324, arXiv:2103.13324 2021) capable of estimating the whole conditional distribution of a response variable in a regression setting. It can be used for continuous, ordinal and count responses. In this study, conditional quantiles and prediction intervals estimated through VTM are compared with those of quantile regression. The comparison is based on a set of data-generating models to assess the performance of the two methodologies regarding the coverage and width of prediction intervals. The simulation study encompasses settings with several functional forms and types of errors. In addition, a discrete version of the continuous ranked probability score is proposed as a tool to choose the best link function for the binary models used in the fitting of VTM. In summary, the varying-thresholds model is a flexible methodology that can be broadly applied with light assumptions; it is advantageous over quantile regression when the conditional quantile function is misspecified.
{"title":"Comparing flexible modelling approaches: the varying-thresholds model versus quantile regression","authors":"Niccolò Ducci, Leonardo Grilli, Marta Pittavino","doi":"10.1007/s11634-025-00635-8","DOIUrl":"10.1007/s11634-025-00635-8","url":null,"abstract":"<div><p>The varying-thresholds model (VTM) is a novel methodology proposed by Tutz ( Flexible predictive distributions from varying-thresholds modelling. https://doi.org/10.48550/arXiv.2103.13324, arXiv:2103.13324 2021) capable of estimating the whole conditional distribution of a response variable in a regression setting. It can be used for continuous, ordinal and count responses. In this study, conditional quantiles and prediction intervals estimated through VTM are compared with those of quantile regression. The comparison is based on a set of data-generating models to assess the performance of the two methodologies regarding the coverage and width of prediction intervals. The simulation study encompasses settings with several functional forms and types of errors. In addition, a discrete version of the continuous ranked probability score is proposed as a tool to choose the best link function for the binary models used in the fitting of VTM. In summary, the varying-thresholds model is a flexible methodology that can be broadly applied with light assumptions; it is advantageous over quantile regression when the conditional quantile function is misspecified.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 classification and related methods”","pages":"493 - 514"},"PeriodicalIF":1.3,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-025-00635-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145165120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-10DOI: 10.1007/s11634-025-00630-z
Maria Giovanna Ranalli, Fulvia Pennoni, Francesco Bartolucci, Antonietta Mira
Since 1998, AlmaLaurea—a consortium of 80 Italian universities and a member of the Italian National Statistical System—has conducted an annual census on graduates’ employment status. The survey provides estimates of descriptive indicators at both the population level and for specific subpopulations (domains) of interest, such as degree programmes. Some domains have very few observations due to a small population size and non-response. In this paper, we address this estimation problem within a Small Area Estimation framework. Specifically, we propose using generalized linear mixed models that incorporate two variables as proxies for graduates’ response propensity, making the assumption of non-informative non-response more plausible. Degree programme estimates of employment rates are derived as (semi-parametric) empirical best predictions using a finite mixture of logistic regression models, with their mean squared error estimated via a second-order, bias-corrected, analytical estimator. Sensitivity analysis is conducted to assess the explanatory power of variables modelling response propensity and to evaluate potential correlations between area-specific random effects and observed heterogeneity.
{"title":"When non-response makes estimates from a census a small area estimation problem: the case of the survey on graduates’ employment status in Italy","authors":"Maria Giovanna Ranalli, Fulvia Pennoni, Francesco Bartolucci, Antonietta Mira","doi":"10.1007/s11634-025-00630-z","DOIUrl":"10.1007/s11634-025-00630-z","url":null,"abstract":"<div><p>Since 1998, AlmaLaurea—a consortium of 80 Italian universities and a member of the Italian National Statistical System—has conducted an annual census on graduates’ employment status. The survey provides estimates of descriptive indicators at both the population level and for specific subpopulations (domains) of interest, such as degree programmes. Some domains have very few observations due to a small population size and non-response. In this paper, we address this estimation problem within a Small Area Estimation framework. Specifically, we propose using generalized linear mixed models that incorporate two variables as proxies for graduates’ response propensity, making the assumption of non-informative non-response more plausible. Degree programme estimates of employment rates are derived as (semi-parametric) empirical best predictions using a finite mixture of logistic regression models, with their mean squared error estimated via a second-order, bias-corrected, analytical estimator. Sensitivity analysis is conducted to assess the explanatory power of variables modelling response propensity and to evaluate potential correlations between area-specific random effects and observed heterogeneity.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 classification and related methods”","pages":"515 - 543"},"PeriodicalIF":1.3,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-025-00630-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145163429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-10DOI: 10.1007/s11634-025-00624-x
Paula Brito, A. Pedro Duarte Silva
We present parametric probabilistic models for numerical distributional variables. The proposed models are based on the representation of each distribution by a location measure and inter-quantile ranges, for given quantiles, thereby characterizing the underlying empirical distributions in a flexible way. Multivariate Normal distributions are assumed for the whole set of indicators, considering alternative structures of the variance–covariance matrix. For all cases, maximum likelihood estimators of the corresponding parameters are derived. This modelling allows for hypothesis testing and multivariate parametric analysis. The proposed framework is applied to Analysis of Variance and parametric Discriminant Analysis of distributional data. A simulation study examines the performance of the proposed models in classification problems under different data conditions. Applications to Internet traffic data and Portuguese official data illustrate the relevance of the proposed approach.
{"title":"Parametric models for distributional data","authors":"Paula Brito, A. Pedro Duarte Silva","doi":"10.1007/s11634-025-00624-x","DOIUrl":"10.1007/s11634-025-00624-x","url":null,"abstract":"<div><p>We present parametric probabilistic models for numerical distributional variables. The proposed models are based on the representation of each distribution by a location measure and inter-quantile ranges, for given quantiles, thereby characterizing the underlying empirical distributions in a flexible way. Multivariate Normal distributions are assumed for the whole set of indicators, considering alternative structures of the variance–covariance matrix. For all cases, maximum likelihood estimators of the corresponding parameters are derived. This modelling allows for hypothesis testing and multivariate parametric analysis. The proposed framework is applied to Analysis of Variance and parametric Discriminant Analysis of distributional data. A simulation study examines the performance of the proposed models in classification problems under different data conditions. Applications to Internet traffic data and Portuguese official data illustrate the relevance of the proposed approach.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 4","pages":"1119 - 1146"},"PeriodicalIF":1.3,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-025-00624-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-28DOI: 10.1007/s11634-025-00629-6
Maurizio Vichi, Andrea Cerioli, Hans A. Kestler
{"title":"Editorial for ADAC issue 1 of volume 19 (2025)","authors":"Maurizio Vichi, Andrea Cerioli, Hans A. Kestler","doi":"10.1007/s11634-025-00629-6","DOIUrl":"10.1007/s11634-025-00629-6","url":null,"abstract":"","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 1","pages":"1 - 4"},"PeriodicalIF":1.4,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143707011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-13DOI: 10.1007/s11634-025-00625-w
Ryan DeWolfe, Jeffrey L. Andrews
The adjusted Rand index (ARI) is a widely used method for comparing hard clusterings, but requires a choice of random model that is often left implicit. Several recent works have extended the Rand index to fuzzy clusterings and adjusted for chance agreement with the permutation model, but the assumptions of this random model are difficult to justify for fuzzy clusterings. Previous work on random models for hard clusterings has shown that different random models can impact similarity rankings, so matching the assumptions of the random model to the algorithm is essential. We propose a single framework computing the ARI with three new random models that are intuitive and explainable for both hard and fuzzy clusterings. The theory and assumptions of the proposed models are contrasted with the existing permutation model, and computations on synthetic and benchmark data show that each model has distinct behaviour, meaning accurate model selection is important for the reliability of results.
{"title":"Random models for adjusting fuzzy rand index extensions","authors":"Ryan DeWolfe, Jeffrey L. Andrews","doi":"10.1007/s11634-025-00625-w","DOIUrl":"10.1007/s11634-025-00625-w","url":null,"abstract":"<div><p>The adjusted Rand index (ARI) is a widely used method for comparing hard clusterings, but requires a choice of random model that is often left implicit. Several recent works have extended the Rand index to fuzzy clusterings and adjusted for chance agreement with the permutation model, but the assumptions of this random model are difficult to justify for fuzzy clusterings. Previous work on random models for hard clusterings has shown that different random models can impact similarity rankings, so matching the assumptions of the random model to the algorithm is essential. We propose a single framework computing the ARI with three new random models that are intuitive and explainable for both hard and fuzzy clusterings. The theory and assumptions of the proposed models are contrasted with the existing permutation model, and computations on synthetic and benchmark data show that each model has distinct behaviour, meaning accurate model selection is important for the reliability of results.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 classification and related methods”","pages":"361 - 385"},"PeriodicalIF":1.3,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145165581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}