Pub Date : 2022-03-01DOI: 10.1007/s00357-022-09410-2
Paul D. McNicholas
{"title":"Editorial: Journal of Classification Vol. 39-1","authors":"Paul D. McNicholas","doi":"10.1007/s00357-022-09410-2","DOIUrl":"https://doi.org/10.1007/s00357-022-09410-2","url":null,"abstract":"","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"71 1","pages":"1-2"},"PeriodicalIF":2.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138536011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-02-28DOI: 10.1007/s00357-021-09407-3
J. Andrews, R. Browne, Chelsey D. Hvingelby
{"title":"On Assessments of Agreement Between Fuzzy Partitions","authors":"J. Andrews, R. Browne, Chelsey D. Hvingelby","doi":"10.1007/s00357-021-09407-3","DOIUrl":"https://doi.org/10.1007/s00357-021-09407-3","url":null,"abstract":"","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"39 1","pages":"326 - 342"},"PeriodicalIF":2.0,"publicationDate":"2022-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45000845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-02-03DOI: 10.1007/s00357-021-09408-2
R. Giubilei, P. Brutti
{"title":"Supervised Classification for Link Prediction in Facebook Ego Networks With Anonymized Profile Information","authors":"R. Giubilei, P. Brutti","doi":"10.1007/s00357-021-09408-2","DOIUrl":"https://doi.org/10.1007/s00357-021-09408-2","url":null,"abstract":"","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"39 1","pages":"302 - 325"},"PeriodicalIF":2.0,"publicationDate":"2022-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44759366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-17DOI: 10.1007/s00357-021-09409-1
Julian Rossbroich, Jeffrey Durieux, Tom F. Wilderjans
In various scientific fields, researchers make use of partitioning methods (e.g., K-means) to disclose the structural mechanisms underlying object by variable data. In some instances, however, a grouping of objects into clusters that are allowed to overlap (i.e., assigning objects to multiple clusters) might lead to a better representation of the underlying clustering structure. To obtain an overlapping object clustering from object by variable data, Mirkin’s ADditive PROfile CLUStering (ADPROCLUS) model may be used. A major challenge when performing ADPROCLUS is to determine the optimal number of overlapping clusters underlying the data, which pertains to a model selection problem. Up to now, however, this problem has not been systematically investigated and almost no guidelines can be found in the literature regarding appropriate model selection strategies for ADPROCLUS. Therefore, in this paper, several existing model selection strategies for K-means (a.o., CHull, the Caliński-Harabasz, Krzanowski-Lai, Average Silhouette Width and Dunn Index and information-theoretic measures like AIC and BIC) and two cross-validation based strategies are tailored towards an ADPROCLUS context and are compared to each other in an extensive simulation study. The results demonstrate that CHull outperforms all other model selection strategies and this especially when the negative log-likelihood, which is associated with a minimal stochastic extension of ADPROCLUS, is used as (mis)fit measure. The analysis of a post hoc AIC-based model selection strategy revealed that better performance may be obtained when a different—more appropriate—definition of model complexity for ADPROCLUS is used.
{"title":"Model Selection Strategies for Determining the Optimal Number of Overlapping Clusters in Additive Overlapping Partitional Clustering","authors":"Julian Rossbroich, Jeffrey Durieux, Tom F. Wilderjans","doi":"10.1007/s00357-021-09409-1","DOIUrl":"https://doi.org/10.1007/s00357-021-09409-1","url":null,"abstract":"<p>In various scientific fields, researchers make use of partitioning methods (e.g., <i>K</i>-means) to disclose the structural mechanisms underlying object by variable data. In some instances, however, a grouping of objects into clusters that are allowed to overlap (i.e., assigning objects to multiple clusters) might lead to a better representation of the underlying clustering structure. To obtain an overlapping object clustering from object by variable data, Mirkin’s ADditive PROfile CLUStering (ADPROCLUS) model may be used. A major challenge when performing ADPROCLUS is to determine the optimal number of overlapping clusters underlying the data, which pertains to a model selection problem. Up to now, however, this problem has not been systematically investigated and almost no guidelines can be found in the literature regarding appropriate model selection strategies for ADPROCLUS. Therefore, in this paper, several existing model selection strategies for <i>K</i>-means (a.o., CHull, the Caliński-Harabasz, Krzanowski-Lai, Average Silhouette Width and Dunn Index and information-theoretic measures like AIC and BIC) and two cross-validation based strategies are tailored towards an ADPROCLUS context and are compared to each other in an extensive simulation study. The results demonstrate that CHull outperforms all other model selection strategies and this especially when the negative log-likelihood, which is associated with a minimal stochastic extension of ADPROCLUS, is used as (mis)fit measure. The analysis of a post hoc AIC-based model selection strategy revealed that better performance may be obtained when a different—more appropriate—definition of model complexity for ADPROCLUS is used.</p>","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"28 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138536002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-01Epub Date: 2021-09-08DOI: 10.1007/s00357-021-09395-4
François Bavaud
The paper presents and analyzes the properties of a new diversity index, the effective entropy, which lowers Shannon entropy by taking into account the presence of similarities between items. Similarities decrease exponentially with the item dissimilarities, with a freely adjustable discriminability parameter controlling various diversity regimes separated by phase transitions. Effective entropies are determined iteratively, and turn out to be concave and subadditive, in contrast to the reduced entropy, proposed in Ecology for similar purposes. Two data sets are used to illustrate the formalism, and underline the role played by the dissimilarity types.
{"title":"Similarity-Reduced Diversities: the Effective Entropy and the Reduced Entropy.","authors":"François Bavaud","doi":"10.1007/s00357-021-09395-4","DOIUrl":"https://doi.org/10.1007/s00357-021-09395-4","url":null,"abstract":"<p><p>The paper presents and analyzes the properties of a new diversity index, the effective entropy, which lowers Shannon entropy by taking into account the presence of similarities between items. Similarities decrease exponentially with the item dissimilarities, with a freely adjustable discriminability parameter controlling various diversity regimes separated by phase transitions. Effective entropies are determined iteratively, and turn out to be concave and subadditive, in contrast to the reduced entropy, proposed in Ecology for similar purposes. Two data sets are used to illustrate the formalism, and underline the role played by the dissimilarity types.</p>","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"39 1","pages":"100-121"},"PeriodicalIF":2.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8924145/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40305236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-28DOI: 10.1007/s00357-021-09405-5
Sunghoon Kim, Ashley Stadler Blank, W. DeSarbo, J. Vermunt
{"title":"Erratum to: The Spatial Representation of Consumer Dispersion Patterns via a New Multi-level Latent Class Methodology","authors":"Sunghoon Kim, Ashley Stadler Blank, W. DeSarbo, J. Vermunt","doi":"10.1007/s00357-021-09405-5","DOIUrl":"https://doi.org/10.1007/s00357-021-09405-5","url":null,"abstract":"","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"39 1","pages":"240 - 240"},"PeriodicalIF":2.0,"publicationDate":"2021-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45578789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Finite mixture modeling, expanded to matrix-valued data, faces several challenges. One of the major concerns is overparameterization resulting from the high number of parameters involved in a matrix mixture. In addition, an appropriate power transformation is very useful if the data are skewed. The R package MatTransMix is a new piece of software devoted to parsimonious models, based on spectral decomposition of covariance matrices, developed for fitting heterogeneous matrix-valued data providing model-based clustering results. The package implements a variety of parsimonious models obtained from various combinations of spectral decomposition and skewness parameters. The paper discusses some methodological foundations of the proposed models and elaborates the functions available in this package on carefully chosen examples.
{"title":"MatTransMix: an R Package for Matrix Model-Based Clustering and Parsimonious Mixture Modeling","authors":"Zhu, Xuwen, Sarkar, Shuchismita, Melnykov, Volodymyr","doi":"10.1007/s00357-021-09401-9","DOIUrl":"https://doi.org/10.1007/s00357-021-09401-9","url":null,"abstract":"<p>Finite mixture modeling, expanded to matrix-valued data, faces several challenges. One of the major concerns is overparameterization resulting from the high number of parameters involved in a matrix mixture. In addition, an appropriate power transformation is very useful if the data are skewed. The R package MatTransMix is a new piece of software devoted to parsimonious models, based on spectral decomposition of covariance matrices, developed for fitting heterogeneous matrix-valued data providing model-based clustering results. The package implements a variety of parsimonious models obtained from various combinations of spectral decomposition and skewness parameters. The paper discusses some methodological foundations of the proposed models and elaborates the functions available in this package on carefully chosen examples.</p>","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"15 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2021-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138536000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-16DOI: 10.1007/s00357-021-09400-w
Xuwen Zhu, Yana Melnykov
{"title":"Erratum to: On Finite Mixture Modeling of Change-Point Processes","authors":"Xuwen Zhu, Yana Melnykov","doi":"10.1007/s00357-021-09400-w","DOIUrl":"https://doi.org/10.1007/s00357-021-09400-w","url":null,"abstract":"","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"39 1","pages":"23 - 23"},"PeriodicalIF":2.0,"publicationDate":"2021-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"51951640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}