Pub Date : 2023-11-30DOI: 10.1007/s00357-023-09449-9
Ippei Takasawa, Kensuke Tanioka, Hiroshi Yadohisa
Joint analysis with clustering and structural equation modeling is one of the most popular approaches to analyzing heterogeneous data. The methods involved in this approach estimate a path diagram of the same shape for each cluster and interpret the clusters according to the magnitude of the coefficients. However, these methods have problems with difficulty in interpreting the coefficients when the number of clusters and/or paths increases and are unable to deal with any situation where the path diagram for each cluster is different. To tackle these problems, we propose two methods for simplifying the path structure and facilitating interpretation by estimating a different form of path diagram for each cluster using sparse estimation. The proposed methods and related methods are compared using numerical simulation and real data examples. The proposed methods are superior to the existing methods in terms of both fitting and interpretation.
{"title":"Clustered Sparse Structural Equation Modeling for Heterogeneous Data","authors":"Ippei Takasawa, Kensuke Tanioka, Hiroshi Yadohisa","doi":"10.1007/s00357-023-09449-9","DOIUrl":"https://doi.org/10.1007/s00357-023-09449-9","url":null,"abstract":"<p>Joint analysis with clustering and structural equation modeling is one of the most popular approaches to analyzing heterogeneous data. The methods involved in this approach estimate a path diagram of the same shape for each cluster and interpret the clusters according to the magnitude of the coefficients. However, these methods have problems with difficulty in interpreting the coefficients when the number of clusters and/or paths increases and are unable to deal with any situation where the path diagram for each cluster is different. To tackle these problems, we propose two methods for simplifying the path structure and facilitating interpretation by estimating a different form of path diagram for each cluster using sparse estimation. The proposed methods and related methods are compared using numerical simulation and real data examples. The proposed methods are superior to the existing methods in terms of both fitting and interpretation.</p>","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"199 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138535995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-25DOI: 10.1007/s00357-023-09455-x
Måns Karlsson, Ola Hössjer
In many applications there is ambiguity about which (if any) of a finite number N of hypotheses that best fits an observation. It is of interest then to possibly output a whole set of categories, that is, a scenario where the size of the classified set of categories ranges from 0 to N. Empty sets correspond to an outlier, sets of size 1 represent a firm decision that singles out one hypothesis, sets of size N correspond to a rejection to classify, whereas sets of sizes (2,ldots ,N-1) represent a partial rejection to classify, where some hypotheses are excluded from further analysis. In this paper, we review and unify several proposed methods of Bayesian set-valued classification, where the objective is to find the optimal Bayesian classifier that maximizes the expected reward. We study a large class of reward functions with rewards for sets that include the true category, whereas additive or multiplicative penalties are incurred for sets depending on their size. For models with one homogeneous block of hypotheses, we provide general expressions for the accompanying Bayesian classifier, several of which extend previous results in the literature. Then, we derive novel results for the more general setting when hypotheses are partitioned into blocks, where ambiguity within and between blocks are of different severity. We also discuss how well-known methods of classification, such as conformal prediction, indifference zones, and hierarchical classification, fit into our framework. Finally, set-valued classification is illustrated using an ornithological data set, with taxa partitioned into blocks and parameters estimated using MCMC. The associated reward function’s tuning parameters are chosen through cross-validation.
{"title":"Classification Under Partial Reject Options","authors":"Måns Karlsson, Ola Hössjer","doi":"10.1007/s00357-023-09455-x","DOIUrl":"https://doi.org/10.1007/s00357-023-09455-x","url":null,"abstract":"<p>In many applications there is ambiguity about which (if any) of a finite number <i>N</i> of hypotheses that best fits an observation. It is of interest then to possibly output a whole set of categories, that is, a scenario where the size of the classified set of categories ranges from 0 to <i>N</i>. Empty sets correspond to an outlier, sets of size 1 represent a firm decision that singles out one hypothesis, sets of size <i>N</i> correspond to a rejection to classify, whereas sets of sizes <span>(2,ldots ,N-1)</span> represent a partial rejection to classify, where some hypotheses are excluded from further analysis. In this paper, we review and unify several proposed methods of Bayesian set-valued classification, where the objective is to find the optimal Bayesian classifier that maximizes the expected reward. We study a large class of reward functions with rewards for sets that include the true category, whereas additive or multiplicative penalties are incurred for sets depending on their size. For models with one homogeneous block of hypotheses, we provide general expressions for the accompanying Bayesian classifier, several of which extend previous results in the literature. Then, we derive novel results for the more general setting when hypotheses are partitioned into blocks, where ambiguity within and between blocks are of different severity. We also discuss how well-known methods of classification, such as conformal prediction, indifference zones, and hierarchical classification, fit into our framework. Finally, set-valued classification is illustrated using an ornithological data set, with taxa partitioned into blocks and parameters estimated using MCMC. The associated reward function’s tuning parameters are chosen through cross-validation.</p>","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"29 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2023-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138536017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-13DOI: 10.1007/s00357-023-09453-z
Jason Hou-Liu, Ryan P. Browne
{"title":"Model-Based Clustering with Nested Gaussian Clusters","authors":"Jason Hou-Liu, Ryan P. Browne","doi":"10.1007/s00357-023-09453-z","DOIUrl":"https://doi.org/10.1007/s00357-023-09453-z","url":null,"abstract":"","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"75 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136346468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-07DOI: 10.1007/s00357-023-09452-0
Wangshu Tu, Sanjeena Subedi
The human microbiome plays an important role in human health and disease status. Next-generating sequencing technologies allow for quantifying the composition of the human microbiome. Clustering these microbiome data can provide valuable information by identifying underlying patterns across samples. Recently, Fang and Subedi (2023) proposed a logistic normal multinomial mixture model (LNM-MM) for clustering microbiome data. As microbiome data tends to be high dimensional, here, we develop a family of logistic normal multinomial factor analyzers (LNM-FA) by incorporating a factor analyzer structure in the LNM-MM. This family of models is more suitable for high-dimensional data as the number of free parameters in LNM-FA can be greatly reduced by assuming that the number of latent factors is small. Parameter estimation is done using a computationally efficient variant of the alternating expectation conditional maximization algorithm that utilizes variational Gaussian approximations. The proposed method is illustrated using simulated and real datasets.
{"title":"Logistic Normal Multinomial Factor Analyzers for Clustering Microbiome Data","authors":"Wangshu Tu, Sanjeena Subedi","doi":"10.1007/s00357-023-09452-0","DOIUrl":"https://doi.org/10.1007/s00357-023-09452-0","url":null,"abstract":"The human microbiome plays an important role in human health and disease status. Next-generating sequencing technologies allow for quantifying the composition of the human microbiome. Clustering these microbiome data can provide valuable information by identifying underlying patterns across samples. Recently, Fang and Subedi (2023) proposed a logistic normal multinomial mixture model (LNM-MM) for clustering microbiome data. As microbiome data tends to be high dimensional, here, we develop a family of logistic normal multinomial factor analyzers (LNM-FA) by incorporating a factor analyzer structure in the LNM-MM. This family of models is more suitable for high-dimensional data as the number of free parameters in LNM-FA can be greatly reduced by assuming that the number of latent factors is small. Parameter estimation is done using a computationally efficient variant of the alternating expectation conditional maximization algorithm that utilizes variational Gaussian approximations. The proposed method is illustrated using simulated and real datasets.","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135431080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-16DOI: 10.1007/s00357-023-09448-w
Milad Vahidi, Sina Aghakhani, Diego Martín, Hossein Aminzadeh, Mehrdad Kaveh
{"title":"Optimal Band Selection Using Evolutionary Machine Learning to Improve the Accuracy of Hyper-spectral Images Classification: a Novel Migration-Based Particle Swarm Optimization","authors":"Milad Vahidi, Sina Aghakhani, Diego Martín, Hossein Aminzadeh, Mehrdad Kaveh","doi":"10.1007/s00357-023-09448-w","DOIUrl":"https://doi.org/10.1007/s00357-023-09448-w","url":null,"abstract":"","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135308373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-12DOI: 10.1007/s00357-023-09445-z
Yingying Zhang, Volodymyr Melnykov, Igor Melnykov
{"title":"On Model-Based Clustering of Directional Data with Heavy Tails","authors":"Yingying Zhang, Volodymyr Melnykov, Igor Melnykov","doi":"10.1007/s00357-023-09445-z","DOIUrl":"https://doi.org/10.1007/s00357-023-09445-z","url":null,"abstract":"","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135830983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-04DOI: 10.1007/s00357-023-09447-x
Sebastien Roch
{"title":"Expanding the Class of Global Objective Functions for Dissimilarity-Based Hierarchical Clustering","authors":"Sebastien Roch","doi":"10.1007/s00357-023-09447-x","DOIUrl":"https://doi.org/10.1007/s00357-023-09447-x","url":null,"abstract":"","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135403856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}