Pub Date : 2021-09-11DOI: 10.1007/s00357-021-09399-0
An, Baiguo, Feng, Guozhong, Guo, Jianhua
Interactions have greatly influenced recent scientific discoveries, but the identification of interactions is challenging in ultra-high dimensions. In this study, we propose an interaction identification method for classification with ultra-high dimensional discrete features. We utilize clique sets to capture interactions among features, where features in a common clique have interactions that can be used for classification. The number of features related to the interaction is the size of the clique. Hence, our method can consider interactions caused by more than two feature variables. We propose a Kullback-Leibler divergence-based approach to correctly identify the clique sets with a probability that tends to 1 as the sample size tends to infinity. A clique screening method is then proposed to filter out clique sets that are useless for classification, and the strong sure screening property can be guaranteed. Finally, a clique naïve Bayes classifier is proposed for classification. Numerical studies demonstrate that our proposed approach performs very well.
{"title":"Interaction Identification and Clique Screening for Classification with Ultra-high Dimensional Discrete Features","authors":"An, Baiguo, Feng, Guozhong, Guo, Jianhua","doi":"10.1007/s00357-021-09399-0","DOIUrl":"https://doi.org/10.1007/s00357-021-09399-0","url":null,"abstract":"<p>Interactions have greatly influenced recent scientific discoveries, but the identification of interactions is challenging in ultra-high dimensions. In this study, we propose an interaction identification method for classification with ultra-high dimensional discrete features. We utilize clique sets to capture interactions among features, where features in a common clique have interactions that can be used for classification. The number of features related to the interaction is the size of the clique. Hence, our method can consider interactions caused by more than two feature variables. We propose a Kullback-Leibler divergence-based approach to correctly identify the clique sets with a probability that tends to 1 as the sample size tends to infinity. A clique screening method is then proposed to filter out clique sets that are useless for classification, and the strong sure screening property can be guaranteed. Finally, a clique naïve Bayes classifier is proposed for classification. Numerical studies demonstrate that our proposed approach performs very well.</p>","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"8 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2021-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138536010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-08DOI: 10.1007/s00357-021-09398-1
Sunghoon Kim, Ashley Stadler Blank, W. DeSarbo, J. Vermunt
{"title":"The Spatial Representation of Consumer Dispersion Patterns via a New Multi-level Latent Class Methodology","authors":"Sunghoon Kim, Ashley Stadler Blank, W. DeSarbo, J. Vermunt","doi":"10.1007/s00357-021-09398-1","DOIUrl":"https://doi.org/10.1007/s00357-021-09398-1","url":null,"abstract":"","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"39 1","pages":"218 - 239"},"PeriodicalIF":2.0,"publicationDate":"2021-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42586415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Decision tree learning is among the most popular and most traditional families of machine learning algorithms. While these techniques excel in being quite intuitive and interpretable, they also suffer from instability: small perturbations in the training data may result in big changes in the predictions. The so-called ensemble methods combine the output of multiple trees, which makes the decision more reliable and stable. They have been primarily applied to numeric prediction problems and to classification tasks. In the last years, some attempts to extend the ensemble methods to ordinal data can be found in the literature, but no concrete methodology has been provided for preference data. In this paper, we extend decision trees, and in the following also ensemble methods to ranking data. In particular, we propose a theoretical and computational definition of bagging and boosting, two of the best known ensemble methods. In an experimental study using simulated data and real-world datasets, our results confirm that known results from classification, such as that boosting outperforms bagging, could be successfully carried over to the ranking case.
{"title":"Comparing Boosting and Bagging for Decision Trees of Rankings","authors":"Plaia, Antonella, Buscemi, Simona, Fürnkranz, Johannes, Mencía, Eneldo Loza","doi":"10.1007/s00357-021-09397-2","DOIUrl":"https://doi.org/10.1007/s00357-021-09397-2","url":null,"abstract":"<p>Decision tree learning is among the most popular and most traditional families of machine learning algorithms. While these techniques excel in being quite intuitive and interpretable, they also suffer from instability: small perturbations in the training data may result in big changes in the predictions. The so-called ensemble methods combine the output of multiple trees, which makes the decision more reliable and stable. They have been primarily applied to numeric prediction problems and to classification tasks. In the last years, some attempts to extend the ensemble methods to ordinal data can be found in the literature, but no concrete methodology has been provided for preference data. In this paper, we extend decision trees, and in the following also ensemble methods to ranking data. In particular, we propose a theoretical and computational definition of bagging and boosting, two of the best known ensemble methods. In an experimental study using simulated data and real-world datasets, our results confirm that known results from classification, such as that boosting outperforms bagging, could be successfully carried over to the ranking case.</p>","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"135 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2021-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138536001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-08-28DOI: 10.1007/s00357-021-09394-5
Fei Liu, L. Billard
{"title":"Partition of Interval-Valued Observations Using Regression","authors":"Fei Liu, L. Billard","doi":"10.1007/s00357-021-09394-5","DOIUrl":"https://doi.org/10.1007/s00357-021-09394-5","url":null,"abstract":"","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"39 1","pages":"55 - 77"},"PeriodicalIF":2.0,"publicationDate":"2021-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s00357-021-09394-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"51951585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-31DOI: 10.1007/s00357-021-09392-7
K. Yamaguchi, J. Templin
{"title":"A Gibbs Sampling Algorithm with Monotonicity Constraints for Diagnostic Classification Models","authors":"K. Yamaguchi, J. Templin","doi":"10.1007/s00357-021-09392-7","DOIUrl":"https://doi.org/10.1007/s00357-021-09392-7","url":null,"abstract":"","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"39 1","pages":"24 - 54"},"PeriodicalIF":2.0,"publicationDate":"2021-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s00357-021-09392-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44749155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-31DOI: 10.1007/s00357-021-09390-9
Gabriele Soffritti
{"title":"Estimating the Covariance Matrix of the Maximum Likelihood Estimator Under Linear Cluster-Weighted Models","authors":"Gabriele Soffritti","doi":"10.1007/s00357-021-09390-9","DOIUrl":"https://doi.org/10.1007/s00357-021-09390-9","url":null,"abstract":"","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"38 1","pages":"594 - 625"},"PeriodicalIF":2.0,"publicationDate":"2021-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s00357-021-09390-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47103588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-17DOI: 10.1007/s00357-022-09421-z
A. Casa, A. Cappozzo, Michael Fop
{"title":"Group-Wise Shrinkage Estimation in Penalized Model-Based Clustering","authors":"A. Casa, A. Cappozzo, Michael Fop","doi":"10.1007/s00357-022-09421-z","DOIUrl":"https://doi.org/10.1007/s00357-022-09421-z","url":null,"abstract":"","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"39 1","pages":"648-674"},"PeriodicalIF":2.0,"publicationDate":"2021-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44074402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-10DOI: 10.1007/s00357-021-09388-3
Bo-Shiang Ke, Y. Chang
{"title":"A Model-Free Subject Selection Method for Active Learning Classification Procedures","authors":"Bo-Shiang Ke, Y. Chang","doi":"10.1007/s00357-021-09388-3","DOIUrl":"https://doi.org/10.1007/s00357-021-09388-3","url":null,"abstract":"","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"38 1","pages":"544 - 555"},"PeriodicalIF":2.0,"publicationDate":"2021-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s00357-021-09388-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41851801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}