Pub Date : 2023-01-04DOI: 10.1007/s11634-022-00532-4
Nam-Hwui Kim, Ryan P. Browne
When modeling the functional relationship between a response variable and covariates via linear regression, multiple relationships may be present depending on the underlying component structure. Deploying a flexible mixture distribution can help with capturing a wide variety of such structures, thereby successfully modeling the response–covariate relationship while addressing the components. In that spirit, a mixture regression model based on the finite mixture of generalized hyperbolic distributions is introduced, and its parameter estimation method is presented. The flexibility of the generalized hyperbolic distribution can identify better-fitting components, which can lead to a more meaningful functional relationship between the response variable and the covariates. In addition, we introduce an iterative component combining procedure to aid the interpretability of the model. The results from simulated and real data analyses indicate that our method offers a distinctive edge over some of the existing methods, and that it can generate useful insights on the data set at hand for further investigation.
{"title":"Flexible mixture regression with the generalized hyperbolic distribution","authors":"Nam-Hwui Kim, Ryan P. Browne","doi":"10.1007/s11634-022-00532-4","DOIUrl":"10.1007/s11634-022-00532-4","url":null,"abstract":"<div><p>When modeling the functional relationship between a response variable and covariates via linear regression, multiple relationships may be present depending on the underlying component structure. Deploying a flexible mixture distribution can help with capturing a wide variety of such structures, thereby successfully modeling the response–covariate relationship while addressing the components. In that spirit, a mixture regression model based on the finite mixture of generalized hyperbolic distributions is introduced, and its parameter estimation method is presented. The flexibility of the generalized hyperbolic distribution can identify better-fitting components, which can lead to a more meaningful functional relationship between the response variable and the covariates. In addition, we introduce an iterative component combining procedure to aid the interpretability of the model. The results from simulated and real data analyses indicate that our method offers a distinctive edge over some of the existing methods, and that it can generate useful insights on the data set at hand for further investigation.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"18 1","pages":"33 - 60"},"PeriodicalIF":1.4,"publicationDate":"2023-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82422675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-02DOI: 10.1007/s11634-022-00531-5
Ruiping Liu, Ndeye Niang, Gilbert Saporta, Huiwen Wang
We propose sparse variants of correspondence analysis (CA) for large contingency tables like documents-terms matrices used in text mining. By seeking to obtain many zero coefficients, sparse CA remedies to the difficulty of interpreting CA results when the size of the table is large. Since CA is a double weighted PCA (for rows and columns) or a weighted generalized SVD, we adapt known sparse versions of these methods with specific developments to obtain orthogonal solutions and to tune the sparseness parameters. We distinguish two cases depending on whether sparseness is asked for both rows and columns, or only for one set.
{"title":"Sparse correspondence analysis for large contingency tables","authors":"Ruiping Liu, Ndeye Niang, Gilbert Saporta, Huiwen Wang","doi":"10.1007/s11634-022-00531-5","DOIUrl":"10.1007/s11634-022-00531-5","url":null,"abstract":"<div><p>We propose sparse variants of correspondence analysis (CA) for large contingency tables like documents-terms matrices used in text mining. By seeking to obtain many zero coefficients, sparse CA remedies to the difficulty of interpreting CA results when the size of the table is large. Since CA is a double weighted PCA (for rows and columns) or a weighted generalized SVD, we adapt known sparse versions of these methods with specific developments to obtain orthogonal solutions and to tune the sparseness parameters. We distinguish two cases depending on whether sparseness is asked for both rows and columns, or only for one set.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 4","pages":"1037 - 1056"},"PeriodicalIF":1.6,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50003542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-21DOI: 10.1007/s11634-022-00530-6
Summer Atkins, Gudmundur Einarsson, Line Clemmensen, Brendan Ames
Linear discriminant analysis (LDA) is a classical method for dimensionality reduction, where discriminant vectors are sought to project data to a lower dimensional space for optimal separability of classes. Several recent papers have outlined strategies, based on exploiting sparsity of the discriminant vectors, for performing LDA in the high-dimensional setting where the number of features exceeds the number of observations in the data. However, many of these proposed methods lack scalable methods for solution of the underlying optimization problems. We consider an optimization scheme for solving the sparse optimal scoring formulation of LDA based on block coordinate descent. Each iteration of this algorithm requires an update of a scoring vector, which admits an analytic formula, and an update of the corresponding discriminant vector, which requires solution of a convex subproblem; we will propose several variants of this algorithm where the proximal gradient method or the alternating direction method of multipliers is used to solve this subproblem. We show that the per-iteration cost of these methods scales linearly in the dimension of the data provided restricted regularization terms are employed, and cubically in the dimension of the data in the worst case. Furthermore, we establish that when this block coordinate descent framework generates convergent subsequences of iterates, then these subsequences converge to the stationary points of the sparse optimal scoring problem. We demonstrate the effectiveness of our new methods with empirical results for classification of Gaussian data and data sets drawn from benchmarking repositories, including time-series and multispectral X-ray data, and provide Matlab and R implementations of our optimization schemes.
{"title":"Proximal methods for sparse optimal scoring and discriminant analysis","authors":"Summer Atkins, Gudmundur Einarsson, Line Clemmensen, Brendan Ames","doi":"10.1007/s11634-022-00530-6","DOIUrl":"10.1007/s11634-022-00530-6","url":null,"abstract":"<div><p>Linear discriminant analysis (LDA) is a classical method for dimensionality reduction, where discriminant vectors are sought to project data to a lower dimensional space for optimal separability of classes. Several recent papers have outlined strategies, based on exploiting sparsity of the discriminant vectors, for performing LDA in the high-dimensional setting where the number of features exceeds the number of observations in the data. However, many of these proposed methods lack scalable methods for solution of the underlying optimization problems. We consider an optimization scheme for solving the sparse optimal scoring formulation of LDA based on block coordinate descent. Each iteration of this algorithm requires an update of a scoring vector, which admits an analytic formula, and an update of the corresponding discriminant vector, which requires solution of a convex subproblem; we will propose several variants of this algorithm where the proximal gradient method or the alternating direction method of multipliers is used to solve this subproblem. We show that the per-iteration cost of these methods scales linearly in the dimension of the data provided restricted regularization terms are employed, and cubically in the dimension of the data in the worst case. Furthermore, we establish that when this block coordinate descent framework generates convergent subsequences of iterates, then these subsequences converge to the stationary points of the sparse optimal scoring problem. We demonstrate the effectiveness of our new methods with empirical results for classification of Gaussian data and data sets drawn from benchmarking repositories, including time-series and multispectral X-ray data, and provide <span>Matlab</span> and <span>R</span> implementations of our optimization schemes.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 4","pages":"983 - 1036"},"PeriodicalIF":1.6,"publicationDate":"2022-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50502301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-13DOI: 10.1007/s11634-022-00529-z
Ronald Richman, Mario V. Wüthrich
Deep learning models have been very successful in the application of machine learning methods, often out-performing classical statistical models such as linear regression models or generalized linear models. On the other hand, deep learning models are often criticized for not being explainable nor allowing for variable selection. There are two different ways of dealing with this problem, either we use post-hoc model interpretability methods or we design specific deep learning architectures that allow for an easier interpretation and explanation. This paper builds on our previous work on the LocalGLMnet architecture that gives an interpretable deep learning architecture. In the present paper, we show how group LASSO regularization (and other regularization schemes) can be implemented within the LocalGLMnet architecture so that we receive feature sparsity for variable selection. We benchmark our approach with the recently developed LassoNet of Lemhadri et al. ( LassoNet: a neural network with feature sparsity. J Mach Learn Res 22:1–29, 2021).
深度学习模型在机器学习方法的应用中非常成功,通常优于线性回归模型或广义线性模型等经典统计模型。另一方面,深度学习模型经常被批评为无法解释,也不允许变量选择。有两种不同的方法来处理这个问题,要么我们使用事后模型可解释性方法,要么我们设计特定的深度学习架构,以便更容易地进行解释和解释。本文建立在我们之前关于LocalGLMnet架构的工作之上,该架构提供了一个可解释的深度学习架构。在本文中,我们展示了如何在LocalGLMnet架构中实现组LASSO正则化(和其他正则化方案),以便我们接收用于变量选择的特征稀疏性。我们将我们的方法与Lemhardi等人最近开发的LassoNet进行了比较。(LassoNet:一种具有特征稀疏性的神经网络。J Mach Learn Res 22:1-292021)。
{"title":"LASSO regularization within the LocalGLMnet architecture","authors":"Ronald Richman, Mario V. Wüthrich","doi":"10.1007/s11634-022-00529-z","DOIUrl":"10.1007/s11634-022-00529-z","url":null,"abstract":"<div><p>Deep learning models have been very successful in the application of machine learning methods, often out-performing classical statistical models such as linear regression models or generalized linear models. On the other hand, deep learning models are often criticized for not being explainable nor allowing for variable selection. There are two different ways of dealing with this problem, either we use post-hoc model interpretability methods or we design specific deep learning architectures that allow for an easier interpretation and explanation. This paper builds on our previous work on the LocalGLMnet architecture that gives an interpretable deep learning architecture. In the present paper, we show how group LASSO regularization (and other regularization schemes) can be implemented within the LocalGLMnet architecture so that we receive feature sparsity for variable selection. We benchmark our approach with the recently developed LassoNet of Lemhadri et al. ( LassoNet: a neural network with feature sparsity. J Mach Learn Res 22:1–29, 2021).</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 4","pages":"951 - 981"},"PeriodicalIF":1.6,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50047295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-17DOI: 10.1007/s11634-022-00528-0
Hyukjun Gweon
In multi-class classification, the output of a probabilistic classifier is a probability distribution of the classes. In this work, we focus on a statistical assessment of the reliability of probabilistic classifiers for multi-class problems. Our approach generates a Pearson (chi ^2) statistic based on the k-nearest-neighbors in the prediction space. Further, we develop a Bayesian approach for estimating the expected power of the reliability test that can be used for an appropriate sample size k. We propose a sampling algorithm and demonstrate that this algorithm obtains a valid prior distribution. The effectiveness of the proposed reliability test and expected power is evaluated through a simulation study. We also provide illustrative examples of the proposed methods with practical applications.
{"title":"A power-controlled reliability assessment for multi-class probabilistic classifiers","authors":"Hyukjun Gweon","doi":"10.1007/s11634-022-00528-0","DOIUrl":"10.1007/s11634-022-00528-0","url":null,"abstract":"<div><p>In multi-class classification, the output of a probabilistic classifier is a probability distribution of the classes. In this work, we focus on a statistical assessment of the reliability of probabilistic classifiers for multi-class problems. Our approach generates a Pearson <span>(chi ^2)</span> statistic based on the <i>k</i>-nearest-neighbors in the prediction space. Further, we develop a Bayesian approach for estimating the expected power of the reliability test that can be used for an appropriate sample size <i>k</i>. We propose a sampling algorithm and demonstrate that this algorithm obtains a valid prior distribution. The effectiveness of the proposed reliability test and expected power is evaluated through a simulation study. We also provide illustrative examples of the proposed methods with practical applications.\u0000</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 4","pages":"927 - 949"},"PeriodicalIF":1.6,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50071056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-16DOI: 10.1007/s11634-022-00526-2
Alex Sharp, Glen Chalatov, Ryan P. Browne
We present a parsimonious dual-subspace clustering approach for a mixture of matrix-normal distributions. By assuming certain principal components of the row and column covariance matrices are equally important, we express the model in fewer parameters without sacrificing discriminatory information. We derive update rules for an ECM algorithm and set forth necessary conditions to ensure identifiability. We use simulation to demonstrate parameter recovery, and we illustrate the parsimony and competitive performance of the model through two data analyses.
{"title":"A dual subspace parsimonious mixture of matrix normal distributions","authors":"Alex Sharp, Glen Chalatov, Ryan P. Browne","doi":"10.1007/s11634-022-00526-2","DOIUrl":"10.1007/s11634-022-00526-2","url":null,"abstract":"<div><p>We present a parsimonious dual-subspace clustering approach for a mixture of matrix-normal distributions. By assuming certain principal components of the row and column covariance matrices are equally important, we express the model in fewer parameters without sacrificing discriminatory information. We derive update rules for an ECM algorithm and set forth necessary conditions to ensure identifiability. We use simulation to demonstrate parameter recovery, and we illustrate the parsimony and competitive performance of the model through two data analyses.\u0000</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 3","pages":"801 - 822"},"PeriodicalIF":1.6,"publicationDate":"2022-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50032840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-12DOI: 10.1007/s11634-022-00527-1
Liang-Ching Lin, Meihui Guo, Sangyeol Lee
This study considers monitoring photochemical pollutants for anomaly detection based on symbolic interval-valued data analysis. For this task, we construct control charts based on the principal component scores of symbolic interval-valued data. Herein, the symbolic interval-valued data are assumed to follow a normal distribution, and an approximate expectation formula of order statistics from the normal distribution is used in the univariate case to estimate the mean and variance via the method of moments. In addition, we consider the bivariate case wherein we use the maximum likelihood estimator calculated from the likelihood function derived under a bivariate copula. We also establish the procedures for the statistical control chart based on the univariate and bivariate interval-valued variables, and the procedures are potentially extendable to higher dimensional cases. Monte Carlo simulations and real data analysis using photochemical pollutants confirm the validity of the proposed method. The results particularly show the superiority over the conventional method that uses the averages to identify the date on which the abnormal maximum occurred.
{"title":"Monitoring photochemical pollutants based on symbolic interval-valued data analysis","authors":"Liang-Ching Lin, Meihui Guo, Sangyeol Lee","doi":"10.1007/s11634-022-00527-1","DOIUrl":"10.1007/s11634-022-00527-1","url":null,"abstract":"<div><p>This study considers monitoring photochemical pollutants for anomaly detection based on symbolic interval-valued data analysis. For this task, we construct control charts based on the principal component scores of symbolic interval-valued data. Herein, the symbolic interval-valued data are assumed to follow a normal distribution, and an approximate expectation formula of order statistics from the normal distribution is used in the univariate case to estimate the mean and variance via the method of moments. In addition, we consider the bivariate case wherein we use the maximum likelihood estimator calculated from the likelihood function derived under a bivariate copula. We also establish the procedures for the statistical control chart based on the univariate and bivariate interval-valued variables, and the procedures are potentially extendable to higher dimensional cases. Monte Carlo simulations and real data analysis using photochemical pollutants confirm the validity of the proposed method. The results particularly show the superiority over the conventional method that uses the averages to identify the date on which the abnormal maximum occurred.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 4","pages":"897 - 926"},"PeriodicalIF":1.6,"publicationDate":"2022-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50045936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-31DOI: 10.1007/s11634-022-00525-3
Maurizio Vichi, Andrea Ceroli, Hans A. Kestler, Akinori Okada, Claus Weihs
{"title":"Editorial for ADAC issue 4 of volume 16 (2022)","authors":"Maurizio Vichi, Andrea Ceroli, Hans A. Kestler, Akinori Okada, Claus Weihs","doi":"10.1007/s11634-022-00525-3","DOIUrl":"10.1007/s11634-022-00525-3","url":null,"abstract":"","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"16 4","pages":"817 - 821"},"PeriodicalIF":1.6,"publicationDate":"2022-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50529237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-20DOI: 10.1007/s11634-022-00516-4
Eustasio del Barrio, Hristo Inouzhe, Jean-Michel Loubes
We consider the problem of diversity enhancing clustering, i.e, developing clustering methods which produce clusters that favour diversity with respect to a set of protected attributes such as race, sex, age, etc. In the context of fair clustering, diversity plays a major role when fairness is understood as demographic parity. To promote diversity, we introduce perturbations to the distance in the unprotected attributes that account for protected attributes in a way that resembles attraction-repulsion of charged particles in Physics. These perturbations are defined through dissimilarities with a tractable interpretation. Cluster analysis based on attraction-repulsion dissimilarities penalizes homogeneity of the clusters with respect to the protected attributes and leads to an improvement in diversity. An advantage of our approach, which falls into a pre-processing set-up, is its compatibility with a wide variety of clustering methods and whit non-Euclidean data. We illustrate the use of our procedures with both synthetic and real data and provide discussion about the relation between diversity, fairness, and cluster structure.
{"title":"Attraction-repulsion clustering: a way of promoting diversity linked to demographic parity in fair clustering","authors":"Eustasio del Barrio, Hristo Inouzhe, Jean-Michel Loubes","doi":"10.1007/s11634-022-00516-4","DOIUrl":"10.1007/s11634-022-00516-4","url":null,"abstract":"<div><p>We consider the problem of <i>diversity enhancing clustering</i>, i.e, developing clustering methods which produce clusters that favour diversity with respect to a set of protected attributes such as race, sex, age, etc. In the context of <i>fair clustering</i>, diversity plays a major role when fairness is understood as demographic parity. To promote diversity, we introduce perturbations to the distance in the unprotected attributes that account for protected attributes in a way that resembles attraction-repulsion of charged particles in Physics. These perturbations are defined through dissimilarities with a tractable interpretation. Cluster analysis based on attraction-repulsion dissimilarities penalizes homogeneity of the clusters with respect to the protected attributes and leads to an improvement in diversity. An advantage of our approach, which falls into a pre-processing set-up, is its compatibility with a wide variety of clustering methods and whit non-Euclidean data. We illustrate the use of our procedures with both synthetic and real data and provide discussion about the relation between diversity, fairness, and cluster structure.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 4","pages":"859 - 896"},"PeriodicalIF":1.6,"publicationDate":"2022-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-022-00516-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50040006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-19DOI: 10.1007/s11634-022-00524-4
Qin Wang, Yuan Xue
Sufficient dimension reduction (SDR) is a useful tool for high-dimensional data analysis. SDR aims at reducing the data dimensionality without loss of regression information between the response and its high-dimensional predictors. Many existing SDR methods are designed for the data with continuous responses. Motivated by a recent work on aggregate dimension reduction (Wang in Stat Si 30:1027–1048, 2020), we propose a unified SDR framework for both continuous and binary responses through a structured covariance ensemble. The connection with existing approaches is discussed in details and an efficient algorithm is proposed. Numerical examples and a real data application demonstrate its satisfactory performance.
充分降维(SDR)是高维数据分析的一种有用工具。SDR旨在降低数据维度,而不会丢失响应与其高维预测因子之间的回归信息。许多现有的SDR方法都是针对具有连续响应的数据而设计的。受最近一项关于聚合降维的工作的启发(Wang在Stat Si 30:1027-10482020中),我们通过结构化协方差集合为连续和二进制响应提出了一个统一的SDR框架。详细讨论了与现有方法的联系,并提出了一种有效的算法。数值算例和实际数据应用表明,该方法具有令人满意的性能。
{"title":"A structured covariance ensemble for sufficient dimension reduction","authors":"Qin Wang, Yuan Xue","doi":"10.1007/s11634-022-00524-4","DOIUrl":"10.1007/s11634-022-00524-4","url":null,"abstract":"<div><p>Sufficient dimension reduction (SDR) is a useful tool for high-dimensional data analysis. SDR aims at reducing the data dimensionality without loss of regression information between the response and its high-dimensional predictors. Many existing SDR methods are designed for the data with continuous responses. Motivated by a recent work on aggregate dimension reduction (Wang in Stat Si 30:1027–1048, 2020), we propose a unified SDR framework for both continuous and binary responses through a structured covariance ensemble. The connection with existing approaches is discussed in details and an efficient algorithm is proposed. Numerical examples and a real data application demonstrate its satisfactory performance.\u0000</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 3","pages":"777 - 800"},"PeriodicalIF":1.6,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50497854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}