{"title":"New forest-based approaches for sufficient dimension reduction","authors":"Shuang Dai, Ping Wu, Zhou Yu","doi":"10.1007/s11222-024-10482-w","DOIUrl":null,"url":null,"abstract":"<p>Sufficient dimension reduction (SDR) primarily aims to reduce the dimensionality of high-dimensional predictor variables while retaining essential information about the responses. Traditional SDR methods typically employ kernel weighting functions, which unfortunately makes them susceptible to the curse of dimensionality. To address this issue, we in this paper propose novel forest-based approaches for SDR that utilize a locally adaptive kernel generated by Mondrian forests. Overall, our work takes the perspective of Mondrian forest as an adaptive weighted kernel technique for SDR problems. In the central mean subspace model, by integrating the methods from Xia et al. (J R Stat Soc Ser B (Stat Methodol) 64(3):363–410, 2002. https://doi.org/10.1111/1467-9868.03411) with Mondrian forest weights, we suggest the forest-based outer product of gradients estimation (mf-OPG) and the forest-based minimum average variance estimation (mf-MAVE). Moreover, we substitute the kernels used in nonparametric density function estimations (Xia in Ann Stat 35(6):2654–2690, 2007. https://doi.org/10.1214/009053607000000352), targeting the central subspace, with Mondrian forest weights. These techniques are referred to as mf-dOPG and mf-dMAVE, respectively. Under regularity conditions, we establish the asymptotic properties of our forest-based estimators, as well as the convergence of the affiliated algorithms. Through simulation studies and analysis of fully observable data, we demonstrate substantial improvements in computational efficiency and predictive accuracy of our proposals compared with the traditional counterparts.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":1.6000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics and Computing","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s11222-024-10482-w","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Sufficient dimension reduction (SDR) primarily aims to reduce the dimensionality of high-dimensional predictor variables while retaining essential information about the responses. Traditional SDR methods typically employ kernel weighting functions, which unfortunately makes them susceptible to the curse of dimensionality. To address this issue, we in this paper propose novel forest-based approaches for SDR that utilize a locally adaptive kernel generated by Mondrian forests. Overall, our work takes the perspective of Mondrian forest as an adaptive weighted kernel technique for SDR problems. In the central mean subspace model, by integrating the methods from Xia et al. (J R Stat Soc Ser B (Stat Methodol) 64(3):363–410, 2002. https://doi.org/10.1111/1467-9868.03411) with Mondrian forest weights, we suggest the forest-based outer product of gradients estimation (mf-OPG) and the forest-based minimum average variance estimation (mf-MAVE). Moreover, we substitute the kernels used in nonparametric density function estimations (Xia in Ann Stat 35(6):2654–2690, 2007. https://doi.org/10.1214/009053607000000352), targeting the central subspace, with Mondrian forest weights. These techniques are referred to as mf-dOPG and mf-dMAVE, respectively. Under regularity conditions, we establish the asymptotic properties of our forest-based estimators, as well as the convergence of the affiliated algorithms. Through simulation studies and analysis of fully observable data, we demonstrate substantial improvements in computational efficiency and predictive accuracy of our proposals compared with the traditional counterparts.
充分降维(SDR)的主要目的是降低高维预测变量的维度,同时保留反应的基本信息。传统的降维方法通常采用核加权函数,但不幸的是,这种方法容易受到维度诅咒的影响。为了解决这个问题,我们在本文中提出了基于森林的新型 SDR 方法,该方法利用蒙德里安森林生成的局部自适应核。总体而言,我们的工作从蒙德里安森林的角度出发,将其作为一种用于 SDR 问题的自适应加权核技术。在中心均值子空间模型中,通过将 Xia 等人的方法(J R Stat Soc Ser B (Stat Methodol) 64(3):363-410, 2002. https://doi.org/10.1111/1467-9868.03411)与蒙德里安森林权重相结合,我们提出了基于森林的梯度外积估计(mf-OPG)和基于森林的最小平均方差估计(mf-MAVE)。此外,我们还用蒙德里安森林权重替代了非参数密度函数估计中使用的核(Xia 在 Ann Stat 35(6):2654-2690, 2007. https://doi.org/10.1214/009053607000000352),以中心子空间为目标。这些技术分别称为 mf-dOPG 和 mf-dMAVE。在正则条件下,我们建立了基于森林的估计器的渐近特性,以及附属算法的收敛性。通过模拟研究和对完全可观测数据的分析,我们证明了与传统方法相比,我们的建议在计算效率和预测准确性方面都有了大幅提高。
期刊介绍:
Statistics and Computing is a bi-monthly refereed journal which publishes papers covering the range of the interface between the statistical and computing sciences.
In particular, it addresses the use of statistical concepts in computing science, for example in machine learning, computer vision and data analytics, as well as the use of computers in data modelling, prediction and analysis. Specific topics which are covered include: techniques for evaluating analytically intractable problems such as bootstrap resampling, Markov chain Monte Carlo, sequential Monte Carlo, approximate Bayesian computation, search and optimization methods, stochastic simulation and Monte Carlo, graphics, computer environments, statistical approaches to software errors, information retrieval, machine learning, statistics of databases and database technology, huge data sets and big data analytics, computer algebra, graphical models, image processing, tomography, inverse problems and uncertainty quantification.
In addition, the journal contains original research reports, authoritative review papers, discussed papers, and occasional special issues on particular topics or carrying proceedings of relevant conferences. Statistics and Computing also publishes book review and software review sections.