首页 > 最新文献

Journal of Multivariate Analysis最新文献

英文 中文
Parametric convergence rate of a non-parametric estimator in multivariate mixtures of power series distributions under conditional independence 条件无关下幂级数分布多元混合中非参数估计量的参数收敛速率
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-03-01 Epub Date: 2025-11-19 DOI: 10.1016/j.jmva.2025.105542
Fadoua Balabdaoui , Harald Besdziek , Yong Wang
The conditional independence assumption has recently appeared in a growing body of literature on the estimation of multivariate mixtures. We consider here conditionally independent multivariate mixtures of power series distributions with infinite support, to which belong Poisson, Geometric or Negative Binomial mixtures. We show that for all these mixtures, the non-parametric maximum likelihood estimator converges to the truth at the rate (ln(nd))1+d/2n1/2 in the Hellinger distance, where n denotes the size of the observed sample and d represents the dimension of the mixture. Using this result, we then construct a new non-parametric estimator based on the maximum likelihood estimator that converges with the parametric rate n1/2 in all p-distances, for p1. These convergences rates are supported by simulations and the theory is illustrated using the famous Vélib dataset of the bike sharing system of Paris. We also introduce a testing procedure for whether the conditional independence assumption is satisfied for a given sample. This testing procedure is applied for several multivariate mixtures, with varying levels of dependence, and is thereby shown to distinguish well between conditionally independent and dependent mixtures. Finally, we use this testing procedure to investigate whether conditional independence holds for Vélib dataset.
条件独立假设最近出现在越来越多关于多元混合估计的文献中。本文考虑具有无限支持的幂级数分布的条件独立多元混合,它们属于泊松混合、几何混合或负二项式混合。我们证明,对于所有这些混合物,非参数极大似然估计在海灵格距离中以(ln(nd))1+d/2n−1/2的速率收敛于真值,其中n表示观察样本的大小,d表示混合物的尺寸。利用这一结果,我们构造了一个新的基于极大似然估计量的非参数估计量,当p≥1时,该估计量在所有的p距离上以参数速率n−1/2收敛。这些收敛速度得到了仿真的支持,并用著名的巴黎共享单车系统vsamublb数据集说明了该理论。我们还介绍了对给定样本是否满足条件独立假设的检验过程。该测试程序适用于几个多变量混合物,具有不同程度的依赖,因此可以很好地区分条件独立和依赖混合物。最后,我们使用这个测试过程来调查vsamlib数据集是否条件独立。
{"title":"Parametric convergence rate of a non-parametric estimator in multivariate mixtures of power series distributions under conditional independence","authors":"Fadoua Balabdaoui ,&nbsp;Harald Besdziek ,&nbsp;Yong Wang","doi":"10.1016/j.jmva.2025.105542","DOIUrl":"10.1016/j.jmva.2025.105542","url":null,"abstract":"<div><div>The conditional independence assumption has recently appeared in a growing body of literature on the estimation of multivariate mixtures. We consider here conditionally independent multivariate mixtures of power series distributions with infinite support, to which belong Poisson, Geometric or Negative Binomial mixtures. We show that for all these mixtures, the non-parametric maximum likelihood estimator converges to the truth at the rate <span><math><mrow><msup><mrow><mrow><mo>(</mo><mo>ln</mo><mrow><mo>(</mo><mi>n</mi><mi>d</mi><mo>)</mo></mrow><mo>)</mo></mrow></mrow><mrow><mn>1</mn><mo>+</mo><mi>d</mi><mo>/</mo><mn>2</mn></mrow></msup><msup><mrow><mi>n</mi></mrow><mrow><mo>−</mo><mn>1</mn><mo>/</mo><mn>2</mn></mrow></msup></mrow></math></span> in the Hellinger distance, where <span><math><mi>n</mi></math></span> denotes the size of the observed sample and <span><math><mi>d</mi></math></span> represents the dimension of the mixture. Using this result, we then construct a new non-parametric estimator based on the maximum likelihood estimator that converges with the parametric rate <span><math><msup><mrow><mi>n</mi></mrow><mrow><mo>−</mo><mn>1</mn><mo>/</mo><mn>2</mn></mrow></msup></math></span> in all <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mi>p</mi></mrow></msub></math></span>-distances, for <span><math><mrow><mi>p</mi><mo>≥</mo><mn>1</mn></mrow></math></span>. These convergences rates are supported by simulations and the theory is illustrated using the famous Vélib dataset of the bike sharing system of Paris. We also introduce a testing procedure for whether the conditional independence assumption is satisfied for a given sample. This testing procedure is applied for several multivariate mixtures, with varying levels of dependence, and is thereby shown to distinguish well between conditionally independent and dependent mixtures. Finally, we use this testing procedure to investigate whether conditional independence holds for Vélib dataset.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105542"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical guarantees for distribution estimation of contaminated data via DNN-based MoM-GANs 基于dnn的mom - gan对污染数据分布估计的统计保证
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-03-01 Epub Date: 2025-11-28 DOI: 10.1016/j.jmva.2025.105571
Fang Xie , Lihu Xu , Qiuran Yao , Huiming Zhang
This paper investigates the distribution estimation of contaminated data using the MoM-GAN method, which leverages the power of generative adversarial nets (GANs) and median-of-means (MoM) estimation. Specifically, we use a deep neural network (DNN) with a ReLU activation function to model the generator and discriminator of the GAN. In terms of theoretical analysis, we derive a non-asymptotic error bound for the DNN-based MoM-GAN estimator, which is measured by integral probability metrics and takes into account the b-smoothness Hölder class. The error bound essentially decreases in nb/pn1/2, where n and p are the sample size and the dimension of the input data, respectively. It provides a rigorous guarantee of the accuracy and robustness of the MoM-GAN estimator, even in the presence of contaminated data. We present an algorithm for the MoM-GAN method and demonstrate its effectiveness in two real-world applications. Our results show that the MoM-GAN method outperforms other competing methods when dealing with contaminated data, highlighting its superior performance and robustness.
本文利用生成对抗网络(gan)和均值中位数(MoM)估计的力量,研究了使用MoM- gan方法对污染数据的分布估计。具体来说,我们使用具有ReLU激活函数的深度神经网络(DNN)来建模GAN的生成器和鉴别器。在理论分析方面,我们推导了基于dnn的MoM-GAN估计器的非渐近误差界,该估计器通过积分概率度量来测量,并考虑了b-平滑Hölder类。误差界本质上在n−b/p中∨n−1/2中减小,其中n和p分别是输入数据的样本量和维数。它为MoM-GAN估计器的准确性和鲁棒性提供了严格的保证,即使在存在污染数据的情况下。我们提出了一种MoM-GAN方法的算法,并在两个实际应用中证明了它的有效性。我们的结果表明,MoM-GAN方法在处理污染数据时优于其他竞争方法,突出了其优越的性能和鲁棒性。
{"title":"Statistical guarantees for distribution estimation of contaminated data via DNN-based MoM-GANs","authors":"Fang Xie ,&nbsp;Lihu Xu ,&nbsp;Qiuran Yao ,&nbsp;Huiming Zhang","doi":"10.1016/j.jmva.2025.105571","DOIUrl":"10.1016/j.jmva.2025.105571","url":null,"abstract":"<div><div>This paper investigates the distribution estimation of contaminated data using the MoM-GAN method, which leverages the power of generative adversarial nets (GANs) and median-of-means (MoM) estimation. Specifically, we use a deep neural network (DNN) with a ReLU activation function to model the generator and discriminator of the GAN. In terms of theoretical analysis, we derive a non-asymptotic error bound for the DNN-based MoM-GAN estimator, which is measured by integral probability metrics and takes into account the <span><math><mi>b</mi></math></span>-smoothness Hölder class. The error bound essentially decreases in <span><math><mrow><msup><mrow><mi>n</mi></mrow><mrow><mo>−</mo><mi>b</mi><mo>/</mo><mi>p</mi></mrow></msup><mo>∨</mo><msup><mrow><mi>n</mi></mrow><mrow><mo>−</mo><mn>1</mn><mo>/</mo><mn>2</mn></mrow></msup></mrow></math></span>, where <span><math><mi>n</mi></math></span> and <span><math><mi>p</mi></math></span> are the sample size and the dimension of the input data, respectively. It provides a rigorous guarantee of the accuracy and robustness of the MoM-GAN estimator, even in the presence of contaminated data. We present an algorithm for the MoM-GAN method and demonstrate its effectiveness in two real-world applications. Our results show that the MoM-GAN method outperforms other competing methods when dealing with contaminated data, highlighting its superior performance and robustness.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105571"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A latent space model for link prediction in statistical citation network 统计引文网络中链接预测的潜在空间模型
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-03-01 Epub Date: 2025-11-28 DOI: 10.1016/j.jmva.2025.105555
Rui Pan , Yuan Gao , Hansheng Wang
Link prediction is of vital importance in network analysis. In this work, we propose a novel latent space model for link prediction in a statistical citation network. Specifically, the model can incorporate the transitivity information of both the citation network and the author-paper network. In addition, nodal features are also taken into consideration and the pseudo maximum likelihood estimation of the corresponding parameter is developed. The asymptotic consistency is established and demonstrated through extensive simulation studies. Link prediction is then performed and the performance is compared among different methods. At last, a real citation network of statistics is analyzed.
链路预测在网络分析中起着至关重要的作用。在这项工作中,我们提出了一种新的潜在空间模型用于统计引用网络中的链接预测。具体来说,该模型可以同时包含引文网络和作者-论文网络的及物性信息。此外,还考虑了节点特征,提出了相应参数的伪极大似然估计。通过大量的仿真研究,建立并证明了渐近一致性。然后进行链路预测,并比较不同方法的性能。最后,对一个真实的统计引文网络进行了分析。
{"title":"A latent space model for link prediction in statistical citation network","authors":"Rui Pan ,&nbsp;Yuan Gao ,&nbsp;Hansheng Wang","doi":"10.1016/j.jmva.2025.105555","DOIUrl":"10.1016/j.jmva.2025.105555","url":null,"abstract":"<div><div>Link prediction is of vital importance in network analysis. In this work, we propose a novel latent space model for link prediction in a statistical citation network. Specifically, the model can incorporate the transitivity information of both the citation network and the author-paper network. In addition, nodal features are also taken into consideration and the pseudo maximum likelihood estimation of the corresponding parameter is developed. The asymptotic consistency is established and demonstrated through extensive simulation studies. Link prediction is then performed and the performance is compared among different methods. At last, a real citation network of statistics is analyzed.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105555"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimation and testing for fixed effects partially linear nonparametric panel regression model with separable spatially and serially correlated error structure 具有可分离空间和序列相关误差结构的部分线性非参数面板回归模型的固定效应估计与检验
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-03-01 Epub Date: 2025-11-28 DOI: 10.1016/j.jmva.2025.105552
Shuangshuang Li , Jianbao Chen
Panel data collected from “locations” may exhibit spatial and serial correlations. In order to study such spatial and serial correlations, and possible existing nonlinear relationships, a fixed effects partially linear nonparametric panel regression model with separable spatially and serially correlated error structure is introduced. We obtain profile quasi-maximum likelihood estimators (PQMLEs) of the unknowns. Furthermore, a generalized F-test called FNT is designed for assessing the reasonability of nonparametric component setting. Asymptotic properties of PQMLEs and FNT are provided under several conditions. Monte Carlo trials imply our estimators and test statistic exhibit good performance in finite samples and model misspecification may lead to substantial influence on the estimates of unknown parameters. The analysis of provincial housing price in China reveals the presence of nonlinear, spatial and serial correlation relationships.
从“地点”收集的面板数据可能显示空间和序列相关性。为了研究这种空间和序列相关性,以及可能存在的非线性关系,引入了一种具有可分离空间和序列相关误差结构的固定效应部分线性非参数面板回归模型。我们得到了未知的轮廓拟极大似然估计。此外,设计了一种称为FNT的广义f检验来评估非参数组件设置的合理性。在几种条件下,给出了PQMLEs和FNT的渐近性质。蒙特卡罗试验表明,我们的估计量和检验统计量在有限的样本中表现出良好的性能,模型的错误规范可能会对未知参数的估计产生实质性的影响。通过对中国各省房价的分析,可以发现各省房价之间存在着非线性的、空间的和序列的相关关系。
{"title":"Estimation and testing for fixed effects partially linear nonparametric panel regression model with separable spatially and serially correlated error structure","authors":"Shuangshuang Li ,&nbsp;Jianbao Chen","doi":"10.1016/j.jmva.2025.105552","DOIUrl":"10.1016/j.jmva.2025.105552","url":null,"abstract":"<div><div>Panel data collected from “locations” may exhibit spatial and serial correlations. In order to study such spatial and serial correlations, and possible existing nonlinear relationships, a fixed effects partially linear nonparametric panel regression model with separable spatially and serially correlated error structure is introduced. We obtain profile quasi-maximum likelihood estimators (PQMLEs) of the unknowns. Furthermore, a generalized F-test called <span><math><msub><mrow><mi>F</mi></mrow><mrow><mi>N</mi><mi>T</mi></mrow></msub></math></span> is designed for assessing the reasonability of nonparametric component setting. Asymptotic properties of PQMLEs and <span><math><msub><mrow><mi>F</mi></mrow><mrow><mi>N</mi><mi>T</mi></mrow></msub></math></span> are provided under several conditions. Monte Carlo trials imply our estimators and test statistic exhibit good performance in finite samples and model misspecification may lead to substantial influence on the estimates of unknown parameters. The analysis of provincial housing price in China reveals the presence of nonlinear, spatial and serial correlation relationships.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105552"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Testing and measuring the conditional mean (in)dependence for functional data by martingale difference-angle divergence 用鞅差角散度检验和测量函数数据的条件均值(in)依赖性
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-03-01 Epub Date: 2025-11-28 DOI: 10.1016/j.jmva.2025.105573
Tingyu Lai , Yingying Wang , Zhongzhan Zhang
We proposed a new nonparametric method to test and measure conditional mean (in)dependence for functional data. This new metric has some appealing properties: it is nonnegative and equals to zero if and only if the conditional mean independence holds; it is invariant under linear transformations of the predictor; it does not require the moment condition for the predictor variable. Based on this measure, two test procedures for conditional mean independence are proposed for functional data. One uses a wild bootstrap while the other uses the limiting standard normal distribution. The tests are consistent and perform well in finite sample simulations. We further propose some requirements for a reasonable conditional mean dependence measure and demonstrate that our metric has those properties. A real data example is introduced to illustrate the application of the proposed method.
我们提出了一种新的非参数方法来检验和测量函数数据的条件平均(in)依赖性。这个新度量有一些吸引人的性质:它是非负的,当且仅当条件平均独立性成立时等于零;它在预测器的线性变换下是不变的;它不需要预测变量的力矩条件。在此基础上,对功能数据提出了条件均值独立性的两种检验方法。一个使用野生自举,而另一个使用极限标准正态分布。实验结果一致,在有限样本模拟中表现良好。我们进一步提出了合理的条件均值依赖度量的一些要求,并证明了我们的度量具有这些性质。通过一个实际数据实例说明了该方法的应用。
{"title":"Testing and measuring the conditional mean (in)dependence for functional data by martingale difference-angle divergence","authors":"Tingyu Lai ,&nbsp;Yingying Wang ,&nbsp;Zhongzhan Zhang","doi":"10.1016/j.jmva.2025.105573","DOIUrl":"10.1016/j.jmva.2025.105573","url":null,"abstract":"<div><div>We proposed a new nonparametric method to test and measure conditional mean (in)dependence for functional data. This new metric has some appealing properties: it is nonnegative and equals to zero if and only if the conditional mean independence holds; it is invariant under linear transformations of the predictor; it does not require the moment condition for the predictor variable. Based on this measure, two test procedures for conditional mean independence are proposed for functional data. One uses a wild bootstrap while the other uses the limiting standard normal distribution. The tests are consistent and perform well in finite sample simulations. We further propose some requirements for a reasonable conditional mean dependence measure and demonstrate that our metric has those properties. A real data example is introduced to illustrate the application of the proposed method.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105573"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian analysis of nonlinear structured latent factor models with a Gaussian process prior 具有高斯过程先验的非线性结构化潜在因素模型的贝叶斯分析
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-03-01 Epub Date: 2025-11-28 DOI: 10.1016/j.jmva.2025.105577
Yimang Zhang , Xiaorui Wang , Jian Qing Shi
Factor analysis models are widely used in social and behavioral sciences, such as psychology, education, and marketing, to measure unobservable latent traits. In this article, we introduce a nonlinear structured latent factor analysis model that is more flexible in characterizing the relationship between manifest variables and latent factors. The confirmatory identifiability of the latent factor is discussed, ensuring the substantive interpretation of these latent factors. A Bayesian approach with a Gaussian process prior is proposed to estimate the unknown nonlinear function and the unknown parameters. Asymptotic results are established, including the structured identifiability of latent factors, as well as the consistency of estimates for the unknown parameters and the unknown nonlinear function. Simulation studies and real data analysis are conducted to evaluate the performance of the proposed method. The simulation results demonstrate that our proposed method performs well in handling nonlinear model and successfully identifies the latent factors. Additionally, the analysis of oil flow data reveals the underlying structure of latent nonlinear patterns.
因子分析模型广泛应用于社会和行为科学,如心理学、教育和市场营销,以测量不可观察的潜在特征。在本文中,我们引入了一种非线性结构化的潜在因素分析模型,该模型在表征显性变量与潜在因素之间的关系方面更为灵活。讨论了潜在因素的确认性,确保了这些潜在因素的实质性解释。提出了一种具有高斯过程先验的贝叶斯方法来估计未知非线性函数和未知参数。建立了渐近结果,包括潜在因素的结构可辨识性,以及未知参数和未知非线性函数估计的一致性。通过仿真研究和实际数据分析来评估该方法的性能。仿真结果表明,该方法能较好地处理非线性模型,并能成功识别潜在因素。此外,对油流数据的分析揭示了潜在非线性模式的潜在结构。
{"title":"Bayesian analysis of nonlinear structured latent factor models with a Gaussian process prior","authors":"Yimang Zhang ,&nbsp;Xiaorui Wang ,&nbsp;Jian Qing Shi","doi":"10.1016/j.jmva.2025.105577","DOIUrl":"10.1016/j.jmva.2025.105577","url":null,"abstract":"<div><div>Factor analysis models are widely used in social and behavioral sciences, such as psychology, education, and marketing, to measure unobservable latent traits. In this article, we introduce a nonlinear structured latent factor analysis model that is more flexible in characterizing the relationship between manifest variables and latent factors. The confirmatory identifiability of the latent factor is discussed, ensuring the substantive interpretation of these latent factors. A Bayesian approach with a Gaussian process prior is proposed to estimate the unknown nonlinear function and the unknown parameters. Asymptotic results are established, including the structured identifiability of latent factors, as well as the consistency of estimates for the unknown parameters and the unknown nonlinear function. Simulation studies and real data analysis are conducted to evaluate the performance of the proposed method. The simulation results demonstrate that our proposed method performs well in handling nonlinear model and successfully identifies the latent factors. Additionally, the analysis of oil flow data reveals the underlying structure of latent nonlinear patterns.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105577"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recovering Imbalanced Clusters via gradient-based projection pursuit 基于梯度投影追踪的不平衡簇恢复方法
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-03-01 Epub Date: 2025-11-15 DOI: 10.1016/j.jmva.2025.105530
Martin Eppert , Satyaki Mukherjee , Debarghya Ghoshdastidar
Projection Pursuit is a classic exploratory technique for finding interesting projections of a dataset. We propose a method for recovering projections containing either Imbalanced Clusters or a Bernoulli–Rademacher distribution using a gradient-based technique to optimize the projection index. As sample complexity is a major limiting factor in Projection Pursuit, we analyze our algorithm’s sample complexity within a Planted Vector setting where we can observe that Imbalanced Clusters can be recovered more easily than balanced ones. Additionally, we give a generalized result that works for a variety of data distributions and projection indices. We compare these results to computational lower bounds in the Low-Degree-Polynomial Framework. Finally, we experimentally evaluate our method’s applicability to real-world data using FashionMNIST and the Human Activity Recognition Dataset, where our algorithm outperforms others when only a few samples are available.
投影追踪是一种经典的探索性技术,用于寻找数据集的有趣投影。我们提出了一种使用基于梯度的技术来优化投影指数的方法来恢复包含不平衡簇或伯努利-拉德马赫分布的投影。由于样本复杂度是投影追踪的主要限制因素,我们在一个种植向量设置中分析了我们的算法的样本复杂度,我们可以观察到不平衡的集群比平衡的集群更容易恢复。此外,我们还给出了一个适用于各种数据分布和投影指标的广义结果。我们将这些结果与低次多项式框架中的计算下界进行比较。最后,我们通过实验评估了我们的方法对现实世界数据的适用性,使用FashionMNIST和人类活动识别数据集,当只有少数样本可用时,我们的算法优于其他算法。
{"title":"Recovering Imbalanced Clusters via gradient-based projection pursuit","authors":"Martin Eppert ,&nbsp;Satyaki Mukherjee ,&nbsp;Debarghya Ghoshdastidar","doi":"10.1016/j.jmva.2025.105530","DOIUrl":"10.1016/j.jmva.2025.105530","url":null,"abstract":"<div><div>Projection Pursuit is a classic exploratory technique for finding interesting projections of a dataset. We propose a method for recovering projections containing either Imbalanced Clusters or a Bernoulli–Rademacher distribution using a gradient-based technique to optimize the projection index. As sample complexity is a major limiting factor in Projection Pursuit, we analyze our algorithm’s sample complexity within a Planted Vector setting where we can observe that Imbalanced Clusters can be recovered more easily than balanced ones. Additionally, we give a generalized result that works for a variety of data distributions and projection indices. We compare these results to computational lower bounds in the Low-Degree-Polynomial Framework. Finally, we experimentally evaluate our method’s applicability to real-world data using FashionMNIST and the Human Activity Recognition Dataset, where our algorithm outperforms others when only a few samples are available.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105530"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145616469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust two-way dimension reduction by Grassmannian barycenter 格拉斯曼质心鲁棒双向降维
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-03-01 Epub Date: 2025-11-08 DOI: 10.1016/j.jmva.2025.105527
Zeyu Li , Yong He , Xinbing Kong , Xinsheng Zhang
Two-way dimension reduction for well-structured matrix-valued data is growing popular in the past few years. To achieve robustness against individual matrix outliers with large spikes, arising either from heavy-tailed noise or large individual low-rank signals deviating from the population subspace, we first calculate the leading singular subspaces of each individual matrix, and then find the barycenter of the locally estimated subspaces across all observations, in contrast to the existing methods which first integrate data across observations and then do eigenvalue decomposition. In addition, a robust cut-off dimension determination criteria is suggested based on comparing the eigenvalue ratios of the corresponding Euclidean means of the projection matrices. Theoretical properties of the resulting estimators are investigated under mild conditions. Numerical simulation studies justify the advantages and robustness of the proposed methods over the existing tools. Two real examples associated with medical imaging and financial portfolios are given to provide empirical evidence on our arguments and also to illustrate the usefulness of the algorithms.
结构良好的矩阵值数据的双向降维在过去几年中越来越流行。为了实现对由重尾噪声或偏离总体子空间的大个体低秩信号引起的具有大峰值的单个矩阵异常值的鲁棒性,我们首先计算每个单个矩阵的领先奇异子空间,然后找到局部估计子空间的重心,跨越所有观测值,而不是现有的方法,首先跨观测值整合数据,然后进行特征值分解。此外,通过比较投影矩阵对应欧几里得均值的特征值比值,提出了一种鲁棒的截止维确定准则。在温和条件下研究了所得估计量的理论性质。数值模拟研究证明了所提方法相对于现有工具的优越性和鲁棒性。给出了两个与医学成像和金融投资组合相关的真实例子,为我们的论点提供了经验证据,也说明了算法的有用性。
{"title":"Robust two-way dimension reduction by Grassmannian barycenter","authors":"Zeyu Li ,&nbsp;Yong He ,&nbsp;Xinbing Kong ,&nbsp;Xinsheng Zhang","doi":"10.1016/j.jmva.2025.105527","DOIUrl":"10.1016/j.jmva.2025.105527","url":null,"abstract":"<div><div>Two-way dimension reduction for well-structured matrix-valued data is growing popular in the past few years. To achieve robustness against individual matrix outliers with large spikes, arising either from heavy-tailed noise or large individual low-rank signals deviating from the population subspace, we first calculate the leading singular subspaces of each individual matrix, and then find the barycenter of the locally estimated subspaces across all observations, in contrast to the existing methods which first integrate data across observations and then do eigenvalue decomposition. In addition, a robust cut-off dimension determination criteria is suggested based on comparing the eigenvalue ratios of the corresponding Euclidean means of the projection matrices. Theoretical properties of the resulting estimators are investigated under mild conditions. Numerical simulation studies justify the advantages and robustness of the proposed methods over the existing tools. Two real examples associated with medical imaging and financial portfolios are given to provide empirical evidence on our arguments and also to illustrate the usefulness of the algorithms.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105527"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145616470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On convergence of regularized covariance estimator based on modified Cholesky decomposition 基于修正Cholesky分解的正则化协方差估计的收敛性
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-03-01 Epub Date: 2025-11-28 DOI: 10.1016/j.jmva.2025.105553
Yuli Liang , Deliang Dai , Shaobo Jin
The regularization for covariance matrix is a widely used technique when estimating large covariance matrices. This paper examines a penalized likelihood method for constructing a statistically efficient covariance matrix estimator. Modified Cholesky decomposition (MCD) is used to parameterize the covariance matrix and the effective regularization scheme is achieved by combining both shrinkage and smoothing penalties on the Cholesky factor. The practical performance is at odds with an absence of theoretical properties of the derived estimators in the literature. In this work, we aim to fill the gap between theory and practice by establishing the convergence properties under regularity conditions. We also provide a simulation study as numerical illustrations.
协方差矩阵的正则化是估计大协方差矩阵时广泛使用的一种技术。本文研究了一种惩罚似然法来构造统计有效的协方差矩阵估计量。采用改进的Cholesky分解(MCD)对协方差矩阵进行参数化,并结合Cholesky因子的收缩和平滑惩罚实现有效的正则化方案。实际性能与文献中导出的估计器的理论性质的缺乏不一致。在这项工作中,我们旨在通过建立正则条件下的收敛性来填补理论与实践之间的差距。我们还提供了一个仿真研究作为数值说明。
{"title":"On convergence of regularized covariance estimator based on modified Cholesky decomposition","authors":"Yuli Liang ,&nbsp;Deliang Dai ,&nbsp;Shaobo Jin","doi":"10.1016/j.jmva.2025.105553","DOIUrl":"10.1016/j.jmva.2025.105553","url":null,"abstract":"<div><div>The regularization for covariance matrix is a widely used technique when estimating large covariance matrices. This paper examines a penalized likelihood method for constructing a statistically efficient covariance matrix estimator. Modified Cholesky decomposition (MCD) is used to parameterize the covariance matrix and the effective regularization scheme is achieved by combining both shrinkage and smoothing penalties on the Cholesky factor. The practical performance is at odds with an absence of theoretical properties of the derived estimators in the literature. In this work, we aim to fill the gap between theory and practice by establishing the convergence properties under regularity conditions. We also provide a simulation study as numerical illustrations.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105553"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed estimation of spiked eigenvalues in spiked population models 尖峰种群模型中尖峰特征值的分布估计
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-03-01 Epub Date: 2025-11-29 DOI: 10.1016/j.jmva.2025.105558
Lu Yan , Jiang Hu
The proliferation of science and technology has led to the prevalence of voluminous data sets distributed across multiple machines. Conventional statistical methodologies may be infeasible in analyzing such massive data sets due to prohibitively long computing durations, memory constraints, communication overheads, and confidentiality considerations. In this paper, we propose distributed estimators of the spiked eigenvalues in spiked population models. The consistency and asymptotic normality of the distributed estimators are derived, and the statistical error analysis of the distributed estimators is also provided. Compared to the estimation from the full sample, the proposed distributed estimation shares the same order of convergence. Simulation study and real data analysis indicate that the proposed distributed estimation and testing procedures have excellent properties in terms of estimation accuracy and stability as well as transmission efficiency.
科学技术的发展导致了分布在多台机器上的大量数据集的流行。由于计算时间过长、内存限制、通信开销和机密性考虑,传统的统计方法在分析如此庞大的数据集时可能不可行。本文提出了尖峰种群模型中尖峰特征值的分布估计。导出了分布估计量的相合性和渐近正态性,并给出了分布估计量的统计误差分析。与全样本估计相比,所提出的分布式估计具有相同的收敛阶。仿真研究和实际数据分析表明,所提出的分布式估计和测试方法在估计精度、稳定性和传输效率方面具有优异的性能。
{"title":"Distributed estimation of spiked eigenvalues in spiked population models","authors":"Lu Yan ,&nbsp;Jiang Hu","doi":"10.1016/j.jmva.2025.105558","DOIUrl":"10.1016/j.jmva.2025.105558","url":null,"abstract":"<div><div>The proliferation of science and technology has led to the prevalence of voluminous data sets distributed across multiple machines. Conventional statistical methodologies may be infeasible in analyzing such massive data sets due to prohibitively long computing durations, memory constraints, communication overheads, and confidentiality considerations. In this paper, we propose distributed estimators of the spiked eigenvalues in spiked population models. The consistency and asymptotic normality of the distributed estimators are derived, and the statistical error analysis of the distributed estimators is also provided. Compared to the estimation from the full sample, the proposed distributed estimation shares the same order of convergence. Simulation study and real data analysis indicate that the proposed distributed estimation and testing procedures have excellent properties in terms of estimation accuracy and stability as well as transmission efficiency.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105558"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Multivariate Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1