Pub Date : 2026-03-01Epub Date: 2025-11-19DOI: 10.1016/j.jmva.2025.105542
Fadoua Balabdaoui , Harald Besdziek , Yong Wang
The conditional independence assumption has recently appeared in a growing body of literature on the estimation of multivariate mixtures. We consider here conditionally independent multivariate mixtures of power series distributions with infinite support, to which belong Poisson, Geometric or Negative Binomial mixtures. We show that for all these mixtures, the non-parametric maximum likelihood estimator converges to the truth at the rate in the Hellinger distance, where denotes the size of the observed sample and represents the dimension of the mixture. Using this result, we then construct a new non-parametric estimator based on the maximum likelihood estimator that converges with the parametric rate in all -distances, for . These convergences rates are supported by simulations and the theory is illustrated using the famous Vélib dataset of the bike sharing system of Paris. We also introduce a testing procedure for whether the conditional independence assumption is satisfied for a given sample. This testing procedure is applied for several multivariate mixtures, with varying levels of dependence, and is thereby shown to distinguish well between conditionally independent and dependent mixtures. Finally, we use this testing procedure to investigate whether conditional independence holds for Vélib dataset.
{"title":"Parametric convergence rate of a non-parametric estimator in multivariate mixtures of power series distributions under conditional independence","authors":"Fadoua Balabdaoui , Harald Besdziek , Yong Wang","doi":"10.1016/j.jmva.2025.105542","DOIUrl":"10.1016/j.jmva.2025.105542","url":null,"abstract":"<div><div>The conditional independence assumption has recently appeared in a growing body of literature on the estimation of multivariate mixtures. We consider here conditionally independent multivariate mixtures of power series distributions with infinite support, to which belong Poisson, Geometric or Negative Binomial mixtures. We show that for all these mixtures, the non-parametric maximum likelihood estimator converges to the truth at the rate <span><math><mrow><msup><mrow><mrow><mo>(</mo><mo>ln</mo><mrow><mo>(</mo><mi>n</mi><mi>d</mi><mo>)</mo></mrow><mo>)</mo></mrow></mrow><mrow><mn>1</mn><mo>+</mo><mi>d</mi><mo>/</mo><mn>2</mn></mrow></msup><msup><mrow><mi>n</mi></mrow><mrow><mo>−</mo><mn>1</mn><mo>/</mo><mn>2</mn></mrow></msup></mrow></math></span> in the Hellinger distance, where <span><math><mi>n</mi></math></span> denotes the size of the observed sample and <span><math><mi>d</mi></math></span> represents the dimension of the mixture. Using this result, we then construct a new non-parametric estimator based on the maximum likelihood estimator that converges with the parametric rate <span><math><msup><mrow><mi>n</mi></mrow><mrow><mo>−</mo><mn>1</mn><mo>/</mo><mn>2</mn></mrow></msup></math></span> in all <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mi>p</mi></mrow></msub></math></span>-distances, for <span><math><mrow><mi>p</mi><mo>≥</mo><mn>1</mn></mrow></math></span>. These convergences rates are supported by simulations and the theory is illustrated using the famous Vélib dataset of the bike sharing system of Paris. We also introduce a testing procedure for whether the conditional independence assumption is satisfied for a given sample. This testing procedure is applied for several multivariate mixtures, with varying levels of dependence, and is thereby shown to distinguish well between conditionally independent and dependent mixtures. Finally, we use this testing procedure to investigate whether conditional independence holds for Vélib dataset.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105542"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-11-28DOI: 10.1016/j.jmva.2025.105571
Fang Xie , Lihu Xu , Qiuran Yao , Huiming Zhang
This paper investigates the distribution estimation of contaminated data using the MoM-GAN method, which leverages the power of generative adversarial nets (GANs) and median-of-means (MoM) estimation. Specifically, we use a deep neural network (DNN) with a ReLU activation function to model the generator and discriminator of the GAN. In terms of theoretical analysis, we derive a non-asymptotic error bound for the DNN-based MoM-GAN estimator, which is measured by integral probability metrics and takes into account the -smoothness Hölder class. The error bound essentially decreases in , where and are the sample size and the dimension of the input data, respectively. It provides a rigorous guarantee of the accuracy and robustness of the MoM-GAN estimator, even in the presence of contaminated data. We present an algorithm for the MoM-GAN method and demonstrate its effectiveness in two real-world applications. Our results show that the MoM-GAN method outperforms other competing methods when dealing with contaminated data, highlighting its superior performance and robustness.
{"title":"Statistical guarantees for distribution estimation of contaminated data via DNN-based MoM-GANs","authors":"Fang Xie , Lihu Xu , Qiuran Yao , Huiming Zhang","doi":"10.1016/j.jmva.2025.105571","DOIUrl":"10.1016/j.jmva.2025.105571","url":null,"abstract":"<div><div>This paper investigates the distribution estimation of contaminated data using the MoM-GAN method, which leverages the power of generative adversarial nets (GANs) and median-of-means (MoM) estimation. Specifically, we use a deep neural network (DNN) with a ReLU activation function to model the generator and discriminator of the GAN. In terms of theoretical analysis, we derive a non-asymptotic error bound for the DNN-based MoM-GAN estimator, which is measured by integral probability metrics and takes into account the <span><math><mi>b</mi></math></span>-smoothness Hölder class. The error bound essentially decreases in <span><math><mrow><msup><mrow><mi>n</mi></mrow><mrow><mo>−</mo><mi>b</mi><mo>/</mo><mi>p</mi></mrow></msup><mo>∨</mo><msup><mrow><mi>n</mi></mrow><mrow><mo>−</mo><mn>1</mn><mo>/</mo><mn>2</mn></mrow></msup></mrow></math></span>, where <span><math><mi>n</mi></math></span> and <span><math><mi>p</mi></math></span> are the sample size and the dimension of the input data, respectively. It provides a rigorous guarantee of the accuracy and robustness of the MoM-GAN estimator, even in the presence of contaminated data. We present an algorithm for the MoM-GAN method and demonstrate its effectiveness in two real-world applications. Our results show that the MoM-GAN method outperforms other competing methods when dealing with contaminated data, highlighting its superior performance and robustness.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105571"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-11-28DOI: 10.1016/j.jmva.2025.105555
Rui Pan , Yuan Gao , Hansheng Wang
Link prediction is of vital importance in network analysis. In this work, we propose a novel latent space model for link prediction in a statistical citation network. Specifically, the model can incorporate the transitivity information of both the citation network and the author-paper network. In addition, nodal features are also taken into consideration and the pseudo maximum likelihood estimation of the corresponding parameter is developed. The asymptotic consistency is established and demonstrated through extensive simulation studies. Link prediction is then performed and the performance is compared among different methods. At last, a real citation network of statistics is analyzed.
{"title":"A latent space model for link prediction in statistical citation network","authors":"Rui Pan , Yuan Gao , Hansheng Wang","doi":"10.1016/j.jmva.2025.105555","DOIUrl":"10.1016/j.jmva.2025.105555","url":null,"abstract":"<div><div>Link prediction is of vital importance in network analysis. In this work, we propose a novel latent space model for link prediction in a statistical citation network. Specifically, the model can incorporate the transitivity information of both the citation network and the author-paper network. In addition, nodal features are also taken into consideration and the pseudo maximum likelihood estimation of the corresponding parameter is developed. The asymptotic consistency is established and demonstrated through extensive simulation studies. Link prediction is then performed and the performance is compared among different methods. At last, a real citation network of statistics is analyzed.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105555"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-11-28DOI: 10.1016/j.jmva.2025.105552
Shuangshuang Li , Jianbao Chen
Panel data collected from “locations” may exhibit spatial and serial correlations. In order to study such spatial and serial correlations, and possible existing nonlinear relationships, a fixed effects partially linear nonparametric panel regression model with separable spatially and serially correlated error structure is introduced. We obtain profile quasi-maximum likelihood estimators (PQMLEs) of the unknowns. Furthermore, a generalized F-test called is designed for assessing the reasonability of nonparametric component setting. Asymptotic properties of PQMLEs and are provided under several conditions. Monte Carlo trials imply our estimators and test statistic exhibit good performance in finite samples and model misspecification may lead to substantial influence on the estimates of unknown parameters. The analysis of provincial housing price in China reveals the presence of nonlinear, spatial and serial correlation relationships.
{"title":"Estimation and testing for fixed effects partially linear nonparametric panel regression model with separable spatially and serially correlated error structure","authors":"Shuangshuang Li , Jianbao Chen","doi":"10.1016/j.jmva.2025.105552","DOIUrl":"10.1016/j.jmva.2025.105552","url":null,"abstract":"<div><div>Panel data collected from “locations” may exhibit spatial and serial correlations. In order to study such spatial and serial correlations, and possible existing nonlinear relationships, a fixed effects partially linear nonparametric panel regression model with separable spatially and serially correlated error structure is introduced. We obtain profile quasi-maximum likelihood estimators (PQMLEs) of the unknowns. Furthermore, a generalized F-test called <span><math><msub><mrow><mi>F</mi></mrow><mrow><mi>N</mi><mi>T</mi></mrow></msub></math></span> is designed for assessing the reasonability of nonparametric component setting. Asymptotic properties of PQMLEs and <span><math><msub><mrow><mi>F</mi></mrow><mrow><mi>N</mi><mi>T</mi></mrow></msub></math></span> are provided under several conditions. Monte Carlo trials imply our estimators and test statistic exhibit good performance in finite samples and model misspecification may lead to substantial influence on the estimates of unknown parameters. The analysis of provincial housing price in China reveals the presence of nonlinear, spatial and serial correlation relationships.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105552"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-11-28DOI: 10.1016/j.jmva.2025.105573
Tingyu Lai , Yingying Wang , Zhongzhan Zhang
We proposed a new nonparametric method to test and measure conditional mean (in)dependence for functional data. This new metric has some appealing properties: it is nonnegative and equals to zero if and only if the conditional mean independence holds; it is invariant under linear transformations of the predictor; it does not require the moment condition for the predictor variable. Based on this measure, two test procedures for conditional mean independence are proposed for functional data. One uses a wild bootstrap while the other uses the limiting standard normal distribution. The tests are consistent and perform well in finite sample simulations. We further propose some requirements for a reasonable conditional mean dependence measure and demonstrate that our metric has those properties. A real data example is introduced to illustrate the application of the proposed method.
{"title":"Testing and measuring the conditional mean (in)dependence for functional data by martingale difference-angle divergence","authors":"Tingyu Lai , Yingying Wang , Zhongzhan Zhang","doi":"10.1016/j.jmva.2025.105573","DOIUrl":"10.1016/j.jmva.2025.105573","url":null,"abstract":"<div><div>We proposed a new nonparametric method to test and measure conditional mean (in)dependence for functional data. This new metric has some appealing properties: it is nonnegative and equals to zero if and only if the conditional mean independence holds; it is invariant under linear transformations of the predictor; it does not require the moment condition for the predictor variable. Based on this measure, two test procedures for conditional mean independence are proposed for functional data. One uses a wild bootstrap while the other uses the limiting standard normal distribution. The tests are consistent and perform well in finite sample simulations. We further propose some requirements for a reasonable conditional mean dependence measure and demonstrate that our metric has those properties. A real data example is introduced to illustrate the application of the proposed method.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105573"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-11-28DOI: 10.1016/j.jmva.2025.105577
Yimang Zhang , Xiaorui Wang , Jian Qing Shi
Factor analysis models are widely used in social and behavioral sciences, such as psychology, education, and marketing, to measure unobservable latent traits. In this article, we introduce a nonlinear structured latent factor analysis model that is more flexible in characterizing the relationship between manifest variables and latent factors. The confirmatory identifiability of the latent factor is discussed, ensuring the substantive interpretation of these latent factors. A Bayesian approach with a Gaussian process prior is proposed to estimate the unknown nonlinear function and the unknown parameters. Asymptotic results are established, including the structured identifiability of latent factors, as well as the consistency of estimates for the unknown parameters and the unknown nonlinear function. Simulation studies and real data analysis are conducted to evaluate the performance of the proposed method. The simulation results demonstrate that our proposed method performs well in handling nonlinear model and successfully identifies the latent factors. Additionally, the analysis of oil flow data reveals the underlying structure of latent nonlinear patterns.
{"title":"Bayesian analysis of nonlinear structured latent factor models with a Gaussian process prior","authors":"Yimang Zhang , Xiaorui Wang , Jian Qing Shi","doi":"10.1016/j.jmva.2025.105577","DOIUrl":"10.1016/j.jmva.2025.105577","url":null,"abstract":"<div><div>Factor analysis models are widely used in social and behavioral sciences, such as psychology, education, and marketing, to measure unobservable latent traits. In this article, we introduce a nonlinear structured latent factor analysis model that is more flexible in characterizing the relationship between manifest variables and latent factors. The confirmatory identifiability of the latent factor is discussed, ensuring the substantive interpretation of these latent factors. A Bayesian approach with a Gaussian process prior is proposed to estimate the unknown nonlinear function and the unknown parameters. Asymptotic results are established, including the structured identifiability of latent factors, as well as the consistency of estimates for the unknown parameters and the unknown nonlinear function. Simulation studies and real data analysis are conducted to evaluate the performance of the proposed method. The simulation results demonstrate that our proposed method performs well in handling nonlinear model and successfully identifies the latent factors. Additionally, the analysis of oil flow data reveals the underlying structure of latent nonlinear patterns.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105577"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-11-15DOI: 10.1016/j.jmva.2025.105530
Martin Eppert , Satyaki Mukherjee , Debarghya Ghoshdastidar
Projection Pursuit is a classic exploratory technique for finding interesting projections of a dataset. We propose a method for recovering projections containing either Imbalanced Clusters or a Bernoulli–Rademacher distribution using a gradient-based technique to optimize the projection index. As sample complexity is a major limiting factor in Projection Pursuit, we analyze our algorithm’s sample complexity within a Planted Vector setting where we can observe that Imbalanced Clusters can be recovered more easily than balanced ones. Additionally, we give a generalized result that works for a variety of data distributions and projection indices. We compare these results to computational lower bounds in the Low-Degree-Polynomial Framework. Finally, we experimentally evaluate our method’s applicability to real-world data using FashionMNIST and the Human Activity Recognition Dataset, where our algorithm outperforms others when only a few samples are available.
{"title":"Recovering Imbalanced Clusters via gradient-based projection pursuit","authors":"Martin Eppert , Satyaki Mukherjee , Debarghya Ghoshdastidar","doi":"10.1016/j.jmva.2025.105530","DOIUrl":"10.1016/j.jmva.2025.105530","url":null,"abstract":"<div><div>Projection Pursuit is a classic exploratory technique for finding interesting projections of a dataset. We propose a method for recovering projections containing either Imbalanced Clusters or a Bernoulli–Rademacher distribution using a gradient-based technique to optimize the projection index. As sample complexity is a major limiting factor in Projection Pursuit, we analyze our algorithm’s sample complexity within a Planted Vector setting where we can observe that Imbalanced Clusters can be recovered more easily than balanced ones. Additionally, we give a generalized result that works for a variety of data distributions and projection indices. We compare these results to computational lower bounds in the Low-Degree-Polynomial Framework. Finally, we experimentally evaluate our method’s applicability to real-world data using FashionMNIST and the Human Activity Recognition Dataset, where our algorithm outperforms others when only a few samples are available.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105530"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145616469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-11-08DOI: 10.1016/j.jmva.2025.105527
Zeyu Li , Yong He , Xinbing Kong , Xinsheng Zhang
Two-way dimension reduction for well-structured matrix-valued data is growing popular in the past few years. To achieve robustness against individual matrix outliers with large spikes, arising either from heavy-tailed noise or large individual low-rank signals deviating from the population subspace, we first calculate the leading singular subspaces of each individual matrix, and then find the barycenter of the locally estimated subspaces across all observations, in contrast to the existing methods which first integrate data across observations and then do eigenvalue decomposition. In addition, a robust cut-off dimension determination criteria is suggested based on comparing the eigenvalue ratios of the corresponding Euclidean means of the projection matrices. Theoretical properties of the resulting estimators are investigated under mild conditions. Numerical simulation studies justify the advantages and robustness of the proposed methods over the existing tools. Two real examples associated with medical imaging and financial portfolios are given to provide empirical evidence on our arguments and also to illustrate the usefulness of the algorithms.
{"title":"Robust two-way dimension reduction by Grassmannian barycenter","authors":"Zeyu Li , Yong He , Xinbing Kong , Xinsheng Zhang","doi":"10.1016/j.jmva.2025.105527","DOIUrl":"10.1016/j.jmva.2025.105527","url":null,"abstract":"<div><div>Two-way dimension reduction for well-structured matrix-valued data is growing popular in the past few years. To achieve robustness against individual matrix outliers with large spikes, arising either from heavy-tailed noise or large individual low-rank signals deviating from the population subspace, we first calculate the leading singular subspaces of each individual matrix, and then find the barycenter of the locally estimated subspaces across all observations, in contrast to the existing methods which first integrate data across observations and then do eigenvalue decomposition. In addition, a robust cut-off dimension determination criteria is suggested based on comparing the eigenvalue ratios of the corresponding Euclidean means of the projection matrices. Theoretical properties of the resulting estimators are investigated under mild conditions. Numerical simulation studies justify the advantages and robustness of the proposed methods over the existing tools. Two real examples associated with medical imaging and financial portfolios are given to provide empirical evidence on our arguments and also to illustrate the usefulness of the algorithms.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105527"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145616470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-11-28DOI: 10.1016/j.jmva.2025.105553
Yuli Liang , Deliang Dai , Shaobo Jin
The regularization for covariance matrix is a widely used technique when estimating large covariance matrices. This paper examines a penalized likelihood method for constructing a statistically efficient covariance matrix estimator. Modified Cholesky decomposition (MCD) is used to parameterize the covariance matrix and the effective regularization scheme is achieved by combining both shrinkage and smoothing penalties on the Cholesky factor. The practical performance is at odds with an absence of theoretical properties of the derived estimators in the literature. In this work, we aim to fill the gap between theory and practice by establishing the convergence properties under regularity conditions. We also provide a simulation study as numerical illustrations.
{"title":"On convergence of regularized covariance estimator based on modified Cholesky decomposition","authors":"Yuli Liang , Deliang Dai , Shaobo Jin","doi":"10.1016/j.jmva.2025.105553","DOIUrl":"10.1016/j.jmva.2025.105553","url":null,"abstract":"<div><div>The regularization for covariance matrix is a widely used technique when estimating large covariance matrices. This paper examines a penalized likelihood method for constructing a statistically efficient covariance matrix estimator. Modified Cholesky decomposition (MCD) is used to parameterize the covariance matrix and the effective regularization scheme is achieved by combining both shrinkage and smoothing penalties on the Cholesky factor. The practical performance is at odds with an absence of theoretical properties of the derived estimators in the literature. In this work, we aim to fill the gap between theory and practice by establishing the convergence properties under regularity conditions. We also provide a simulation study as numerical illustrations.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105553"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-11-29DOI: 10.1016/j.jmva.2025.105558
Lu Yan , Jiang Hu
The proliferation of science and technology has led to the prevalence of voluminous data sets distributed across multiple machines. Conventional statistical methodologies may be infeasible in analyzing such massive data sets due to prohibitively long computing durations, memory constraints, communication overheads, and confidentiality considerations. In this paper, we propose distributed estimators of the spiked eigenvalues in spiked population models. The consistency and asymptotic normality of the distributed estimators are derived, and the statistical error analysis of the distributed estimators is also provided. Compared to the estimation from the full sample, the proposed distributed estimation shares the same order of convergence. Simulation study and real data analysis indicate that the proposed distributed estimation and testing procedures have excellent properties in terms of estimation accuracy and stability as well as transmission efficiency.
{"title":"Distributed estimation of spiked eigenvalues in spiked population models","authors":"Lu Yan , Jiang Hu","doi":"10.1016/j.jmva.2025.105558","DOIUrl":"10.1016/j.jmva.2025.105558","url":null,"abstract":"<div><div>The proliferation of science and technology has led to the prevalence of voluminous data sets distributed across multiple machines. Conventional statistical methodologies may be infeasible in analyzing such massive data sets due to prohibitively long computing durations, memory constraints, communication overheads, and confidentiality considerations. In this paper, we propose distributed estimators of the spiked eigenvalues in spiked population models. The consistency and asymptotic normality of the distributed estimators are derived, and the statistical error analysis of the distributed estimators is also provided. Compared to the estimation from the full sample, the proposed distributed estimation shares the same order of convergence. Simulation study and real data analysis indicate that the proposed distributed estimation and testing procedures have excellent properties in terms of estimation accuracy and stability as well as transmission efficiency.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105558"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}