Pub Date : 2025-11-28DOI: 10.1016/j.jmva.2025.105571
Fang Xie , Lihu Xu , Qiuran Yao , Huiming Zhang
This paper investigates the distribution estimation of contaminated data using the MoM-GAN method, which leverages the power of generative adversarial nets (GANs) and median-of-means (MoM) estimation. Specifically, we use a deep neural network (DNN) with a ReLU activation function to model the generator and discriminator of the GAN. In terms of theoretical analysis, we derive a non-asymptotic error bound for the DNN-based MoM-GAN estimator, which is measured by integral probability metrics and takes into account the -smoothness Hölder class. The error bound essentially decreases in , where and are the sample size and the dimension of the input data, respectively. It provides a rigorous guarantee of the accuracy and robustness of the MoM-GAN estimator, even in the presence of contaminated data. We present an algorithm for the MoM-GAN method and demonstrate its effectiveness in two real-world applications. Our results show that the MoM-GAN method outperforms other competing methods when dealing with contaminated data, highlighting its superior performance and robustness.
{"title":"Statistical guarantees for distribution estimation of contaminated data via DNN-based MoM-GANs","authors":"Fang Xie , Lihu Xu , Qiuran Yao , Huiming Zhang","doi":"10.1016/j.jmva.2025.105571","DOIUrl":"10.1016/j.jmva.2025.105571","url":null,"abstract":"<div><div>This paper investigates the distribution estimation of contaminated data using the MoM-GAN method, which leverages the power of generative adversarial nets (GANs) and median-of-means (MoM) estimation. Specifically, we use a deep neural network (DNN) with a ReLU activation function to model the generator and discriminator of the GAN. In terms of theoretical analysis, we derive a non-asymptotic error bound for the DNN-based MoM-GAN estimator, which is measured by integral probability metrics and takes into account the <span><math><mi>b</mi></math></span>-smoothness Hölder class. The error bound essentially decreases in <span><math><mrow><msup><mrow><mi>n</mi></mrow><mrow><mo>−</mo><mi>b</mi><mo>/</mo><mi>p</mi></mrow></msup><mo>∨</mo><msup><mrow><mi>n</mi></mrow><mrow><mo>−</mo><mn>1</mn><mo>/</mo><mn>2</mn></mrow></msup></mrow></math></span>, where <span><math><mi>n</mi></math></span> and <span><math><mi>p</mi></math></span> are the sample size and the dimension of the input data, respectively. It provides a rigorous guarantee of the accuracy and robustness of the MoM-GAN estimator, even in the presence of contaminated data. We present an algorithm for the MoM-GAN method and demonstrate its effectiveness in two real-world applications. Our results show that the MoM-GAN method outperforms other competing methods when dealing with contaminated data, highlighting its superior performance and robustness.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105571"},"PeriodicalIF":1.4,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-28DOI: 10.1016/j.jmva.2025.105555
Rui Pan , Yuan Gao , Hansheng Wang
Link prediction is of vital importance in network analysis. In this work, we propose a novel latent space model for link prediction in a statistical citation network. Specifically, the model can incorporate the transitivity information of both the citation network and the author-paper network. In addition, nodal features are also taken into consideration and the pseudo maximum likelihood estimation of the corresponding parameter is developed. The asymptotic consistency is established and demonstrated through extensive simulation studies. Link prediction is then performed and the performance is compared among different methods. At last, a real citation network of statistics is analyzed.
{"title":"A latent space model for link prediction in statistical citation network","authors":"Rui Pan , Yuan Gao , Hansheng Wang","doi":"10.1016/j.jmva.2025.105555","DOIUrl":"10.1016/j.jmva.2025.105555","url":null,"abstract":"<div><div>Link prediction is of vital importance in network analysis. In this work, we propose a novel latent space model for link prediction in a statistical citation network. Specifically, the model can incorporate the transitivity information of both the citation network and the author-paper network. In addition, nodal features are also taken into consideration and the pseudo maximum likelihood estimation of the corresponding parameter is developed. The asymptotic consistency is established and demonstrated through extensive simulation studies. Link prediction is then performed and the performance is compared among different methods. At last, a real citation network of statistics is analyzed.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105555"},"PeriodicalIF":1.4,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-28DOI: 10.1016/j.jmva.2025.105554
Jiaxin Shi , Yuan Gao , Rui Pan , Hansheng Wang
In this study, we develop a latent factor model for analyzing high-dimensional binary data. Specifically, a standard probit model is used to describe the regression relationship between the observed binary data and the continuous latent variables. Our method assumes that the dependency structure of the observed binary data can be fully captured by the continuous latent factors. To estimate the model, a moment-based estimation method is developed. The proposed method is able to deal with both discontinuity and high dimensionality. Most importantly, the asymptotic properties of the resulting estimators are rigorously established. Extensive simulation studies are presented to demonstrate the proposed methodology. A real dataset about product descriptions is analyzed for illustration.
{"title":"A latent factor model for high-dimensional binary data","authors":"Jiaxin Shi , Yuan Gao , Rui Pan , Hansheng Wang","doi":"10.1016/j.jmva.2025.105554","DOIUrl":"10.1016/j.jmva.2025.105554","url":null,"abstract":"<div><div>In this study, we develop a latent factor model for analyzing high-dimensional binary data. Specifically, a standard probit model is used to describe the regression relationship between the observed binary data and the continuous latent variables. Our method assumes that the dependency structure of the observed binary data can be fully captured by the continuous latent factors. To estimate the model, a moment-based estimation method is developed. The proposed method is able to deal with both discontinuity and high dimensionality. Most importantly, the asymptotic properties of the resulting estimators are rigorously established. Extensive simulation studies are presented to demonstrate the proposed methodology. A real dataset about product descriptions is analyzed for illustration.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105554"},"PeriodicalIF":1.4,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145616471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-28DOI: 10.1016/j.jmva.2025.105573
Tingyu Lai , Yingying Wang , Zhongzhan Zhang
We proposed a new nonparametric method to test and measure conditional mean (in)dependence for functional data. This new metric has some appealing properties: it is nonnegative and equals to zero if and only if the conditional mean independence holds; it is invariant under linear transformations of the predictor; it does not require the moment condition for the predictor variable. Based on this measure, two test procedures for conditional mean independence are proposed for functional data. One uses a wild bootstrap while the other uses the limiting standard normal distribution. The tests are consistent and perform well in finite sample simulations. We further propose some requirements for a reasonable conditional mean dependence measure and demonstrate that our metric has those properties. A real data example is introduced to illustrate the application of the proposed method.
{"title":"Testing and measuring the conditional mean (in)dependence for functional data by martingale difference-angle divergence","authors":"Tingyu Lai , Yingying Wang , Zhongzhan Zhang","doi":"10.1016/j.jmva.2025.105573","DOIUrl":"10.1016/j.jmva.2025.105573","url":null,"abstract":"<div><div>We proposed a new nonparametric method to test and measure conditional mean (in)dependence for functional data. This new metric has some appealing properties: it is nonnegative and equals to zero if and only if the conditional mean independence holds; it is invariant under linear transformations of the predictor; it does not require the moment condition for the predictor variable. Based on this measure, two test procedures for conditional mean independence are proposed for functional data. One uses a wild bootstrap while the other uses the limiting standard normal distribution. The tests are consistent and perform well in finite sample simulations. We further propose some requirements for a reasonable conditional mean dependence measure and demonstrate that our metric has those properties. A real data example is introduced to illustrate the application of the proposed method.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105573"},"PeriodicalIF":1.4,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-28DOI: 10.1016/j.jmva.2025.105577
Yimang Zhang , Xiaorui Wang , Jian Qing Shi
Factor analysis models are widely used in social and behavioral sciences, such as psychology, education, and marketing, to measure unobservable latent traits. In this article, we introduce a nonlinear structured latent factor analysis model that is more flexible in characterizing the relationship between manifest variables and latent factors. The confirmatory identifiability of the latent factor is discussed, ensuring the substantive interpretation of these latent factors. A Bayesian approach with a Gaussian process prior is proposed to estimate the unknown nonlinear function and the unknown parameters. Asymptotic results are established, including the structured identifiability of latent factors, as well as the consistency of estimates for the unknown parameters and the unknown nonlinear function. Simulation studies and real data analysis are conducted to evaluate the performance of the proposed method. The simulation results demonstrate that our proposed method performs well in handling nonlinear model and successfully identifies the latent factors. Additionally, the analysis of oil flow data reveals the underlying structure of latent nonlinear patterns.
{"title":"Bayesian analysis of nonlinear structured latent factor models with a Gaussian process prior","authors":"Yimang Zhang , Xiaorui Wang , Jian Qing Shi","doi":"10.1016/j.jmva.2025.105577","DOIUrl":"10.1016/j.jmva.2025.105577","url":null,"abstract":"<div><div>Factor analysis models are widely used in social and behavioral sciences, such as psychology, education, and marketing, to measure unobservable latent traits. In this article, we introduce a nonlinear structured latent factor analysis model that is more flexible in characterizing the relationship between manifest variables and latent factors. The confirmatory identifiability of the latent factor is discussed, ensuring the substantive interpretation of these latent factors. A Bayesian approach with a Gaussian process prior is proposed to estimate the unknown nonlinear function and the unknown parameters. Asymptotic results are established, including the structured identifiability of latent factors, as well as the consistency of estimates for the unknown parameters and the unknown nonlinear function. Simulation studies and real data analysis are conducted to evaluate the performance of the proposed method. The simulation results demonstrate that our proposed method performs well in handling nonlinear model and successfully identifies the latent factors. Additionally, the analysis of oil flow data reveals the underlying structure of latent nonlinear patterns.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105577"},"PeriodicalIF":1.4,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-28DOI: 10.1016/j.jmva.2025.105553
Yuli Liang , Deliang Dai , Shaobo Jin
The regularization for covariance matrix is a widely used technique when estimating large covariance matrices. This paper examines a penalized likelihood method for constructing a statistically efficient covariance matrix estimator. Modified Cholesky decomposition (MCD) is used to parameterize the covariance matrix and the effective regularization scheme is achieved by combining both shrinkage and smoothing penalties on the Cholesky factor. The practical performance is at odds with an absence of theoretical properties of the derived estimators in the literature. In this work, we aim to fill the gap between theory and practice by establishing the convergence properties under regularity conditions. We also provide a simulation study as numerical illustrations.
{"title":"On convergence of regularized covariance estimator based on modified Cholesky decomposition","authors":"Yuli Liang , Deliang Dai , Shaobo Jin","doi":"10.1016/j.jmva.2025.105553","DOIUrl":"10.1016/j.jmva.2025.105553","url":null,"abstract":"<div><div>The regularization for covariance matrix is a widely used technique when estimating large covariance matrices. This paper examines a penalized likelihood method for constructing a statistically efficient covariance matrix estimator. Modified Cholesky decomposition (MCD) is used to parameterize the covariance matrix and the effective regularization scheme is achieved by combining both shrinkage and smoothing penalties on the Cholesky factor. The practical performance is at odds with an absence of theoretical properties of the derived estimators in the literature. In this work, we aim to fill the gap between theory and practice by establishing the convergence properties under regularity conditions. We also provide a simulation study as numerical illustrations.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105553"},"PeriodicalIF":1.4,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-28DOI: 10.1016/j.jmva.2025.105568
Moming Wang , Ningning Xia , Yong Zhou
Due to the heavy trading volume in financial markets and the limitations of recording mechanisms, the occurrence of multiple transactions during each recording period is a common feature of high-frequency data. This paper investigates how the number of such multiple transactions impacts the behavior of an averaged version of time-variation adjusted realized covariance (ATVA) matrix in a high-dimensional situation, where the number of stocks and the observation frequency go to infinity proportionally. By using random matrix theory, we derive the limiting spectral distribution (LSD) of ATVA matrices based on high-frequency multiple observations. We demonstrate how the LSD of ATVA matrices depends on the number of multiple transactions. The study of the LSD of random matrices is not only theoretically interesting in itself but also provides a better insight into the pre-averaging approach, which is widely used to deal with the microstructure noise. Furthermore, we investigate the limits of spiked eigenvalues of ATVA matrices when the covariance matrix of asset prices exhibits a spiked pattern. Finally, the theoretical results are supported by simulation studies.
{"title":"Limiting spectral distribution of high-dimensional integrated covariance matrices based on high-frequency data with multiple transactions","authors":"Moming Wang , Ningning Xia , Yong Zhou","doi":"10.1016/j.jmva.2025.105568","DOIUrl":"10.1016/j.jmva.2025.105568","url":null,"abstract":"<div><div>Due to the heavy trading volume in financial markets and the limitations of recording mechanisms, the occurrence of multiple transactions during each recording period is a common feature of high-frequency data. This paper investigates how the number of such multiple transactions impacts the behavior of an averaged version of time-variation adjusted realized covariance (ATVA) matrix in a high-dimensional situation, where the number of stocks and the observation frequency go to infinity proportionally. By using random matrix theory, we derive the limiting spectral distribution (LSD) of ATVA matrices based on high-frequency multiple observations. We demonstrate how the LSD of ATVA matrices depends on the number of multiple transactions. The study of the LSD of random matrices is not only theoretically interesting in itself but also provides a better insight into the pre-averaging approach, which is widely used to deal with the microstructure noise. Furthermore, we investigate the limits of spiked eigenvalues of ATVA matrices when the covariance matrix of asset prices exhibits a spiked pattern. Finally, the theoretical results are supported by simulation studies.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105568"},"PeriodicalIF":1.4,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-28DOI: 10.1016/j.jmva.2025.105560
Chi Tim Ng , Chun Yip Yau , Yuanbo Li , Lei Qin
Threshold autoregressive (TAR) models form an important class of nonlinear time series models and have attracted great attentions in the literature. In order to extend threshold modeling to high-dimensional nonlinear time series, a threshold network autoregressive (TNAR) model is proposed in this paper to overcome the difficulty of over-parameterization by exploiting the available information of network relations. The proposed model can characterize the regime-switching feature in nonlinear complex network systems. Sufficient conditions for the strict stationarity and the ergodicity of the TNAR model are established. A computationally efficient method based on group LASSO is developed to estimate the multiple thresholds and the parameters. A grouped TNAR model is also proposed to further reduce the number of the parameters. The asymptotic behavior of the proposed method is explored and the estimation consistency of both number of groups and group membership structure is established.
{"title":"Threshold models for high-dimensional time series with network structure","authors":"Chi Tim Ng , Chun Yip Yau , Yuanbo Li , Lei Qin","doi":"10.1016/j.jmva.2025.105560","DOIUrl":"10.1016/j.jmva.2025.105560","url":null,"abstract":"<div><div>Threshold autoregressive (TAR) models form an important class of nonlinear time series models and have attracted great attentions in the literature. In order to extend threshold modeling to high-dimensional nonlinear time series, a threshold network autoregressive (TNAR) model is proposed in this paper to overcome the difficulty of over-parameterization by exploiting the available information of network relations. The proposed model can characterize the regime-switching feature in nonlinear complex network systems. Sufficient conditions for the strict stationarity and the ergodicity of the TNAR model are established. A computationally efficient method based on group LASSO is developed to estimate the multiple thresholds and the parameters. A grouped TNAR model is also proposed to further reduce the number of the parameters. The asymptotic behavior of the proposed method is explored and the estimation consistency of both number of groups and group membership structure is established.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105560"},"PeriodicalIF":1.4,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-28DOI: 10.1016/j.jmva.2025.105575
Xuan Ma , Jianhua Zhao , Changchun Shang , Fen Jiang , Philip L.H. Yu
Factor Analysis based on the multivariate t distribution (tFA) is a useful robust tool for extracting common factors on heavy-tailed or contaminated data. However, tFA is only applicable to vector data. When tFA is applied to matrix data, it is common to first vectorize the matrix observations. This introduces two challenges for tFA: (i) the inherent matrix structure of the data is broken, and (ii) robustness may be lost, as vectorized matrix data typically results in a high data dimension, which could easily lead to the breakdown of tFA. To address these issues, starting from the intrinsic matrix structure of matrix data, a novel robust factor analysis model, namely bilinear factor analysis built on the matrix-variate t distribution (tBFA), is proposed in this paper. The novelty is that it is capable of simultaneously extracting common factors for both row and column variables of interest on heavy-tailed or contaminated matrix data. Two efficient algorithms for maximum likelihood estimation of tBFA are developed. Closed-form expressions for the Fisher information matrix to calculate the accuracy of parameter estimates are derived. Empirical studies are conducted to understand the proposed tBFA model and compare it with related competitors. The results demonstrate the superiority and practicality of tBFA. Importantly, tBFA exhibits a significantly higher breakdown point than tFA, making it more suitable for matrix data.
{"title":"Robust bilinear factor analysis based on the matrix-variate t distribution","authors":"Xuan Ma , Jianhua Zhao , Changchun Shang , Fen Jiang , Philip L.H. Yu","doi":"10.1016/j.jmva.2025.105575","DOIUrl":"10.1016/j.jmva.2025.105575","url":null,"abstract":"<div><div>Factor Analysis based on the multivariate <em>t</em> distribution (<em>t</em>FA) is a useful robust tool for extracting common factors on heavy-tailed or contaminated data. However, <em>t</em>FA is only applicable to vector data. When <em>t</em>FA is applied to matrix data, it is common to first vectorize the matrix observations. This introduces two challenges for <em>t</em>FA: (i) the inherent matrix structure of the data is broken, and (ii) robustness may be lost, as vectorized matrix data typically results in a high data dimension, which could easily lead to the breakdown of <em>t</em>FA. To address these issues, starting from the intrinsic matrix structure of matrix data, a novel robust factor analysis model, namely bilinear factor analysis built on the matrix-variate <em>t</em> distribution (<em>t</em>BFA), is proposed in this paper. The novelty is that it is capable of simultaneously extracting common factors for both row and column variables of interest on heavy-tailed or contaminated matrix data. Two efficient algorithms for maximum likelihood estimation of <em>t</em>BFA are developed. Closed-form expressions for the Fisher information matrix to calculate the accuracy of parameter estimates are derived. Empirical studies are conducted to understand the proposed <em>t</em>BFA model and compare it with related competitors. The results demonstrate the superiority and practicality of <em>t</em>BFA. Importantly, <em>t</em>BFA exhibits a significantly higher breakdown point than <em>t</em>FA, making it more suitable for matrix data.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105575"},"PeriodicalIF":1.4,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-28DOI: 10.1016/j.jmva.2025.105557
Yong He , Yujie Hou , Yalin Wang , Wen-Xin Zhou
For large-dimensional tensor time series, dimension reduction plays a pivotal role. Tensor factor model depicts tensor-valued time series through a low-dimensional projection on a space of common factors, thereby achieving great dimension reduction and having a wide range of applications in economics and finance. In this paper, we propose a simple iterative least squares algorithm for estimating tensor factor model. We first estimate the latent common factors by using deterministic mode- projection matrices and then estimate the loading matrices by minimizing the squared Frobenius loss function under certain identifiability conditions. The estimated loading matrices are further taken as new mode- projection matrices, and the above update procedures are iteratively executed until convergence. We also propose a novel eigenvalue ratio method for estimating the number of factors and show the consistency of the estimators. Given the true number of factors, we theoretically establish the convergence rates of the estimated loading matrices and signal components at the th iteration for any . Thorough numerical studies are conducted to investigate the finite-sample performance of the proposed method. Analyses of import-export transport networks and lung cancer histopathological image datasets illustrate the empirical usefulness of the proposed method.
{"title":"Estimation of tensor factor model by iterative least squares","authors":"Yong He , Yujie Hou , Yalin Wang , Wen-Xin Zhou","doi":"10.1016/j.jmva.2025.105557","DOIUrl":"10.1016/j.jmva.2025.105557","url":null,"abstract":"<div><div>For large-dimensional tensor time series, dimension reduction plays a pivotal role. Tensor factor model depicts tensor-valued time series through a low-dimensional projection on a space of common factors, thereby achieving great dimension reduction and having a wide range of applications in economics and finance. In this paper, we propose a simple iterative least squares algorithm for estimating tensor factor model. We first estimate the latent common factors by using deterministic mode-<span><math><mi>k</mi></math></span> projection matrices and then estimate the loading matrices by minimizing the squared Frobenius loss function under certain identifiability conditions. The estimated loading matrices are further taken as new mode-<span><math><mi>k</mi></math></span> projection matrices, and the above update procedures are iteratively executed until convergence. We also propose a novel eigenvalue ratio method for estimating the number of factors and show the consistency of the estimators. Given the true number of factors, we theoretically establish the convergence rates of the estimated loading matrices and signal components at the <span><math><mi>s</mi></math></span>th iteration for any <span><math><mrow><mi>s</mi><mo>≥</mo><mn>1</mn></mrow></math></span>. Thorough numerical studies are conducted to investigate the finite-sample performance of the proposed method. Analyses of import-export transport networks and lung cancer histopathological image datasets illustrate the empirical usefulness of the proposed method.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105557"},"PeriodicalIF":1.4,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145616472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}