Pub Date : 2026-01-27DOI: 10.1016/j.jmva.2026.105615
Zhiping Qiu , Wei Lin , Xiaming Tu , Jin-Ting Zhang
In many scientific and technological fields, multivariate functional data are often repeatedly observed under varying conditions over time. A fundamental question is whether the mean vector function remains consistently equal throughout the entire period. This paper introduces two novel global testing statistics that leverage integration technique to address this issue. The asymptotic distributions of the proposed test statistics under the null hypothesis are derived, and their root- consistency is established. Simulation studies are conducted to evaluate the numerical performance of the proposed tests, which are further illustrated through an analysis of publicly available EEG motion data.
{"title":"Global tests for detecting change in mean vector functions of multivariate functional data with repeated observations","authors":"Zhiping Qiu , Wei Lin , Xiaming Tu , Jin-Ting Zhang","doi":"10.1016/j.jmva.2026.105615","DOIUrl":"10.1016/j.jmva.2026.105615","url":null,"abstract":"<div><div>In many scientific and technological fields, multivariate functional data are often repeatedly observed under varying conditions over time. A fundamental question is whether the mean vector function remains consistently equal throughout the entire period. This paper introduces two novel global testing statistics that leverage integration technique to address this issue. The asymptotic distributions of the proposed test statistics under the null hypothesis are derived, and their root-<span><math><mi>n</mi></math></span> consistency is established. Simulation studies are conducted to evaluate the numerical performance of the proposed tests, which are further illustrated through an analysis of publicly available EEG motion data.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"214 ","pages":"Article 105615"},"PeriodicalIF":1.4,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.jmva.2026.105614
Jie Zeng , Guozhi Hu , Weihu Cheng
This paper considers estimation problem in multivariate regression models. Under this framework, we develop a novel two-stage model averaging procedure. In the first stage, we construct a scalable model averaging estimator which involves transforming the original model based on the singular value decomposition. When the dimension of the regressor vector is , this approach enables us to average the estimators from the candidate model set of size instead of size . The second stage is to find the optimal weights for averaging by applying a weight choice criterion from Kullback–Leibler distance. We prove that the minimum weighted squared loss from the scalable model averaging is asymptotically the same as that from original model averaging, further demonstrate asymptotic optimality of the scalable model averaging estimator using Kullback–Leibler-distance-based weights, and derive the rate of the resulting weights tending to the risk-based optimal weights. In comparison with existing model averaging methods, the simulation results show that, in terms of weighted mean squared prediction error and computation time, our proposal is more efficient, especially under the situation where the number of candidate models is large and the sample size is small. Moreover, a real data analysis is provided to illustrate the application of our method in practice.
{"title":"A scalable model averaging based on Kullback–Leibler distance for multivariate regression models","authors":"Jie Zeng , Guozhi Hu , Weihu Cheng","doi":"10.1016/j.jmva.2026.105614","DOIUrl":"10.1016/j.jmva.2026.105614","url":null,"abstract":"<div><div>This paper considers estimation problem in multivariate regression models. Under this framework, we develop a novel two-stage model averaging procedure. In the first stage, we construct a scalable model averaging estimator which involves transforming the original model based on the singular value decomposition. When the dimension of the regressor vector is <span><math><mi>K</mi></math></span>, this approach enables us to average the estimators from the candidate model set of size <span><math><mi>K</mi></math></span> instead of size <span><math><msup><mrow><mn>2</mn></mrow><mrow><mi>K</mi></mrow></msup></math></span>. The second stage is to find the optimal weights for averaging by applying a weight choice criterion from Kullback–Leibler distance. We prove that the minimum weighted squared loss from the scalable model averaging is asymptotically the same as that from original model averaging, further demonstrate asymptotic optimality of the scalable model averaging estimator using Kullback–Leibler-distance-based weights, and derive the rate of the resulting weights tending to the risk-based optimal weights. In comparison with existing model averaging methods, the simulation results show that, in terms of weighted mean squared prediction error and computation time, our proposal is more efficient, especially under the situation where the number of candidate models is large and the sample size is small. Moreover, a real data analysis is provided to illustrate the application of our method in practice.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"214 ","pages":"Article 105614"},"PeriodicalIF":1.4,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.jmva.2025.105591
Alejandro Cholaquidis , Antonio Cuevas , Beatriz Pateiro-López
The problem of estimating, from a random sample of points, the dimension of a compact subset of the Euclidean space is considered. The emphasis is put on consistency results in the statistical sense. That is, statements of convergence to the true dimension value when the sample size grows to infinity. Among the many available definitions of dimension, we have focused (on the grounds of its statistical tractability) on three notions: the Minkowski dimension, the correlation dimension and the, perhaps less popular, concept of pointwise dimension. We prove the statistical consistency of some natural estimators of these quantities. Our proofs partially rely on the use of an instrumental estimator formulated in terms of the empirical volume function , defined as the Lebesgue measure of the set of points whose distance to the sample is at most . In particular, we explore the case in which the true volume function of the target set is a polynomial on some interval starting at zero. An empirical study is also included. Our study aims to provide some theoretical support, and some practical insights, for the problem of deciding whether or not the set has a dimension smaller than that of the ambient space. This is a major statistical motivation of the dimension studies, in connection with the so-called “Manifold Hypothesis”.
{"title":"On consistent estimation of dimension values","authors":"Alejandro Cholaquidis , Antonio Cuevas , Beatriz Pateiro-López","doi":"10.1016/j.jmva.2025.105591","DOIUrl":"10.1016/j.jmva.2025.105591","url":null,"abstract":"<div><div>The problem of estimating, from a random sample of points, the dimension of a compact subset <span><math><mi>S</mi></math></span> of the Euclidean space is considered. The emphasis is put on consistency results in the statistical sense. That is, statements of convergence to the true dimension value when the sample size grows to infinity. Among the many available definitions of dimension, we have focused (on the grounds of its statistical tractability) on three notions: the Minkowski dimension, the correlation dimension and the, perhaps less popular, concept of pointwise dimension. We prove the statistical consistency of some natural estimators of these quantities. Our proofs partially rely on the use of an instrumental estimator formulated in terms of the empirical volume function <span><math><mrow><msub><mrow><mi>V</mi></mrow><mrow><mi>n</mi></mrow></msub><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow></mrow></math></span>, defined as the Lebesgue measure of the set of points whose distance to the sample is at most <span><math><mi>r</mi></math></span>. In particular, we explore the case in which the true volume function <span><math><mrow><mi>V</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow></mrow></math></span> of the target set <span><math><mi>S</mi></math></span> is a polynomial on some interval starting at zero. An empirical study is also included. Our study aims to provide some theoretical support, and some practical insights, for the problem of deciding whether or not the set <span><math><mi>S</mi></math></span> has a dimension smaller than that of the ambient space. This is a major statistical motivation of the dimension studies, in connection with the so-called “Manifold Hypothesis”.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"214 ","pages":"Article 105591"},"PeriodicalIF":1.4,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-17DOI: 10.1016/j.jmva.2026.105607
Nicolas Marie
This paper presents several situations leading to the observation of multiple correlated copies of a drifted process, and then non-asymptotic risk bounds are established on nonparametric estimators of the drift function and its derivative. For drifted Gaussian processes with a regular enough covariance function, a sharper risk bound is established on the estimator of , and a model selection procedure is provided with theoretical guarantees.
{"title":"Nonparametric estimation from correlated copies of a drifted process","authors":"Nicolas Marie","doi":"10.1016/j.jmva.2026.105607","DOIUrl":"10.1016/j.jmva.2026.105607","url":null,"abstract":"<div><div>This paper presents several situations leading to the observation of multiple correlated copies of a drifted process, and then non-asymptotic risk bounds are established on nonparametric estimators of the drift function <span><math><msub><mrow><mi>b</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span> and its derivative. For drifted Gaussian processes with a regular enough covariance function, a sharper risk bound is established on the estimator of <span><math><msubsup><mrow><mi>b</mi></mrow><mrow><mn>0</mn></mrow><mrow><mo>′</mo></mrow></msubsup></math></span>, and a model selection procedure is provided with theoretical guarantees.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"213 ","pages":"Article 105607"},"PeriodicalIF":1.4,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145977495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1016/j.jmva.2026.105606
Jia Zhou , Yang Li , Zemin Zheng , Changchun Tan
Reproducible learning of high-dimensional graphical structures is fundamentally important in numerous contemporary applications, as it visually reveals the underlying conditional dependencies among complex network data. In this paper, we introduce a novel procedure called the uniform graphical knockoff filter, which controls the overall false discovery rate (FDR) in Gaussian graph recovery by utilizing knockoff variables and a uniform threshold. Compared to existing methods, it is more robust to varying levels of sparsity in the true graph. We provide theoretical justifications for the procedure, demonstrating that the FDR can be asymptotically controlled and that the power is asymptotically one under mild conditions. Extensive numerical studies confirm the robust and competitive finite-sample performance of the proposed method.
{"title":"Uniform knockoff filter for high-dimensional controlled graph recovery","authors":"Jia Zhou , Yang Li , Zemin Zheng , Changchun Tan","doi":"10.1016/j.jmva.2026.105606","DOIUrl":"10.1016/j.jmva.2026.105606","url":null,"abstract":"<div><div>Reproducible learning of high-dimensional graphical structures is fundamentally important in numerous contemporary applications, as it visually reveals the underlying conditional dependencies among complex network data. In this paper, we introduce a novel procedure called the uniform graphical knockoff filter, which controls the overall false discovery rate (FDR) in Gaussian graph recovery by utilizing knockoff variables and a uniform threshold. Compared to existing methods, it is more robust to varying levels of sparsity in the true graph. We provide theoretical justifications for the procedure, demonstrating that the FDR can be asymptotically controlled and that the power is asymptotically one under mild conditions. Extensive numerical studies confirm the robust and competitive finite-sample performance of the proposed method.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"214 ","pages":"Article 105606"},"PeriodicalIF":1.4,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-15DOI: 10.1016/j.jmva.2026.105605
Yuliang Bai, Niansheng Tang
Feature screening method is an important tool for screening active features in ultrahigh dimensional data analysis. Existing feature screening methods mainly focus on the fully observed data or missing responses at random. But in many applied fields such as biomedicine, social science and epidemiological studies, responses might be subject to nonignorable missingness due to various reasons such as dropout. To this end, this paper proposes a new adjusted Spearman rank correlation to screen active features by incorporating the Spearman rank correlation and its conditional expectation in the presence of nonignorable missing responses. To circumvent the notorious identification problem, we introduce instrumental variables into the propensity score (PS) function, which is specified by a more general semiparametric regression model. A nonparametric imputation method is developed to estimate the adjusted Spearman rank correlation. The proposed method has several desirable merits. First, it is model-free. Second, it is robust to outliers, heavy tailed data and the misspecification of the PS function. Third, under some weaker regularity conditions than existing missing data literature, it has sure screening property and ranking consistency, and can well control the false discovery rate regardless of known or consistently estimated parameters in the PS function. Simulation studies and two real examples are used to investigate the performance of the proposed methodologies.
{"title":"Model-free feature screening for ultrahigh dimensional data with responses missing not at random","authors":"Yuliang Bai, Niansheng Tang","doi":"10.1016/j.jmva.2026.105605","DOIUrl":"10.1016/j.jmva.2026.105605","url":null,"abstract":"<div><div>Feature screening method is an important tool for screening active features in ultrahigh dimensional data analysis. Existing feature screening methods mainly focus on the fully observed data or missing responses at random. But in many applied fields such as biomedicine, social science and epidemiological studies, responses might be subject to nonignorable missingness due to various reasons such as dropout. To this end, this paper proposes a new adjusted Spearman rank correlation to screen active features by incorporating the Spearman rank correlation and its conditional expectation in the presence of nonignorable missing responses. To circumvent the notorious identification problem, we introduce instrumental variables into the propensity score (PS) function, which is specified by a more general semiparametric regression model. A nonparametric imputation method is developed to estimate the adjusted Spearman rank correlation. The proposed method has several desirable merits. First, it is model-free. Second, it is robust to outliers, heavy tailed data and the misspecification of the PS function. Third, under some weaker regularity conditions than existing missing data literature, it has sure screening property and ranking consistency, and can well control the false discovery rate regardless of known or consistently estimated parameters in the PS function. Simulation studies and two real examples are used to investigate the performance of the proposed methodologies.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"213 ","pages":"Article 105605"},"PeriodicalIF":1.4,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145977542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper develops a novel penalized matrix estimation method for sparse dimension reduction when detecting change points in high-dimensional data. The strategy is to project high-dimensional data onto a low-dimensional subspace without losing any change point information, enabling efficient change point detection within this dimension-reduced subspace. Theoretical analysis establishes the consistency of the proposed matrix estimation and selects consistently the important variables which have change points. Numerical studies on synthetic and several real data sets suggest that the dimension reduction strategy enhances the performance of existing approaches. Additionally, the results showcase the efficiency of the proposed algorithm for selecting important variables in high-dimensional sparse data.
{"title":"A sparse dimension-reduced subspace-based approach for detecting multiple change points in high-dimensional data","authors":"Luoyao Yu , Rongzhu Zhao , Jiaqi Huang , Lixing Zhu , Xuehu Zhu","doi":"10.1016/j.jmva.2025.105594","DOIUrl":"10.1016/j.jmva.2025.105594","url":null,"abstract":"<div><div>This paper develops a novel penalized matrix estimation method for sparse dimension reduction when detecting change points in high-dimensional data. The strategy is to project high-dimensional data onto a low-dimensional subspace without losing any change point information, enabling efficient change point detection within this dimension-reduced subspace. Theoretical analysis establishes the consistency of the proposed matrix estimation and selects consistently the important variables which have change points. Numerical studies on synthetic and several real data sets suggest that the dimension reduction strategy enhances the performance of existing approaches. Additionally, the results showcase the efficiency of the proposed algorithm for selecting important variables in high-dimensional sparse data.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"213 ","pages":"Article 105594"},"PeriodicalIF":1.4,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145880915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-23DOI: 10.1016/j.jmva.2025.105593
Shu-Yu Li, Han-Ying Liang
Based on panel data, we explore partially linear varying-coefficient quantile regression with group effects under high dimension and missing observations. Using generalized estimating equations, we construct oracle estimators along with smoothed version for the unknown parameter vector, varying-coefficient functions as well as group effects, and establish their asymptotic normality. In the estimation procedure, the within-subject correlations of the panel data are considered by introducing working correlation matrix. We further investigate variable selection by the SCAD penalty for the parameters, varying-coefficient functions and group identification simultaneously, and discuss oracle properties. Meanwhile, hypothesis tests for the parameter, varying-coefficient functions and group effects are done, asymptotic distributions of the restricted estimators and test statistics under both the null and local alternative hypotheses are analyzed. Also, simulation study and real data analysis are conducted to evaluate the performance of the proposed methods.
{"title":"Subgroup effect quantile regression with high dimensional missing panel data","authors":"Shu-Yu Li, Han-Ying Liang","doi":"10.1016/j.jmva.2025.105593","DOIUrl":"10.1016/j.jmva.2025.105593","url":null,"abstract":"<div><div>Based on panel data, we explore partially linear varying-coefficient quantile regression with group effects under high dimension and missing observations. Using generalized estimating equations, we construct oracle estimators along with smoothed version for the unknown parameter vector, varying-coefficient functions as well as group effects, and establish their asymptotic normality. In the estimation procedure, the within-subject correlations of the panel data are considered by introducing working correlation matrix. We further investigate variable selection by the SCAD penalty for the parameters, varying-coefficient functions and group identification simultaneously, and discuss oracle properties. Meanwhile, hypothesis tests for the parameter, varying-coefficient functions and group effects are done, asymptotic distributions of the restricted estimators and test statistics under both the null and local alternative hypotheses are analyzed. Also, simulation study and real data analysis are conducted to evaluate the performance of the proposed methods.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"213 ","pages":"Article 105593"},"PeriodicalIF":1.4,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145837542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-23DOI: 10.1016/j.jmva.2025.105592
Minxuan Wu , Joseph Antonelli , Zhihua Su
In this article, we extend predictor envelope models to settings with multivariate outcomes and multiple, functional predictors. We propose a two-step estimation strategy, which first projects the function onto a finite-dimensional Euclidean space before fitting the model using existing approaches to envelope models. We first develop an estimator under a linear model with continuous outcomes and then extend this procedure to the more general class of generalized linear models, which allow for a variety of outcome types. We provide asymptotic theory for these estimators showing that they are root- consistent and asymptotically normal when the regression coefficient is finite-rank. Additionally we show that consistency can be obtained even when the regression coefficient has rank that grows with the sample size. Extensive simulation studies confirm our theoretical results and show strong prediction performance of the proposed estimators. Additionally, we provide multiple data analyses showing that the proposed approach performs well in real-world settings under a variety of outcome types compared with existing dimension reduction approaches.
{"title":"Envelope-based partial least squares in functional regression","authors":"Minxuan Wu , Joseph Antonelli , Zhihua Su","doi":"10.1016/j.jmva.2025.105592","DOIUrl":"10.1016/j.jmva.2025.105592","url":null,"abstract":"<div><div>In this article, we extend predictor envelope models to settings with multivariate outcomes and multiple, functional predictors. We propose a two-step estimation strategy, which first projects the function onto a finite-dimensional Euclidean space before fitting the model using existing approaches to envelope models. We first develop an estimator under a linear model with continuous outcomes and then extend this procedure to the more general class of generalized linear models, which allow for a variety of outcome types. We provide asymptotic theory for these estimators showing that they are root-<span><math><mi>n</mi></math></span> consistent and asymptotically normal when the regression coefficient is finite-rank. Additionally we show that consistency can be obtained even when the regression coefficient has rank that grows with the sample size. Extensive simulation studies confirm our theoretical results and show strong prediction performance of the proposed estimators. Additionally, we provide multiple data analyses showing that the proposed approach performs well in real-world settings under a variety of outcome types compared with existing dimension reduction approaches.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"213 ","pages":"Article 105592"},"PeriodicalIF":1.4,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145837543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.1016/j.jmva.2025.105590
Tetsuya Umino , Kazuyoshi Yata , Makoto Aoshima
Scenarios involving high-dimensional, low-sample-size (HDLSS) data are often encountered in modern scientific fields involving genetic microarrays, medical imaging, and finance, where the number of variables can greatly exceed the number of observations. In such settings, a reliable estimation of cross-covariance structures is essential for understanding relationships between variable sets. However, classical estimators often exhibit severe noise accumulation. To address this issue, in this study, we propose a novel thresholding estimator of the cross-covariance matrix for HDLSS settings. We consider the asymptotic properties of the sample cross-covariance matrix and show that the estimator contains large amounts of noise in the high-dimensional setting, which renders it inconsistent. To solve this problem occurring in high-dimensional settings, we develop a new thresholding estimator based on the automatic sparse estimation methodology and show that the estimator is consistent under mild assumptions. We analyze and evaluate the performance of the proposed estimator based on numerical simulations and actual data analysis. The simulations demonstrate that the method attains consistency without requiring the stringent high-dimensional conditions assumed by existing approaches, and the real-data analysis illustrates its applicability to high-dimensional regression problems, wherein improved parameter estimation enhances prediction accuracy. In conclusion, our findings serve as a theoretically sound tool for cross-covariance estimation in HDLSS contexts, with potential implications for a wide range of high-dimensional data analyses.
{"title":"Automatic sparse estimation of the high-dimensional cross-covariance matrix","authors":"Tetsuya Umino , Kazuyoshi Yata , Makoto Aoshima","doi":"10.1016/j.jmva.2025.105590","DOIUrl":"10.1016/j.jmva.2025.105590","url":null,"abstract":"<div><div>Scenarios involving high-dimensional, low-sample-size (HDLSS) data are often encountered in modern scientific fields involving genetic microarrays, medical imaging, and finance, where the number of variables can greatly exceed the number of observations. In such settings, a reliable estimation of cross-covariance structures is essential for understanding relationships between variable sets. However, classical estimators often exhibit severe noise accumulation. To address this issue, in this study, we propose a novel thresholding estimator of the cross-covariance matrix for HDLSS settings. We consider the asymptotic properties of the sample cross-covariance matrix and show that the estimator contains large amounts of noise in the high-dimensional setting, which renders it inconsistent. To solve this problem occurring in high-dimensional settings, we develop a new thresholding estimator based on the automatic sparse estimation methodology and show that the estimator is consistent under mild assumptions. We analyze and evaluate the performance of the proposed estimator based on numerical simulations and actual data analysis. The simulations demonstrate that the method attains consistency without requiring the stringent high-dimensional conditions assumed by existing approaches, and the real-data analysis illustrates its applicability to high-dimensional regression problems, wherein improved parameter estimation enhances prediction accuracy. In conclusion, our findings serve as a theoretically sound tool for cross-covariance estimation in HDLSS contexts, with potential implications for a wide range of high-dimensional data analyses.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"213 ","pages":"Article 105590"},"PeriodicalIF":1.4,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145837541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}