Pub Date : 2026-07-01Epub Date: 2026-01-23DOI: 10.1016/j.jmva.2026.105614
Jie Zeng , Guozhi Hu , Weihu Cheng
This paper considers estimation problem in multivariate regression models. Under this framework, we develop a novel two-stage model averaging procedure. In the first stage, we construct a scalable model averaging estimator which involves transforming the original model based on the singular value decomposition. When the dimension of the regressor vector is , this approach enables us to average the estimators from the candidate model set of size instead of size . The second stage is to find the optimal weights for averaging by applying a weight choice criterion from Kullback–Leibler distance. We prove that the minimum weighted squared loss from the scalable model averaging is asymptotically the same as that from original model averaging, further demonstrate asymptotic optimality of the scalable model averaging estimator using Kullback–Leibler-distance-based weights, and derive the rate of the resulting weights tending to the risk-based optimal weights. In comparison with existing model averaging methods, the simulation results show that, in terms of weighted mean squared prediction error and computation time, our proposal is more efficient, especially under the situation where the number of candidate models is large and the sample size is small. Moreover, a real data analysis is provided to illustrate the application of our method in practice.
{"title":"A scalable model averaging based on Kullback–Leibler distance for multivariate regression models","authors":"Jie Zeng , Guozhi Hu , Weihu Cheng","doi":"10.1016/j.jmva.2026.105614","DOIUrl":"10.1016/j.jmva.2026.105614","url":null,"abstract":"<div><div>This paper considers estimation problem in multivariate regression models. Under this framework, we develop a novel two-stage model averaging procedure. In the first stage, we construct a scalable model averaging estimator which involves transforming the original model based on the singular value decomposition. When the dimension of the regressor vector is <span><math><mi>K</mi></math></span>, this approach enables us to average the estimators from the candidate model set of size <span><math><mi>K</mi></math></span> instead of size <span><math><msup><mrow><mn>2</mn></mrow><mrow><mi>K</mi></mrow></msup></math></span>. The second stage is to find the optimal weights for averaging by applying a weight choice criterion from Kullback–Leibler distance. We prove that the minimum weighted squared loss from the scalable model averaging is asymptotically the same as that from original model averaging, further demonstrate asymptotic optimality of the scalable model averaging estimator using Kullback–Leibler-distance-based weights, and derive the rate of the resulting weights tending to the risk-based optimal weights. In comparison with existing model averaging methods, the simulation results show that, in terms of weighted mean squared prediction error and computation time, our proposal is more efficient, especially under the situation where the number of candidate models is large and the sample size is small. Moreover, a real data analysis is provided to illustrate the application of our method in practice.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"214 ","pages":"Article 105614"},"PeriodicalIF":1.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-02-20DOI: 10.1016/j.jmva.2026.105626
David del Val , José R. Berrendero , Alberto Suárez
Partial least squares (PLS) is a dimensionality reduction technique introduced in the field of chemometrics and successfully employed in numerous areas of application. The PLS components are obtained by maximizing the covariance between linear combinations of the regressors and of the target variables. In this work, we focus on its application to scalar regression problems. PLS regression consists in finding the least squares predictor that is a linear combination of a subset of the PLS components. Alternatively, PLS regression can be formulated as a least squares problem restricted to a Krylov subspace. This equivalent formulation is employed to analyze the distance between , the PLS estimator of the vector of coefficients of the linear regression model based on PLS components, and , the one obtained by ordinary least squares (OLS), as a function of . Specifically, is the vector of coefficients in the aforementioned Krylov subspace that is closest to in terms of the Mahalanobis distance with respect to the covariance matrix of the OLS estimate. We provide a bound on this distance that depends only on the distribution of the eigenvalues of the regressor covariance matrix. Numerical examples on synthetic and real-world data are used to illustrate how the distance between and depends on the number of clusters in which the eigenvalues of the regressor covariance matrix are grouped.
{"title":"Relation between PLS and OLS regression in terms of the eigenvalue distribution of the regressor covariance matrix","authors":"David del Val , José R. Berrendero , Alberto Suárez","doi":"10.1016/j.jmva.2026.105626","DOIUrl":"10.1016/j.jmva.2026.105626","url":null,"abstract":"<div><div>Partial least squares (PLS) is a dimensionality reduction technique introduced in the field of chemometrics and successfully employed in numerous areas of application. The PLS components are obtained by maximizing the covariance between linear combinations of the regressors and of the target variables. In this work, we focus on its application to scalar regression problems. PLS regression consists in finding the least squares predictor that is a linear combination of a subset of the PLS components. Alternatively, PLS regression can be formulated as a least squares problem restricted to a Krylov subspace. This equivalent formulation is employed to analyze the distance between <span><math><msubsup><mrow><mover><mrow><mi>β</mi></mrow><mrow><mo>ˆ</mo></mrow></mover></mrow><mrow><mi>PLS</mi></mrow><mrow><mrow><mo>(</mo><mi>L</mi><mo>)</mo></mrow></mrow></msubsup></math></span>, the PLS estimator of the vector of coefficients of the linear regression model based on <span><math><mi>L</mi></math></span> PLS components, and <span><math><msub><mrow><mover><mrow><mi>β</mi></mrow><mrow><mo>ˆ</mo></mrow></mover></mrow><mrow><mi>OLS</mi></mrow></msub></math></span>, the one obtained by ordinary least squares (OLS), as a function of <span><math><mi>L</mi></math></span>. Specifically, <span><math><msubsup><mrow><mover><mrow><mi>β</mi></mrow><mrow><mo>ˆ</mo></mrow></mover></mrow><mrow><mi>PLS</mi></mrow><mrow><mrow><mo>(</mo><mi>L</mi><mo>)</mo></mrow></mrow></msubsup></math></span> is the vector of coefficients in the aforementioned Krylov subspace that is closest to <span><math><msub><mrow><mover><mrow><mi>β</mi></mrow><mrow><mo>ˆ</mo></mrow></mover></mrow><mrow><mi>OLS</mi></mrow></msub></math></span> in terms of the Mahalanobis distance with respect to the covariance matrix of the OLS estimate. We provide a bound on this distance that depends only on the distribution of the eigenvalues of the regressor covariance matrix. Numerical examples on synthetic and real-world data are used to illustrate how the distance between <span><math><msubsup><mrow><mover><mrow><mi>β</mi></mrow><mrow><mo>ˆ</mo></mrow></mover></mrow><mrow><mi>PLS</mi></mrow><mrow><mrow><mo>(</mo><mi>L</mi><mo>)</mo></mrow></mrow></msubsup></math></span> and <span><math><msub><mrow><mover><mrow><mi>β</mi></mrow><mrow><mo>ˆ</mo></mrow></mover></mrow><mrow><mi>OLS</mi></mrow></msub></math></span> depends on the number of clusters in which the eigenvalues of the regressor covariance matrix are grouped.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"214 ","pages":"Article 105626"},"PeriodicalIF":1.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147385534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-02-02DOI: 10.1016/j.jmva.2026.105617
Marléne Baumeister , Konstantin Emil Thiel , Lynn Matits , Georg Zimmermann , Markus Pauly , Paavo Sattler
Evaluating intervention effects on multiple outcomes is a central research goal in a wide range of quantitative sciences. It is thereby common to compare interventions among each other and with a control across several, potentially highly correlated, outcome variables. In this context, researchers are interested in identifying effects at both, the global level (across all outcome variables) and the local level (for specific variables). At the same time, potential confounding must be accounted for. This leads to the need for powerful multiple contrast testing procedures (mctps) capable of handling multivariate outcomes and covariates. Given this background, we propose an extension of mctps within a semiparametric mancova framework that allows applicability beyond multivariate normality, homoscedasticity, or non-singular covariance structures. To realize this, we implement a generalized resampling-based method for the determination of critical values. We illustrate our approach by analysing multivariate psychological intervention data, evaluating joint physiological and psychological constructs such as heart rate variability.
{"title":"Multivariate and multiple contrast testing in general covariate-adjusted factorial designs","authors":"Marléne Baumeister , Konstantin Emil Thiel , Lynn Matits , Georg Zimmermann , Markus Pauly , Paavo Sattler","doi":"10.1016/j.jmva.2026.105617","DOIUrl":"10.1016/j.jmva.2026.105617","url":null,"abstract":"<div><div>Evaluating intervention effects on multiple outcomes is a central research goal in a wide range of quantitative sciences. It is thereby common to compare interventions among each other and with a control across several, potentially highly correlated, outcome variables. In this context, researchers are interested in identifying effects at both, the global level (across all outcome variables) and the local level (for specific variables). At the same time, potential confounding must be accounted for. This leads to the need for powerful multiple contrast testing procedures (<span>mctp</span>s) capable of handling multivariate outcomes and covariates. Given this background, we propose an extension of <span>mctp</span>s within a semiparametric <span>mancova</span> framework that allows applicability beyond multivariate normality, homoscedasticity, or non-singular covariance structures. To realize this, we implement a generalized resampling-based method for the determination of critical values. We illustrate our approach by analysing multivariate psychological intervention data, evaluating joint physiological and psychological constructs such as heart rate variability.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"214 ","pages":"Article 105617"},"PeriodicalIF":1.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146170928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We offer a novel test of mutual independence based on consistent estimates of the area under the Kendall curve. We also present an index of dependence that allows one to measure the mutual dependence of a -dimensional random vector with . The index is based on a -dimensional Kendall process. We discuss a standardized version of our index of dependence that is easy to interpret, and provide an algorithm for its computation. Based on the proposed index of dependence, we exemplify a novel method for searching for patterns in the dependence structure. We evaluate the performance of our procedures via simulation, and apply our methods to a real data set.
{"title":"AUK-based test for mutual independence and an index of mutual dependence","authors":"Georgios Afendras , Marianthi Markatou , Nickos Papantonis","doi":"10.1016/j.jmva.2025.105589","DOIUrl":"10.1016/j.jmva.2025.105589","url":null,"abstract":"<div><div>We offer a novel test of mutual independence based on consistent estimates of the area under the Kendall curve. We also present an index of dependence that allows one to measure the mutual dependence of a <span><math><mi>d</mi></math></span>-dimensional random vector with <span><math><mrow><mi>d</mi><mo>></mo><mn>2</mn></mrow></math></span>. The index is based on a <span><math><mi>d</mi></math></span>-dimensional Kendall process. We discuss a standardized version of our index of dependence that is easy to interpret, and provide an algorithm for its computation. Based on the proposed index of dependence, we exemplify a novel method for searching for patterns in the dependence structure. We evaluate the performance of our procedures via simulation, and apply our methods to a real data set.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"214 ","pages":"Article 105589"},"PeriodicalIF":1.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146170929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-03-06DOI: 10.1016/j.jmva.2026.105630
Jonathan Ansari , Marcus Rockel
The rank correlation recently established by Sourav Chatterjee and already popular in the statistics literature, takes values in where 0 characterises independence of and and 1 characterises perfect dependence of on Unlike concordance measures such as Spearman’s which capture the degree of positive or negative dependence, quantifies the strength of functional dependence. In this paper, we study the attainable set of pairs . The resulting --region is a convex set whose boundary is characterised by a novel family of absolutely continuous, asymmetric copulas having a diagonal band structure. Moreover, we prove that whenever is stochastically increasing or decreasing in and we identify the maximal difference as exactly Our proofs rely on a convex optimisation problem under various equality and inequality constraints, as well as on ordering properties for and Our results contribute to a better understanding of Chatterjee’s rank correlation, which typically yields substantially smaller values than Spearman’s rho when quantifying positive dependencies. In particular, when interpreting the values of Chatterjee’s rank correlation on the scale of , the quantity appears to be more appropriate.
{"title":"The exact region and an inequality between Chatterjee’s and Spearman’s rank correlations","authors":"Jonathan Ansari , Marcus Rockel","doi":"10.1016/j.jmva.2026.105630","DOIUrl":"10.1016/j.jmva.2026.105630","url":null,"abstract":"<div><div>The rank correlation <span><math><mrow><mi>ξ</mi><mrow><mo>(</mo><mi>X</mi><mo>,</mo><mi>Y</mi><mo>)</mo></mrow><mo>,</mo></mrow></math></span> recently established by Sourav Chatterjee and already popular in the statistics literature, takes values in <span><math><mrow><mrow><mo>[</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>]</mo></mrow><mo>,</mo></mrow></math></span> where 0 characterises independence of <span><math><mi>X</mi></math></span> and <span><math><mrow><mi>Y</mi><mo>,</mo></mrow></math></span> and 1 characterises perfect dependence of <span><math><mi>Y</mi></math></span> on <span><math><mrow><mi>X</mi><mo>.</mo></mrow></math></span> Unlike concordance measures such as Spearman’s <span><math><mrow><mi>ρ</mi><mo>,</mo></mrow></math></span> which capture the degree of positive or negative dependence, <span><math><mi>ξ</mi></math></span> quantifies the strength of functional dependence. In this paper, we study the attainable set of pairs <span><math><mrow><mo>(</mo><mi>ξ</mi><mrow><mo>(</mo><mi>X</mi><mo>,</mo><mi>Y</mi><mo>)</mo></mrow><mo>,</mo><mi>ρ</mi><mrow><mo>(</mo><mi>X</mi><mo>,</mo><mi>Y</mi><mo>)</mo></mrow><mo>)</mo></mrow></math></span>. The resulting <span><math><mi>ξ</mi></math></span>-<span><math><mi>ρ</mi></math></span>-region is a convex set whose boundary is characterised by a novel family of absolutely continuous, asymmetric copulas having a diagonal band structure. Moreover, we prove that <span><math><mrow><mi>ξ</mi><mrow><mo>(</mo><mi>X</mi><mo>,</mo><mi>Y</mi><mo>)</mo></mrow><mo>≤</mo><mrow><mo>|</mo><mi>ρ</mi><mrow><mo>(</mo><mi>X</mi><mo>,</mo><mi>Y</mi><mo>)</mo></mrow><mo>|</mo></mrow></mrow></math></span> whenever <span><math><mi>Y</mi></math></span> is stochastically increasing or decreasing in <span><math><mrow><mi>X</mi><mo>,</mo></mrow></math></span> and we identify the maximal difference <span><math><mrow><mi>ρ</mi><mrow><mo>(</mo><mi>X</mi><mo>,</mo><mi>Y</mi><mo>)</mo></mrow><mo>−</mo><mi>ξ</mi><mrow><mo>(</mo><mi>X</mi><mo>,</mo><mi>Y</mi><mo>)</mo></mrow></mrow></math></span> as exactly <span><math><mrow><mn>0</mn><mo>.</mo><mn>4</mn><mo>.</mo></mrow></math></span> Our proofs rely on a convex optimisation problem under various equality and inequality constraints, as well as on ordering properties for <span><math><mi>ξ</mi></math></span> and <span><math><mrow><mi>ρ</mi><mo>.</mo></mrow></math></span> Our results contribute to a better understanding of Chatterjee’s rank correlation, which typically yields substantially smaller values than Spearman’s rho when quantifying positive dependencies. In particular, when interpreting the values of Chatterjee’s rank correlation on the scale of <span><math><mi>ρ</mi></math></span>, the quantity <span><math><msqrt><mrow><mi>ξ</mi></mrow></msqrt></math></span> appears to be more appropriate.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"214 ","pages":"Article 105630"},"PeriodicalIF":1.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147385530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-03-07DOI: 10.1016/j.jmva.2026.105631
Yixiao Liu, Pengjian Shang
In this paper, we propose a novel Euclidean-distance-based coefficient, named differential distance correlation, to measure the strength of dependence between a random variable and a random vector . The coefficient has a concise expression and is invariant to arbitrary orthogonal transformations of the random vector. Moreover, the coefficient is a strongly consistent estimator of a simple and interpretable dependent measure, which is 0 if and only if and are independent and equal to 1 if and only if determines almost surely. An alternative approach is also proposed to address the limitation that the coefficient is non-robust to outliers. Furthermore, the coefficient exhibits asymptotic normality with a simple variance under the independent hypothesis, facilitating fast and accurate estimation of -value for testing independence. Three simulation experiments show that the proposed coefficient is more computationally efficient for independence testing and more effective in detecting oscillatory relationships than several competing methods. We also apply our method to analyze a real data example.
{"title":"Differential distance correlation and its applications","authors":"Yixiao Liu, Pengjian Shang","doi":"10.1016/j.jmva.2026.105631","DOIUrl":"10.1016/j.jmva.2026.105631","url":null,"abstract":"<div><div>In this paper, we propose a novel Euclidean-distance-based coefficient, named differential distance correlation, to measure the strength of dependence between a random variable <span><math><mrow><mi>Y</mi><mo>∈</mo><mi>R</mi></mrow></math></span> and a random vector <span><math><mrow><mi>X</mi><mo>∈</mo><msup><mrow><mi>R</mi></mrow><mrow><mi>p</mi></mrow></msup></mrow></math></span>. The coefficient has a concise expression and is invariant to arbitrary orthogonal transformations of the random vector. Moreover, the coefficient is a strongly consistent estimator of a simple and interpretable dependent measure, which is 0 if and only if <span><math><mi>X</mi></math></span> and <span><math><mi>Y</mi></math></span> are independent and equal to 1 if and only if <span><math><mi>Y</mi></math></span> determines <span><math><mi>X</mi></math></span> almost surely. An alternative approach is also proposed to address the limitation that the coefficient is non-robust to outliers. Furthermore, the coefficient exhibits asymptotic normality with a simple variance under the independent hypothesis, facilitating fast and accurate estimation of <span><math><mi>p</mi></math></span>-value for testing independence. Three simulation experiments show that the proposed coefficient is more computationally efficient for independence testing and more effective in detecting oscillatory relationships than several competing methods. We also apply our method to analyze a real data example.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"214 ","pages":"Article 105631"},"PeriodicalIF":1.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147385536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-03-07DOI: 10.1016/j.jmva.2026.105629
Wenshan Wang , Xiufang Liu , Dianliang Deng
This study delves into kernel quantile regression estimation for a semiparametric partially linear time-varying-coefficient model, which incorporates a history process with time-dependent covariates and a right-censored time-to-event variable. We propose a three-stage approach to construct the estimators of the parametric portion and nonparametric time-varying-coefficient function for this model, in view of inverse probability of censoring weighting (IPCW) technique. Additionally, we offer a procedure for variable selection among the time-dependent covariates in the parametric segment through the use of an adaptive LASSO penalty. The paper establishes the asymptotic normality of the proposed estimators and demonstrates that the penalized estimators possess the oracle property. A numerical simulation is implemented to evaluate the performance of the proposed estimators. Eventually, we apply the developed method to analyze medical cost data from a multicenter automatic defibrillator implantation trial (MADIT) to illustrate its practical utility.
{"title":"Kernel quantile regression for semiparametric partially linear time-varying-coefficient model based on a history process of longitudinal data","authors":"Wenshan Wang , Xiufang Liu , Dianliang Deng","doi":"10.1016/j.jmva.2026.105629","DOIUrl":"10.1016/j.jmva.2026.105629","url":null,"abstract":"<div><div>This study delves into kernel quantile regression estimation for a semiparametric partially linear time-varying-coefficient model, which incorporates a history process with time-dependent covariates and a right-censored time-to-event variable. We propose a three-stage approach to construct the estimators of the parametric portion and nonparametric time-varying-coefficient function for this model, in view of inverse probability of censoring weighting (IPCW) technique. Additionally, we offer a procedure for variable selection among the time-dependent covariates in the parametric segment through the use of an adaptive LASSO penalty. The paper establishes the asymptotic normality of the proposed estimators and demonstrates that the penalized estimators possess the oracle property. A numerical simulation is implemented to evaluate the performance of the proposed estimators. Eventually, we apply the developed method to analyze medical cost data from a multicenter automatic defibrillator implantation trial (MADIT) to illustrate its practical utility.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"214 ","pages":"Article 105629"},"PeriodicalIF":1.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147385537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-23DOI: 10.1016/j.jmva.2025.105591
Alejandro Cholaquidis , Antonio Cuevas , Beatriz Pateiro-López
The problem of estimating, from a random sample of points, the dimension of a compact subset of the Euclidean space is considered. The emphasis is put on consistency results in the statistical sense. That is, statements of convergence to the true dimension value when the sample size grows to infinity. Among the many available definitions of dimension, we have focused (on the grounds of its statistical tractability) on three notions: the Minkowski dimension, the correlation dimension and the, perhaps less popular, concept of pointwise dimension. We prove the statistical consistency of some natural estimators of these quantities. Our proofs partially rely on the use of an instrumental estimator formulated in terms of the empirical volume function , defined as the Lebesgue measure of the set of points whose distance to the sample is at most . In particular, we explore the case in which the true volume function of the target set is a polynomial on some interval starting at zero. An empirical study is also included. Our study aims to provide some theoretical support, and some practical insights, for the problem of deciding whether or not the set has a dimension smaller than that of the ambient space. This is a major statistical motivation of the dimension studies, in connection with the so-called “Manifold Hypothesis”.
{"title":"On consistent estimation of dimension values","authors":"Alejandro Cholaquidis , Antonio Cuevas , Beatriz Pateiro-López","doi":"10.1016/j.jmva.2025.105591","DOIUrl":"10.1016/j.jmva.2025.105591","url":null,"abstract":"<div><div>The problem of estimating, from a random sample of points, the dimension of a compact subset <span><math><mi>S</mi></math></span> of the Euclidean space is considered. The emphasis is put on consistency results in the statistical sense. That is, statements of convergence to the true dimension value when the sample size grows to infinity. Among the many available definitions of dimension, we have focused (on the grounds of its statistical tractability) on three notions: the Minkowski dimension, the correlation dimension and the, perhaps less popular, concept of pointwise dimension. We prove the statistical consistency of some natural estimators of these quantities. Our proofs partially rely on the use of an instrumental estimator formulated in terms of the empirical volume function <span><math><mrow><msub><mrow><mi>V</mi></mrow><mrow><mi>n</mi></mrow></msub><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow></mrow></math></span>, defined as the Lebesgue measure of the set of points whose distance to the sample is at most <span><math><mi>r</mi></math></span>. In particular, we explore the case in which the true volume function <span><math><mrow><mi>V</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow></mrow></math></span> of the target set <span><math><mi>S</mi></math></span> is a polynomial on some interval starting at zero. An empirical study is also included. Our study aims to provide some theoretical support, and some practical insights, for the problem of deciding whether or not the set <span><math><mi>S</mi></math></span> has a dimension smaller than that of the ambient space. This is a major statistical motivation of the dimension studies, in connection with the so-called “Manifold Hypothesis”.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"214 ","pages":"Article 105591"},"PeriodicalIF":1.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-02-14DOI: 10.1016/j.jmva.2026.105624
Xin Wang , Hongxin Zhao , Zhenwei Zhou , Lingchen Kong , Liqun Wang
The estimation of high-dimensional covariance matrix plays an important role in many application fields such as economics, biology, social and health sciences. A mainstream structural assumption for enhancing estimator accuracy is that the covariance matrix is sparse or approximately sparse. This paper proposes an adaptive regularized estimator with minimum eigenvalue constraint for high-dimensional sparse covariance matrix. This method eliminates the need for the conventional two-stage framework of sequential correlation and covariance matrix estimation. Under appropriate regularity conditions, we analyze its asymptotic and finite sample properties. The proposed iterative reweighted minimization method and its inexact variant can be employed to find a desired estimate. Simulation studies confirm that the proposed estimation performs better than some other state-of-the-art methods.
{"title":"Adaptive ℓq regularized estimation for high-dimensional sparse covariance matrix","authors":"Xin Wang , Hongxin Zhao , Zhenwei Zhou , Lingchen Kong , Liqun Wang","doi":"10.1016/j.jmva.2026.105624","DOIUrl":"10.1016/j.jmva.2026.105624","url":null,"abstract":"<div><div>The estimation of high-dimensional covariance matrix plays an important role in many application fields such as economics, biology, social and health sciences. A mainstream structural assumption for enhancing estimator accuracy is that the covariance matrix is sparse or approximately sparse. This paper proposes an adaptive <span><math><mrow><msub><mrow><mi>ℓ</mi></mrow><mrow><mi>q</mi></mrow></msub><mrow><mo>(</mo><mn>0</mn><mo><</mo><mi>q</mi><mo><</mo><mn>1</mn><mo>)</mo></mrow></mrow></math></span> regularized estimator with minimum eigenvalue constraint for high-dimensional sparse covariance matrix. This method eliminates the need for the conventional two-stage framework of sequential correlation and covariance matrix estimation. Under appropriate regularity conditions, we analyze its asymptotic and finite sample properties. The proposed iterative reweighted minimization method and its inexact variant can be employed to find a desired estimate. Simulation studies confirm that the proposed estimation performs better than some other state-of-the-art methods.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"214 ","pages":"Article 105624"},"PeriodicalIF":1.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147385533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-05-01Epub Date: 2025-12-03DOI: 10.1016/j.jmva.2025.105563
Hanteng Ma , Peijun Sang , Xingdong Feng , Xin Liu
Functional classification has been increasingly helpful in exploring and predicting a response variable with multiple categories. In fact, both functional and scalar covariates may be useful and should be included in the model simultaneously, and thus developing a robust multi-categorical functional classifier with statistical guarantees is desirable. However, both of these two issues are rarely touched in previous studies. Motivated by these, in this paper we propose a novel large margin linear mixed functional classifier for the response with multiple categories, which includes both functional and scalar covariates as predictors, especially when functional data are sparsely longitudinal. Not only does the proposed method address the functional classification using a combination of both functional and scalar covariates, but also provides a robust multi-categorical mixed functional classifier using a large margin loss adaptive to observed samples. Furthermore, we establish statistical theories of a mixed functional classifier, which have been less considered in existing literature. An efficient algorithm is also proposed for its practical implementation. Numerical investigations have supported the superb performance of the proposed method on both simulated and real datasets.
{"title":"A robust mixed functional classifier with adaptive large margin loss","authors":"Hanteng Ma , Peijun Sang , Xingdong Feng , Xin Liu","doi":"10.1016/j.jmva.2025.105563","DOIUrl":"10.1016/j.jmva.2025.105563","url":null,"abstract":"<div><div>Functional classification has been increasingly helpful in exploring and predicting a response variable with multiple categories. In fact, both functional and scalar covariates may be useful and should be included in the model simultaneously, and thus developing a robust multi-categorical functional classifier with statistical guarantees is desirable. However, both of these two issues are rarely touched in previous studies. Motivated by these, in this paper we propose a novel large margin linear mixed functional classifier for the response with multiple categories, which includes both functional and scalar covariates as predictors, especially when functional data are sparsely longitudinal. Not only does the proposed method address the functional classification using a combination of both functional and scalar covariates, but also provides a robust multi-categorical mixed functional classifier using a large margin loss adaptive to observed samples. Furthermore, we establish statistical theories of a mixed functional classifier, which have been less considered in existing literature. An efficient algorithm is also proposed for its practical implementation. Numerical investigations have supported the superb performance of the proposed method on both simulated and real datasets.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"213 ","pages":"Article 105563"},"PeriodicalIF":1.4,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145735425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}