首页 > 最新文献

Journal of Multivariate Analysis最新文献

英文 中文
A scalable model averaging based on Kullback–Leibler distance for multivariate regression models 多元回归模型中基于Kullback-Leibler距离的可伸缩模型平均
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-07-01 Epub Date: 2026-01-23 DOI: 10.1016/j.jmva.2026.105614
Jie Zeng , Guozhi Hu , Weihu Cheng
This paper considers estimation problem in multivariate regression models. Under this framework, we develop a novel two-stage model averaging procedure. In the first stage, we construct a scalable model averaging estimator which involves transforming the original model based on the singular value decomposition. When the dimension of the regressor vector is K, this approach enables us to average the estimators from the candidate model set of size K instead of size 2K. The second stage is to find the optimal weights for averaging by applying a weight choice criterion from Kullback–Leibler distance. We prove that the minimum weighted squared loss from the scalable model averaging is asymptotically the same as that from original model averaging, further demonstrate asymptotic optimality of the scalable model averaging estimator using Kullback–Leibler-distance-based weights, and derive the rate of the resulting weights tending to the risk-based optimal weights. In comparison with existing model averaging methods, the simulation results show that, in terms of weighted mean squared prediction error and computation time, our proposal is more efficient, especially under the situation where the number of candidate models is large and the sample size is small. Moreover, a real data analysis is provided to illustrate the application of our method in practice.
本文研究多元回归模型中的估计问题。在此框架下,我们开发了一种新的两阶段模型平均方法。在第一阶段,我们构造了一个可扩展的模型平均估计器,它涉及到基于奇异值分解的原始模型的转换。当回归向量的维度为K时,这种方法使我们能够平均大小为K而不是大小为2K的候选模型集的估计量。第二阶段是应用Kullback-Leibler距离的权值选择准则,求出最优的平均权值。我们证明了可扩展模型平均的最小加权平方损失与原始模型平均的最小加权平方损失是渐近相同的,进一步利用基于kullback - leibler -distance的权值证明了可扩展模型平均估计的渐近最优性,并推导了结果权值趋向于基于风险的最优权值的比率。仿真结果表明,与现有的模型平均方法相比,在加权均方预测误差和计算时间方面,我们的方法效率更高,特别是在候选模型数量大、样本量小的情况下。并通过一个实际的数据分析来说明本文方法在实际中的应用。
{"title":"A scalable model averaging based on Kullback–Leibler distance for multivariate regression models","authors":"Jie Zeng ,&nbsp;Guozhi Hu ,&nbsp;Weihu Cheng","doi":"10.1016/j.jmva.2026.105614","DOIUrl":"10.1016/j.jmva.2026.105614","url":null,"abstract":"<div><div>This paper considers estimation problem in multivariate regression models. Under this framework, we develop a novel two-stage model averaging procedure. In the first stage, we construct a scalable model averaging estimator which involves transforming the original model based on the singular value decomposition. When the dimension of the regressor vector is <span><math><mi>K</mi></math></span>, this approach enables us to average the estimators from the candidate model set of size <span><math><mi>K</mi></math></span> instead of size <span><math><msup><mrow><mn>2</mn></mrow><mrow><mi>K</mi></mrow></msup></math></span>. The second stage is to find the optimal weights for averaging by applying a weight choice criterion from Kullback–Leibler distance. We prove that the minimum weighted squared loss from the scalable model averaging is asymptotically the same as that from original model averaging, further demonstrate asymptotic optimality of the scalable model averaging estimator using Kullback–Leibler-distance-based weights, and derive the rate of the resulting weights tending to the risk-based optimal weights. In comparison with existing model averaging methods, the simulation results show that, in terms of weighted mean squared prediction error and computation time, our proposal is more efficient, especially under the situation where the number of candidate models is large and the sample size is small. Moreover, a real data analysis is provided to illustrate the application of our method in practice.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"214 ","pages":"Article 105614"},"PeriodicalIF":1.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Relation between PLS and OLS regression in terms of the eigenvalue distribution of the regressor covariance matrix PLS和OLS回归之间的关系,回归量协方差矩阵的特征值分布
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-07-01 Epub Date: 2026-02-20 DOI: 10.1016/j.jmva.2026.105626
David del Val , José R. Berrendero , Alberto Suárez
Partial least squares (PLS) is a dimensionality reduction technique introduced in the field of chemometrics and successfully employed in numerous areas of application. The PLS components are obtained by maximizing the covariance between linear combinations of the regressors and of the target variables. In this work, we focus on its application to scalar regression problems. PLS regression consists in finding the least squares predictor that is a linear combination of a subset of the PLS components. Alternatively, PLS regression can be formulated as a least squares problem restricted to a Krylov subspace. This equivalent formulation is employed to analyze the distance between βˆPLS(L), the PLS estimator of the vector of coefficients of the linear regression model based on L PLS components, and βˆOLS, the one obtained by ordinary least squares (OLS), as a function of L. Specifically, βˆPLS(L) is the vector of coefficients in the aforementioned Krylov subspace that is closest to βˆOLS in terms of the Mahalanobis distance with respect to the covariance matrix of the OLS estimate. We provide a bound on this distance that depends only on the distribution of the eigenvalues of the regressor covariance matrix. Numerical examples on synthetic and real-world data are used to illustrate how the distance between βˆPLS(L) and βˆOLS depends on the number of clusters in which the eigenvalues of the regressor covariance matrix are grouped.
偏最小二乘(PLS)是化学计量学领域引入的一种降维技术,已成功应用于许多领域。PLS分量是通过最大化回归量和目标变量的线性组合之间的协方差来获得的。在这项工作中,我们重点研究了它在标量回归问题中的应用。PLS回归包括找到最小二乘预测器,它是PLS组件子集的线性组合。另外,PLS回归可以被表述为一个限制在Krylov子空间的最小二乘问题。该等效公式用于分析基于L PLS分量的线性回归模型的系数向量的PLS估计量β - PLS(L)与由普通最小二乘(OLS)得到的系数向量β - OLS之间的距离,作为L的函数。具体而言,β - PLS(L)是上述Krylov子空间中最接近β - OLS的系数向量,其相对于OLS估计的协方差矩阵的马氏距离最接近β - OLS。我们提供了这个距离的边界,它只依赖于回归协方差矩阵的特征值的分布。使用合成数据和实际数据的数值示例来说明β - PLS(L)和β - OLS之间的距离如何取决于回归协方差矩阵的特征值分组的聚类数量。
{"title":"Relation between PLS and OLS regression in terms of the eigenvalue distribution of the regressor covariance matrix","authors":"David del Val ,&nbsp;José R. Berrendero ,&nbsp;Alberto Suárez","doi":"10.1016/j.jmva.2026.105626","DOIUrl":"10.1016/j.jmva.2026.105626","url":null,"abstract":"<div><div>Partial least squares (PLS) is a dimensionality reduction technique introduced in the field of chemometrics and successfully employed in numerous areas of application. The PLS components are obtained by maximizing the covariance between linear combinations of the regressors and of the target variables. In this work, we focus on its application to scalar regression problems. PLS regression consists in finding the least squares predictor that is a linear combination of a subset of the PLS components. Alternatively, PLS regression can be formulated as a least squares problem restricted to a Krylov subspace. This equivalent formulation is employed to analyze the distance between <span><math><msubsup><mrow><mover><mrow><mi>β</mi></mrow><mrow><mo>ˆ</mo></mrow></mover></mrow><mrow><mi>PLS</mi></mrow><mrow><mrow><mo>(</mo><mi>L</mi><mo>)</mo></mrow></mrow></msubsup></math></span>, the PLS estimator of the vector of coefficients of the linear regression model based on <span><math><mi>L</mi></math></span> PLS components, and <span><math><msub><mrow><mover><mrow><mi>β</mi></mrow><mrow><mo>ˆ</mo></mrow></mover></mrow><mrow><mi>OLS</mi></mrow></msub></math></span>, the one obtained by ordinary least squares (OLS), as a function of <span><math><mi>L</mi></math></span>. Specifically, <span><math><msubsup><mrow><mover><mrow><mi>β</mi></mrow><mrow><mo>ˆ</mo></mrow></mover></mrow><mrow><mi>PLS</mi></mrow><mrow><mrow><mo>(</mo><mi>L</mi><mo>)</mo></mrow></mrow></msubsup></math></span> is the vector of coefficients in the aforementioned Krylov subspace that is closest to <span><math><msub><mrow><mover><mrow><mi>β</mi></mrow><mrow><mo>ˆ</mo></mrow></mover></mrow><mrow><mi>OLS</mi></mrow></msub></math></span> in terms of the Mahalanobis distance with respect to the covariance matrix of the OLS estimate. We provide a bound on this distance that depends only on the distribution of the eigenvalues of the regressor covariance matrix. Numerical examples on synthetic and real-world data are used to illustrate how the distance between <span><math><msubsup><mrow><mover><mrow><mi>β</mi></mrow><mrow><mo>ˆ</mo></mrow></mover></mrow><mrow><mi>PLS</mi></mrow><mrow><mrow><mo>(</mo><mi>L</mi><mo>)</mo></mrow></mrow></msubsup></math></span> and <span><math><msub><mrow><mover><mrow><mi>β</mi></mrow><mrow><mo>ˆ</mo></mrow></mover></mrow><mrow><mi>OLS</mi></mrow></msub></math></span> depends on the number of clusters in which the eigenvalues of the regressor covariance matrix are grouped.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"214 ","pages":"Article 105626"},"PeriodicalIF":1.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147385534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multivariate and multiple contrast testing in general covariate-adjusted factorial designs 一般协变量调整因子设计的多变量和多重对比检验
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-07-01 Epub Date: 2026-02-02 DOI: 10.1016/j.jmva.2026.105617
Marléne Baumeister , Konstantin Emil Thiel , Lynn Matits , Georg Zimmermann , Markus Pauly , Paavo Sattler
Evaluating intervention effects on multiple outcomes is a central research goal in a wide range of quantitative sciences. It is thereby common to compare interventions among each other and with a control across several, potentially highly correlated, outcome variables. In this context, researchers are interested in identifying effects at both, the global level (across all outcome variables) and the local level (for specific variables). At the same time, potential confounding must be accounted for. This leads to the need for powerful multiple contrast testing procedures (mctps) capable of handling multivariate outcomes and covariates. Given this background, we propose an extension of mctps within a semiparametric mancova framework that allows applicability beyond multivariate normality, homoscedasticity, or non-singular covariance structures. To realize this, we implement a generalized resampling-based method for the determination of critical values. We illustrate our approach by analysing multivariate psychological intervention data, evaluating joint physiological and psychological constructs such as heart rate variability.
评估多种结果的干预效果是众多定量科学研究的中心目标。因此,比较彼此之间的干预措施以及在几个可能高度相关的结果变量之间进行对照是很常见的。在这种情况下,研究人员感兴趣的是确定在全球层面(所有结果变量)和局部层面(特定变量)的影响。同时,潜在的混淆也必须考虑在内。这导致需要强大的多重对比测试程序(mctps)能够处理多变量结果和协变量。在此背景下,我们提出在半参数方差框架内扩展mctps,使其适用于多元正态性、同方差或非奇异协方差结构。为了实现这一点,我们实现了一种基于广义重采样的方法来确定临界值。我们通过分析多变量心理干预数据,评估关节生理和心理结构(如心率变异性)来说明我们的方法。
{"title":"Multivariate and multiple contrast testing in general covariate-adjusted factorial designs","authors":"Marléne Baumeister ,&nbsp;Konstantin Emil Thiel ,&nbsp;Lynn Matits ,&nbsp;Georg Zimmermann ,&nbsp;Markus Pauly ,&nbsp;Paavo Sattler","doi":"10.1016/j.jmva.2026.105617","DOIUrl":"10.1016/j.jmva.2026.105617","url":null,"abstract":"<div><div>Evaluating intervention effects on multiple outcomes is a central research goal in a wide range of quantitative sciences. It is thereby common to compare interventions among each other and with a control across several, potentially highly correlated, outcome variables. In this context, researchers are interested in identifying effects at both, the global level (across all outcome variables) and the local level (for specific variables). At the same time, potential confounding must be accounted for. This leads to the need for powerful multiple contrast testing procedures (<span>mctp</span>s) capable of handling multivariate outcomes and covariates. Given this background, we propose an extension of <span>mctp</span>s within a semiparametric <span>mancova</span> framework that allows applicability beyond multivariate normality, homoscedasticity, or non-singular covariance structures. To realize this, we implement a generalized resampling-based method for the determination of critical values. We illustrate our approach by analysing multivariate psychological intervention data, evaluating joint physiological and psychological constructs such as heart rate variability.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"214 ","pages":"Article 105617"},"PeriodicalIF":1.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146170928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AUK-based test for mutual independence and an index of mutual dependence 基于auc的相互独立测试和相互依赖指数
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-07-01 Epub Date: 2025-12-22 DOI: 10.1016/j.jmva.2025.105589
Georgios Afendras , Marianthi Markatou , Nickos Papantonis
We offer a novel test of mutual independence based on consistent estimates of the area under the Kendall curve. We also present an index of dependence that allows one to measure the mutual dependence of a d-dimensional random vector with d>2. The index is based on a d-dimensional Kendall process. We discuss a standardized version of our index of dependence that is easy to interpret, and provide an algorithm for its computation. Based on the proposed index of dependence, we exemplify a novel method for searching for patterns in the dependence structure. We evaluate the performance of our procedures via simulation, and apply our methods to a real data set.
我们提供了一种基于肯德尔曲线下面积的一致估计的相互独立性的新测试。我们还提出了一个依赖指数,它允许人们用d>;2来测量d维随机向量的相互依赖性。该指数基于一个d维肯德尔过程。我们讨论了一个易于解释的依赖指数的标准化版本,并提供了一个计算它的算法。基于提出的依赖指数,给出了一种新的依赖结构模式搜索方法。我们通过模拟来评估我们的程序的性能,并将我们的方法应用于真实的数据集。
{"title":"AUK-based test for mutual independence and an index of mutual dependence","authors":"Georgios Afendras ,&nbsp;Marianthi Markatou ,&nbsp;Nickos Papantonis","doi":"10.1016/j.jmva.2025.105589","DOIUrl":"10.1016/j.jmva.2025.105589","url":null,"abstract":"<div><div>We offer a novel test of mutual independence based on consistent estimates of the area under the Kendall curve. We also present an index of dependence that allows one to measure the mutual dependence of a <span><math><mi>d</mi></math></span>-dimensional random vector with <span><math><mrow><mi>d</mi><mo>&gt;</mo><mn>2</mn></mrow></math></span>. The index is based on a <span><math><mi>d</mi></math></span>-dimensional Kendall process. We discuss a standardized version of our index of dependence that is easy to interpret, and provide an algorithm for its computation. Based on the proposed index of dependence, we exemplify a novel method for searching for patterns in the dependence structure. We evaluate the performance of our procedures via simulation, and apply our methods to a real data set.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"214 ","pages":"Article 105589"},"PeriodicalIF":1.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146170929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The exact region and an inequality between Chatterjee’s and Spearman’s rank correlations 查特吉和斯皮尔曼的等级相关性的确切区域和不平等
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-07-01 Epub Date: 2026-03-06 DOI: 10.1016/j.jmva.2026.105630
Jonathan Ansari , Marcus Rockel
The rank correlation ξ(X,Y), recently established by Sourav Chatterjee and already popular in the statistics literature, takes values in [0,1], where 0 characterises independence of X and Y, and 1 characterises perfect dependence of Y on X. Unlike concordance measures such as Spearman’s ρ, which capture the degree of positive or negative dependence, ξ quantifies the strength of functional dependence. In this paper, we study the attainable set of pairs (ξ(X,Y),ρ(X,Y)). The resulting ξ-ρ-region is a convex set whose boundary is characterised by a novel family of absolutely continuous, asymmetric copulas having a diagonal band structure. Moreover, we prove that ξ(X,Y)|ρ(X,Y)| whenever Y is stochastically increasing or decreasing in X, and we identify the maximal difference ρ(X,Y)ξ(X,Y) as exactly 0.4. Our proofs rely on a convex optimisation problem under various equality and inequality constraints, as well as on ordering properties for ξ and ρ. Our results contribute to a better understanding of Chatterjee’s rank correlation, which typically yields substantially smaller values than Spearman’s rho when quantifying positive dependencies. In particular, when interpreting the values of Chatterjee’s rank correlation on the scale of ρ, the quantity ξ appears to be more appropriate.
最近由Sourav Chatterjee建立并已在统计文献中流行的秩相关ξ(X,Y)取[0,1]中的值,其中0表示X和Y的独立性,1表示Y对X的完全依赖。与诸如Spearman的ρ(捕获正依赖或负依赖的程度)之类的一致性度量不同,ξ量化了函数依赖的强度。本文研究了可得的对集(ξ(X,Y),ρ(X,Y))。由此得到的ξ-ρ区域是一个凸集,其边界以一组具有对角带结构的绝对连续的非对称连轴为特征。此外,我们证明了当Y在X中随机增减时,ξ(X,Y)≤|ρ(X,Y)|,并确定了ρ(X,Y) - ξ(X,Y)的最大差值恰好为0.4。我们的证明依赖于各种等式和不等式约束下的凸优化问题,以及ξ和ρ的有序性质。我们的结果有助于更好地理解查特吉的秩相关,当量化正相关性时,它通常产生比斯皮尔曼的rho小得多的值。特别是,在解释ρ尺度上的查特吉等级相关值时,数量ξ似乎更合适。
{"title":"The exact region and an inequality between Chatterjee’s and Spearman’s rank correlations","authors":"Jonathan Ansari ,&nbsp;Marcus Rockel","doi":"10.1016/j.jmva.2026.105630","DOIUrl":"10.1016/j.jmva.2026.105630","url":null,"abstract":"<div><div>The rank correlation <span><math><mrow><mi>ξ</mi><mrow><mo>(</mo><mi>X</mi><mo>,</mo><mi>Y</mi><mo>)</mo></mrow><mo>,</mo></mrow></math></span> recently established by Sourav Chatterjee and already popular in the statistics literature, takes values in <span><math><mrow><mrow><mo>[</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>]</mo></mrow><mo>,</mo></mrow></math></span> where 0 characterises independence of <span><math><mi>X</mi></math></span> and <span><math><mrow><mi>Y</mi><mo>,</mo></mrow></math></span> and 1 characterises perfect dependence of <span><math><mi>Y</mi></math></span> on <span><math><mrow><mi>X</mi><mo>.</mo></mrow></math></span> Unlike concordance measures such as Spearman’s <span><math><mrow><mi>ρ</mi><mo>,</mo></mrow></math></span> which capture the degree of positive or negative dependence, <span><math><mi>ξ</mi></math></span> quantifies the strength of functional dependence. In this paper, we study the attainable set of pairs <span><math><mrow><mo>(</mo><mi>ξ</mi><mrow><mo>(</mo><mi>X</mi><mo>,</mo><mi>Y</mi><mo>)</mo></mrow><mo>,</mo><mi>ρ</mi><mrow><mo>(</mo><mi>X</mi><mo>,</mo><mi>Y</mi><mo>)</mo></mrow><mo>)</mo></mrow></math></span>. The resulting <span><math><mi>ξ</mi></math></span>-<span><math><mi>ρ</mi></math></span>-region is a convex set whose boundary is characterised by a novel family of absolutely continuous, asymmetric copulas having a diagonal band structure. Moreover, we prove that <span><math><mrow><mi>ξ</mi><mrow><mo>(</mo><mi>X</mi><mo>,</mo><mi>Y</mi><mo>)</mo></mrow><mo>≤</mo><mrow><mo>|</mo><mi>ρ</mi><mrow><mo>(</mo><mi>X</mi><mo>,</mo><mi>Y</mi><mo>)</mo></mrow><mo>|</mo></mrow></mrow></math></span> whenever <span><math><mi>Y</mi></math></span> is stochastically increasing or decreasing in <span><math><mrow><mi>X</mi><mo>,</mo></mrow></math></span> and we identify the maximal difference <span><math><mrow><mi>ρ</mi><mrow><mo>(</mo><mi>X</mi><mo>,</mo><mi>Y</mi><mo>)</mo></mrow><mo>−</mo><mi>ξ</mi><mrow><mo>(</mo><mi>X</mi><mo>,</mo><mi>Y</mi><mo>)</mo></mrow></mrow></math></span> as exactly <span><math><mrow><mn>0</mn><mo>.</mo><mn>4</mn><mo>.</mo></mrow></math></span> Our proofs rely on a convex optimisation problem under various equality and inequality constraints, as well as on ordering properties for <span><math><mi>ξ</mi></math></span> and <span><math><mrow><mi>ρ</mi><mo>.</mo></mrow></math></span> Our results contribute to a better understanding of Chatterjee’s rank correlation, which typically yields substantially smaller values than Spearman’s rho when quantifying positive dependencies. In particular, when interpreting the values of Chatterjee’s rank correlation on the scale of <span><math><mi>ρ</mi></math></span>, the quantity <span><math><msqrt><mrow><mi>ξ</mi></mrow></msqrt></math></span> appears to be more appropriate.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"214 ","pages":"Article 105630"},"PeriodicalIF":1.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147385530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differential distance correlation and its applications 微分距离相关及其应用
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-07-01 Epub Date: 2026-03-07 DOI: 10.1016/j.jmva.2026.105631
Yixiao Liu, Pengjian Shang
In this paper, we propose a novel Euclidean-distance-based coefficient, named differential distance correlation, to measure the strength of dependence between a random variable YR and a random vector XRp. The coefficient has a concise expression and is invariant to arbitrary orthogonal transformations of the random vector. Moreover, the coefficient is a strongly consistent estimator of a simple and interpretable dependent measure, which is 0 if and only if X and Y are independent and equal to 1 if and only if Y determines X almost surely. An alternative approach is also proposed to address the limitation that the coefficient is non-robust to outliers. Furthermore, the coefficient exhibits asymptotic normality with a simple variance under the independent hypothesis, facilitating fast and accurate estimation of p-value for testing independence. Three simulation experiments show that the proposed coefficient is more computationally efficient for independence testing and more effective in detecting oscillatory relationships than several competing methods. We also apply our method to analyze a real data example.
在本文中,我们提出了一种新的基于欧几里得距离的系数,称为微分距离相关,用于测量随机变量Y∈R与随机向量X∈Rp之间的依赖程度。该系数具有简洁的表达式,并且对随机向量的任意正交变换是不变的。此外,该系数是一个简单且可解释的相关测度的强一致估计量,当且仅当X和Y独立时该系数为0,当且仅当Y几乎确定地决定X时该系数等于1。本文还提出了一种替代方法来解决该系数对异常值不具有鲁棒性的限制。此外,在独立假设下,系数具有简单方差的渐近正态性,便于快速准确地估计p值以检验独立性。三个仿真实验表明,与几种竞争方法相比,该系数在独立性测试和检测振荡关系方面具有更高的计算效率。并应用该方法对一个实际数据实例进行了分析。
{"title":"Differential distance correlation and its applications","authors":"Yixiao Liu,&nbsp;Pengjian Shang","doi":"10.1016/j.jmva.2026.105631","DOIUrl":"10.1016/j.jmva.2026.105631","url":null,"abstract":"<div><div>In this paper, we propose a novel Euclidean-distance-based coefficient, named differential distance correlation, to measure the strength of dependence between a random variable <span><math><mrow><mi>Y</mi><mo>∈</mo><mi>R</mi></mrow></math></span> and a random vector <span><math><mrow><mi>X</mi><mo>∈</mo><msup><mrow><mi>R</mi></mrow><mrow><mi>p</mi></mrow></msup></mrow></math></span>. The coefficient has a concise expression and is invariant to arbitrary orthogonal transformations of the random vector. Moreover, the coefficient is a strongly consistent estimator of a simple and interpretable dependent measure, which is 0 if and only if <span><math><mi>X</mi></math></span> and <span><math><mi>Y</mi></math></span> are independent and equal to 1 if and only if <span><math><mi>Y</mi></math></span> determines <span><math><mi>X</mi></math></span> almost surely. An alternative approach is also proposed to address the limitation that the coefficient is non-robust to outliers. Furthermore, the coefficient exhibits asymptotic normality with a simple variance under the independent hypothesis, facilitating fast and accurate estimation of <span><math><mi>p</mi></math></span>-value for testing independence. Three simulation experiments show that the proposed coefficient is more computationally efficient for independence testing and more effective in detecting oscillatory relationships than several competing methods. We also apply our method to analyze a real data example.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"214 ","pages":"Article 105631"},"PeriodicalIF":1.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147385536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kernel quantile regression for semiparametric partially linear time-varying-coefficient model based on a history process of longitudinal data 基于纵向数据历史过程的半参数部分线性时变系数模型的核分位数回归
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-07-01 Epub Date: 2026-03-07 DOI: 10.1016/j.jmva.2026.105629
Wenshan Wang , Xiufang Liu , Dianliang Deng
This study delves into kernel quantile regression estimation for a semiparametric partially linear time-varying-coefficient model, which incorporates a history process with time-dependent covariates and a right-censored time-to-event variable. We propose a three-stage approach to construct the estimators of the parametric portion and nonparametric time-varying-coefficient function for this model, in view of inverse probability of censoring weighting (IPCW) technique. Additionally, we offer a procedure for variable selection among the time-dependent covariates in the parametric segment through the use of an adaptive LASSO penalty. The paper establishes the asymptotic normality of the proposed estimators and demonstrates that the penalized estimators possess the oracle property. A numerical simulation is implemented to evaluate the performance of the proposed estimators. Eventually, we apply the developed method to analyze medical cost data from a multicenter automatic defibrillator implantation trial (MADIT) to illustrate its practical utility.
本文研究了半参数部分线性时变系数模型的核分位数回归估计,该模型包含具有时变协变量的历史过程和右截尾时间到事件变量。我们提出了一种三阶段方法来构造该模型的参数部分和非参数时变系数函数的估计量,考虑到逆概率加权(IPCW)技术。此外,我们通过使用自适应LASSO惩罚,在参数段中的时间相关协变量中提供了一个变量选择过程。本文建立了所提估计量的渐近正态性,并证明了惩罚估计量具有神谕性。通过数值仿真对所提估计器的性能进行了评价。最后,我们将此方法应用于多中心自动除颤器植入试验(MADIT)的医疗费用数据分析,以说明其实际用途。
{"title":"Kernel quantile regression for semiparametric partially linear time-varying-coefficient model based on a history process of longitudinal data","authors":"Wenshan Wang ,&nbsp;Xiufang Liu ,&nbsp;Dianliang Deng","doi":"10.1016/j.jmva.2026.105629","DOIUrl":"10.1016/j.jmva.2026.105629","url":null,"abstract":"<div><div>This study delves into kernel quantile regression estimation for a semiparametric partially linear time-varying-coefficient model, which incorporates a history process with time-dependent covariates and a right-censored time-to-event variable. We propose a three-stage approach to construct the estimators of the parametric portion and nonparametric time-varying-coefficient function for this model, in view of inverse probability of censoring weighting (IPCW) technique. Additionally, we offer a procedure for variable selection among the time-dependent covariates in the parametric segment through the use of an adaptive LASSO penalty. The paper establishes the asymptotic normality of the proposed estimators and demonstrates that the penalized estimators possess the oracle property. A numerical simulation is implemented to evaluate the performance of the proposed estimators. Eventually, we apply the developed method to analyze medical cost data from a multicenter automatic defibrillator implantation trial (MADIT) to illustrate its practical utility.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"214 ","pages":"Article 105629"},"PeriodicalIF":1.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147385537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On consistent estimation of dimension values 关于维度值的一致估计
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-07-01 Epub Date: 2026-01-23 DOI: 10.1016/j.jmva.2025.105591
Alejandro Cholaquidis , Antonio Cuevas , Beatriz Pateiro-López
The problem of estimating, from a random sample of points, the dimension of a compact subset S of the Euclidean space is considered. The emphasis is put on consistency results in the statistical sense. That is, statements of convergence to the true dimension value when the sample size grows to infinity. Among the many available definitions of dimension, we have focused (on the grounds of its statistical tractability) on three notions: the Minkowski dimension, the correlation dimension and the, perhaps less popular, concept of pointwise dimension. We prove the statistical consistency of some natural estimators of these quantities. Our proofs partially rely on the use of an instrumental estimator formulated in terms of the empirical volume function Vn(r), defined as the Lebesgue measure of the set of points whose distance to the sample is at most r. In particular, we explore the case in which the true volume function V(r) of the target set S is a polynomial on some interval starting at zero. An empirical study is also included. Our study aims to provide some theoretical support, and some practical insights, for the problem of deciding whether or not the set S has a dimension smaller than that of the ambient space. This is a major statistical motivation of the dimension studies, in connection with the so-called “Manifold Hypothesis”.
考虑了从随机点样本中估计欧几里德空间的紧子集S的维数的问题。重点放在统计意义上的一致性结果上。即当样本量增长到无穷大时收敛于真实维度值的表述。在许多可用的维度定义中,我们(基于其统计可追溯性)关注了三个概念:闵可夫斯基维度、相关维度和可能不太流行的点向维度概念。我们证明了这些量的一些自然估计的统计相合性。我们的证明部分依赖于用经验体积函数Vn(r)表示的工具估计量的使用,Vn(r)定义为与样本距离最多为r的点集的勒贝格测度。特别是,我们探讨了目标集S的真实体积函数V(r)在开始于0的某个区间上是多项式的情况。并进行了实证研究。我们的研究旨在为确定集合S的维数是否小于环境空间的维数的问题提供一些理论支持和一些实践见解。这是维度研究的主要统计动机,与所谓的“流形假说”有关。
{"title":"On consistent estimation of dimension values","authors":"Alejandro Cholaquidis ,&nbsp;Antonio Cuevas ,&nbsp;Beatriz Pateiro-López","doi":"10.1016/j.jmva.2025.105591","DOIUrl":"10.1016/j.jmva.2025.105591","url":null,"abstract":"<div><div>The problem of estimating, from a random sample of points, the dimension of a compact subset <span><math><mi>S</mi></math></span> of the Euclidean space is considered. The emphasis is put on consistency results in the statistical sense. That is, statements of convergence to the true dimension value when the sample size grows to infinity. Among the many available definitions of dimension, we have focused (on the grounds of its statistical tractability) on three notions: the Minkowski dimension, the correlation dimension and the, perhaps less popular, concept of pointwise dimension. We prove the statistical consistency of some natural estimators of these quantities. Our proofs partially rely on the use of an instrumental estimator formulated in terms of the empirical volume function <span><math><mrow><msub><mrow><mi>V</mi></mrow><mrow><mi>n</mi></mrow></msub><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow></mrow></math></span>, defined as the Lebesgue measure of the set of points whose distance to the sample is at most <span><math><mi>r</mi></math></span>. In particular, we explore the case in which the true volume function <span><math><mrow><mi>V</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow></mrow></math></span> of the target set <span><math><mi>S</mi></math></span> is a polynomial on some interval starting at zero. An empirical study is also included. Our study aims to provide some theoretical support, and some practical insights, for the problem of deciding whether or not the set <span><math><mi>S</mi></math></span> has a dimension smaller than that of the ambient space. This is a major statistical motivation of the dimension studies, in connection with the so-called “Manifold Hypothesis”.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"214 ","pages":"Article 105591"},"PeriodicalIF":1.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive ℓq regularized estimation for high-dimensional sparse covariance matrix 高维稀疏协方差矩阵的自适应正则化估计
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-07-01 Epub Date: 2026-02-14 DOI: 10.1016/j.jmva.2026.105624
Xin Wang , Hongxin Zhao , Zhenwei Zhou , Lingchen Kong , Liqun Wang
The estimation of high-dimensional covariance matrix plays an important role in many application fields such as economics, biology, social and health sciences. A mainstream structural assumption for enhancing estimator accuracy is that the covariance matrix is sparse or approximately sparse. This paper proposes an adaptive q(0<q<1) regularized estimator with minimum eigenvalue constraint for high-dimensional sparse covariance matrix. This method eliminates the need for the conventional two-stage framework of sequential correlation and covariance matrix estimation. Under appropriate regularity conditions, we analyze its asymptotic and finite sample properties. The proposed iterative reweighted minimization method and its inexact variant can be employed to find a desired estimate. Simulation studies confirm that the proposed estimation performs better than some other state-of-the-art methods.
高维协方差矩阵的估计在经济学、生物学、社会科学和健康科学等许多应用领域中发挥着重要作用。提高估计精度的主流结构假设是协方差矩阵是稀疏或近似稀疏的。针对高维稀疏协方差矩阵,提出了一种具有最小特征值约束的自适应正则化估计器。该方法消除了传统的顺序相关和协方差矩阵估计两阶段框架的需要。在适当的正则性条件下,我们分析了它的渐近和有限样本性质。所提出的迭代重加权最小化方法及其不精确变体可用于找到理想的估计。仿真研究证实,所提出的估计比其他一些最先进的方法表现得更好。
{"title":"Adaptive ℓq regularized estimation for high-dimensional sparse covariance matrix","authors":"Xin Wang ,&nbsp;Hongxin Zhao ,&nbsp;Zhenwei Zhou ,&nbsp;Lingchen Kong ,&nbsp;Liqun Wang","doi":"10.1016/j.jmva.2026.105624","DOIUrl":"10.1016/j.jmva.2026.105624","url":null,"abstract":"<div><div>The estimation of high-dimensional covariance matrix plays an important role in many application fields such as economics, biology, social and health sciences. A mainstream structural assumption for enhancing estimator accuracy is that the covariance matrix is sparse or approximately sparse. This paper proposes an adaptive <span><math><mrow><msub><mrow><mi>ℓ</mi></mrow><mrow><mi>q</mi></mrow></msub><mrow><mo>(</mo><mn>0</mn><mo>&lt;</mo><mi>q</mi><mo>&lt;</mo><mn>1</mn><mo>)</mo></mrow></mrow></math></span> regularized estimator with minimum eigenvalue constraint for high-dimensional sparse covariance matrix. This method eliminates the need for the conventional two-stage framework of sequential correlation and covariance matrix estimation. Under appropriate regularity conditions, we analyze its asymptotic and finite sample properties. The proposed iterative reweighted minimization method and its inexact variant can be employed to find a desired estimate. Simulation studies confirm that the proposed estimation performs better than some other state-of-the-art methods.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"214 ","pages":"Article 105624"},"PeriodicalIF":1.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147385533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A robust mixed functional classifier with adaptive large margin loss 一种自适应大边际损失鲁棒混合函数分类器
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-05-01 Epub Date: 2025-12-03 DOI: 10.1016/j.jmva.2025.105563
Hanteng Ma , Peijun Sang , Xingdong Feng , Xin Liu
Functional classification has been increasingly helpful in exploring and predicting a response variable with multiple categories. In fact, both functional and scalar covariates may be useful and should be included in the model simultaneously, and thus developing a robust multi-categorical functional classifier with statistical guarantees is desirable. However, both of these two issues are rarely touched in previous studies. Motivated by these, in this paper we propose a novel large margin linear mixed functional classifier for the response with multiple categories, which includes both functional and scalar covariates as predictors, especially when functional data are sparsely longitudinal. Not only does the proposed method address the functional classification using a combination of both functional and scalar covariates, but also provides a robust multi-categorical mixed functional classifier using a large margin loss adaptive to observed samples. Furthermore, we establish statistical theories of a mixed functional classifier, which have been less considered in existing literature. An efficient algorithm is also proposed for its practical implementation. Numerical investigations have supported the superb performance of the proposed method on both simulated and real datasets.
功能分类在探索和预测具有多类别的反应变量方面越来越有帮助。事实上,函数协变量和标量协变量可能都是有用的,应该同时包含在模型中,因此开发具有统计保证的鲁棒多分类函数分类器是可取的。然而,这两个问题在以往的研究中都很少涉及。受此启发,本文提出了一种新的大余量线性混合泛函分类器,用于具有多类别的响应,其中包括函数和标量协变量作为预测因子,特别是当函数数据是稀疏纵向的时。该方法不仅解决了使用泛函协变量和标量协变量组合的功能分类问题,而且还提供了一个鲁棒的多类别混合功能分类器,该分类器使用自适应观察样本的大裕度损失。此外,我们建立了混合功能分类器的统计理论,这在现有文献中很少被考虑。为实现该算法,提出了一种有效的算法。数值研究支持了该方法在模拟和实际数据集上的优异性能。
{"title":"A robust mixed functional classifier with adaptive large margin loss","authors":"Hanteng Ma ,&nbsp;Peijun Sang ,&nbsp;Xingdong Feng ,&nbsp;Xin Liu","doi":"10.1016/j.jmva.2025.105563","DOIUrl":"10.1016/j.jmva.2025.105563","url":null,"abstract":"<div><div>Functional classification has been increasingly helpful in exploring and predicting a response variable with multiple categories. In fact, both functional and scalar covariates may be useful and should be included in the model simultaneously, and thus developing a robust multi-categorical functional classifier with statistical guarantees is desirable. However, both of these two issues are rarely touched in previous studies. Motivated by these, in this paper we propose a novel large margin linear mixed functional classifier for the response with multiple categories, which includes both functional and scalar covariates as predictors, especially when functional data are sparsely longitudinal. Not only does the proposed method address the functional classification using a combination of both functional and scalar covariates, but also provides a robust multi-categorical mixed functional classifier using a large margin loss adaptive to observed samples. Furthermore, we establish statistical theories of a mixed functional classifier, which have been less considered in existing literature. An efficient algorithm is also proposed for its practical implementation. Numerical investigations have supported the superb performance of the proposed method on both simulated and real datasets.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"213 ","pages":"Article 105563"},"PeriodicalIF":1.4,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145735425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Multivariate Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1