首页 > 最新文献

Journal of Multivariate Analysis最新文献

英文 中文
Automatic sparse estimation of the high-dimensional cross-covariance matrix 高维交叉协方差矩阵的自动稀疏估计
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-05-01 Epub Date: 2025-12-19 DOI: 10.1016/j.jmva.2025.105590
Tetsuya Umino , Kazuyoshi Yata , Makoto Aoshima
Scenarios involving high-dimensional, low-sample-size (HDLSS) data are often encountered in modern scientific fields involving genetic microarrays, medical imaging, and finance, where the number of variables can greatly exceed the number of observations. In such settings, a reliable estimation of cross-covariance structures is essential for understanding relationships between variable sets. However, classical estimators often exhibit severe noise accumulation. To address this issue, in this study, we propose a novel thresholding estimator of the cross-covariance matrix for HDLSS settings. We consider the asymptotic properties of the sample cross-covariance matrix and show that the estimator contains large amounts of noise in the high-dimensional setting, which renders it inconsistent. To solve this problem occurring in high-dimensional settings, we develop a new thresholding estimator based on the automatic sparse estimation methodology and show that the estimator is consistent under mild assumptions. We analyze and evaluate the performance of the proposed estimator based on numerical simulations and actual data analysis. The simulations demonstrate that the method attains consistency without requiring the stringent high-dimensional conditions assumed by existing approaches, and the real-data analysis illustrates its applicability to high-dimensional regression problems, wherein improved parameter estimation enhances prediction accuracy. In conclusion, our findings serve as a theoretically sound tool for cross-covariance estimation in HDLSS contexts, with potential implications for a wide range of high-dimensional data analyses.
涉及高维、低样本大小(HDLSS)数据的场景经常在涉及基因微阵列、医学成像和金融的现代科学领域中遇到,其中变量的数量可能大大超过观察的数量。在这种情况下,对交叉协方差结构的可靠估计对于理解变量集之间的关系至关重要。然而,经典估计器往往表现出严重的噪声积累。为了解决这个问题,在本研究中,我们提出了一种新的HDLSS设置的交叉协方差矩阵阈值估计器。我们考虑样本交叉协方差矩阵的渐近性质,并表明估计量在高维设置中包含大量的噪声,这使得它不一致。为了解决高维环境中出现的这一问题,我们开发了一种新的基于自动稀疏估计方法的阈值估计器,并证明了该估计器在温和假设下是一致的。基于数值模拟和实际数据分析,对所提估计器的性能进行了分析和评价。仿真结果表明,该方法不需要现有方法所假定的严格的高维条件,即可达到一致性;实际数据分析表明,该方法适用于高维回归问题,其中改进的参数估计提高了预测精度。总之,我们的研究结果可以作为HDLSS背景下交叉协方差估计的理论可靠工具,对广泛的高维数据分析具有潜在的影响。
{"title":"Automatic sparse estimation of the high-dimensional cross-covariance matrix","authors":"Tetsuya Umino ,&nbsp;Kazuyoshi Yata ,&nbsp;Makoto Aoshima","doi":"10.1016/j.jmva.2025.105590","DOIUrl":"10.1016/j.jmva.2025.105590","url":null,"abstract":"<div><div>Scenarios involving high-dimensional, low-sample-size (HDLSS) data are often encountered in modern scientific fields involving genetic microarrays, medical imaging, and finance, where the number of variables can greatly exceed the number of observations. In such settings, a reliable estimation of cross-covariance structures is essential for understanding relationships between variable sets. However, classical estimators often exhibit severe noise accumulation. To address this issue, in this study, we propose a novel thresholding estimator of the cross-covariance matrix for HDLSS settings. We consider the asymptotic properties of the sample cross-covariance matrix and show that the estimator contains large amounts of noise in the high-dimensional setting, which renders it inconsistent. To solve this problem occurring in high-dimensional settings, we develop a new thresholding estimator based on the automatic sparse estimation methodology and show that the estimator is consistent under mild assumptions. We analyze and evaluate the performance of the proposed estimator based on numerical simulations and actual data analysis. The simulations demonstrate that the method attains consistency without requiring the stringent high-dimensional conditions assumed by existing approaches, and the real-data analysis illustrates its applicability to high-dimensional regression problems, wherein improved parameter estimation enhances prediction accuracy. In conclusion, our findings serve as a theoretically sound tool for cross-covariance estimation in HDLSS contexts, with potential implications for a wide range of high-dimensional data analyses.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"213 ","pages":"Article 105590"},"PeriodicalIF":1.4,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145837541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bounds and identification on direct and indirect effects under partially observed mediator-endpoint confounders 在部分观察到的中介-终点混杂因素下直接和间接影响的界限和鉴定
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-05-01 Epub Date: 2025-12-03 DOI: 10.1016/j.jmva.2025.105565
Yu Han , Peng Luo , Wei Zhang , Xiang Gu
The direct effect of a treatment variable and the indirect effect through a mediator variable on an endpoint variable are important for understanding a causal mechanism. The Controlled direct effect has a prescriptive interpretation, while the natural direct and indirect effects have a descriptive interpretation. In practice, these three effects are usually very difficult to identify. To tackle this problem, some researchers investigated the upper and lower bounds of these three effects when some reasonable identification conditions hold. For example, Luo and Geng (2016) gave the upper and lower bounds of these direct and indirect effects when there is an unobserved mediator-endpoint confounder vector and the endpoint variable is continuous. In this paper, we tighten the bounds on controlled direct effect in Luo and Geng (2016) when part of the confounders can be observed. Additionally, we give a sufficient condition to identify the direct and indirect effects when the variables satisfy one linear relationship.
治疗变量的直接影响和通过中介变量对终点变量的间接影响对于理解因果机制很重要。受控直接效应具有规定性解释,而自然直接效应和间接效应具有描述性解释。在实践中,这三种效应通常很难识别。为了解决这个问题,一些研究者研究了在一些合理的识别条件下,这三种效应的上界和下界。例如,Luo和Geng(2016)给出了存在未观察到的中介-端点混杂向量且端点变量连续时这些直接和间接影响的上界和下界。在本文中,当可以观察到部分混杂因素时,我们在Luo和Geng(2016)中收紧了受控直接效应的界限。此外,我们还给出了当变量满足一个线性关系时辨识直接效应和间接效应的充分条件。
{"title":"Bounds and identification on direct and indirect effects under partially observed mediator-endpoint confounders","authors":"Yu Han ,&nbsp;Peng Luo ,&nbsp;Wei Zhang ,&nbsp;Xiang Gu","doi":"10.1016/j.jmva.2025.105565","DOIUrl":"10.1016/j.jmva.2025.105565","url":null,"abstract":"<div><div>The direct effect of a treatment variable and the indirect effect through a mediator variable on an endpoint variable are important for understanding a causal mechanism. The Controlled direct effect has a prescriptive interpretation, while the natural direct and indirect effects have a descriptive interpretation. In practice, these three effects are usually very difficult to identify. To tackle this problem, some researchers investigated the upper and lower bounds of these three effects when some reasonable identification conditions hold. For example, Luo and Geng (2016) gave the upper and lower bounds of these direct and indirect effects when there is an unobserved mediator-endpoint confounder vector and the endpoint variable is continuous. In this paper, we tighten the bounds on controlled direct effect in Luo and Geng (2016) when part of the confounders can be observed. Additionally, we give a sufficient condition to identify the direct and indirect effects when the variables satisfy one linear relationship.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"213 ","pages":"Article 105565"},"PeriodicalIF":1.4,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145683761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A general framework to extend sufficient dimension reductions to the cases of the mixture multivariate elliptical distributions 给出了将足够降维扩展到混合多元椭圆分布的一般框架
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-05-01 Epub Date: 2025-11-28 DOI: 10.1016/j.jmva.2025.105551
Wenjuan Li , Hongming Pei , Ali Jiang , Fei Chen
In the sufficient dimension reduction (SDR), many methods depend on some assumptions on the distribution of predictor vector, such as the linear design condition (L.D.C.), the assumption of constant conditional variance, and so on. The mixture distributions emerge frequently in practice, but they may not satisfy the above assumptions. In this article, a general framework is proposed to extend various SDR methods to the cases where the predictor vector follows the mixture elliptical distributions, together with the asymptotic property for the consistency of the kernel matrix estimators. For illustration, the extensions of several classical SDR approaches under the proposed framework are detailed. Moreover, a method to estimate the structural dimension is given, together with a procedure to check an assumption called homogeneity. The proposed methodology is illustrated by simulated and real examples.
在充分降维(SDR)中,许多方法依赖于对预测向量分布的一些假设,如线性设计条件(L.D.C.)、条件方差恒定假设等。混合分布在实践中经常出现,但它们可能不满足上述假设。本文提出了一个一般框架,将各种SDR方法扩展到预测向量服从混合椭圆分布的情况,并给出了核矩阵估计量相合性的渐近性质。为了说明这一点,详细介绍了几种经典SDR方法在该框架下的扩展。此外,还给出了一种估计结构尺寸的方法,以及一种检验均匀性假设的方法。通过仿真和实际算例说明了所提出的方法。
{"title":"A general framework to extend sufficient dimension reductions to the cases of the mixture multivariate elliptical distributions","authors":"Wenjuan Li ,&nbsp;Hongming Pei ,&nbsp;Ali Jiang ,&nbsp;Fei Chen","doi":"10.1016/j.jmva.2025.105551","DOIUrl":"10.1016/j.jmva.2025.105551","url":null,"abstract":"<div><div>In the sufficient dimension reduction (SDR), many methods depend on some assumptions on the distribution of predictor vector, such as the linear design condition (L.D.C.), the assumption of constant conditional variance, and so on. The mixture distributions emerge frequently in practice, but they may not satisfy the above assumptions. In this article, a general framework is proposed to extend various SDR methods to the cases where the predictor vector follows the mixture elliptical distributions, together with the asymptotic property for the consistency of the kernel matrix estimators. For illustration, the extensions of several classical SDR approaches under the proposed framework are detailed. Moreover, a method to estimate the structural dimension is given, together with a procedure to check an assumption called homogeneity. The proposed methodology is illustrated by simulated and real examples.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"213 ","pages":"Article 105551"},"PeriodicalIF":1.4,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145683764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simultaneous variable selection and estimation of multivariate panel count data 多变量面板计数数据的同时变量选择与估计
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-05-01 Epub Date: 2025-12-02 DOI: 10.1016/j.jmva.2025.105559
Lei Ge , Rong Liu , Tao Hu , Jianguo Sun
Panel count data are a general type of data arising from the studies on recurrent events and occur when the observed information on each study subject consists of only the numbers of the occurrences of the recurrent events between successive examinations. It is easy to see that such data can occur in many fields, including economic studies, medical studies and social sciences. This paper considers regression analysis of multivariate panel count data with the focus on variable selection and estimation of significant covariate effects. For the problem, a minimum information criterion-based method is proposed and an expectation–maximization algorithm is developed for the determination of the proposed estimator. Furthermore, the resulting estimator is shown to have the desirable oracle property and a simulation study is performed and confirms the good finite-sample properties of the proposed method. Finally the method is applied to a set of real data arising from a skin cancer study.
面板计数数据是由复发事件研究产生的一种一般类型的数据,当观察到的每个研究对象的信息仅包括连续检查之间复发事件的发生次数时,就会出现这种数据。很容易看出,这样的数据可以出现在许多领域,包括经济研究、医学研究和社会科学。本文考虑多元面板计数数据的回归分析,重点是变量选择和显著协变量效应的估计。针对这一问题,提出了一种基于最小信息准则的方法,并提出了一种期望最大化算法来确定所提出的估计量。此外,所得到的估计器具有理想的oracle特性,并进行了仿真研究,证实了所提出的方法具有良好的有限样本特性。最后,将该方法应用于一组来自皮肤癌研究的真实数据。
{"title":"Simultaneous variable selection and estimation of multivariate panel count data","authors":"Lei Ge ,&nbsp;Rong Liu ,&nbsp;Tao Hu ,&nbsp;Jianguo Sun","doi":"10.1016/j.jmva.2025.105559","DOIUrl":"10.1016/j.jmva.2025.105559","url":null,"abstract":"<div><div>Panel count data are a general type of data arising from the studies on recurrent events and occur when the observed information on each study subject consists of only the numbers of the occurrences of the recurrent events between successive examinations. It is easy to see that such data can occur in many fields, including economic studies, medical studies and social sciences. This paper considers regression analysis of multivariate panel count data with the focus on variable selection and estimation of significant covariate effects. For the problem, a minimum information criterion-based method is proposed and an expectation–maximization algorithm is developed for the determination of the proposed estimator. Furthermore, the resulting estimator is shown to have the desirable oracle property and a simulation study is performed and confirms the good finite-sample properties of the proposed method. Finally the method is applied to a set of real data arising from a skin cancer study.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"213 ","pages":"Article 105559"},"PeriodicalIF":1.4,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145683760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The mean tests with high dimensional data 高维数据的均值检验
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-03-01 Epub Date: 2025-11-28 DOI: 10.1016/j.jmva.2025.105564
Wenzhi Yang , Chi Yao , Yiming Liu , Guangming Pan , Wang Zhou
In this paper, we consider the mean tests with high dimensional data and give two new tests which consist of three steps. Firstly, we reduce the high dimensional vectors into many low dimensional vectors and construct the Hotelling’s T2 tests; Secondly, by the distribution or asymptotic distribution of these Hotelling’s tests under the null hypothesis, we transform these tests into uniform distribution or asymptotic uniform distribution random variables; Thirdly, the central limit theorems of the normalized sum of these transformations are obtained under the Gaussian case and non-Gaussian cases. Moreover, the asymptotic power of new test is also presented for non-Gaussian case. Compared to the existing tests, our tests not only have the good empirical sizes, but also have the high empirical powers.
本文考虑了高维数据的均值检验,给出了两个由三步组成的新检验。首先,将高维向量化简为多个低维向量,构造Hotelling’s T2检验;其次,利用这些Hotelling检验在零假设下的分布或渐近分布,将这些检验转化为均匀分布或渐近均匀分布的随机变量;第三,在高斯和非高斯情况下,得到了这些变换的归一化和的中心极限定理。此外,在非高斯情况下,给出了新检验的渐近幂。与已有的检验相比,我们的检验不仅具有较好的实证规模,而且具有较高的实证幂。
{"title":"The mean tests with high dimensional data","authors":"Wenzhi Yang ,&nbsp;Chi Yao ,&nbsp;Yiming Liu ,&nbsp;Guangming Pan ,&nbsp;Wang Zhou","doi":"10.1016/j.jmva.2025.105564","DOIUrl":"10.1016/j.jmva.2025.105564","url":null,"abstract":"<div><div>In this paper, we consider the mean tests with high dimensional data and give two new tests which consist of three steps. Firstly, we reduce the high dimensional vectors into many low dimensional vectors and construct the Hotelling’s <span><math><msup><mrow><mi>T</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> tests; Secondly, by the distribution or asymptotic distribution of these Hotelling’s tests under the null hypothesis, we transform these tests into uniform distribution or asymptotic uniform distribution random variables; Thirdly, the central limit theorems of the normalized sum of these transformations are obtained under the Gaussian case and non-Gaussian cases. Moreover, the asymptotic power of new test is also presented for non-Gaussian case. Compared to the existing tests, our tests not only have the good empirical sizes, but also have the high empirical powers.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105564"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Iterative sequential screening strategies for sparse recovery with computational advantages 具有计算优势的稀疏恢复迭代顺序筛选策略
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-03-01 Epub Date: 2025-11-29 DOI: 10.1016/j.jmva.2025.105570
Weixiong Liang , Yuehan Yang
A prominent problem in multi-response models is the presence of complex group structures of the high-dimensional data such as the overlapping group structures. In such models, both responses and predictors are grouped, and each response group is allowed to relate to multiple predictor groups. Ignoring such structures often yields insufficient statistical inference and misleading statistical conclusions. Motivated by practical needs and the sequential canonical correlation search (SCCS) algorithm proposed by Luo and Chen (2020), this paper proposes two computationally attractive feature selection algorithms, reallocating-SCCS (RSCCS) and prescreening-SCCS (PSCCS), for the high-dimensional multi-response models with complex group structures.
The proposed methods, RSCCS and PSCCS, consist of three steps. In the first step, to fully incorporate the information of group structures in the feature selection algorithm, both RSCCS and PSCCS select a non-zero coefficient block according to the canonical correlation between the residual response groups and feature groups. In the second step, RSCCS selects the non-zero coefficient row, while PSCCS conducts screening within the non-zero coefficient block using penalized regularizations. In the third step, RSCCS and PSCCS select features by EBIC based on different situations and different iterations. We demonstrate the advantages of these two methods compared with several existing approaches. The statistical guarantees of RSCCS and PSCCS are established. We provide numerical simulation results and analyze a real data example to compare their performance with other methods.
多响应模型的一个突出问题是高维数据中存在复杂的群结构,如重叠群结构。在这样的模型中,响应和预测因子都被分组,并且每个响应组都可以关联到多个预测因子组。忽视这种结构往往会产生不充分的统计推断和误导性的统计结论。本文基于实际需求,结合Luo和Chen(2020)提出的序列典型相关搜索(SCCS)算法,针对具有复杂群体结构的高维多响应模型,提出了两种计算上具有吸引力的特征选择算法——重新分配-SCCS (RSCCS)和预筛选-SCCS (PSCCS)。所提出的方法,RSCCS和PSCCS,包括三个步骤。第一步,为了在特征选择算法中充分考虑组结构信息,RSCCS和PSCCS都根据残差响应组与特征组之间的典型相关性选择非零系数块。在第二步中,RSCCS选择非零系数行,而PSCCS使用惩罚正则化在非零系数块内进行筛选。第三步,RSCCS和PSCCS根据不同的情况和不同的迭代,通过EBIC选择特征。与现有的几种方法相比,我们证明了这两种方法的优点。建立了RSCCS和PSCCS的统计保证。给出了数值模拟结果,并对一个实际数据实例进行了分析,与其他方法的性能进行了比较。
{"title":"Iterative sequential screening strategies for sparse recovery with computational advantages","authors":"Weixiong Liang ,&nbsp;Yuehan Yang","doi":"10.1016/j.jmva.2025.105570","DOIUrl":"10.1016/j.jmva.2025.105570","url":null,"abstract":"<div><div>A prominent problem in multi-response models is the presence of complex group structures of the high-dimensional data such as the overlapping group structures. In such models, both responses and predictors are grouped, and each response group is allowed to relate to multiple predictor groups. Ignoring such structures often yields insufficient statistical inference and misleading statistical conclusions. Motivated by practical needs and the sequential canonical correlation search (SCCS) algorithm proposed by Luo and Chen (2020), this paper proposes two computationally attractive feature selection algorithms, reallocating-SCCS (RSCCS) and prescreening-SCCS (PSCCS), for the high-dimensional multi-response models with complex group structures.</div><div>The proposed methods, RSCCS and PSCCS, consist of three steps. In the first step, to fully incorporate the information of group structures in the feature selection algorithm, both RSCCS and PSCCS select a non-zero coefficient block according to the canonical correlation between the residual response groups and feature groups. In the second step, RSCCS selects the non-zero coefficient row, while PSCCS conducts screening within the non-zero coefficient block using penalized regularizations. In the third step, RSCCS and PSCCS select features by EBIC based on different situations and different iterations. We demonstrate the advantages of these two methods compared with several existing approaches. The statistical guarantees of RSCCS and PSCCS are established. We provide numerical simulation results and analyze a real data example to compare their performance with other methods.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105570"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145616474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the use of the Gram matrix for multivariate functional principal components analysis 格拉姆矩阵在多元泛函主成分分析中的应用
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-03-01 Epub Date: 2025-11-13 DOI: 10.1016/j.jmva.2025.105525
Steven Golovkine , Edward Gunning , Andrew J. Simpkin , Norma Bargary
Dimension reduction is crucial in functional data analysis (FDA). The key tool to reduce the dimension of the data is functional principal component analysis. Existing approaches for functional principal component analysis usually involve the diagonalization of the covariance operator. With the increasing size and complexity of functional datasets, estimating the covariance operator has become more challenging. Therefore, there is a growing need for efficient methodologies to estimate the eigencomponents. Using the duality of the space of observations and the space of functional features, we propose to use the inner-product between the curves to estimate the eigenelements of multivariate and multidimensional functional datasets. The relationship between the eigenelements of the covariance operator and those of the inner-product matrix is established. We explore the application of these methodologies in several FDA settings and provide general guidance on their usability.
降维在功能数据分析(FDA)中是至关重要的。功能主成分分析是降低数据维数的关键工具。现有的功能主成分分析方法通常涉及协方差算子的对角化。随着功能数据集的规模和复杂性的增加,协方差算子的估计变得越来越具有挑战性。因此,越来越需要有效的方法来估计特征分量。利用观测值空间和功能特征空间的对偶性,提出利用曲线间的内积来估计多元和多维功能数据集的特征元素。建立了协方差算子特征元与内积矩阵特征元之间的关系。我们探索这些方法在几个FDA设置中的应用,并提供关于其可用性的一般指导。
{"title":"On the use of the Gram matrix for multivariate functional principal components analysis","authors":"Steven Golovkine ,&nbsp;Edward Gunning ,&nbsp;Andrew J. Simpkin ,&nbsp;Norma Bargary","doi":"10.1016/j.jmva.2025.105525","DOIUrl":"10.1016/j.jmva.2025.105525","url":null,"abstract":"<div><div>Dimension reduction is crucial in functional data analysis (FDA). The key tool to reduce the dimension of the data is functional principal component analysis. Existing approaches for functional principal component analysis usually involve the diagonalization of the covariance operator. With the increasing size and complexity of functional datasets, estimating the covariance operator has become more challenging. Therefore, there is a growing need for efficient methodologies to estimate the eigencomponents. Using the duality of the space of observations and the space of functional features, we propose to use the inner-product between the curves to estimate the eigenelements of multivariate and multidimensional functional datasets. The relationship between the eigenelements of the covariance operator and those of the inner-product matrix is established. We explore the application of these methodologies in several FDA settings and provide general guidance on their usability.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105525"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rank-based combination independence tests for high-dimensional data 高维数据的基于秩的组合独立性检验
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-03-01 Epub Date: 2025-11-28 DOI: 10.1016/j.jmva.2025.105550
Liqi Xia , Ruiyuan Cao , Jiang Du , Ling Liu
This paper proposes a novel approach to enhance the versatility of the max-sum test in high-dimensional data analysis by combining two distinct rank correlation coefficients: Spearman’s ρ and Chatterjee’s ξ. We uncovered the independence between the max-type test and the sum-type test by deriving their joint distribution. This insight enables the development of a comprehensive max-sum test that tackles both sparse and dense alternative correlation structures in an adaptive manner. Leveraging the asymptotic independence between the two coefficients and the intrinsic highlights of two single-coefficient tests, we have strategically implemented Cauchy combination principles to devise a multifunctional testing methodology. This approach can accommodate monotonic and nonmonotonic data types and thus offers a versatile solution to a broad spectrum of analytical requirements. This versatility of our proposed method has been impressively demonstrated through a diverse range of simulation data studies and two real-world data analyses, underscoring its effectiveness and practical utility.
本文提出了一种新的方法,通过结合两个不同的秩相关系数:Spearman 's ρ和Chatterjee 's ξ,来增强高维数据分析中最大和检验的通用性。我们通过推导最大型检验和和型检验的联合分布,揭示了它们之间的独立性。这种见解使得开发一个全面的最大和测试能够以自适应的方式处理稀疏和密集的替代相关结构。利用两个系数之间的渐近独立性和两个单系数测试的内在亮点,我们有策略地实施柯西组合原理来设计多功能测试方法。这种方法可以容纳单调和非单调的数据类型,因此为广泛的分析需求提供了一个通用的解决方案。我们提出的方法的多功能性已经通过各种模拟数据研究和两个真实世界的数据分析得到了令人印象深刻的证明,强调了其有效性和实用性。
{"title":"Rank-based combination independence tests for high-dimensional data","authors":"Liqi Xia ,&nbsp;Ruiyuan Cao ,&nbsp;Jiang Du ,&nbsp;Ling Liu","doi":"10.1016/j.jmva.2025.105550","DOIUrl":"10.1016/j.jmva.2025.105550","url":null,"abstract":"<div><div>This paper proposes a novel approach to enhance the versatility of the max-sum test in high-dimensional data analysis by combining two distinct rank correlation coefficients: Spearman’s <span><math><mi>ρ</mi></math></span> and Chatterjee’s <span><math><mi>ξ</mi></math></span>. We uncovered the independence between the max-type test and the sum-type test by deriving their joint distribution. This insight enables the development of a comprehensive max-sum test that tackles both sparse and dense alternative correlation structures in an adaptive manner. Leveraging the asymptotic independence between the two coefficients and the intrinsic highlights of two single-coefficient tests, we have strategically implemented Cauchy combination principles to devise a multifunctional testing methodology. This approach can accommodate monotonic and nonmonotonic data types and thus offers a versatile solution to a broad spectrum of analytical requirements. This versatility of our proposed method has been impressively demonstrated through a diverse range of simulation data studies and two real-world data analyses, underscoring its effectiveness and practical utility.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105550"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variable selection in mixture regression for longitudinal data based on joint mean–covariance model 基于联合均值-协方差模型的纵向数据混合回归变量选择
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-03-01 Epub Date: 2025-11-19 DOI: 10.1016/j.jmva.2025.105548
Jing Yu , Jianxin Pan
A large number of explanatory variables may be measured with the collection of longitudinal data, of which some may not be influential for modeling of heterogeneous longitudinal data. For such complex data, not only their mean but also covariances may be affected by various explanatory variables. A data-driven approach is proposed to model the mean and covariance structures, simultaneously, together with selecting influential explanatory variables. A penalized maximum likelihood method for the joint mean and covariance model is developed within the framework of finite Gaussian mixture regression. EM algorithm is employed for the numerical calculation. The parameter estimators obtained are shown to be consistent and asymptotically normally distributed, and have oracle properties with proper choices of penalty function and tuning parameter. Simulation studies show that the proposed method works very well and provides accurate and effective parameter estimators by conducting variable selection. For illustration, real data analysis on clustering COVID-19 infected cases for European countries in terms of governmental policy effects is made to demonstrate the usefulness of the proposed method.
通过收集纵向数据可以测量大量的解释变量,其中一些变量可能对异构纵向数据的建模没有影响。对于这种复杂的数据,不仅其均值,而且协方差都可能受到各种解释变量的影响。提出了一种数据驱动的方法,同时对均值和协方差结构进行建模,并选择有影响的解释变量。在有限高斯混合回归的框架内,对联合均值和协方差模型提出了一种惩罚极大似然法。采用EM算法进行数值计算。得到的参数估计量是一致的、渐近正态分布的,并且在适当选择惩罚函数和调优参数的情况下具有oracle性质。仿真研究表明,该方法通过变量选择提供了准确有效的参数估计。为了说明,从政府政策效应的角度对欧洲国家的聚集性COVID-19感染病例进行了实际数据分析,以证明所提出方法的有效性。
{"title":"Variable selection in mixture regression for longitudinal data based on joint mean–covariance model","authors":"Jing Yu ,&nbsp;Jianxin Pan","doi":"10.1016/j.jmva.2025.105548","DOIUrl":"10.1016/j.jmva.2025.105548","url":null,"abstract":"<div><div>A large number of explanatory variables may be measured with the collection of longitudinal data, of which some may not be influential for modeling of heterogeneous longitudinal data. For such complex data, not only their mean but also covariances may be affected by various explanatory variables. A data-driven approach is proposed to model the mean and covariance structures, simultaneously, together with selecting influential explanatory variables. A penalized maximum likelihood method for the joint mean and covariance model is developed within the framework of finite Gaussian mixture regression. EM algorithm is employed for the numerical calculation. The parameter estimators obtained are shown to be consistent and asymptotically normally distributed, and have oracle properties with proper choices of penalty function and tuning parameter. Simulation studies show that the proposed method works very well and provides accurate and effective parameter estimators by conducting variable selection. For illustration, real data analysis on clustering COVID-19 infected cases for European countries in terms of governmental policy effects is made to demonstrate the usefulness of the proposed method.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105548"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical guarantees for distribution estimation of contaminated data via DNN-based MoM-GANs 基于dnn的mom - gan对污染数据分布估计的统计保证
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-03-01 Epub Date: 2025-11-28 DOI: 10.1016/j.jmva.2025.105571
Fang Xie , Lihu Xu , Qiuran Yao , Huiming Zhang
This paper investigates the distribution estimation of contaminated data using the MoM-GAN method, which leverages the power of generative adversarial nets (GANs) and median-of-means (MoM) estimation. Specifically, we use a deep neural network (DNN) with a ReLU activation function to model the generator and discriminator of the GAN. In terms of theoretical analysis, we derive a non-asymptotic error bound for the DNN-based MoM-GAN estimator, which is measured by integral probability metrics and takes into account the b-smoothness Hölder class. The error bound essentially decreases in nb/pn1/2, where n and p are the sample size and the dimension of the input data, respectively. It provides a rigorous guarantee of the accuracy and robustness of the MoM-GAN estimator, even in the presence of contaminated data. We present an algorithm for the MoM-GAN method and demonstrate its effectiveness in two real-world applications. Our results show that the MoM-GAN method outperforms other competing methods when dealing with contaminated data, highlighting its superior performance and robustness.
本文利用生成对抗网络(gan)和均值中位数(MoM)估计的力量,研究了使用MoM- gan方法对污染数据的分布估计。具体来说,我们使用具有ReLU激活函数的深度神经网络(DNN)来建模GAN的生成器和鉴别器。在理论分析方面,我们推导了基于dnn的MoM-GAN估计器的非渐近误差界,该估计器通过积分概率度量来测量,并考虑了b-平滑Hölder类。误差界本质上在n−b/p中∨n−1/2中减小,其中n和p分别是输入数据的样本量和维数。它为MoM-GAN估计器的准确性和鲁棒性提供了严格的保证,即使在存在污染数据的情况下。我们提出了一种MoM-GAN方法的算法,并在两个实际应用中证明了它的有效性。我们的结果表明,MoM-GAN方法在处理污染数据时优于其他竞争方法,突出了其优越的性能和鲁棒性。
{"title":"Statistical guarantees for distribution estimation of contaminated data via DNN-based MoM-GANs","authors":"Fang Xie ,&nbsp;Lihu Xu ,&nbsp;Qiuran Yao ,&nbsp;Huiming Zhang","doi":"10.1016/j.jmva.2025.105571","DOIUrl":"10.1016/j.jmva.2025.105571","url":null,"abstract":"<div><div>This paper investigates the distribution estimation of contaminated data using the MoM-GAN method, which leverages the power of generative adversarial nets (GANs) and median-of-means (MoM) estimation. Specifically, we use a deep neural network (DNN) with a ReLU activation function to model the generator and discriminator of the GAN. In terms of theoretical analysis, we derive a non-asymptotic error bound for the DNN-based MoM-GAN estimator, which is measured by integral probability metrics and takes into account the <span><math><mi>b</mi></math></span>-smoothness Hölder class. The error bound essentially decreases in <span><math><mrow><msup><mrow><mi>n</mi></mrow><mrow><mo>−</mo><mi>b</mi><mo>/</mo><mi>p</mi></mrow></msup><mo>∨</mo><msup><mrow><mi>n</mi></mrow><mrow><mo>−</mo><mn>1</mn><mo>/</mo><mn>2</mn></mrow></msup></mrow></math></span>, where <span><math><mi>n</mi></math></span> and <span><math><mi>p</mi></math></span> are the sample size and the dimension of the input data, respectively. It provides a rigorous guarantee of the accuracy and robustness of the MoM-GAN estimator, even in the presence of contaminated data. We present an algorithm for the MoM-GAN method and demonstrate its effectiveness in two real-world applications. Our results show that the MoM-GAN method outperforms other competing methods when dealing with contaminated data, highlighting its superior performance and robustness.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105571"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Multivariate Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1