首页 > 最新文献

Journal of Multivariate Analysis最新文献

英文 中文
A robust mixed functional classifier with adaptive large margin loss 一种自适应大边际损失鲁棒混合函数分类器
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-12-03 DOI: 10.1016/j.jmva.2025.105563
Hanteng Ma , Peijun Sang , Xingdong Feng , Xin Liu
Functional classification has been increasingly helpful in exploring and predicting a response variable with multiple categories. In fact, both functional and scalar covariates may be useful and should be included in the model simultaneously, and thus developing a robust multi-categorical functional classifier with statistical guarantees is desirable. However, both of these two issues are rarely touched in previous studies. Motivated by these, in this paper we propose a novel large margin linear mixed functional classifier for the response with multiple categories, which includes both functional and scalar covariates as predictors, especially when functional data are sparsely longitudinal. Not only does the proposed method address the functional classification using a combination of both functional and scalar covariates, but also provides a robust multi-categorical mixed functional classifier using a large margin loss adaptive to observed samples. Furthermore, we establish statistical theories of a mixed functional classifier, which have been less considered in existing literature. An efficient algorithm is also proposed for its practical implementation. Numerical investigations have supported the superb performance of the proposed method on both simulated and real datasets.
功能分类在探索和预测具有多类别的反应变量方面越来越有帮助。事实上,函数协变量和标量协变量可能都是有用的,应该同时包含在模型中,因此开发具有统计保证的鲁棒多分类函数分类器是可取的。然而,这两个问题在以往的研究中都很少涉及。受此启发,本文提出了一种新的大余量线性混合泛函分类器,用于具有多类别的响应,其中包括函数和标量协变量作为预测因子,特别是当函数数据是稀疏纵向的时。该方法不仅解决了使用泛函协变量和标量协变量组合的功能分类问题,而且还提供了一个鲁棒的多类别混合功能分类器,该分类器使用自适应观察样本的大裕度损失。此外,我们建立了混合功能分类器的统计理论,这在现有文献中很少被考虑。为实现该算法,提出了一种有效的算法。数值研究支持了该方法在模拟和实际数据集上的优异性能。
{"title":"A robust mixed functional classifier with adaptive large margin loss","authors":"Hanteng Ma ,&nbsp;Peijun Sang ,&nbsp;Xingdong Feng ,&nbsp;Xin Liu","doi":"10.1016/j.jmva.2025.105563","DOIUrl":"10.1016/j.jmva.2025.105563","url":null,"abstract":"<div><div>Functional classification has been increasingly helpful in exploring and predicting a response variable with multiple categories. In fact, both functional and scalar covariates may be useful and should be included in the model simultaneously, and thus developing a robust multi-categorical functional classifier with statistical guarantees is desirable. However, both of these two issues are rarely touched in previous studies. Motivated by these, in this paper we propose a novel large margin linear mixed functional classifier for the response with multiple categories, which includes both functional and scalar covariates as predictors, especially when functional data are sparsely longitudinal. Not only does the proposed method address the functional classification using a combination of both functional and scalar covariates, but also provides a robust multi-categorical mixed functional classifier using a large margin loss adaptive to observed samples. Furthermore, we establish statistical theories of a mixed functional classifier, which have been less considered in existing literature. An efficient algorithm is also proposed for its practical implementation. Numerical investigations have supported the superb performance of the proposed method on both simulated and real datasets.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"213 ","pages":"Article 105563"},"PeriodicalIF":1.4,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145735425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bounds and identification on direct and indirect effects under partially observed mediator-endpoint confounders 在部分观察到的中介-终点混杂因素下直接和间接影响的界限和鉴定
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-12-03 DOI: 10.1016/j.jmva.2025.105565
Yu Han , Peng Luo , Wei Zhang , Xiang Gu
The direct effect of a treatment variable and the indirect effect through a mediator variable on an endpoint variable are important for understanding a causal mechanism. The Controlled direct effect has a prescriptive interpretation, while the natural direct and indirect effects have a descriptive interpretation. In practice, these three effects are usually very difficult to identify. To tackle this problem, some researchers investigated the upper and lower bounds of these three effects when some reasonable identification conditions hold. For example, Luo and Geng (2016) gave the upper and lower bounds of these direct and indirect effects when there is an unobserved mediator-endpoint confounder vector and the endpoint variable is continuous. In this paper, we tighten the bounds on controlled direct effect in Luo and Geng (2016) when part of the confounders can be observed. Additionally, we give a sufficient condition to identify the direct and indirect effects when the variables satisfy one linear relationship.
治疗变量的直接影响和通过中介变量对终点变量的间接影响对于理解因果机制很重要。受控直接效应具有规定性解释,而自然直接效应和间接效应具有描述性解释。在实践中,这三种效应通常很难识别。为了解决这个问题,一些研究者研究了在一些合理的识别条件下,这三种效应的上界和下界。例如,Luo和Geng(2016)给出了存在未观察到的中介-端点混杂向量且端点变量连续时这些直接和间接影响的上界和下界。在本文中,当可以观察到部分混杂因素时,我们在Luo和Geng(2016)中收紧了受控直接效应的界限。此外,我们还给出了当变量满足一个线性关系时辨识直接效应和间接效应的充分条件。
{"title":"Bounds and identification on direct and indirect effects under partially observed mediator-endpoint confounders","authors":"Yu Han ,&nbsp;Peng Luo ,&nbsp;Wei Zhang ,&nbsp;Xiang Gu","doi":"10.1016/j.jmva.2025.105565","DOIUrl":"10.1016/j.jmva.2025.105565","url":null,"abstract":"<div><div>The direct effect of a treatment variable and the indirect effect through a mediator variable on an endpoint variable are important for understanding a causal mechanism. The Controlled direct effect has a prescriptive interpretation, while the natural direct and indirect effects have a descriptive interpretation. In practice, these three effects are usually very difficult to identify. To tackle this problem, some researchers investigated the upper and lower bounds of these three effects when some reasonable identification conditions hold. For example, Luo and Geng (2016) gave the upper and lower bounds of these direct and indirect effects when there is an unobserved mediator-endpoint confounder vector and the endpoint variable is continuous. In this paper, we tighten the bounds on controlled direct effect in Luo and Geng (2016) when part of the confounders can be observed. Additionally, we give a sufficient condition to identify the direct and indirect effects when the variables satisfy one linear relationship.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"213 ","pages":"Article 105565"},"PeriodicalIF":1.4,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145683761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simultaneous variable selection and estimation of multivariate panel count data 多变量面板计数数据的同时变量选择与估计
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-12-02 DOI: 10.1016/j.jmva.2025.105559
Lei Ge , Rong Liu , Tao Hu , Jianguo Sun
Panel count data are a general type of data arising from the studies on recurrent events and occur when the observed information on each study subject consists of only the numbers of the occurrences of the recurrent events between successive examinations. It is easy to see that such data can occur in many fields, including economic studies, medical studies and social sciences. This paper considers regression analysis of multivariate panel count data with the focus on variable selection and estimation of significant covariate effects. For the problem, a minimum information criterion-based method is proposed and an expectation–maximization algorithm is developed for the determination of the proposed estimator. Furthermore, the resulting estimator is shown to have the desirable oracle property and a simulation study is performed and confirms the good finite-sample properties of the proposed method. Finally the method is applied to a set of real data arising from a skin cancer study.
面板计数数据是由复发事件研究产生的一种一般类型的数据,当观察到的每个研究对象的信息仅包括连续检查之间复发事件的发生次数时,就会出现这种数据。很容易看出,这样的数据可以出现在许多领域,包括经济研究、医学研究和社会科学。本文考虑多元面板计数数据的回归分析,重点是变量选择和显著协变量效应的估计。针对这一问题,提出了一种基于最小信息准则的方法,并提出了一种期望最大化算法来确定所提出的估计量。此外,所得到的估计器具有理想的oracle特性,并进行了仿真研究,证实了所提出的方法具有良好的有限样本特性。最后,将该方法应用于一组来自皮肤癌研究的真实数据。
{"title":"Simultaneous variable selection and estimation of multivariate panel count data","authors":"Lei Ge ,&nbsp;Rong Liu ,&nbsp;Tao Hu ,&nbsp;Jianguo Sun","doi":"10.1016/j.jmva.2025.105559","DOIUrl":"10.1016/j.jmva.2025.105559","url":null,"abstract":"<div><div>Panel count data are a general type of data arising from the studies on recurrent events and occur when the observed information on each study subject consists of only the numbers of the occurrences of the recurrent events between successive examinations. It is easy to see that such data can occur in many fields, including economic studies, medical studies and social sciences. This paper considers regression analysis of multivariate panel count data with the focus on variable selection and estimation of significant covariate effects. For the problem, a minimum information criterion-based method is proposed and an expectation–maximization algorithm is developed for the determination of the proposed estimator. Furthermore, the resulting estimator is shown to have the desirable oracle property and a simulation study is performed and confirms the good finite-sample properties of the proposed method. Finally the method is applied to a set of real data arising from a skin cancer study.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"213 ","pages":"Article 105559"},"PeriodicalIF":1.4,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145683760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust factor analysis with exponential squared loss 具有指数平方损失的稳健因子分析
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-12-01 DOI: 10.1016/j.jmva.2025.105567
Jiaqi Hu, Tingyin Wang, Xueqin Wang
The large dimensional factor model, aimed at reducing dimensionality and extracting features through a few latent common factors, has sparked significant interest due to its broad applications. Despite the popularity of traditional methods for factor models, they may yield incorrect estimators for heavy-tailed data. To address this issue, we introduce the exponential squared loss to the factor model in this study, denoted as the Robust Exponential Factor Analysis (REFA). We propose a modified rank minimization technique to enhance the estimation accuracy of factor numbers in finite-sample cases. Consistency properties for factors and loadings are established under mild conditions, without any moment assumptions on the errors. The performance of REFA with finite samples under both light and heavy-tailed cases has been demonstrated through simulation studies. Furthermore, an analysis employing a financial dataset of asset returns underscores the superiority of REFA. To facilitate the implementation of our proposed methodology by researchers, we have developed an R package named REFA, which is available on CRAN.
大维度因子模型旨在通过几个潜在的共同因子降维并提取特征,由于其广泛的应用而引起了人们的极大兴趣。尽管因子模型的传统方法很受欢迎,但它们可能对重尾数据产生不正确的估计。为了解决这个问题,我们在本研究中将指数平方损失引入因子模型,称为稳健指数因子分析(REFA)。为了提高有限样本情况下因子数的估计精度,提出了一种改进的秩最小化技术。因子和载荷的一致性特性是在温和的条件下建立的,没有对误差的任何力矩假设。通过仿真研究,验证了该方法在轻尾和重尾两种情况下的性能。此外,采用资产回报金融数据集的分析强调了REFA的优越性。为了便于研究人员实施我们提出的方法,我们开发了一个名为REFA的R包,可在CRAN上获得。
{"title":"Robust factor analysis with exponential squared loss","authors":"Jiaqi Hu,&nbsp;Tingyin Wang,&nbsp;Xueqin Wang","doi":"10.1016/j.jmva.2025.105567","DOIUrl":"10.1016/j.jmva.2025.105567","url":null,"abstract":"<div><div>The large dimensional factor model, aimed at reducing dimensionality and extracting features through a few latent common factors, has sparked significant interest due to its broad applications. Despite the popularity of traditional methods for factor models, they may yield incorrect estimators for heavy-tailed data. To address this issue, we introduce the exponential squared loss to the factor model in this study, denoted as the Robust Exponential Factor Analysis (REFA). We propose a modified rank minimization technique to enhance the estimation accuracy of factor numbers in finite-sample cases. Consistency properties for factors and loadings are established under mild conditions, without any moment assumptions on the errors. The performance of REFA with finite samples under both light and heavy-tailed cases has been demonstrated through simulation studies. Furthermore, an analysis employing a financial dataset of asset returns underscores the superiority of REFA. To facilitate the implementation of our proposed methodology by researchers, we have developed an <span>R</span> package named <span>REFA</span>, which is available on <span>CRAN</span>.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"213 ","pages":"Article 105567"},"PeriodicalIF":1.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145683763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Uniform designs for experiments with branching and nested factors 具有分支和嵌套因素的实验的统一设计
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-12-01 DOI: 10.1016/j.jmva.2025.105576
Feng Yang , Zheng Zhou , Yongdao Zhou
The factors that exist only at certain levels of other factors are called the nested factors. The factors that lead to such nested factors are called the branching factors. Experiments with branching and nested factors occur frequently in practical applications. Designing such experiments is challenging due to the special relationship between the branching and nested factors. In this paper, we propose uniform designs for experiments involving branching and nested factors. A novel criterion is introduced to measure the uniformity of such designs, and the corresponding lower bound is also given. The construction methods of uniform designs for experiments with branching and nested factors are provided, and their effectiveness is verified by simulation comparisons and a practical manufacturing experiment. The proposed method allows each of branching, nested and shared factors to be either qualitative or quantitative. Moreover, the run size and the levels of quantitative factors are very flexible, such that our method works well for both physical and computer experiments.
只存在于其他因素的一定水平上的因素称为嵌套因素。导致这些嵌套因子的因子被称为分支因子。分支因子和嵌套因子的实验在实际应用中经常出现。由于分支和嵌套因素之间的特殊关系,设计这样的实验是具有挑战性的。在本文中,我们提出了涉及分支和嵌套因素的实验的统一设计。引入了一种新的准则来衡量这种设计的均匀性,并给出了相应的下界。给出了分支因子和嵌套因子实验均匀设计的构建方法,并通过仿真对比和实际制造实验验证了其有效性。所提出的方法允许每个分支、嵌套和共享的因素是定性的或定量的。此外,运行规模和定量因素的水平非常灵活,因此我们的方法对物理和计算机实验都很有效。
{"title":"Uniform designs for experiments with branching and nested factors","authors":"Feng Yang ,&nbsp;Zheng Zhou ,&nbsp;Yongdao Zhou","doi":"10.1016/j.jmva.2025.105576","DOIUrl":"10.1016/j.jmva.2025.105576","url":null,"abstract":"<div><div>The factors that exist only at certain levels of other factors are called the nested factors. The factors that lead to such nested factors are called the branching factors. Experiments with branching and nested factors occur frequently in practical applications. Designing such experiments is challenging due to the special relationship between the branching and nested factors. In this paper, we propose uniform designs for experiments involving branching and nested factors. A novel criterion is introduced to measure the uniformity of such designs, and the corresponding lower bound is also given. The construction methods of uniform designs for experiments with branching and nested factors are provided, and their effectiveness is verified by simulation comparisons and a practical manufacturing experiment. The proposed method allows each of branching, nested and shared factors to be either qualitative or quantitative. Moreover, the run size and the levels of quantitative factors are very flexible, such that our method works well for both physical and computer experiments.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105576"},"PeriodicalIF":1.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Iterative sequential screening strategies for sparse recovery with computational advantages 具有计算优势的稀疏恢复迭代顺序筛选策略
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-29 DOI: 10.1016/j.jmva.2025.105570
Weixiong Liang , Yuehan Yang
A prominent problem in multi-response models is the presence of complex group structures of the high-dimensional data such as the overlapping group structures. In such models, both responses and predictors are grouped, and each response group is allowed to relate to multiple predictor groups. Ignoring such structures often yields insufficient statistical inference and misleading statistical conclusions. Motivated by practical needs and the sequential canonical correlation search (SCCS) algorithm proposed by Luo and Chen (2020), this paper proposes two computationally attractive feature selection algorithms, reallocating-SCCS (RSCCS) and prescreening-SCCS (PSCCS), for the high-dimensional multi-response models with complex group structures.
The proposed methods, RSCCS and PSCCS, consist of three steps. In the first step, to fully incorporate the information of group structures in the feature selection algorithm, both RSCCS and PSCCS select a non-zero coefficient block according to the canonical correlation between the residual response groups and feature groups. In the second step, RSCCS selects the non-zero coefficient row, while PSCCS conducts screening within the non-zero coefficient block using penalized regularizations. In the third step, RSCCS and PSCCS select features by EBIC based on different situations and different iterations. We demonstrate the advantages of these two methods compared with several existing approaches. The statistical guarantees of RSCCS and PSCCS are established. We provide numerical simulation results and analyze a real data example to compare their performance with other methods.
多响应模型的一个突出问题是高维数据中存在复杂的群结构,如重叠群结构。在这样的模型中,响应和预测因子都被分组,并且每个响应组都可以关联到多个预测因子组。忽视这种结构往往会产生不充分的统计推断和误导性的统计结论。本文基于实际需求,结合Luo和Chen(2020)提出的序列典型相关搜索(SCCS)算法,针对具有复杂群体结构的高维多响应模型,提出了两种计算上具有吸引力的特征选择算法——重新分配-SCCS (RSCCS)和预筛选-SCCS (PSCCS)。所提出的方法,RSCCS和PSCCS,包括三个步骤。第一步,为了在特征选择算法中充分考虑组结构信息,RSCCS和PSCCS都根据残差响应组与特征组之间的典型相关性选择非零系数块。在第二步中,RSCCS选择非零系数行,而PSCCS使用惩罚正则化在非零系数块内进行筛选。第三步,RSCCS和PSCCS根据不同的情况和不同的迭代,通过EBIC选择特征。与现有的几种方法相比,我们证明了这两种方法的优点。建立了RSCCS和PSCCS的统计保证。给出了数值模拟结果,并对一个实际数据实例进行了分析,与其他方法的性能进行了比较。
{"title":"Iterative sequential screening strategies for sparse recovery with computational advantages","authors":"Weixiong Liang ,&nbsp;Yuehan Yang","doi":"10.1016/j.jmva.2025.105570","DOIUrl":"10.1016/j.jmva.2025.105570","url":null,"abstract":"<div><div>A prominent problem in multi-response models is the presence of complex group structures of the high-dimensional data such as the overlapping group structures. In such models, both responses and predictors are grouped, and each response group is allowed to relate to multiple predictor groups. Ignoring such structures often yields insufficient statistical inference and misleading statistical conclusions. Motivated by practical needs and the sequential canonical correlation search (SCCS) algorithm proposed by Luo and Chen (2020), this paper proposes two computationally attractive feature selection algorithms, reallocating-SCCS (RSCCS) and prescreening-SCCS (PSCCS), for the high-dimensional multi-response models with complex group structures.</div><div>The proposed methods, RSCCS and PSCCS, consist of three steps. In the first step, to fully incorporate the information of group structures in the feature selection algorithm, both RSCCS and PSCCS select a non-zero coefficient block according to the canonical correlation between the residual response groups and feature groups. In the second step, RSCCS selects the non-zero coefficient row, while PSCCS conducts screening within the non-zero coefficient block using penalized regularizations. In the third step, RSCCS and PSCCS select features by EBIC based on different situations and different iterations. We demonstrate the advantages of these two methods compared with several existing approaches. The statistical guarantees of RSCCS and PSCCS are established. We provide numerical simulation results and analyze a real data example to compare their performance with other methods.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105570"},"PeriodicalIF":1.4,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145616474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed estimation of spiked eigenvalues in spiked population models 尖峰种群模型中尖峰特征值的分布估计
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-29 DOI: 10.1016/j.jmva.2025.105558
Lu Yan , Jiang Hu
The proliferation of science and technology has led to the prevalence of voluminous data sets distributed across multiple machines. Conventional statistical methodologies may be infeasible in analyzing such massive data sets due to prohibitively long computing durations, memory constraints, communication overheads, and confidentiality considerations. In this paper, we propose distributed estimators of the spiked eigenvalues in spiked population models. The consistency and asymptotic normality of the distributed estimators are derived, and the statistical error analysis of the distributed estimators is also provided. Compared to the estimation from the full sample, the proposed distributed estimation shares the same order of convergence. Simulation study and real data analysis indicate that the proposed distributed estimation and testing procedures have excellent properties in terms of estimation accuracy and stability as well as transmission efficiency.
科学技术的发展导致了分布在多台机器上的大量数据集的流行。由于计算时间过长、内存限制、通信开销和机密性考虑,传统的统计方法在分析如此庞大的数据集时可能不可行。本文提出了尖峰种群模型中尖峰特征值的分布估计。导出了分布估计量的相合性和渐近正态性,并给出了分布估计量的统计误差分析。与全样本估计相比,所提出的分布式估计具有相同的收敛阶。仿真研究和实际数据分析表明,所提出的分布式估计和测试方法在估计精度、稳定性和传输效率方面具有优异的性能。
{"title":"Distributed estimation of spiked eigenvalues in spiked population models","authors":"Lu Yan ,&nbsp;Jiang Hu","doi":"10.1016/j.jmva.2025.105558","DOIUrl":"10.1016/j.jmva.2025.105558","url":null,"abstract":"<div><div>The proliferation of science and technology has led to the prevalence of voluminous data sets distributed across multiple machines. Conventional statistical methodologies may be infeasible in analyzing such massive data sets due to prohibitively long computing durations, memory constraints, communication overheads, and confidentiality considerations. In this paper, we propose distributed estimators of the spiked eigenvalues in spiked population models. The consistency and asymptotic normality of the distributed estimators are derived, and the statistical error analysis of the distributed estimators is also provided. Compared to the estimation from the full sample, the proposed distributed estimation shares the same order of convergence. Simulation study and real data analysis indicate that the proposed distributed estimation and testing procedures have excellent properties in terms of estimation accuracy and stability as well as transmission efficiency.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105558"},"PeriodicalIF":1.4,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The mean tests with high dimensional data 高维数据的均值检验
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-28 DOI: 10.1016/j.jmva.2025.105564
Wenzhi Yang , Chi Yao , Yiming Liu , Guangming Pan , Wang Zhou
In this paper, we consider the mean tests with high dimensional data and give two new tests which consist of three steps. Firstly, we reduce the high dimensional vectors into many low dimensional vectors and construct the Hotelling’s T2 tests; Secondly, by the distribution or asymptotic distribution of these Hotelling’s tests under the null hypothesis, we transform these tests into uniform distribution or asymptotic uniform distribution random variables; Thirdly, the central limit theorems of the normalized sum of these transformations are obtained under the Gaussian case and non-Gaussian cases. Moreover, the asymptotic power of new test is also presented for non-Gaussian case. Compared to the existing tests, our tests not only have the good empirical sizes, but also have the high empirical powers.
本文考虑了高维数据的均值检验,给出了两个由三步组成的新检验。首先,将高维向量化简为多个低维向量,构造Hotelling’s T2检验;其次,利用这些Hotelling检验在零假设下的分布或渐近分布,将这些检验转化为均匀分布或渐近均匀分布的随机变量;第三,在高斯和非高斯情况下,得到了这些变换的归一化和的中心极限定理。此外,在非高斯情况下,给出了新检验的渐近幂。与已有的检验相比,我们的检验不仅具有较好的实证规模,而且具有较高的实证幂。
{"title":"The mean tests with high dimensional data","authors":"Wenzhi Yang ,&nbsp;Chi Yao ,&nbsp;Yiming Liu ,&nbsp;Guangming Pan ,&nbsp;Wang Zhou","doi":"10.1016/j.jmva.2025.105564","DOIUrl":"10.1016/j.jmva.2025.105564","url":null,"abstract":"<div><div>In this paper, we consider the mean tests with high dimensional data and give two new tests which consist of three steps. Firstly, we reduce the high dimensional vectors into many low dimensional vectors and construct the Hotelling’s <span><math><msup><mrow><mi>T</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> tests; Secondly, by the distribution or asymptotic distribution of these Hotelling’s tests under the null hypothesis, we transform these tests into uniform distribution or asymptotic uniform distribution random variables; Thirdly, the central limit theorems of the normalized sum of these transformations are obtained under the Gaussian case and non-Gaussian cases. Moreover, the asymptotic power of new test is also presented for non-Gaussian case. Compared to the existing tests, our tests not only have the good empirical sizes, but also have the high empirical powers.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105564"},"PeriodicalIF":1.4,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rank-based combination independence tests for high-dimensional data 高维数据的基于秩的组合独立性检验
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-28 DOI: 10.1016/j.jmva.2025.105550
Liqi Xia , Ruiyuan Cao , Jiang Du , Ling Liu
This paper proposes a novel approach to enhance the versatility of the max-sum test in high-dimensional data analysis by combining two distinct rank correlation coefficients: Spearman’s ρ and Chatterjee’s ξ. We uncovered the independence between the max-type test and the sum-type test by deriving their joint distribution. This insight enables the development of a comprehensive max-sum test that tackles both sparse and dense alternative correlation structures in an adaptive manner. Leveraging the asymptotic independence between the two coefficients and the intrinsic highlights of two single-coefficient tests, we have strategically implemented Cauchy combination principles to devise a multifunctional testing methodology. This approach can accommodate monotonic and nonmonotonic data types and thus offers a versatile solution to a broad spectrum of analytical requirements. This versatility of our proposed method has been impressively demonstrated through a diverse range of simulation data studies and two real-world data analyses, underscoring its effectiveness and practical utility.
本文提出了一种新的方法,通过结合两个不同的秩相关系数:Spearman 's ρ和Chatterjee 's ξ,来增强高维数据分析中最大和检验的通用性。我们通过推导最大型检验和和型检验的联合分布,揭示了它们之间的独立性。这种见解使得开发一个全面的最大和测试能够以自适应的方式处理稀疏和密集的替代相关结构。利用两个系数之间的渐近独立性和两个单系数测试的内在亮点,我们有策略地实施柯西组合原理来设计多功能测试方法。这种方法可以容纳单调和非单调的数据类型,因此为广泛的分析需求提供了一个通用的解决方案。我们提出的方法的多功能性已经通过各种模拟数据研究和两个真实世界的数据分析得到了令人印象深刻的证明,强调了其有效性和实用性。
{"title":"Rank-based combination independence tests for high-dimensional data","authors":"Liqi Xia ,&nbsp;Ruiyuan Cao ,&nbsp;Jiang Du ,&nbsp;Ling Liu","doi":"10.1016/j.jmva.2025.105550","DOIUrl":"10.1016/j.jmva.2025.105550","url":null,"abstract":"<div><div>This paper proposes a novel approach to enhance the versatility of the max-sum test in high-dimensional data analysis by combining two distinct rank correlation coefficients: Spearman’s <span><math><mi>ρ</mi></math></span> and Chatterjee’s <span><math><mi>ξ</mi></math></span>. We uncovered the independence between the max-type test and the sum-type test by deriving their joint distribution. This insight enables the development of a comprehensive max-sum test that tackles both sparse and dense alternative correlation structures in an adaptive manner. Leveraging the asymptotic independence between the two coefficients and the intrinsic highlights of two single-coefficient tests, we have strategically implemented Cauchy combination principles to devise a multifunctional testing methodology. This approach can accommodate monotonic and nonmonotonic data types and thus offers a versatile solution to a broad spectrum of analytical requirements. This versatility of our proposed method has been impressively demonstrated through a diverse range of simulation data studies and two real-world data analyses, underscoring its effectiveness and practical utility.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105550"},"PeriodicalIF":1.4,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimation and testing for fixed effects partially linear nonparametric panel regression model with separable spatially and serially correlated error structure 具有可分离空间和序列相关误差结构的部分线性非参数面板回归模型的固定效应估计与检验
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-28 DOI: 10.1016/j.jmva.2025.105552
Shuangshuang Li , Jianbao Chen
Panel data collected from “locations” may exhibit spatial and serial correlations. In order to study such spatial and serial correlations, and possible existing nonlinear relationships, a fixed effects partially linear nonparametric panel regression model with separable spatially and serially correlated error structure is introduced. We obtain profile quasi-maximum likelihood estimators (PQMLEs) of the unknowns. Furthermore, a generalized F-test called FNT is designed for assessing the reasonability of nonparametric component setting. Asymptotic properties of PQMLEs and FNT are provided under several conditions. Monte Carlo trials imply our estimators and test statistic exhibit good performance in finite samples and model misspecification may lead to substantial influence on the estimates of unknown parameters. The analysis of provincial housing price in China reveals the presence of nonlinear, spatial and serial correlation relationships.
从“地点”收集的面板数据可能显示空间和序列相关性。为了研究这种空间和序列相关性,以及可能存在的非线性关系,引入了一种具有可分离空间和序列相关误差结构的固定效应部分线性非参数面板回归模型。我们得到了未知的轮廓拟极大似然估计。此外,设计了一种称为FNT的广义f检验来评估非参数组件设置的合理性。在几种条件下,给出了PQMLEs和FNT的渐近性质。蒙特卡罗试验表明,我们的估计量和检验统计量在有限的样本中表现出良好的性能,模型的错误规范可能会对未知参数的估计产生实质性的影响。通过对中国各省房价的分析,可以发现各省房价之间存在着非线性的、空间的和序列的相关关系。
{"title":"Estimation and testing for fixed effects partially linear nonparametric panel regression model with separable spatially and serially correlated error structure","authors":"Shuangshuang Li ,&nbsp;Jianbao Chen","doi":"10.1016/j.jmva.2025.105552","DOIUrl":"10.1016/j.jmva.2025.105552","url":null,"abstract":"<div><div>Panel data collected from “locations” may exhibit spatial and serial correlations. In order to study such spatial and serial correlations, and possible existing nonlinear relationships, a fixed effects partially linear nonparametric panel regression model with separable spatially and serially correlated error structure is introduced. We obtain profile quasi-maximum likelihood estimators (PQMLEs) of the unknowns. Furthermore, a generalized F-test called <span><math><msub><mrow><mi>F</mi></mrow><mrow><mi>N</mi><mi>T</mi></mrow></msub></math></span> is designed for assessing the reasonability of nonparametric component setting. Asymptotic properties of PQMLEs and <span><math><msub><mrow><mi>F</mi></mrow><mrow><mi>N</mi><mi>T</mi></mrow></msub></math></span> are provided under several conditions. Monte Carlo trials imply our estimators and test statistic exhibit good performance in finite samples and model misspecification may lead to substantial influence on the estimates of unknown parameters. The analysis of provincial housing price in China reveals the presence of nonlinear, spatial and serial correlation relationships.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105552"},"PeriodicalIF":1.4,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Multivariate Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1