首页 > 最新文献

Journal of Multivariate Analysis最新文献

英文 中文
Data depth functions for non-standard data by use of formal concept analysis 利用形式概念分析实现非标准数据的数据深度函数
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-09-19 DOI: 10.1016/j.jmva.2024.105372
Hannah Blocher, Georg Schollmeyer
In this article we introduce a notion of depth functions for data types that are not given in standard statistical data formats. We focus on data that cannot be represented by one specific data structure, such as normed vector spaces. This covers a wide range of different data types, which we refer to as non-standard data. Depth functions have been studied intensively for normed vector spaces. However, a discussion of depth functions for non-standard data is lacking. In this article, we address this gap by using formal concept analysis to obtain a unified data representation. Building on this representation, we then define depth functions for non-standard data. Furthermore, we provide a systematic basis by introducing structural properties using the data representation provided by formal concept analysis. Finally, we embed the generalised Tukey depth into our concept of data depth and analyse it using the introduced structural properties. Thus, this article presents the mathematical formalisation of centrality and outlyingness for non-standard data and increases the number of spaces in which centrality can be discussed. In particular, we provide a basis for defining further depth functions and statistical inference methods for non-standard data.
在本文中,我们为标准统计数据格式中未给出的数据类型引入了深度函数的概念。我们将重点放在无法用一种特定数据结构表示的数据上,例如规范向量空间。这涵盖了各种不同的数据类型,我们称之为非标准数据。对于规范向量空间,深度函数已经得到了深入研究。然而,对于非标准数据的深度函数却缺乏讨论。在本文中,我们通过使用形式概念分析来获得统一的数据表示,从而弥补了这一空白。在此表示法的基础上,我们定义了非标准数据的深度函数。此外,我们还利用形式概念分析提供的数据表示引入了结构属性,从而提供了一个系统化的基础。最后,我们将广义图基深度嵌入到我们的数据深度概念中,并利用引入的结构属性对其进行分析。因此,本文提出了非标准数据的中心性和离散性的数学形式化,并增加了可讨论中心性的空间数量。特别是,我们为进一步定义非标准数据的深度函数和统计推断方法提供了基础。
{"title":"Data depth functions for non-standard data by use of formal concept analysis","authors":"Hannah Blocher,&nbsp;Georg Schollmeyer","doi":"10.1016/j.jmva.2024.105372","DOIUrl":"10.1016/j.jmva.2024.105372","url":null,"abstract":"<div><div>In this article we introduce a notion of depth functions for data types that are not given in standard statistical data formats. We focus on data that cannot be represented by one specific data structure, such as normed vector spaces. This covers a wide range of different data types, which we refer to as non-standard data. Depth functions have been studied intensively for normed vector spaces. However, a discussion of depth functions for non-standard data is lacking. In this article, we address this gap by using formal concept analysis to obtain a unified data representation. Building on this representation, we then define depth functions for non-standard data. Furthermore, we provide a systematic basis by introducing structural properties using the data representation provided by formal concept analysis. Finally, we embed the generalised Tukey depth into our concept of data depth and analyse it using the introduced structural properties. Thus, this article presents the mathematical formalisation of centrality and outlyingness for non-standard data and increases the number of spaces in which centrality can be discussed. In particular, we provide a basis for defining further depth functions and statistical inference methods for non-standard data.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"205 ","pages":"Article 105372"},"PeriodicalIF":1.4,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142327125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scaled envelope models for multivariate time series 多变量时间序列的标度包络模型
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-09-13 DOI: 10.1016/j.jmva.2024.105370
H.M. Wiranthe B. Herath , S. Yaser Samadi

Vector autoregressive (VAR) models have become a popular choice for modeling multivariate time series data due to their simplicity and ease of use. Efficient estimation of VAR coefficients is an important problem. The envelope technique for VAR models is demonstrated to have the potential to yield significant gains in efficiency and accuracy by incorporating linear combinations of the response vector that are essentially immaterial to the estimation of the VAR coefficients. However, inferences based on envelope VAR (EVAR) models are not invariant or equivariant upon the rescaling of the VAR responses, limiting their application to time series data that are measured in the same or similar units. In scenarios where VAR responses are measured on different scales, the efficiency improvements promised by envelopes are not always guaranteed. To address this limitation, we introduce the scaled envelope VAR (SEVAR) model, which preserves the efficiency-boosting capabilities of standard envelope techniques while remaining invariant to scale changes. The asymptotic characteristics of the proposed estimators are established based on different error assumptions. Simulation studies and real-data analysis are conducted to demonstrate the efficiency and effectiveness of the proposed model. The numerical results corroborate our theoretical findings.

向量自回归(VAR)模型因其简单易用,已成为多变量时间序列数据建模的热门选择。有效估计 VAR 系数是一个重要问题。VAR 模型的包络技术通过纳入对 VAR 系数估计基本无关紧要的响应向量的线性组合,被证明具有显著提高效率和准确性的潜力。然而,基于包络 VAR(EVAR)模型的推论在对 VAR 响应进行重新缩放时并不不变或等变,这限制了其在以相同或相似单位测量的时间序列数据中的应用。在 VAR 响应以不同尺度测量的情况下,包络所承诺的效率改进并不总是有保证的。为了解决这一局限性,我们引入了缩放包络 VAR(SEVAR)模型,它既保留了标准包络技术的效率提升功能,又不受尺度变化的影响。基于不同的误差假设,建立了所提出估计器的渐近特性。通过仿真研究和实际数据分析,证明了所提模型的效率和有效性。数值结果证实了我们的理论发现。
{"title":"Scaled envelope models for multivariate time series","authors":"H.M. Wiranthe B. Herath ,&nbsp;S. Yaser Samadi","doi":"10.1016/j.jmva.2024.105370","DOIUrl":"10.1016/j.jmva.2024.105370","url":null,"abstract":"<div><p>Vector autoregressive (VAR) models have become a popular choice for modeling multivariate time series data due to their simplicity and ease of use. Efficient estimation of VAR coefficients is an important problem. The envelope technique for VAR models is demonstrated to have the potential to yield significant gains in efficiency and accuracy by incorporating linear combinations of the response vector that are essentially immaterial to the estimation of the VAR coefficients. However, inferences based on envelope VAR (EVAR) models are not invariant or equivariant upon the rescaling of the VAR responses, limiting their application to time series data that are measured in the same or similar units. In scenarios where VAR responses are measured on different scales, the efficiency improvements promised by envelopes are not always guaranteed. To address this limitation, we introduce the scaled envelope VAR (SEVAR) model, which preserves the efficiency-boosting capabilities of standard envelope techniques while remaining invariant to scale changes. The asymptotic characteristics of the proposed estimators are established based on different error assumptions. Simulation studies and real-data analysis are conducted to demonstrate the efficiency and effectiveness of the proposed model. The numerical results corroborate our theoretical findings.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"205 ","pages":"Article 105370"},"PeriodicalIF":1.4,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0047259X24000770/pdfft?md5=bcb10a9c98d350b55789c52bc615d145&pid=1-s2.0-S0047259X24000770-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142240404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A bias-corrected Srivastava-type test for cross-sectional independence 经偏差校正的斯里瓦斯塔瓦式横截面独立性检验
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-09-13 DOI: 10.1016/j.jmva.2024.105371
Kai Xu , Mingxiang Cao , Qing Cheng

This paper proposes a test for cross-sectional independence with high dimensional panel data. It uses the random matrix theory based approach of Srivastava (2005) in the presence of a large number of cross-sectional units and time series observations. Because the errors are unobservable, the residuals from the regression model for panel data are used. We develop a bias-corrected test after adjusting for the contribution from the regressors. With the aid of the martingale central limit theorem, we prove that the limiting null distribution of the proposed test statistic is normal under mild conditions as cross-sectional dimension and time dimension go to infinity together. We further study the asymptotic relative efficiency of our proposed test with respect to the state-of-art Lagrange multiplier test. An interesting finding is that the newly proposed test can have substantial power gain when the underlying variance magnitudes are not identical across different units.

本文提出了一种利用高维面板数据检验横截面独立性的方法。在存在大量横截面单位和时间序列观测值的情况下,它使用了 Srivastava(2005)基于随机矩阵理论的方法。由于误差是不可观测的,因此使用了面板数据回归模型的残差。在对回归因子的贡献进行调整后,我们开发了偏差校正检验。借助马氏中心极限定理,我们证明了在温和条件下,当横截面维度和时间维度同时达到无穷大时,所提出的检验统计量的极限零分布是正态分布。我们进一步研究了我们提出的检验与最先进的拉格朗日乘数检验的渐进相对效率。一个有趣的发现是,当不同单位的基本方差大小不完全相同时,新提出的检验可以获得很大的功率增益。
{"title":"A bias-corrected Srivastava-type test for cross-sectional independence","authors":"Kai Xu ,&nbsp;Mingxiang Cao ,&nbsp;Qing Cheng","doi":"10.1016/j.jmva.2024.105371","DOIUrl":"10.1016/j.jmva.2024.105371","url":null,"abstract":"<div><p>This paper proposes a test for cross-sectional independence with high dimensional panel data. It uses the random matrix theory based approach of Srivastava (2005) in the presence of a large number of cross-sectional units and time series observations. Because the errors are unobservable, the residuals from the regression model for panel data are used. We develop a bias-corrected test after adjusting for the contribution from the regressors. With the aid of the martingale central limit theorem, we prove that the limiting null distribution of the proposed test statistic is normal under mild conditions as cross-sectional dimension and time dimension go to infinity together. We further study the asymptotic relative efficiency of our proposed test with respect to the state-of-art Lagrange multiplier test. An interesting finding is that the newly proposed test can have substantial power gain when the underlying variance magnitudes are not identical across different units.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"205 ","pages":"Article 105371"},"PeriodicalIF":1.4,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0047259X24000782/pdfft?md5=792309b6f97ca51742555998cfec1771&pid=1-s2.0-S0047259X24000782-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142240405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Invariant correlation under marginal transforms 边际变换下的不变相关性
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-08-30 DOI: 10.1016/j.jmva.2024.105361
Takaaki Koike , Liyuan Lin , Ruodu Wang

A useful property of independent samples is that their correlation remains the same after applying marginal transforms. This invariance property plays a fundamental role in statistical inference, but does not hold in general for dependent samples. In this paper, we study this invariance property on the Pearson correlation coefficient and its applications. A multivariate random vector is said to have an invariant correlation if its pairwise correlation coefficients remain unchanged under any common marginal transforms. For a bivariate case, we characterize all models of such a random vector via a certain combination of comonotonicity—the strongest form of positive dependence—and independence. In particular, we show that the class of exchangeable copulas with invariant correlation is precisely described by what we call positive Fréchet copulas. In the general multivariate case, we characterize the set of all invariant correlation matrices via the clique partition polytope. We also propose a positive regression dependent model that admits any prescribed invariant correlation matrix. Finally, we show that all our characterization results of invariant correlation, except one special case, remain the same if the common marginal transforms are confined to the set of increasing ones.

独立样本的一个有用特性是,在应用边际变换后,它们的相关性保持不变。这一不变量特性在统计推断中发挥着重要作用,但对于因变量样本来说,这一不变量特性一般并不成立。本文将研究皮尔逊相关系数的这一不变性质及其应用。如果一个多变量随机向量的成对相关系数在任何常见边际变换下保持不变,则称该向量具有不变量相关性。对于双变量情况,我们通过一定的协整性--正相关性的最强形式--和独立性的结合来描述这种随机向量的所有模型。特别是,我们证明了具有不变相关性的可交换协方差的类别正是由我们称之为正弗雷谢特协方差所描述的。在一般多变量情况下,我们通过簇分区多面体描述了所有不变相关矩阵的集合。我们还提出了一种正回归依赖模型,它允许任何规定的不变相关矩阵。最后,我们证明,如果将公共边际变换限制在递增变换集合中,那么除了一种特殊情况外,我们对不变相关性的所有表征结果都保持不变。
{"title":"Invariant correlation under marginal transforms","authors":"Takaaki Koike ,&nbsp;Liyuan Lin ,&nbsp;Ruodu Wang","doi":"10.1016/j.jmva.2024.105361","DOIUrl":"10.1016/j.jmva.2024.105361","url":null,"abstract":"<div><p>A useful property of independent samples is that their correlation remains the same after applying marginal transforms. This invariance property plays a fundamental role in statistical inference, but does not hold in general for dependent samples. In this paper, we study this invariance property on the Pearson correlation coefficient and its applications. A multivariate random vector is said to have an invariant correlation if its pairwise correlation coefficients remain unchanged under any common marginal transforms. For a bivariate case, we characterize all models of such a random vector via a certain combination of comonotonicity—the strongest form of positive dependence—and independence. In particular, we show that the class of exchangeable copulas with invariant correlation is precisely described by what we call positive Fréchet copulas. In the general multivariate case, we characterize the set of all invariant correlation matrices via the clique partition polytope. We also propose a positive regression dependent model that admits any prescribed invariant correlation matrix. Finally, we show that all our characterization results of invariant correlation, except one special case, remain the same if the common marginal transforms are confined to the set of increasing ones.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"204 ","pages":"Article 105361"},"PeriodicalIF":1.4,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0047259X2400068X/pdfft?md5=87348d6db627c38f7dec7cb4cd435464&pid=1-s2.0-S0047259X2400068X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142122196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Grouped feature screening for ultrahigh-dimensional classification via Gini distance correlation 通过基尼距离相关性对超高维分类进行分组特征筛选
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-08-17 DOI: 10.1016/j.jmva.2024.105360
Yongli Sang , Xin Dang

Gini distance correlation (GDC) was recently proposed to measure the dependence between a categorical variable, Y, and a numerical random vector, X. It mutually characterizes independence between X and Y. In this article, we utilize the GDC to establish a feature screening for ultrahigh-dimensional discriminant analysis where the response variable is categorical. It can be used for screening individual features as well as grouped features. The proposed procedure possesses several appealing properties. It is model-free. No model specification is needed. It holds the sure independence screening property and the ranking consistency property. The proposed screening method can also deal with the case that the response has divergent number of categories. We conduct several Monte Carlo simulation studies to examine the finite sample performance of the proposed screening procedure. Real data analysis for two real life datasets are illustrated.

基尼距离相关性(Gini distance correlation,GDC)是最近提出的一种测量分类变量 Y 与数值随机向量 X 之间依赖关系的方法。它既可用于筛选单个特征,也可用于筛选分组特征。所提出的程序具有几个吸引人的特性。无模型。无需模型规范。它具有确定的独立性筛选属性和排序一致性属性。所提出的筛选方法还能处理响应类别数量不一的情况。我们进行了多项蒙特卡罗模拟研究,以检验所提出的筛选程序的有限样本性能。我们还对两个真实数据集进行了实际数据分析。
{"title":"Grouped feature screening for ultrahigh-dimensional classification via Gini distance correlation","authors":"Yongli Sang ,&nbsp;Xin Dang","doi":"10.1016/j.jmva.2024.105360","DOIUrl":"10.1016/j.jmva.2024.105360","url":null,"abstract":"<div><p>Gini distance correlation (GDC) was recently proposed to measure the dependence between a categorical variable, <span><math><mi>Y</mi></math></span>, and a numerical random vector, <span><math><mi>X</mi></math></span>. It mutually characterizes independence between <span><math><mi>X</mi></math></span> and <span><math><mi>Y</mi></math></span>. In this article, we utilize the GDC to establish a feature screening for ultrahigh-dimensional discriminant analysis where the response variable is categorical. It can be used for screening individual features as well as grouped features. The proposed procedure possesses several appealing properties. It is model-free. No model specification is needed. It holds the sure independence screening property and the ranking consistency property. The proposed screening method can also deal with the case that the response has divergent number of categories. We conduct several Monte Carlo simulation studies to examine the finite sample performance of the proposed screening procedure. Real data analysis for two real life datasets are illustrated.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"204 ","pages":"Article 105360"},"PeriodicalIF":1.4,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142088010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The inner partial least square: An exploration of the “necessary” dimension reduction 内部分最小平方:对 "必要 "降维的探索
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-08-14 DOI: 10.1016/j.jmva.2024.105356
Yunjian Yin, Lan Liu

The partial least square (PLS) algorithm retains the combinations of predictors that maximize the covariance with the outcome. Cook et al. (2013) showed that PLS results in a predictor envelope, which is the smallest reducing subspace of predictors’ covariance that contains the coefficient. However, PLS and predictor envelope both target at a space that contains the regression coefficients and therefore they may sometimes be too conservative to reduce the dimension of the predictors. In this paper, we propose a new method that may improve the estimation efficiency of regression coefficients when both PLS and predictor envelope fail to do so. Specifically, our method results in the largest reducing subspace of predictors’ covariance that is contained in the coefficient matrix space. Interestingly, the moment based algorithm of our proposed method can be achieved by changing the max in PLS to min. We define the modified PLS as the inner PLS and the resulting space as the inner predictor envelope space. We provide the theoretical properties of our proposed methods as well as demonstrate their use in China Health and Nutrition Survey.

偏最小二乘法(PLS)算法保留了与结果协方差最大的预测因子组合。Cook 等人(2013 年)的研究表明,偏最小二乘法会产生一个预测因子包络,它是包含系数的预测因子协方差的最小还原子空间。然而,PLS 和预测因子包络都以包含回归系数的空间为目标,因此它们有时在降低预测因子维度方面可能过于保守。在本文中,我们提出了一种新方法,当 PLS 和预测包络都无法提高回归系数的估计效率时,这种方法可以提高估计效率。具体来说,我们的方法可以获得预测因子协方差的最大还原子空间,该空间包含在系数矩阵空间中。有趣的是,我们提出的基于矩的算法可以通过将 PLS 中的最大值改为最小值来实现。我们将修改后的 PLS 定义为内部 PLS,并将由此产生的空间定义为内部预测包络空间。我们提供了所提方法的理论特性,并演示了这些方法在中国健康与营养调查中的应用。
{"title":"The inner partial least square: An exploration of the “necessary” dimension reduction","authors":"Yunjian Yin,&nbsp;Lan Liu","doi":"10.1016/j.jmva.2024.105356","DOIUrl":"10.1016/j.jmva.2024.105356","url":null,"abstract":"<div><p>The partial least square (PLS) algorithm retains the combinations of predictors that maximize the covariance with the outcome. Cook et al. (2013) showed that PLS results in a predictor envelope, which is the smallest reducing subspace of predictors’ covariance that contains the coefficient. However, PLS and predictor envelope both target at a space that contains the regression coefficients and therefore they may sometimes be too conservative to reduce the dimension of the predictors. In this paper, we propose a new method that may improve the estimation efficiency of regression coefficients when both PLS and predictor envelope fail to do so. Specifically, our method results in the largest reducing subspace of predictors’ covariance that is contained in the coefficient matrix space. Interestingly, the moment based algorithm of our proposed method can be achieved by changing the max in PLS to min. We define the modified PLS as the inner PLS and the resulting space as the inner predictor envelope space. We provide the theoretical properties of our proposed methods as well as demonstrate their use in China Health and Nutrition Survey.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"204 ","pages":"Article 105356"},"PeriodicalIF":1.4,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142083637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross projection test for mean vectors via multiple random splits in high dimensions 通过高维多重随机分割对均值向量进行交叉投影测试
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-08-06 DOI: 10.1016/j.jmva.2024.105358
Guanpeng Wang , Jiujing Wu , Hengjian Cui

The cross projection test (CPT) technique is extended to high-dimensional two-sample mean tests in this article, which was first proposed by Wang and Cui (2024). A data-splitting strategy is required to find the projection directions that reduce the data from high dimensional space to low dimensional space which can well solve the issue of “the curse of dimensionality”. As long as both samples are randomly split once, two correlated cross projection statistics can be established according to the CPT development mechanism, which is similar to all constructed test statistics that exist the correlation caused by multiple random splits. To deal with this issue and improve the performance of empirical powers by eliminating the randomness of data-splitting, we further utilize a powerful Cauchy combination test algorithm based on multiple data-splitting. Theoretically, we prove the asymptotic property of the proposed test statistic. Furthermore, for the sparse alternative case, we apply the power enhancement technique to the ensemble Cauchy combination test-based algorithm in marginal screening for the full data. Numerical studies through Monte Carlo simulations and two real data examples are conducted simultaneously to illustrate the utility of our proposed ensemble algorithm.

本文将交叉投影检验(CPT)技术扩展到高维双样本均值检验中,该技术由 Wang 和 Cui(2024 年)首次提出。需要采用数据分割策略来找到将数据从高维空间缩小到低维空间的投影方向,这可以很好地解决 "维度诅咒 "问题。只要将两个样本随机拆分一次,就可以根据 CPT 开发机制建立两个相关的交叉投影统计量,这与所有构造检验统计量类似,都存在多次随机拆分造成的相关性。为了解决这一问题,并通过消除数据拆分的随机性来提高经验幂的性能,我们进一步利用了基于多次数据拆分的强大的考奇组合检验算法。我们从理论上证明了所提出的检验统计量的渐近特性。此外,对于稀疏替代情况,我们将功率增强技术应用于基于集合考奇组合检验算法的边际筛选中,以获得完整数据。我们同时通过蒙特卡罗模拟和两个真实数据实例进行了数值研究,以说明我们提出的集合算法的实用性。
{"title":"Cross projection test for mean vectors via multiple random splits in high dimensions","authors":"Guanpeng Wang ,&nbsp;Jiujing Wu ,&nbsp;Hengjian Cui","doi":"10.1016/j.jmva.2024.105358","DOIUrl":"10.1016/j.jmva.2024.105358","url":null,"abstract":"<div><p>The cross projection test (CPT) technique is extended to high-dimensional two-sample mean tests in this article, which was first proposed by Wang and Cui (2024). A data-splitting strategy is required to find the projection directions that reduce the data from high dimensional space to low dimensional space which can well solve the issue of “the curse of dimensionality”. As long as both samples are randomly split once, two correlated cross projection statistics can be established according to the CPT development mechanism, which is similar to all constructed test statistics that exist the correlation caused by multiple random splits. To deal with this issue and improve the performance of empirical powers by eliminating the randomness of data-splitting, we further utilize a powerful Cauchy combination test algorithm based on multiple data-splitting. Theoretically, we prove the asymptotic property of the proposed test statistic. Furthermore, for the sparse alternative case, we apply the power enhancement technique to the ensemble Cauchy combination test-based algorithm in marginal screening for the full data. Numerical studies through Monte Carlo simulations and two real data examples are conducted simultaneously to illustrate the utility of our proposed ensemble algorithm.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"204 ","pages":"Article 105358"},"PeriodicalIF":1.4,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142020744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian covariance structure modeling of interval-censored multi-way nested survival data 区间删失多向嵌套生存数据的贝叶斯协方差结构建模
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-08-06 DOI: 10.1016/j.jmva.2024.105359
Stef Baas , Jean-Paul Fox , Richard J. Boucherie

A Bayesian covariance structure model (BCSM) is proposed for interval-censored multi-way nested survival data. This flexible modeling framework generalizes mixed effects survival models by allowing positive and negative associations among clustered observations. Conjugate shifted-inverse gamma priors are proposed for the covariance parameters, implying inverse gamma priors for the eigenvalues of the covariance matrix, which ensures a positive definite covariance matrix under posterior analysis. A numerically efficient Gibbs sampling procedure is defined for balanced nested designs. This requires sampling latent variables from their marginal full conditional distributions, which are derived through a recursive formula. This makes the estimation procedure suitable for interval-censored data with large cluster sizes. For unbalanced nested designs, a novel (balancing) data augmentation procedure is introduced to improve the efficiency of the Gibbs sampler. The Gibbs sampling procedure is validated in two simulation studies. The linear transformation BCSM (LT-BCSM) was applied to two-way nested interval-censored event times to analyze differences in adverse events between three groups of patients, who were randomly allocated to treatment with different stents (BIO-RESORT). The parameters of the structured covariance matrix represented unobserved heterogeneity in treatment effects and were examined to detect differential treatment effects. A comparison was made with inference results under a random effects linear transformation model. It was concluded that the LT-BCSM led to inferences with higher posterior credibility, a more profound way of quantifying evidence for risk equivalence of the three treatments, and it was more robust to prior specifications.

针对区间删失多向嵌套生存数据提出了贝叶斯协方差结构模型(BCSM)。这种灵活的建模框架通过允许聚类观测值之间的正负关联,对混合效应生存模型进行了概括。为协方差参数提出了共轭移位逆伽马先验,这意味着为协方差矩阵的特征值提出了逆伽马先验,从而确保在后验分析中协方差矩阵为正定值。为平衡嵌套设计定义了一种数值高效吉布斯采样程序。这需要从潜在变量的边际全条件分布中抽样,而边际全条件分布是通过递归公式推导出来的。这使得该估计程序适用于具有较大聚类规模的区间删失数据。对于非平衡嵌套设计,引入了一种新颖的(平衡)数据扩增程序,以提高吉布斯采样器的效率。Gibbs 采样程序在两项模拟研究中得到了验证。将线性变换 BCSM(LT-BCSM)应用于双向嵌套间隔删失事件时间,以分析随机分配到不同支架治疗的三组患者之间不良事件的差异(BIO-RESORT)。结构化协方差矩阵的参数代表了治疗效果中未观察到的异质性,通过检验这些参数可以发现不同的治疗效果。与随机效应线性变换模型下的推断结果进行了比较。得出的结论是,LT-BCSM 得出的推论具有更高的后验可信度,是量化三种治疗方法风险等同性证据的一种更深刻的方法,而且它对先验规格更稳健。
{"title":"Bayesian covariance structure modeling of interval-censored multi-way nested survival data","authors":"Stef Baas ,&nbsp;Jean-Paul Fox ,&nbsp;Richard J. Boucherie","doi":"10.1016/j.jmva.2024.105359","DOIUrl":"10.1016/j.jmva.2024.105359","url":null,"abstract":"<div><p>A Bayesian covariance structure model (BCSM) is proposed for interval-censored multi-way nested survival data. This flexible modeling framework generalizes mixed effects survival models by allowing positive and negative associations among clustered observations. Conjugate shifted-inverse gamma priors are proposed for the covariance parameters, implying inverse gamma priors for the eigenvalues of the covariance matrix, which ensures a positive definite covariance matrix under posterior analysis. A numerically efficient Gibbs sampling procedure is defined for balanced nested designs. This requires sampling latent variables from their marginal full conditional distributions, which are derived through a recursive formula. This makes the estimation procedure suitable for interval-censored data with large cluster sizes. For unbalanced nested designs, a novel (balancing) data augmentation procedure is introduced to improve the efficiency of the Gibbs sampler. The Gibbs sampling procedure is validated in two simulation studies. The linear transformation BCSM (LT-BCSM) was applied to two-way nested interval-censored event times to analyze differences in adverse events between three groups of patients, who were randomly allocated to treatment with different stents (BIO-RESORT). The parameters of the structured covariance matrix represented unobserved heterogeneity in treatment effects and were examined to detect differential treatment effects. A comparison was made with inference results under a random effects linear transformation model. It was concluded that the LT-BCSM led to inferences with higher posterior credibility, a more profound way of quantifying evidence for risk equivalence of the three treatments, and it was more robust to prior specifications.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"204 ","pages":"Article 105359"},"PeriodicalIF":1.4,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0047259X24000666/pdfft?md5=ba8eccdffa71a651c495cfe20091f2f0&pid=1-s2.0-S0047259X24000666-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141993461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conjugacy properties of multivariate unified skew-elliptical distributions 多元统一斜椭圆分布的共轭特性
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-08-06 DOI: 10.1016/j.jmva.2024.105357
Maicon J. Karling , Daniele Durante , Marc G. Genton

The family of multivariate unified skew-normal (SUN) distributions has been recently shown to possess fundamental conjugacy properties. When used as priors for the vector of coefficients in probit, tobit, and multinomial probit models, these distributions yield posteriors that still belong to the SUN family. Although this result has led to important advancements in Bayesian inference and computation, its applicability beyond likelihoods associated with fully-observed, discretized, or censored realizations from multivariate Gaussian models remains yet unexplored. This article covers such a gap by proving that the wider family of multivariate unified skew-elliptical (SUE) distributions, which extends SUNs to more general perturbations of elliptical densities, guarantees conjugacy for broader classes of models, beyond those relying on fully-observed, discretized or censored Gaussians. Such a result leverages the closure under linear combinations, conditioning and marginalization of SUE to prove that this family is conjugate to the likelihood induced by regression models for fully-observed, censored or dichotomized realizations from skew-elliptical distributions. This key advancement enlarges the set of models that enable conjugate Bayesian inference to general formulations arising from elliptical and skew-elliptical families, including the multivariate Student’s t and skew-t, among others.

多变量统一偏正态分布(SUN)族最近被证明具有基本的共轭特性。当用作 probit、tobit 和多叉 probit 模型中系数向量的先验时,这些分布产生的后验仍属于 SUN 系列。尽管这一结果在贝叶斯推理和计算方面取得了重要进展,但它在多变量高斯模型中与完全观测、离散化或删减实现相关的似然之外的适用性仍有待探索。本文通过证明更广泛的多元统一偏斜-椭圆(SUE)分布系列(将 SUNs 扩展到更一般的椭圆密度扰动)来弥补这一空白,从而保证了更广泛类别模型的共轭性,而不仅仅是那些依赖于完全观测、离散化或剔除的高斯模型。这一结果利用了 SUE 的线性组合、条件和边际化下的封闭性,证明该系列与完全观测、离散化或二分化的偏椭圆分布实现的回归模型所诱导的似然共轭。这一重要进展扩大了可进行共轭贝叶斯推断的模型集合,使其适用于椭圆和偏斜-椭圆族的一般公式,包括多元 Student's t 和 skew-t 等。
{"title":"Conjugacy properties of multivariate unified skew-elliptical distributions","authors":"Maicon J. Karling ,&nbsp;Daniele Durante ,&nbsp;Marc G. Genton","doi":"10.1016/j.jmva.2024.105357","DOIUrl":"10.1016/j.jmva.2024.105357","url":null,"abstract":"<div><p>The family of multivariate unified skew-normal (SUN) distributions has been recently shown to possess fundamental conjugacy properties. When used as priors for the vector of coefficients in probit, tobit, and multinomial probit models, these distributions yield posteriors that still belong to the SUN family. Although this result has led to important advancements in Bayesian inference and computation, its applicability beyond likelihoods associated with fully-observed, discretized, or censored realizations from multivariate Gaussian models remains yet unexplored. This article covers such a gap by proving that the wider family of multivariate unified skew-elliptical (SUE) distributions, which extends SUNs to more general perturbations of elliptical densities, guarantees conjugacy for broader classes of models, beyond those relying on fully-observed, discretized or censored Gaussians. Such a result leverages the closure under linear combinations, conditioning and marginalization of SUE to prove that this family is conjugate to the likelihood induced by regression models for fully-observed, censored or dichotomized realizations from skew-elliptical distributions. This key advancement enlarges the set of models that enable conjugate Bayesian inference to general formulations arising from elliptical and skew-elliptical families, including the multivariate Student’s <span><math><mi>t</mi></math></span> and skew-<span><math><mi>t</mi></math></span>, among others.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"204 ","pages":"Article 105357"},"PeriodicalIF":1.4,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142148580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mean and covariance estimation for discretely observed high-dimensional functional data: Rates of convergence and division of observational regimes 离散观测高维函数数据的均值和协方差估计:收敛速度和观测制度的划分
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-08-05 DOI: 10.1016/j.jmva.2024.105355
Alexander Petersen

Estimation of the mean and covariance parameters for functional data is a critical task, with local linear smoothing being a popular choice. In recent years, many scientific domains are producing multivariate functional data for which p, the number of curves per subject, is often much larger than the sample size n. In this setting of high-dimensional functional data, much of developed methodology relies on preliminary estimates of the unknown mean functions and the auto- and cross-covariance functions. This paper investigates the convergence rates of local linear estimators in terms of the maximal error across components and pairs of components for mean and covariance functions, respectively, in both L2 and uniform metrics. The local linear estimators utilize a generic weighting scheme that can adjust for differing numbers of discrete observations Nij across curves j and subjects i, where the Nij vary with n. Particular attention is given to the equal weight per observation (OBS) and equal weight per subject (SUBJ) weighting schemes. The theoretical results utilize novel applications of concentration inequalities for functional data and demonstrate that, similar to univariate functional data, the order of the Nij relative to p and n divides high-dimensional functional data into three regimes (sparse, dense, and ultra-dense), with the high-dimensional parametric convergence rate of log(p)/n1/2 being attainable in the latter two.

估计功能数据的均值和协方差参数是一项关键任务,而局部线性平滑是一种常用的选择。近年来,许多科学领域正在产生多变量函数数据,其中每个受试者的曲线数 p 往往远大于样本数 n。在这种高维函数数据设置中,许多已开发的方法依赖于对未知均值函数以及自协方差和交协方差函数的初步估计。本文研究了局部线性估计器的收敛率,即在 L2 和均匀度量下,分别对均值函数和协方差函数的跨分量和成对分量的最大误差进行估计。局部线性估计器采用通用加权方案,该方案可以调整曲线 j 和受试者 i 之间不同数量的离散观测值 Nij,其中 Nij 随 n 变化。理论结果利用了函数数据集中不等式的新应用,并证明了与单变量函数数据类似,Nij 相对于 p 和 n 的阶数将高维函数数据分为三种情况(稀疏、密集和超密集),在后两种情况下可达到 log(p)/n1/2 的高维参数收敛速率。
{"title":"Mean and covariance estimation for discretely observed high-dimensional functional data: Rates of convergence and division of observational regimes","authors":"Alexander Petersen","doi":"10.1016/j.jmva.2024.105355","DOIUrl":"10.1016/j.jmva.2024.105355","url":null,"abstract":"<div><p>Estimation of the mean and covariance parameters for functional data is a critical task, with local linear smoothing being a popular choice. In recent years, many scientific domains are producing multivariate functional data for which <span><math><mi>p</mi></math></span>, the number of curves per subject, is often much larger than the sample size <span><math><mi>n</mi></math></span>. In this setting of high-dimensional functional data, much of developed methodology relies on preliminary estimates of the unknown mean functions and the auto- and cross-covariance functions. This paper investigates the convergence rates of local linear estimators in terms of the maximal error across components and pairs of components for mean and covariance functions, respectively, in both <span><math><msup><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> and uniform metrics. The local linear estimators utilize a generic weighting scheme that can adjust for differing numbers of discrete observations <span><math><msub><mrow><mi>N</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub></math></span> across curves <span><math><mi>j</mi></math></span> and subjects <span><math><mi>i</mi></math></span>, where the <span><math><msub><mrow><mi>N</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub></math></span> vary with <span><math><mi>n</mi></math></span>. Particular attention is given to the equal weight per observation (OBS) and equal weight per subject (SUBJ) weighting schemes. The theoretical results utilize novel applications of concentration inequalities for functional data and demonstrate that, similar to univariate functional data, the order of the <span><math><msub><mrow><mi>N</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub></math></span> relative to <span><math><mi>p</mi></math></span> and <span><math><mi>n</mi></math></span> divides high-dimensional functional data into three regimes (sparse, dense, and ultra-dense), with the high-dimensional parametric convergence rate of <span><math><msup><mrow><mfenced><mrow><mo>log</mo><mrow><mo>(</mo><mi>p</mi><mo>)</mo></mrow><mo>/</mo><mi>n</mi></mrow></mfenced></mrow><mrow><mn>1</mn><mo>/</mo><mn>2</mn></mrow></msup></math></span> being attainable in the latter two.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"204 ","pages":"Article 105355"},"PeriodicalIF":1.4,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141953976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Multivariate Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1