首页 > 最新文献

Journal of Multivariate Analysis最新文献

英文 中文
Generalized implementation of invariant coordinate selection with positive semi-definite scatter matrices 正半定散点矩阵不变量坐标选择的广义实现
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-01-01 Epub Date: 2025-11-08 DOI: 10.1016/j.jmva.2025.105520
Aurore Archimbaud
Invariant coordinate selection is an unsupervised multivariate data transformation useful in many contexts such as outlier detection or clustering. It is based on the simultaneous diagonalization of two affine equivariant and positive definite scatter matrices. Its classical implementation relies on a non-symmetric eigenvalue problem by diagonalizing one scatter relatively to the other. In case of collinearity, at least one of the scatter matrices is singular, making the problem unsolvable. To address this limitation, three approaches are proposed using: a Moore–Penrose pseudo inverse, a dimension reduction, and a generalized singular value decomposition. Their properties are investigated both theoretically and through various empirical applications. Overall, the extension based on the generalized singular value decomposition seems the most promising, even though it restricts the choice of scatter matrices to those that can be expressed as cross-products. In practice, some of the approaches also appear suitable in the context of data in high-dimension low-sample-size data.
不变坐标选择是一种无监督的多变量数据变换,在离群点检测或聚类等很多情况下都很有用。它基于两个仿射等变正定散射矩阵的同时对角化。它的经典实现依赖于一个非对称的特征值问题,通过对角化一个散射相对于另一个。在共线性的情况下,至少有一个散点矩阵是奇异的,使得问题不可解。为了解决这一限制,提出了三种方法:Moore-Penrose伪逆,降维和广义奇异值分解。它们的性质在理论上和通过各种经验应用进行了研究。总的来说,基于广义奇异值分解的扩展似乎是最有前途的,尽管它将散点矩阵的选择限制在那些可以表示为叉积的散点矩阵上。在实践中,有些方法似乎也适用于高维低样本数据的数据背景。
{"title":"Generalized implementation of invariant coordinate selection with positive semi-definite scatter matrices","authors":"Aurore Archimbaud","doi":"10.1016/j.jmva.2025.105520","DOIUrl":"10.1016/j.jmva.2025.105520","url":null,"abstract":"<div><div>Invariant coordinate selection is an unsupervised multivariate data transformation useful in many contexts such as outlier detection or clustering. It is based on the simultaneous diagonalization of two affine equivariant and positive definite scatter matrices. Its classical implementation relies on a non-symmetric eigenvalue problem by diagonalizing one scatter relatively to the other. In case of collinearity, at least one of the scatter matrices is singular, making the problem unsolvable. To address this limitation, three approaches are proposed using: a Moore–Penrose pseudo inverse, a dimension reduction, and a generalized singular value decomposition. Their properties are investigated both theoretically and through various empirical applications. Overall, the extension based on the generalized singular value decomposition seems the most promising, even though it restricts the choice of scatter matrices to those that can be expressed as cross-products. In practice, some of the approaches also appear suitable in the context of data in high-dimension low-sample-size data.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105520"},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ultra-high dimensional semiparametric dynamic high-order spatial autoregressive models 超高维半参数动态高阶空间自回归模型
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-01-01 Epub Date: 2025-10-30 DOI: 10.1016/j.jmva.2025.105516
Feng Luo , Hongxia Xu , Guoliang Fan , Liping Zhu
Motivated by the need to effectively characterize complex spatial dependencies inherent in ultra-high dimensional data, this paper develops a sparse semiparametric framework for modeling dynamic high-order spatial autoregressive processes. In this framework, the number of covariates in the linear component grows at a rate much faster than the sample size under a sparsity assumption, whereas the nonparametric component remains of fixed dimension. The varying coefficients are approximated using B-spline basis functions. To address the endogeneity arising from spatial lag terms, two-stage sieve least squares together with instrumental variable methods are employed. We investigate the theoretical properties of the oracle estimator, assuming that the true sparsity structure is known, and establish its convergence rates and asymptotic normality. Further, we propose a nonconvex penalized estimation procedure that simultaneously performs variable selection and estimates both the linear and spatial autoregressive parameters, and we show that it possesses the oracle property under mild conditions. The effectiveness of the proposed method is demonstrated through simulation studies and an empirical application to the Communities and Crime data set from the UCI Machine Learning Repository.
由于需要有效表征超高维数据中固有的复杂空间依赖性,本文开发了一种用于建模动态高阶空间自回归过程的稀疏半参数框架。在此框架中,在稀疏性假设下,线性分量中的协变量数量以比样本量快得多的速度增长,而非参数分量保持固定维数。用b样条基函数逼近变系数。为了解决空间滞后项引起的内生性问题,采用了两阶段筛最小二乘法和工具变量法。在真实稀疏结构已知的情况下,研究了oracle估计量的理论性质,建立了它的收敛速率和渐近正态性。进一步,我们提出了一种非凸惩罚估计方法,该方法同时进行变量选择并估计线性和空间自回归参数,并且我们证明了它在温和条件下具有预言性。通过模拟研究和对UCI机器学习存储库中的社区和犯罪数据集的经验应用,证明了所提出方法的有效性。
{"title":"Ultra-high dimensional semiparametric dynamic high-order spatial autoregressive models","authors":"Feng Luo ,&nbsp;Hongxia Xu ,&nbsp;Guoliang Fan ,&nbsp;Liping Zhu","doi":"10.1016/j.jmva.2025.105516","DOIUrl":"10.1016/j.jmva.2025.105516","url":null,"abstract":"<div><div>Motivated by the need to effectively characterize complex spatial dependencies inherent in ultra-high dimensional data, this paper develops a sparse semiparametric framework for modeling dynamic high-order spatial autoregressive processes. In this framework, the number of covariates in the linear component grows at a rate much faster than the sample size under a sparsity assumption, whereas the nonparametric component remains of fixed dimension. The varying coefficients are approximated using B-spline basis functions. To address the endogeneity arising from spatial lag terms, two-stage sieve least squares together with instrumental variable methods are employed. We investigate the theoretical properties of the oracle estimator, assuming that the true sparsity structure is known, and establish its convergence rates and asymptotic normality. Further, we propose a nonconvex penalized estimation procedure that simultaneously performs variable selection and estimates both the linear and spatial autoregressive parameters, and we show that it possesses the oracle property under mild conditions. The effectiveness of the proposed method is demonstrated through simulation studies and an empirical application to the Communities and Crime data set from the UCI Machine Learning Repository.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105516"},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145416650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
tSNE-Spec: A new classification method for multivariate time series data tSNE-Spec:一种新的多元时间序列数据分类方法
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-01-01 Epub Date: 2025-11-08 DOI: 10.1016/j.jmva.2025.105537
Shubhajit Sen , Soudeep Deb
Classification of multivariate time series (MTS) data has applications in various domains, for example, medical sciences, finance, sports analytics, etc. In this work, we propose a new technique that uses the advantages of dimension reduction through the t-distributed stochastic neighbor embedding (t-SNE) method, coupled with the attractive properties of the spectral density estimates of a time series, and k-nearest neighbor algorithm. We transform each MTS to a lower dimensional time series using t-SNE, making it useful for visualizing and retaining the temporal patterns, and subsequently use that in classification. Then, we extend the standard univariate spectral density-based classification in the multivariate setting and prove its theoretical consistency. Empirically, at first, we establish that the pairwise structure of the multivariate spectral density based distance matrix is retained in the t-SNE transformed spectral density-based distance calculation method, thus indicating that the consistency derived based on multivariate spectral density is transferable to our proposed method. The performance of our proposed method is shown by comparing it against other widely used methods and we find that the proposed algorithm achieves superior classification accuracy across various settings. We also demonstrate the superiority of our method in a real-life health dataset where the task is to classify epilepsy seizures from other activities like walking and running based on accelerometer data.
多变量时间序列(MTS)数据的分类在许多领域都有应用,例如医学、金融、体育分析等。在这项工作中,我们提出了一种新技术,该技术利用了通过t分布随机邻居嵌入(t-SNE)方法降维的优势,加上时间序列的谱密度估计的吸引特性,以及k最近邻算法。我们使用t-SNE将每个MTS转换为较低维度的时间序列,使其有助于可视化和保留时间模式,并随后将其用于分类。然后,我们将标准的基于单变量谱密度的分类推广到多变量环境中,并证明了其理论一致性。在经验上,我们首先建立了基于多元谱密度的距离矩阵的两两结构在t-SNE变换的基于谱密度的距离计算方法中保持不变,从而表明基于多元谱密度的一致性可以转移到我们提出的方法中。通过与其他广泛使用的方法进行比较,我们发现所提出的算法在各种设置下都具有较高的分类精度。我们还在现实生活中的健康数据集中展示了我们方法的优越性,该数据集的任务是根据加速度计数据将癫痫发作与其他活动(如步行和跑步)进行分类。
{"title":"tSNE-Spec: A new classification method for multivariate time series data","authors":"Shubhajit Sen ,&nbsp;Soudeep Deb","doi":"10.1016/j.jmva.2025.105537","DOIUrl":"10.1016/j.jmva.2025.105537","url":null,"abstract":"<div><div>Classification of multivariate time series (MTS) data has applications in various domains, for example, medical sciences, finance, sports analytics, etc. In this work, we propose a new technique that uses the advantages of dimension reduction through the t-distributed stochastic neighbor embedding (t-SNE) method, coupled with the attractive properties of the spectral density estimates of a time series, and k-nearest neighbor algorithm. We transform each MTS to a lower dimensional time series using t-SNE, making it useful for visualizing and retaining the temporal patterns, and subsequently use that in classification. Then, we extend the standard univariate spectral density-based classification in the multivariate setting and prove its theoretical consistency. Empirically, at first, we establish that the pairwise structure of the multivariate spectral density based distance matrix is retained in the t-SNE transformed spectral density-based distance calculation method, thus indicating that the consistency derived based on multivariate spectral density is transferable to our proposed method. The performance of our proposed method is shown by comparing it against other widely used methods and we find that the proposed algorithm achieves superior classification accuracy across various settings. We also demonstrate the superiority of our method in a real-life health dataset where the task is to classify epilepsy seizures from other activities like walking and running based on accelerometer data.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105537"},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Projection pursuit Bayesian regression for symmetric matrix predictors 对称矩阵预测器的投影寻迹贝叶斯回归
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-01-01 Epub Date: 2025-11-11 DOI: 10.1016/j.jmva.2025.105539
Xiaomeng Ju, Hyung G. Park, Thaddeus Tarpey
This paper develops a novel Bayesian approach for nonlinear regression with symmetric matrix predictors, often used to encode connectivity of different nodes. Unlike methods that vectorize matrices as predictors that result in a large number of model parameters and unstable estimation, we propose a Bayesian multi-index regression method, resulting in a projection-pursuit-type estimator that leverages the structure of matrix-valued predictors. We establish the model identifiability conditions and impose a sparsity-inducing prior on the projection directions for sparse sampling to prevent overfitting and enhance interpretability of the parameter estimates. Posterior inference is conducted through Bayesian backfitting. The performance of the proposed method is evaluated through simulation studies and a case study investigating the relationship between brain connectivity features and cognitive scores.
本文提出了一种具有对称矩阵预测器的非线性回归贝叶斯方法,该方法通常用于对不同节点的连通性进行编码。与将矩阵矢量化作为预测器的方法不同,这些方法会导致大量模型参数和不稳定的估计,我们提出了贝叶斯多指标回归方法,从而产生利用矩阵值预测器结构的投影追踪型估计器。我们建立了模型可识别性条件,并在稀疏采样的投影方向上施加稀疏性诱导先验,以防止过拟合并增强参数估计的可解释性。后验推理通过贝叶斯反拟合进行。通过模拟研究和研究大脑连接特征与认知得分之间关系的案例研究来评估所提出方法的性能。
{"title":"Projection pursuit Bayesian regression for symmetric matrix predictors","authors":"Xiaomeng Ju,&nbsp;Hyung G. Park,&nbsp;Thaddeus Tarpey","doi":"10.1016/j.jmva.2025.105539","DOIUrl":"10.1016/j.jmva.2025.105539","url":null,"abstract":"<div><div>This paper develops a novel Bayesian approach for nonlinear regression with symmetric matrix predictors, often used to encode connectivity of different nodes. Unlike methods that vectorize matrices as predictors that result in a large number of model parameters and unstable estimation, we propose a Bayesian multi-index regression method, resulting in a projection-pursuit-type estimator that leverages the structure of matrix-valued predictors. We establish the model identifiability conditions and impose a sparsity-inducing prior on the projection directions for sparse sampling to prevent overfitting and enhance interpretability of the parameter estimates. Posterior inference is conducted through Bayesian backfitting. The performance of the proposed method is evaluated through simulation studies and a case study investigating the relationship between brain connectivity features and cognitive scores.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105539"},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ICS for complex data with application to outlier detection for density data 复杂数据的ICS及其在密度数据离群值检测中的应用
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-01-01 Epub Date: 2025-11-08 DOI: 10.1016/j.jmva.2025.105522
Camille Mondon , Huong Thi Trinh , Anne Ruiz-Gazen , Christine Thomas-Agnan
Invariant coordinate selection (ICS) is a dimension reduction method, used as a preliminary step for clustering and outlier detection. It has been primarily applied to multivariate data. This work introduces a coordinate-free definition of ICS in an abstract Euclidean space and extends the method to complex data. Functional and distributional data are preprocessed into a finite-dimensional subspace. For example, in the framework of Bayes Hilbert spaces, distributional data are smoothed into compositional spline functions through the Maximum Penalised Likelihood method. We describe an outlier detection procedure for complex data and study the impact of some preprocessing parameters on the results. We compare our approach with other outlier detection methods through simulations, producing promising results in scenarios with a low proportion of outliers. ICS allows detecting abnormal climate events in a sample of daily maximum temperature distributions recorded across the provinces of Northern Vietnam between 1987 and 2016.
不变坐标选择(ICS)是一种降维方法,用于聚类和异常点检测的初步步骤。它主要应用于多变量数据。本文介绍了抽象欧几里得空间中ICS的无坐标定义,并将该方法扩展到复杂数据。函数和分布数据被预处理成一个有限维子空间。例如,在贝叶斯希尔伯特空间的框架中,通过最大惩罚似然方法将分布数据平滑成组合样条函数。我们描述了一个复杂数据的异常值检测过程,并研究了一些预处理参数对结果的影响。我们通过模拟将我们的方法与其他离群值检测方法进行了比较,在离群值比例较低的情况下产生了有希望的结果。ICS可以在1987年至2016年期间越南北部各省记录的日最高温度分布样本中检测异常气候事件。
{"title":"ICS for complex data with application to outlier detection for density data","authors":"Camille Mondon ,&nbsp;Huong Thi Trinh ,&nbsp;Anne Ruiz-Gazen ,&nbsp;Christine Thomas-Agnan","doi":"10.1016/j.jmva.2025.105522","DOIUrl":"10.1016/j.jmva.2025.105522","url":null,"abstract":"<div><div>Invariant coordinate selection (ICS) is a dimension reduction method, used as a preliminary step for clustering and outlier detection. It has been primarily applied to multivariate data. This work introduces a coordinate-free definition of ICS in an abstract Euclidean space and extends the method to complex data. Functional and distributional data are preprocessed into a finite-dimensional subspace. For example, in the framework of Bayes Hilbert spaces, distributional data are smoothed into compositional spline functions through the Maximum Penalised Likelihood method. We describe an outlier detection procedure for complex data and study the impact of some preprocessing parameters on the results. We compare our approach with other outlier detection methods through simulations, producing promising results in scenarios with a low proportion of outliers. ICS allows detecting abnormal climate events in a sample of daily maximum temperature distributions recorded across the provinces of Northern Vietnam between 1987 and 2016.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105522"},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Random correlation matrices generated via partial correlation C-vines 通过部分相关C-vines生成的随机相关矩阵
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-01-01 Epub Date: 2025-11-12 DOI: 10.1016/j.jmva.2025.105519
Harry Joe , Dorota Kurowicka
The method for generating random d×d correlation matrices with a partial correlation C-vine is extended so that each correlation can have a distribution that is asymmetric on (1,1) or on (0,1). With the recursion formulas from the partial correlation C-vine to the correlation matrix, first and second moments can be derived, in the case of the same distribution for each partial correlation in tree of the vine (1<d). Algorithms and conditions are given so that, after a permutation step, all random correlations have a common mean and second moment. The algorithms can be useful for simulation experiments to generate random correlation matrices that cover the whole space or with the restriction that each correlation is positive.
扩展了用偏相关C-vine生成随机d×d相关矩阵的方法,使得每个相关可以在(- 1,1)或(0,1)上具有不对称分布。利用偏相关c -藤到相关矩阵的递推公式,在藤的树(1≤r <d)中各偏相关的分布相同的情况下,可以导出一阶矩和二阶矩。给出了一种算法和条件,使得在一个排列步骤之后,所有随机相关都有一个共同的均值和第二矩。该算法可用于模拟实验,生成覆盖整个空间的随机相关矩阵或约束每个相关为正的随机相关矩阵。
{"title":"Random correlation matrices generated via partial correlation C-vines","authors":"Harry Joe ,&nbsp;Dorota Kurowicka","doi":"10.1016/j.jmva.2025.105519","DOIUrl":"10.1016/j.jmva.2025.105519","url":null,"abstract":"<div><div>The method for generating random <span><math><mrow><mi>d</mi><mo>×</mo><mi>d</mi></mrow></math></span> correlation matrices with a partial correlation C-vine is extended so that each correlation can have a distribution that is asymmetric on <span><math><mrow><mo>(</mo><mo>−</mo><mn>1</mn><mo>,</mo><mn>1</mn><mo>)</mo></mrow></math></span> or on <span><math><mrow><mo>(</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>)</mo></mrow></math></span>. With the recursion formulas from the partial correlation C-vine to the correlation matrix, first and second moments can be derived, in the case of the same distribution for each partial correlation in tree <span><math><mi>ℓ</mi></math></span> of the vine (<span><math><mrow><mn>1</mn><mo>≤</mo><mi>ℓ</mi><mo>&lt;</mo><mi>d</mi></mrow></math></span>). Algorithms and conditions are given so that, after a permutation step, all random correlations have a common mean and second moment. The algorithms can be useful for simulation experiments to generate random correlation matrices that cover the whole space or with the restriction that each correlation is positive.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105519"},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating singular functions of kernel cross-covariance operators: An investigation of the Nyström method 估计核交叉协方差算子的奇异函数:Nyström方法的研究
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-01-01 Epub Date: 2025-10-10 DOI: 10.1016/j.jmva.2025.105514
Min Xu , Qi-Hang Zhou , Qin Fang , Zhuo-Xi Shi
We investigate the Nyström method as an efficient means of overcoming the computational bottleneck inherent in estimating the singular functions of kernel cross-covariance operators, which play a central role in tasks such as covariate shift correction and multi-view learning. We present a Nyström-type approximation of the kernel cross-covariance operator, and establish its convergence rate. Furthermore, we derive a novel bound on the weighted sum of squared estimation errors of all associated singular functions, providing tighter control than traditional bounds that treat each error individually. Our theoretical analysis reveals that the Nyström-based singular function estimators attain the same statistical accuracy as their full empirical counterparts, while offering significant computational savings. Numerical experiments further confirm the practical effectiveness of the proposed approach.
我们研究了Nyström方法作为克服核交叉协方差算子奇异函数估计固有的计算瓶颈的有效手段,这在协变量移位校正和多视图学习等任务中起着核心作用。给出了核交叉协方差算子的Nyström-type近似,并确定了其收敛速度。此外,我们推导了所有相关奇异函数的加权平方和估计误差的新界,比单独处理每个误差的传统界提供了更严格的控制。我们的理论分析表明,Nyström-based奇异函数估计器获得与完全经验对应的相同的统计精度,同时提供显着的计算节省。数值实验进一步验证了该方法的实用性。
{"title":"Estimating singular functions of kernel cross-covariance operators: An investigation of the Nyström method","authors":"Min Xu ,&nbsp;Qi-Hang Zhou ,&nbsp;Qin Fang ,&nbsp;Zhuo-Xi Shi","doi":"10.1016/j.jmva.2025.105514","DOIUrl":"10.1016/j.jmva.2025.105514","url":null,"abstract":"<div><div>We investigate the Nyström method as an efficient means of overcoming the computational bottleneck inherent in estimating the singular functions of kernel cross-covariance operators, which play a central role in tasks such as covariate shift correction and multi-view learning. We present a Nyström-type approximation of the kernel cross-covariance operator, and establish its convergence rate. Furthermore, we derive a novel bound on the weighted sum of squared estimation errors of all associated singular functions, providing tighter control than traditional bounds that treat each error individually. Our theoretical analysis reveals that the Nyström-based singular function estimators attain the same statistical accuracy as their full empirical counterparts, while offering significant computational savings. Numerical experiments further confirm the practical effectiveness of the proposed approach.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105514"},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145266689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying differential networks through high-dimensional two-sample inference 通过高维双样本推理识别差分网络
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-01-01 Epub Date: 2025-09-24 DOI: 10.1016/j.jmva.2025.105511
Hui Chen , Yinxu Jia
In this article, we identify differential networks within the Gaussian graphical model framework by examining the equivalence of two precision matrices. It is challenging work when the dimension of the precision matrix increases with the sample size. Existing methods typically assume sparsity in the precision matrix structure, a condition often unmet in real data. To address this issue, we introduce a statistic based on debiased estimator of the high-dimensional precision matrix and employ multiplier bootstrap to approximate the null distribution of the proposed statistic. The proposed method can be easily coupled with various estimation algorithms for high-dimensional precision matrix. In comparison with existing methods, the superiority of the proposed approach lies in mild structure constraints to the unknown precision matrix, making it robust to intricate conditional dependence structures in real data. Additionally, we introduce a cross-fitting procedure that utilizes full data information, leading to enhanced statistical power. Theoretical justification is provided to ensure the validity of the proposed method without restrictive assumptions. We showcase the effectiveness of our proposed method by simulation and real data example, which provides evidence of the proposed method’s usefulness and potential for application in various domains.
在本文中,我们通过检查两个精度矩阵的等价性来识别高斯图形模型框架内的微分网络。当精度矩阵的尺寸随样本量的增加而增加时,这是一项具有挑战性的工作。现有方法通常在精确矩阵结构中假定稀疏性,而这一条件在实际数据中往往不满足。为了解决这个问题,我们引入了一种基于高维精度矩阵的去偏估计量的统计量,并使用乘法器自举来近似所提出的统计量的零分布。该方法可以方便地与各种高维精度矩阵的估计算法相结合。与现有方法相比,该方法的优点在于对未知精度矩阵的结构约束较轻,对实际数据中复杂的条件依赖结构具有较强的鲁棒性。此外,我们引入了一个交叉拟合程序,利用完整的数据信息,从而提高了统计能力。在没有限制性假设的情况下,为保证所提方法的有效性提供了理论依据。通过仿真和实际数据实例验证了所提方法的有效性,证明了所提方法在各个领域的实用性和应用潜力。
{"title":"Identifying differential networks through high-dimensional two-sample inference","authors":"Hui Chen ,&nbsp;Yinxu Jia","doi":"10.1016/j.jmva.2025.105511","DOIUrl":"10.1016/j.jmva.2025.105511","url":null,"abstract":"<div><div>In this article, we identify differential networks within the Gaussian graphical model framework by examining the equivalence of two precision matrices. It is challenging work when the dimension of the precision matrix increases with the sample size. Existing methods typically assume sparsity in the precision matrix structure, a condition often unmet in real data. To address this issue, we introduce a statistic based on debiased estimator of the high-dimensional precision matrix and employ multiplier bootstrap to approximate the null distribution of the proposed statistic. The proposed method can be easily coupled with various estimation algorithms for high-dimensional precision matrix. In comparison with existing methods, the superiority of the proposed approach lies in mild structure constraints to the unknown precision matrix, making it robust to intricate conditional dependence structures in real data. Additionally, we introduce a cross-fitting procedure that utilizes full data information, leading to enhanced statistical power. Theoretical justification is provided to ensure the validity of the proposed method without restrictive assumptions. We showcase the effectiveness of our proposed method by simulation and real data example, which provides evidence of the proposed method’s usefulness and potential for application in various domains.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105511"},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145266690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distance correlation in the presence of measurement errors 距离相关存在测量误差
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-01-01 Epub Date: 2025-11-03 DOI: 10.1016/j.jmva.2025.105518
Xilin Zhang , Guoliang Fan , Liping Zhu
Independence testing is a fundamental issue in statistics. In practice, almost all observations are measured with random errors. The independence test in the presence of measurement errors is an important issue but is rarely addressed in the literature. This paper focuses on distance correlation in the presence of measurement errors. We show that distance covariance is underestimated in the presence of measurement errors and is a strictly decreasing function of the dispersion of measurement errors. Furthermore, the powers of independence tests based on distance covariance and distance correlation are both strictly decreasing functions of the dispersion of measurement errors. Extensive numerical simulations and real data analysis support the conclusions drawn in this paper.
独立性检验是统计学中的一个基本问题。实际上,几乎所有的观测结果都带有随机误差。存在测量误差的独立性检验是一个重要的问题,但在文献中很少涉及。本文主要研究存在测量误差时的距离相关问题。我们表明,距离协方差在存在测量误差时被低估,并且是测量误差分散的严格递减函数。此外,基于距离协方差和距离相关的独立性检验的幂都是测量误差离散度的严格递减函数。大量的数值模拟和实际数据分析支持了本文的结论。
{"title":"Distance correlation in the presence of measurement errors","authors":"Xilin Zhang ,&nbsp;Guoliang Fan ,&nbsp;Liping Zhu","doi":"10.1016/j.jmva.2025.105518","DOIUrl":"10.1016/j.jmva.2025.105518","url":null,"abstract":"<div><div>Independence testing is a fundamental issue in statistics. In practice, almost all observations are measured with random errors. The independence test in the presence of measurement errors is an important issue but is rarely addressed in the literature. This paper focuses on distance correlation in the presence of measurement errors. We show that distance covariance is underestimated in the presence of measurement errors and is a strictly decreasing function of the dispersion of measurement errors. Furthermore, the powers of independence tests based on distance covariance and distance correlation are both strictly decreasing functions of the dispersion of measurement errors. Extensive numerical simulations and real data analysis support the conclusions drawn in this paper.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105518"},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145465407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tree Pólya Splitting distributions for multivariate count data 树Pólya分裂分布的多变量计数数据
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2026-01-01 Epub Date: 2025-09-12 DOI: 10.1016/j.jmva.2025.105507
Samuel Valiquette , Jean Peyhardi , Éric Marchand , Gwladys Toulemonde , Frédéric Mortier
In this article, we develop a new class of multivariate distributions adapted for count data, called Tree Pólya Splitting. This class results from the combination of a univariate distribution and singular multivariate distributions along a fixed partition tree. Known distributions, including the Dirichlet-multinomial, the generalized Dirichlet-multinomial and the Dirichlet-tree multinomial, are particular cases within this class. As we will demonstrate, these distributions offer some flexibility, allowing for the modeling of complex dependence structures (positive, negative, or null) at the observation level. Specifically, we present theoretical properties of Tree Pólya Splitting distributions by focusing primarily on marginal distributions, factorial moments, and dependence structures (covariance and correlations). A dataset of abundance of Trichoptera is used, on one hand, as a benchmark to illustrate the theoretical properties developed in this article, and on the other hand, to demonstrate the interest of these types of models, notably by comparing them to other approaches for fitting multivariate data, such as the Poisson-lognormal model in ecology or singular multivariate distributions used in microbial analysis.
在本文中,我们开发了一类新的适合计数数据的多变量分布,称为Tree Pólya Splitting。该类由沿固定划分树的单变量分布和奇异多变量分布组合而成。已知分布,包括dirichlet -多项式,广义dirichlet -多项式和dirichlet -树多项式,都是这类的特殊情况。正如我们将演示的那样,这些分布提供了一些灵活性,允许在观察级别对复杂的依赖结构(正、负或null)进行建模。具体来说,我们通过主要关注边际分布、阶乘时刻和依赖结构(协方差和相关性)来介绍Tree Pólya分裂分布的理论性质。一方面,我们使用了丰富的Trichoptera数据集作为基准来说明本文中开发的理论特性,另一方面,通过将这些模型与其他拟合多变量数据的方法进行比较,例如生态学中的泊松-对数正态模型或微生物分析中使用的奇异多变量分布,我们展示了这些类型模型的兴趣。
{"title":"Tree Pólya Splitting distributions for multivariate count data","authors":"Samuel Valiquette ,&nbsp;Jean Peyhardi ,&nbsp;Éric Marchand ,&nbsp;Gwladys Toulemonde ,&nbsp;Frédéric Mortier","doi":"10.1016/j.jmva.2025.105507","DOIUrl":"10.1016/j.jmva.2025.105507","url":null,"abstract":"<div><div>In this article, we develop a new class of multivariate distributions adapted for count data, called Tree Pólya Splitting. This class results from the combination of a univariate distribution and singular multivariate distributions along a fixed partition tree. Known distributions, including the Dirichlet-multinomial, the generalized Dirichlet-multinomial and the Dirichlet-tree multinomial, are particular cases within this class. As we will demonstrate, these distributions offer some flexibility, allowing for the modeling of complex dependence structures (positive, negative, or null) at the observation level. Specifically, we present theoretical properties of Tree Pólya Splitting distributions by focusing primarily on marginal distributions, factorial moments, and dependence structures (covariance and correlations). A dataset of abundance of Trichoptera is used, on one hand, as a benchmark to illustrate the theoretical properties developed in this article, and on the other hand, to demonstrate the interest of these types of models, notably by comparing them to other approaches for fitting multivariate data, such as the Poisson-lognormal model in ecology or singular multivariate distributions used in microbial analysis.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105507"},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Multivariate Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1