首页 > 最新文献

Journal of Multivariate Analysis最新文献

英文 中文
A unified framework of principal component analysis and factor analysis 主成分分析与因子分析的统一框架
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-08 DOI: 10.1016/j.jmva.2025.105529
Shifeng Xiong
Principal component analysis and factor analysis are fundamental multivariate analysis methods. In this paper a unified framework to connect them is introduced. Under a general latent variable model, we present matrix optimization problems from the viewpoint of loss function minimization, and show that the two methods can be viewed as solutions to the optimization problems with specific loss functions. Specifically, principal component analysis can be derived from a broad class of loss functions including the 2 norm, while factor analysis corresponds to a modified 0 norm problem. Related problems are discussed, including algorithms, penalized maximum likelihood estimation under the latent variable model, and a principal component factor model. These results can lead to new tools of data analysis and research topics.
主成分分析和因子分析是基本的多变量分析方法。本文介绍了一个统一的框架来连接它们。在一般隐变量模型下,我们从损失函数最小化的角度提出了矩阵优化问题,并证明了这两种方法都可以看作是具有特定损失函数的优化问题的解。具体地说,主成分分析可以从包含2范数的损失函数中导出,而因子分析对应于一个修正的0范数问题。讨论了相关问题,包括算法、潜在变量模型下的惩罚极大似然估计和主成分因子模型。这些结果可以导致新的数据分析工具和研究课题。
{"title":"A unified framework of principal component analysis and factor analysis","authors":"Shifeng Xiong","doi":"10.1016/j.jmva.2025.105529","DOIUrl":"10.1016/j.jmva.2025.105529","url":null,"abstract":"<div><div>Principal component analysis and factor analysis are fundamental multivariate analysis methods. In this paper a unified framework to connect them is introduced. Under a general latent variable model, we present matrix optimization problems from the viewpoint of loss function minimization, and show that the two methods can be viewed as solutions to the optimization problems with specific loss functions. Specifically, principal component analysis can be derived from a broad class of loss functions including the <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span> norm, while factor analysis corresponds to a modified <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span> norm problem. Related problems are discussed, including algorithms, penalized maximum likelihood estimation under the latent variable model, and a principal component factor model. These results can lead to new tools of data analysis and research topics.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105529"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized implementation of invariant coordinate selection with positive semi-definite scatter matrices 正半定散点矩阵不变量坐标选择的广义实现
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-08 DOI: 10.1016/j.jmva.2025.105520
Aurore Archimbaud
Invariant coordinate selection is an unsupervised multivariate data transformation useful in many contexts such as outlier detection or clustering. It is based on the simultaneous diagonalization of two affine equivariant and positive definite scatter matrices. Its classical implementation relies on a non-symmetric eigenvalue problem by diagonalizing one scatter relatively to the other. In case of collinearity, at least one of the scatter matrices is singular, making the problem unsolvable. To address this limitation, three approaches are proposed using: a Moore–Penrose pseudo inverse, a dimension reduction, and a generalized singular value decomposition. Their properties are investigated both theoretically and through various empirical applications. Overall, the extension based on the generalized singular value decomposition seems the most promising, even though it restricts the choice of scatter matrices to those that can be expressed as cross-products. In practice, some of the approaches also appear suitable in the context of data in high-dimension low-sample-size data.
不变坐标选择是一种无监督的多变量数据变换,在离群点检测或聚类等很多情况下都很有用。它基于两个仿射等变正定散射矩阵的同时对角化。它的经典实现依赖于一个非对称的特征值问题,通过对角化一个散射相对于另一个。在共线性的情况下,至少有一个散点矩阵是奇异的,使得问题不可解。为了解决这一限制,提出了三种方法:Moore-Penrose伪逆,降维和广义奇异值分解。它们的性质在理论上和通过各种经验应用进行了研究。总的来说,基于广义奇异值分解的扩展似乎是最有前途的,尽管它将散点矩阵的选择限制在那些可以表示为叉积的散点矩阵上。在实践中,有些方法似乎也适用于高维低样本数据的数据背景。
{"title":"Generalized implementation of invariant coordinate selection with positive semi-definite scatter matrices","authors":"Aurore Archimbaud","doi":"10.1016/j.jmva.2025.105520","DOIUrl":"10.1016/j.jmva.2025.105520","url":null,"abstract":"<div><div>Invariant coordinate selection is an unsupervised multivariate data transformation useful in many contexts such as outlier detection or clustering. It is based on the simultaneous diagonalization of two affine equivariant and positive definite scatter matrices. Its classical implementation relies on a non-symmetric eigenvalue problem by diagonalizing one scatter relatively to the other. In case of collinearity, at least one of the scatter matrices is singular, making the problem unsolvable. To address this limitation, three approaches are proposed using: a Moore–Penrose pseudo inverse, a dimension reduction, and a generalized singular value decomposition. Their properties are investigated both theoretically and through various empirical applications. Overall, the extension based on the generalized singular value decomposition seems the most promising, even though it restricts the choice of scatter matrices to those that can be expressed as cross-products. In practice, some of the approaches also appear suitable in the context of data in high-dimension low-sample-size data.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105520"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust two-way dimension reduction by Grassmannian barycenter 格拉斯曼质心鲁棒双向降维
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-08 DOI: 10.1016/j.jmva.2025.105527
Zeyu Li , Yong He , Xinbing Kong , Xinsheng Zhang
Two-way dimension reduction for well-structured matrix-valued data is growing popular in the past few years. To achieve robustness against individual matrix outliers with large spikes, arising either from heavy-tailed noise or large individual low-rank signals deviating from the population subspace, we first calculate the leading singular subspaces of each individual matrix, and then find the barycenter of the locally estimated subspaces across all observations, in contrast to the existing methods which first integrate data across observations and then do eigenvalue decomposition. In addition, a robust cut-off dimension determination criteria is suggested based on comparing the eigenvalue ratios of the corresponding Euclidean means of the projection matrices. Theoretical properties of the resulting estimators are investigated under mild conditions. Numerical simulation studies justify the advantages and robustness of the proposed methods over the existing tools. Two real examples associated with medical imaging and financial portfolios are given to provide empirical evidence on our arguments and also to illustrate the usefulness of the algorithms.
结构良好的矩阵值数据的双向降维在过去几年中越来越流行。为了实现对由重尾噪声或偏离总体子空间的大个体低秩信号引起的具有大峰值的单个矩阵异常值的鲁棒性,我们首先计算每个单个矩阵的领先奇异子空间,然后找到局部估计子空间的重心,跨越所有观测值,而不是现有的方法,首先跨观测值整合数据,然后进行特征值分解。此外,通过比较投影矩阵对应欧几里得均值的特征值比值,提出了一种鲁棒的截止维确定准则。在温和条件下研究了所得估计量的理论性质。数值模拟研究证明了所提方法相对于现有工具的优越性和鲁棒性。给出了两个与医学成像和金融投资组合相关的真实例子,为我们的论点提供了经验证据,也说明了算法的有用性。
{"title":"Robust two-way dimension reduction by Grassmannian barycenter","authors":"Zeyu Li ,&nbsp;Yong He ,&nbsp;Xinbing Kong ,&nbsp;Xinsheng Zhang","doi":"10.1016/j.jmva.2025.105527","DOIUrl":"10.1016/j.jmva.2025.105527","url":null,"abstract":"<div><div>Two-way dimension reduction for well-structured matrix-valued data is growing popular in the past few years. To achieve robustness against individual matrix outliers with large spikes, arising either from heavy-tailed noise or large individual low-rank signals deviating from the population subspace, we first calculate the leading singular subspaces of each individual matrix, and then find the barycenter of the locally estimated subspaces across all observations, in contrast to the existing methods which first integrate data across observations and then do eigenvalue decomposition. In addition, a robust cut-off dimension determination criteria is suggested based on comparing the eigenvalue ratios of the corresponding Euclidean means of the projection matrices. Theoretical properties of the resulting estimators are investigated under mild conditions. Numerical simulation studies justify the advantages and robustness of the proposed methods over the existing tools. Two real examples associated with medical imaging and financial portfolios are given to provide empirical evidence on our arguments and also to illustrate the usefulness of the algorithms.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105527"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145616470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ICS for complex data with application to outlier detection for density data 复杂数据的ICS及其在密度数据离群值检测中的应用
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-08 DOI: 10.1016/j.jmva.2025.105522
Camille Mondon , Huong Thi Trinh , Anne Ruiz-Gazen , Christine Thomas-Agnan
Invariant coordinate selection (ICS) is a dimension reduction method, used as a preliminary step for clustering and outlier detection. It has been primarily applied to multivariate data. This work introduces a coordinate-free definition of ICS in an abstract Euclidean space and extends the method to complex data. Functional and distributional data are preprocessed into a finite-dimensional subspace. For example, in the framework of Bayes Hilbert spaces, distributional data are smoothed into compositional spline functions through the Maximum Penalised Likelihood method. We describe an outlier detection procedure for complex data and study the impact of some preprocessing parameters on the results. We compare our approach with other outlier detection methods through simulations, producing promising results in scenarios with a low proportion of outliers. ICS allows detecting abnormal climate events in a sample of daily maximum temperature distributions recorded across the provinces of Northern Vietnam between 1987 and 2016.
不变坐标选择(ICS)是一种降维方法,用于聚类和异常点检测的初步步骤。它主要应用于多变量数据。本文介绍了抽象欧几里得空间中ICS的无坐标定义,并将该方法扩展到复杂数据。函数和分布数据被预处理成一个有限维子空间。例如,在贝叶斯希尔伯特空间的框架中,通过最大惩罚似然方法将分布数据平滑成组合样条函数。我们描述了一个复杂数据的异常值检测过程,并研究了一些预处理参数对结果的影响。我们通过模拟将我们的方法与其他离群值检测方法进行了比较,在离群值比例较低的情况下产生了有希望的结果。ICS可以在1987年至2016年期间越南北部各省记录的日最高温度分布样本中检测异常气候事件。
{"title":"ICS for complex data with application to outlier detection for density data","authors":"Camille Mondon ,&nbsp;Huong Thi Trinh ,&nbsp;Anne Ruiz-Gazen ,&nbsp;Christine Thomas-Agnan","doi":"10.1016/j.jmva.2025.105522","DOIUrl":"10.1016/j.jmva.2025.105522","url":null,"abstract":"<div><div>Invariant coordinate selection (ICS) is a dimension reduction method, used as a preliminary step for clustering and outlier detection. It has been primarily applied to multivariate data. This work introduces a coordinate-free definition of ICS in an abstract Euclidean space and extends the method to complex data. Functional and distributional data are preprocessed into a finite-dimensional subspace. For example, in the framework of Bayes Hilbert spaces, distributional data are smoothed into compositional spline functions through the Maximum Penalised Likelihood method. We describe an outlier detection procedure for complex data and study the impact of some preprocessing parameters on the results. We compare our approach with other outlier detection methods through simulations, producing promising results in scenarios with a low proportion of outliers. ICS allows detecting abnormal climate events in a sample of daily maximum temperature distributions recorded across the provinces of Northern Vietnam between 1987 and 2016.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105522"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
tSNE-Spec: A new classification method for multivariate time series data tSNE-Spec:一种新的多元时间序列数据分类方法
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-08 DOI: 10.1016/j.jmva.2025.105537
Shubhajit Sen , Soudeep Deb
Classification of multivariate time series (MTS) data has applications in various domains, for example, medical sciences, finance, sports analytics, etc. In this work, we propose a new technique that uses the advantages of dimension reduction through the t-distributed stochastic neighbor embedding (t-SNE) method, coupled with the attractive properties of the spectral density estimates of a time series, and k-nearest neighbor algorithm. We transform each MTS to a lower dimensional time series using t-SNE, making it useful for visualizing and retaining the temporal patterns, and subsequently use that in classification. Then, we extend the standard univariate spectral density-based classification in the multivariate setting and prove its theoretical consistency. Empirically, at first, we establish that the pairwise structure of the multivariate spectral density based distance matrix is retained in the t-SNE transformed spectral density-based distance calculation method, thus indicating that the consistency derived based on multivariate spectral density is transferable to our proposed method. The performance of our proposed method is shown by comparing it against other widely used methods and we find that the proposed algorithm achieves superior classification accuracy across various settings. We also demonstrate the superiority of our method in a real-life health dataset where the task is to classify epilepsy seizures from other activities like walking and running based on accelerometer data.
多变量时间序列(MTS)数据的分类在许多领域都有应用,例如医学、金融、体育分析等。在这项工作中,我们提出了一种新技术,该技术利用了通过t分布随机邻居嵌入(t-SNE)方法降维的优势,加上时间序列的谱密度估计的吸引特性,以及k最近邻算法。我们使用t-SNE将每个MTS转换为较低维度的时间序列,使其有助于可视化和保留时间模式,并随后将其用于分类。然后,我们将标准的基于单变量谱密度的分类推广到多变量环境中,并证明了其理论一致性。在经验上,我们首先建立了基于多元谱密度的距离矩阵的两两结构在t-SNE变换的基于谱密度的距离计算方法中保持不变,从而表明基于多元谱密度的一致性可以转移到我们提出的方法中。通过与其他广泛使用的方法进行比较,我们发现所提出的算法在各种设置下都具有较高的分类精度。我们还在现实生活中的健康数据集中展示了我们方法的优越性,该数据集的任务是根据加速度计数据将癫痫发作与其他活动(如步行和跑步)进行分类。
{"title":"tSNE-Spec: A new classification method for multivariate time series data","authors":"Shubhajit Sen ,&nbsp;Soudeep Deb","doi":"10.1016/j.jmva.2025.105537","DOIUrl":"10.1016/j.jmva.2025.105537","url":null,"abstract":"<div><div>Classification of multivariate time series (MTS) data has applications in various domains, for example, medical sciences, finance, sports analytics, etc. In this work, we propose a new technique that uses the advantages of dimension reduction through the t-distributed stochastic neighbor embedding (t-SNE) method, coupled with the attractive properties of the spectral density estimates of a time series, and k-nearest neighbor algorithm. We transform each MTS to a lower dimensional time series using t-SNE, making it useful for visualizing and retaining the temporal patterns, and subsequently use that in classification. Then, we extend the standard univariate spectral density-based classification in the multivariate setting and prove its theoretical consistency. Empirically, at first, we establish that the pairwise structure of the multivariate spectral density based distance matrix is retained in the t-SNE transformed spectral density-based distance calculation method, thus indicating that the consistency derived based on multivariate spectral density is transferable to our proposed method. The performance of our proposed method is shown by comparing it against other widely used methods and we find that the proposed algorithm achieves superior classification accuracy across various settings. We also demonstrate the superiority of our method in a real-life health dataset where the task is to classify epilepsy seizures from other activities like walking and running based on accelerometer data.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105537"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Invariant Coordinate Selection and Fisher discriminant subspace beyond the case of two groups 两组以外的不变量坐标选择和Fisher判别子空间
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-08 DOI: 10.1016/j.jmva.2025.105521
Colombe Becquart , Aurore Archimbaud , Anne Ruiz-Gazen , Luka Prilć , Klaus Nordhausen
Invariant Coordinate Selection (ICS) is a multivariate technique that relies on the simultaneous diagonalization of two scatter matrices. It serves various purposes, including its use as a dimension reduction tool prior to clustering or outlier detection. ICS’s theoretical foundation establishes why and when the identified subspace should contain relevant information by demonstrating its connection with the Fisher discriminant subspace (FDS). These general results have been examined in detail primarily for specific scatter combinations within a two-cluster framework. In this study, we expand these investigations to include more clusters and scatter combinations. Our analysis reveals the importance of distinguishing whether the group centers matrix has full rank. In the full-rank case, we establish deeper connections between ICS and FDS. We provide a detailed study of these relationships for three clusters when the group centers matrix has full rank and when it does not. Based on these expanded theoretical insights and supported by numerical studies, we conclude that ICS is indeed suitable for recovering the FDS under very general settings and cases of failure seem rare.
不变坐标选择(ICS)是一种依赖于两个散点矩阵同时对角化的多变量技术。它有多种用途,包括在聚类或离群值检测之前用作降维工具。ICS的理论基础通过证明识别子空间与Fisher判别子空间(FDS)的联系,确定了识别子空间为什么以及何时应该包含相关信息。这些一般结果主要是针对两簇框架内的特定散点组合进行了详细研究。在本研究中,我们将这些研究扩展到包括更多的聚类和分散组合。我们的分析揭示了鉴别群中心矩阵是否满秩的重要性。在全级情况下,我们在ICS和FDS之间建立了更深层次的联系。我们提供了这些关系的详细研究对于三个簇,当群中心矩阵有满秩和当它没有。基于这些扩展的理论见解和数值研究的支持,我们得出结论,ICS确实适用于在非常一般的设置下恢复FDS,并且失败的情况似乎很少。
{"title":"Invariant Coordinate Selection and Fisher discriminant subspace beyond the case of two groups","authors":"Colombe Becquart ,&nbsp;Aurore Archimbaud ,&nbsp;Anne Ruiz-Gazen ,&nbsp;Luka Prilć ,&nbsp;Klaus Nordhausen","doi":"10.1016/j.jmva.2025.105521","DOIUrl":"10.1016/j.jmva.2025.105521","url":null,"abstract":"<div><div>Invariant Coordinate Selection (ICS) is a multivariate technique that relies on the simultaneous diagonalization of two scatter matrices. It serves various purposes, including its use as a dimension reduction tool prior to clustering or outlier detection. ICS’s theoretical foundation establishes why and when the identified subspace should contain relevant information by demonstrating its connection with the Fisher discriminant subspace (FDS). These general results have been examined in detail primarily for specific scatter combinations within a two-cluster framework. In this study, we expand these investigations to include more clusters and scatter combinations. Our analysis reveals the importance of distinguishing whether the group centers matrix has full rank. In the full-rank case, we establish deeper connections between ICS and FDS. We provide a detailed study of these relationships for three clusters when the group centers matrix has full rank and when it does not. Based on these expanded theoretical insights and supported by numerical studies, we conclude that ICS is indeed suitable for recovering the FDS under very general settings and cases of failure seem rare.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105521"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the fourth cumulant tensor in projection pursuit for a flexible class of skewed models 一类挠性倾斜模型投影追踪中的第四累积张量
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-08 DOI: 10.1016/j.jmva.2025.105532
Jorge M. Arevalillo , Hilario Navarro
Projection pursuit is an exploratory data analysis approach for summarizing multivariate data through the search of interesting data projections. It relies on the maximization of an abnormality measure that quantifies the relevance of a projection to capture data non-normal features. The need to expand the estimation approaches to address projection pursuit has motivated its study within parametric frameworks. This is a follow-up work aimed at exploring the problem under a general class of distributions as it is the scale mixture of skew-normal family. Projection pursuit is examined by exploring the path going from the role played by the fourth cumulant tensor for addressing the problem to its connection with model parameters. The paper contributes to build a triangulation among linear algebra, projection pursuit and parametric statistics. A simulation study and an example with real data are also provided.
投影寻踪是一种探索性的数据分析方法,通过寻找有趣的数据投影来总结多元数据。它依赖于异常度量的最大化,该度量量化了投影的相关性,以捕获数据的非正态特征。扩展估计方法以解决投影追求的需要促使其在参数框架内进行研究。这是一项后续工作,旨在探讨在一般分布类别下的问题,因为它是斜正态族的尺度混合。通过探索从第四个累积张量在解决问题时所起的作用到其与模型参数的联系的路径来检查投影寻踪。建立了一种集线性代数、投影寻踪和参数统计于一体的三角剖分方法。并给出了仿真研究和实际数据算例。
{"title":"On the fourth cumulant tensor in projection pursuit for a flexible class of skewed models","authors":"Jorge M. Arevalillo ,&nbsp;Hilario Navarro","doi":"10.1016/j.jmva.2025.105532","DOIUrl":"10.1016/j.jmva.2025.105532","url":null,"abstract":"<div><div>Projection pursuit is an exploratory data analysis approach for summarizing multivariate data through the search of interesting data projections. It relies on the maximization of an abnormality measure that quantifies the relevance of a projection to capture data non-normal features. The need to expand the estimation approaches to address projection pursuit has motivated its study within parametric frameworks. This is a follow-up work aimed at exploring the problem under a general class of distributions as it is the scale mixture of skew-normal family. Projection pursuit is examined by exploring the path going from the role played by the fourth cumulant tensor for addressing the problem to its connection with model parameters. The paper contributes to build a triangulation among linear algebra, projection pursuit and parametric statistics. A simulation study and an example with real data are also provided.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105532"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wasserstein projection pursuit of non-Gaussian signals 非高斯信号的Wasserstein投影追踪
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-08 DOI: 10.1016/j.jmva.2025.105535
Satyaki Mukherjee , Soumendu Sundar Mukherjee , Debarghya Ghoshdastidar
We consider the general dimensionality reduction problem of locating in a high-dimensional data cloud, a k-dimensional non-Gaussian subspace of interesting features. We use a projection pursuit approach—we search for mutually orthogonal unit directions which maximise the q-Wasserstein distance of the empirical distribution of data-projections along these directions from a standard Gaussian. Under a generative model, where there is a underlying (unknown) low-dimensional non-Gaussian subspace, we prove rigorous statistical guarantees on the accuracy of approximating this unknown subspace by the directions found by our projection pursuit approach. Our results operate in the regime where the data dimensionality is comparable to the sample size, and thus supplement the recent literature on the non-feasibility of locating interesting directions via projection pursuit in the complementary regime where the data dimensionality is much larger than the sample size.
我们考虑一般的降维问题定位在一个高维数据云,一个k维的非高斯子空间的有趣的特征。我们使用投影追踪方法-我们搜索相互正交的单位方向,使数据的经验分布的q-Wasserstein距离最大化-沿着这些方向从标准高斯分布的投影。在生成模型下,其中存在一个潜在的(未知的)低维非高斯子空间,我们证明了通过我们的投影追踪方法找到的方向逼近该未知子空间的准确性的严格统计保证。我们的结果在数据维数与样本量相当的情况下运行,从而补充了最近关于在数据维数远大于样本量的互补情况下通过投影追踪定位感兴趣方向的不可行性的文献。
{"title":"Wasserstein projection pursuit of non-Gaussian signals","authors":"Satyaki Mukherjee ,&nbsp;Soumendu Sundar Mukherjee ,&nbsp;Debarghya Ghoshdastidar","doi":"10.1016/j.jmva.2025.105535","DOIUrl":"10.1016/j.jmva.2025.105535","url":null,"abstract":"<div><div>We consider the general dimensionality reduction problem of locating in a high-dimensional data cloud, a <span><math><mi>k</mi></math></span>-dimensional non-Gaussian subspace of interesting features. We use a projection pursuit approach—we search for mutually orthogonal unit directions which maximise the <span><math><mi>q</mi></math></span>-Wasserstein distance of the empirical distribution of data-projections along these directions from a standard Gaussian. Under a generative model, where there is a underlying (unknown) low-dimensional non-Gaussian subspace, we prove rigorous statistical guarantees on the accuracy of approximating this unknown subspace by the directions found by our projection pursuit approach. Our results operate in the regime where the data dimensionality is comparable to the sample size, and thus supplement the recent literature on the non-feasibility of locating interesting directions via projection pursuit in the complementary regime where the data dimensionality is much larger than the sample size.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105535"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145537558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Testing for patterns and structures in covariance and correlation matrices 检验模式和结构的协方差和相关矩阵
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-06 DOI: 10.1016/j.jmva.2025.105517
Paavo Sattler , Dennis Dobler
Covariance matrices of random vectors contain information that is crucial for modeling. Specific structures and patterns of the covariances (or correlations) may be used to justify parametric models, e.g., autoregressive models. Until now, there have been only a few approaches for testing such covariance structures and most of them can only be used for one particular structure. In the present paper, we propose a systematic and unified testing procedure working among others for the large class of linear covariance structures. Our approach requires only weak distributional assumptions. It covers common structures such as diagonal matrices, Toeplitz matrices, and compound symmetry, as well as the more involved autoregressive matrices. We exemplify the approach for all these structures. We prove the correctness of these tests for large sample sizes and use bootstrap techniques for a better small-sample approximation. Moreover, the proposed tests invite adaptations to other covariance patterns by choosing the hypothesis matrix appropriately. With the help of a simulation study, we also assess the small sample properties of the tests. Finally, we illustrate the procedure in an application to a real data set.
随机向量的协方差矩阵包含对建模至关重要的信息。协方差(或相关性)的特定结构和模式可以用来证明参数模型,例如,自回归模型。到目前为止,只有少数方法可以测试这种协方差结构,而且大多数方法只能用于一个特定的结构。在本文中,我们提出了一种系统的、统一的检验方法,适用于大类别的线性协方差结构。我们的方法只需要弱的分布假设。它涵盖了常见的结构,如对角矩阵、Toeplitz矩阵和复合对称,以及更复杂的自回归矩阵。我们举例说明了所有这些结构的方法。我们证明了这些测试对于大样本量的正确性,并使用自举技术来获得更好的小样本近似。此外,所提出的检验邀请适应其他协方差模式通过选择适当的假设矩阵。在模拟研究的帮助下,我们还评估了测试的小样本特性。最后,我们在一个实际数据集的应用中说明了该过程。
{"title":"Testing for patterns and structures in covariance and correlation matrices","authors":"Paavo Sattler ,&nbsp;Dennis Dobler","doi":"10.1016/j.jmva.2025.105517","DOIUrl":"10.1016/j.jmva.2025.105517","url":null,"abstract":"<div><div>Covariance matrices of random vectors contain information that is crucial for modeling. Specific structures and patterns of the covariances (or correlations) may be used to justify parametric models, e.g., autoregressive models. Until now, there have been only a few approaches for testing such covariance structures and most of them can only be used for one particular structure. In the present paper, we propose a systematic and unified testing procedure working among others for the large class of linear covariance structures. Our approach requires only weak distributional assumptions. It covers common structures such as diagonal matrices, Toeplitz matrices, and compound symmetry, as well as the more involved autoregressive matrices. We exemplify the approach for all these structures. We prove the correctness of these tests for large sample sizes and use bootstrap techniques for a better small-sample approximation. Moreover, the proposed tests invite adaptations to other covariance patterns by choosing the hypothesis matrix appropriately. With the help of a simulation study, we also assess the small sample properties of the tests. Finally, we illustrate the procedure in an application to a real data set.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105517"},"PeriodicalIF":1.4,"publicationDate":"2025-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distance correlation in the presence of measurement errors 距离相关存在测量误差
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-03 DOI: 10.1016/j.jmva.2025.105518
Xilin Zhang , Guoliang Fan , Liping Zhu
Independence testing is a fundamental issue in statistics. In practice, almost all observations are measured with random errors. The independence test in the presence of measurement errors is an important issue but is rarely addressed in the literature. This paper focuses on distance correlation in the presence of measurement errors. We show that distance covariance is underestimated in the presence of measurement errors and is a strictly decreasing function of the dispersion of measurement errors. Furthermore, the powers of independence tests based on distance covariance and distance correlation are both strictly decreasing functions of the dispersion of measurement errors. Extensive numerical simulations and real data analysis support the conclusions drawn in this paper.
独立性检验是统计学中的一个基本问题。实际上,几乎所有的观测结果都带有随机误差。存在测量误差的独立性检验是一个重要的问题,但在文献中很少涉及。本文主要研究存在测量误差时的距离相关问题。我们表明,距离协方差在存在测量误差时被低估,并且是测量误差分散的严格递减函数。此外,基于距离协方差和距离相关的独立性检验的幂都是测量误差离散度的严格递减函数。大量的数值模拟和实际数据分析支持了本文的结论。
{"title":"Distance correlation in the presence of measurement errors","authors":"Xilin Zhang ,&nbsp;Guoliang Fan ,&nbsp;Liping Zhu","doi":"10.1016/j.jmva.2025.105518","DOIUrl":"10.1016/j.jmva.2025.105518","url":null,"abstract":"<div><div>Independence testing is a fundamental issue in statistics. In practice, almost all observations are measured with random errors. The independence test in the presence of measurement errors is an important issue but is rarely addressed in the literature. This paper focuses on distance correlation in the presence of measurement errors. We show that distance covariance is underestimated in the presence of measurement errors and is a strictly decreasing function of the dispersion of measurement errors. Furthermore, the powers of independence tests based on distance covariance and distance correlation are both strictly decreasing functions of the dispersion of measurement errors. Extensive numerical simulations and real data analysis support the conclusions drawn in this paper.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105518"},"PeriodicalIF":1.4,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145465407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Multivariate Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1