首页 > 最新文献

Journal of Multivariate Analysis最新文献

英文 中文
Dimension reduction for outlier detection in high-dimensional data 高维数据中异常点检测的降维方法
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-11 DOI: 10.1016/j.jmva.2025.105531
Santiago Ortiz , Henry Laniado , Daniel Peña , Francisco J. Prieto
The work introduces the KASP (Kurtosis and Skewness Projections) procedure, a method for detecting outliers in high-dimensional multivariate data based on dimension reduction techniques. The procedure involves finding projections that maximize non-normality measures in the distribution of the observations. These projections are based on three directions: one that maximizes a combination of the squared skewness and kurtosis coefficients, one that minimizes the kurtosis coefficient, and one that maximizes the squared skewness coefficient. The study demonstrates that these directions include the optimal way to identify outliers for many different contamination structures. The performance of the KASP procedure is compared with alternative methods in correctly identifying and falsely detecting outliers in high-dimensional data sets. Additionally, the paper presents three practical examples to illustrate the effectiveness of the procedure in outlier detection in high dimensions.
该工作介绍了KASP(峰度和偏度投影)程序,这是一种基于降维技术检测高维多变量数据异常值的方法。这个过程包括在观测分布中寻找最大非正态度量的投影。这些预测基于三个方向:一个是最大化偏度平方和峰度系数的组合,一个是最小化峰度系数,一个是最大化偏度平方系数。研究表明,这些方向包括识别许多不同污染结构异常值的最佳方法。比较了KASP方法在正确识别和错误检测高维数据集异常值方面的性能。此外,本文还给出了三个实例来说明该方法在高维异常点检测中的有效性。
{"title":"Dimension reduction for outlier detection in high-dimensional data","authors":"Santiago Ortiz ,&nbsp;Henry Laniado ,&nbsp;Daniel Peña ,&nbsp;Francisco J. Prieto","doi":"10.1016/j.jmva.2025.105531","DOIUrl":"10.1016/j.jmva.2025.105531","url":null,"abstract":"<div><div>The work introduces the KASP (Kurtosis and Skewness Projections) procedure, a method for detecting outliers in high-dimensional multivariate data based on dimension reduction techniques. The procedure involves finding projections that maximize non-normality measures in the distribution of the observations. These projections are based on three directions: one that maximizes a combination of the squared skewness and kurtosis coefficients, one that minimizes the kurtosis coefficient, and one that maximizes the squared skewness coefficient. The study demonstrates that these directions include the optimal way to identify outliers for many different contamination structures. The performance of the KASP procedure is compared with alternative methods in correctly identifying and falsely detecting outliers in high-dimensional data sets. Additionally, the paper presents three practical examples to illustrate the effectiveness of the procedure in outlier detection in high dimensions.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105531"},"PeriodicalIF":1.4,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Projection pursuit Bayesian regression for symmetric matrix predictors 对称矩阵预测器的投影寻迹贝叶斯回归
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-11 DOI: 10.1016/j.jmva.2025.105539
Xiaomeng Ju, Hyung G. Park, Thaddeus Tarpey
This paper develops a novel Bayesian approach for nonlinear regression with symmetric matrix predictors, often used to encode connectivity of different nodes. Unlike methods that vectorize matrices as predictors that result in a large number of model parameters and unstable estimation, we propose a Bayesian multi-index regression method, resulting in a projection-pursuit-type estimator that leverages the structure of matrix-valued predictors. We establish the model identifiability conditions and impose a sparsity-inducing prior on the projection directions for sparse sampling to prevent overfitting and enhance interpretability of the parameter estimates. Posterior inference is conducted through Bayesian backfitting. The performance of the proposed method is evaluated through simulation studies and a case study investigating the relationship between brain connectivity features and cognitive scores.
本文提出了一种具有对称矩阵预测器的非线性回归贝叶斯方法,该方法通常用于对不同节点的连通性进行编码。与将矩阵矢量化作为预测器的方法不同,这些方法会导致大量模型参数和不稳定的估计,我们提出了贝叶斯多指标回归方法,从而产生利用矩阵值预测器结构的投影追踪型估计器。我们建立了模型可识别性条件,并在稀疏采样的投影方向上施加稀疏性诱导先验,以防止过拟合并增强参数估计的可解释性。后验推理通过贝叶斯反拟合进行。通过模拟研究和研究大脑连接特征与认知得分之间关系的案例研究来评估所提出方法的性能。
{"title":"Projection pursuit Bayesian regression for symmetric matrix predictors","authors":"Xiaomeng Ju,&nbsp;Hyung G. Park,&nbsp;Thaddeus Tarpey","doi":"10.1016/j.jmva.2025.105539","DOIUrl":"10.1016/j.jmva.2025.105539","url":null,"abstract":"<div><div>This paper develops a novel Bayesian approach for nonlinear regression with symmetric matrix predictors, often used to encode connectivity of different nodes. Unlike methods that vectorize matrices as predictors that result in a large number of model parameters and unstable estimation, we propose a Bayesian multi-index regression method, resulting in a projection-pursuit-type estimator that leverages the structure of matrix-valued predictors. We establish the model identifiability conditions and impose a sparsity-inducing prior on the projection directions for sparse sampling to prevent overfitting and enhance interpretability of the parameter estimates. Posterior inference is conducted through Bayesian backfitting. The performance of the proposed method is evaluated through simulation studies and a case study investigating the relationship between brain connectivity features and cognitive scores.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105539"},"PeriodicalIF":1.4,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Star products and dimension reduction 明星产品和降维
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-11 DOI: 10.1016/j.jmva.2025.105523
Nicola Loperfido
The star product of two matrices is the linear combination of the blocks in the second matrix, with the corresponding elements of the first matrix as coefficients. In probability and statistics, the star product appeared in conjunction with measures of multivariate skewness and kurtosis, within the frameworks of model-based clustering, multivariate normality testing, outlier detection, invariant coordinate selection and independent component analysis. In this paper, we investigate some properties of the star product and their applications to dimension reduction techniques, including common principal components and invariant coordinate selection. The connections of the star product with tensor concepts and three-way data are also considered. The theoretical results are illustrated with the Iris Flowers and the Swiss Banknotes datasets.
两个矩阵的星积是第二个矩阵中块的线性组合,以第一个矩阵的相应元素作为系数。在概率论和统计学中,在基于模型的聚类、多元正态性检验、离群值检测、不变坐标选择和独立分量分析的框架下,星形产品与多元偏度和峰度度量一起出现。本文研究了星积的一些性质及其在降维技术中的应用,包括公主成分和不变坐标选择。还考虑了星积与张量概念和三向数据的联系。理论结果用鸢尾花和瑞士钞票数据集进行了说明。
{"title":"Star products and dimension reduction","authors":"Nicola Loperfido","doi":"10.1016/j.jmva.2025.105523","DOIUrl":"10.1016/j.jmva.2025.105523","url":null,"abstract":"<div><div>The star product of two matrices is the linear combination of the blocks in the second matrix, with the corresponding elements of the first matrix as coefficients. In probability and statistics, the star product appeared in conjunction with measures of multivariate skewness and kurtosis, within the frameworks of model-based clustering, multivariate normality testing, outlier detection, invariant coordinate selection and independent component analysis. In this paper, we investigate some properties of the star product and their applications to dimension reduction techniques, including common principal components and invariant coordinate selection. The connections of the star product with tensor concepts and three-way data are also considered. The theoretical results are illustrated with the Iris Flowers and the Swiss Banknotes datasets.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105523"},"PeriodicalIF":1.4,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parsimonious multivariate structural spatial models with intra-location feedback 具有位置内反馈的简约多元结构空间模型
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-10 DOI: 10.1016/j.jmva.2025.105541
Hossein Asgharian , Krzysztof Podgórski , Nima Shariati
In univariate spatial stochastic models, parameter space dimension is reduced through structural models with a known adjacency matrix. This structural reduction is also applied in multivariate spatial settings, where matrix-valued observations represent locations along one coordinate and multivariate variables along the other. However, such reduction often goes too far, omitting parameters that capture natural and important dependencies. Widely used models, including the spatial error and spatial lag models, lack parameters for intra-location dependencies. In a spatial econometric context, for example, while parameters link inflation and interest rates across economies, there is no explicit way to represent the effect of inflation on interest rates within a given economy. Through examples and analytical arguments, it is shown that when intralocation feedback exists in the data, standard models fail to capture it, leading to serious misrepresentation of other effects. As a remedy, this paper develops multivariate spatial models that incorporate feedback between variables at the same location. Given the high-dimensional nature of structural models, the challenge is to introduce such effects without substantially enlarging the parameter space, thereby avoiding overparameterization or non-identifiability. This is achieved by adding a single parameter that accounts for intralocation feedback. The proposed models are well-defined under a general second-order framework, accommodating non-Gaussian distributions. Dimensions of the parameter space, model identification, and other fundamental properties are established. Statistical inference is discussed using both empirical precision matrix methods and maximum likelihood. While the main contribution lies in static models, extensions to time-dependent data are also formulated, showing that dynamic generalizations are straightforward.
在单变量空间随机模型中,通过已知邻接矩阵的结构模型来降低参数空间维数。这种结构简化也适用于多变量空间设置,其中矩阵值观测值表示沿一个坐标的位置,而沿另一个坐标的多变量变量。然而,这种简化常常走得太远,忽略了捕获自然和重要依赖关系的参数。广泛使用的模型,包括空间误差和空间滞后模型,缺乏位置内依赖关系的参数。例如,在空间计量经济学背景下,虽然各经济体之间的参数将通货膨胀和利率联系起来,但没有明确的方法来表示给定经济体中通货膨胀对利率的影响。通过实例和分析论证表明,当数据中存在分配内反馈时,标准模型无法捕捉到它,从而导致对其他效应的严重误解。作为补救措施,本文开发了多元空间模型,其中包含同一位置变量之间的反馈。考虑到结构模型的高维性质,挑战在于在不大幅扩大参数空间的情况下引入这种效应,从而避免过度参数化或不可识别性。这是通过添加单个参数来实现的,该参数负责分配内部反馈。所提出的模型在一般二阶框架下定义良好,可适应非高斯分布。建立了参数空间的维度、模型识别和其他基本属性。用经验精度矩阵法和极大似然法讨论了统计推断。虽然主要贡献在于静态模型,但也制定了对时间相关数据的扩展,这表明动态泛化是直接的。
{"title":"Parsimonious multivariate structural spatial models with intra-location feedback","authors":"Hossein Asgharian ,&nbsp;Krzysztof Podgórski ,&nbsp;Nima Shariati","doi":"10.1016/j.jmva.2025.105541","DOIUrl":"10.1016/j.jmva.2025.105541","url":null,"abstract":"<div><div>In univariate spatial stochastic models, parameter space dimension is reduced through structural models with a known adjacency matrix. This structural reduction is also applied in multivariate spatial settings, where matrix-valued observations represent locations along one coordinate and multivariate variables along the other. However, such reduction often goes too far, omitting parameters that capture natural and important dependencies. Widely used models, including the spatial error and spatial lag models, lack parameters for intra-location dependencies. In a spatial econometric context, for example, while parameters link inflation and interest rates across economies, there is no explicit way to represent the effect of inflation on interest rates within a given economy. Through examples and analytical arguments, it is shown that when intralocation feedback exists in the data, standard models fail to capture it, leading to serious misrepresentation of other effects. As a remedy, this paper develops multivariate spatial models that incorporate feedback between variables at the same location. Given the high-dimensional nature of structural models, the challenge is to introduce such effects without substantially enlarging the parameter space, thereby avoiding overparameterization or non-identifiability. This is achieved by adding a single parameter that accounts for intralocation feedback. The proposed models are well-defined under a general second-order framework, accommodating non-Gaussian distributions. Dimensions of the parameter space, model identification, and other fundamental properties are established. Statistical inference is discussed using both empirical precision matrix methods and maximum likelihood. While the main contribution lies in static models, extensions to time-dependent data are also formulated, showing that dynamic generalizations are straightforward.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105541"},"PeriodicalIF":1.4,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing spatial functional linear regression with robust dimension reduction methods 用稳健降维方法增强空间泛函线性回归
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-10 DOI: 10.1016/j.jmva.2025.105538
Ufuk Beyaztas , Abhijit Mandal , Han Lin Shang
This paper introduces a robust estimation strategy for the spatial functional linear regression model using dimension reduction methods, specifically functional principal component analysis (FPCA) and functional partial least squares (FPLS). These techniques are designed to address challenges associated with spatially correlated functional data, particularly the impact of outliers on parameter estimation. By projecting the infinite-dimensional functional predictor onto a finite-dimensional space defined by orthonormal basis functions and employing M-estimation to mitigate outlier effects, our approach improves the accuracy and reliability of parameter estimates in the spatial functional linear regression context. Simulation studies and empirical data analysis substantiate the effectiveness of our methods. Fisher consistency and influence function of the FPCA-based approach are established under regularity conditions. The rfsac package in
1 implements these robust estimation strategies, ensuring practical applicability for researchers and practitioners.
本文介绍了一种利用降维方法,特别是功能主成分分析(FPCA)和功能偏最小二乘(FPLS)对空间功能线性回归模型进行稳健估计的策略。这些技术旨在解决与空间相关功能数据相关的挑战,特别是异常值对参数估计的影响。通过将无限维泛函预测器投影到由正交基函数定义的有限维空间上,并使用m估计来减轻离群值效应,我们的方法提高了空间函数线性回归环境中参数估计的准确性和可靠性。仿真研究和实证数据分析证实了方法的有效性。在正则性条件下,建立了基于fpca方法的Fisher一致性和影响函数。1中的rfsac包实现了这些健壮的估计策略,确保了研究人员和实践者的实际适用性。
{"title":"Enhancing spatial functional linear regression with robust dimension reduction methods","authors":"Ufuk Beyaztas ,&nbsp;Abhijit Mandal ,&nbsp;Han Lin Shang","doi":"10.1016/j.jmva.2025.105538","DOIUrl":"10.1016/j.jmva.2025.105538","url":null,"abstract":"<div><div>This paper introduces a robust estimation strategy for the spatial functional linear regression model using dimension reduction methods, specifically functional principal component analysis (FPCA) and functional partial least squares (FPLS). These techniques are designed to address challenges associated with spatially correlated functional data, particularly the impact of outliers on parameter estimation. By projecting the infinite-dimensional functional predictor onto a finite-dimensional space defined by orthonormal basis functions and employing M-estimation to mitigate outlier effects, our approach improves the accuracy and reliability of parameter estimates in the spatial functional linear regression context. Simulation studies and empirical data analysis substantiate the effectiveness of our methods. Fisher consistency and influence function of the FPCA-based approach are established under regularity conditions. The <span>rfsac</span> package in <figure><img></figure> <span><span><sup>1</sup></span></span> implements these robust estimation strategies, ensuring practical applicability for researchers and practitioners.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105538"},"PeriodicalIF":1.4,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cover it up! Bipartite graphs uncover identifiability in sparse factor analysis 把它盖起来!二部图揭示了稀疏因子分析中的可辨识性
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-08 DOI: 10.1016/j.jmva.2025.105536
Darjus Hosszejni, Sylvia Frühwirth-Schnatter
Factor models are an indispensable tool in dimension reduction in multivariate statistical analysis. Methodological research for factor models is often concerned with identifying rotations that provide the best interpretation of the loadings. This focus on rotational invariance, however, does not ensure unique variance decomposition, which is crucial in many applications where separating common and idiosyncratic variation is key. The present paper provides conditions for variance identification based solely on a counting rule for the binary zero–nonzero pattern of the factor loading matrix which underpins subsequent inference and interpretability. By connecting factor analysis with some classical elements from graph and network theory, it is proven that this condition is sufficient for variance identification without imposing any conditions on the factor loading matrix. An efficient algorithm is designed to verify the seemingly untractable condition in polynomial number of steps. To illustrate the practical relevance of these new insights, the paper makes an explicit connection to post-processing in sparse Bayesian factor analysis. A simulation study and a real world data analysis of financial returns with a time-varying factor model illustrates that verifying variance identification is highly relevant for statistical factor analysis, in particular when the factor dimension is unknown.
因子模型是多元统计分析中降维不可缺少的工具。因子模型的方法学研究通常涉及确定对负荷提供最佳解释的旋转。然而,这种对旋转不变性的关注并不能确保唯一的方差分解,这在许多应用程序中是至关重要的,在这些应用程序中,分离常见和特殊的变异是关键。本文提供了仅基于因子加载矩阵二进制零-非零模式的计数规则进行方差识别的条件,这是后续推理和可解释性的基础。通过将因子分析与图论和网络理论中的一些经典元素联系起来,证明了该条件是方差识别的充分条件,而不需要对因子加载矩阵施加任何条件。设计了一种有效的算法,以多项式步数来验证看似难以处理的条件。为了说明这些新见解的实际意义,本文明确地将其与稀疏贝叶斯因子分析中的后处理联系起来。一项模拟研究和使用时变因素模型对财务回报进行的真实世界数据分析表明,验证方差识别与统计因素分析高度相关,特别是在因素维度未知的情况下。
{"title":"Cover it up! Bipartite graphs uncover identifiability in sparse factor analysis","authors":"Darjus Hosszejni,&nbsp;Sylvia Frühwirth-Schnatter","doi":"10.1016/j.jmva.2025.105536","DOIUrl":"10.1016/j.jmva.2025.105536","url":null,"abstract":"<div><div>Factor models are an indispensable tool in dimension reduction in multivariate statistical analysis. Methodological research for factor models is often concerned with identifying rotations that provide the best interpretation of the loadings. This focus on rotational invariance, however, does not ensure unique variance decomposition, which is crucial in many applications where separating common and idiosyncratic variation is key. The present paper provides conditions for variance identification based solely on a counting rule for the binary zero–nonzero pattern of the factor loading matrix which underpins subsequent inference and interpretability. By connecting factor analysis with some classical elements from graph and network theory, it is proven that this condition is sufficient for variance identification without imposing any conditions on the factor loading matrix. An efficient algorithm is designed to verify the seemingly untractable condition in polynomial number of steps. To illustrate the practical relevance of these new insights, the paper makes an explicit connection to post-processing in sparse Bayesian factor analysis. A simulation study and a real world data analysis of financial returns with a time-varying factor model illustrates that verifying variance identification is highly relevant for statistical factor analysis, in particular when the factor dimension is unknown.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105536"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Projection pursuit via kernel mean embeddings 基于核均值嵌入的投影追踪
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-08 DOI: 10.1016/j.jmva.2025.105534
Oliver Warth, Lutz Dümbgen
Detecting and visualizing interesting structures in high-dimensional data is a ubiquitous challenge. If one aims for linear projections onto low-dimensional spaces, a well-known problematic phenomenon is the Diaconis–Freedman effect: under mild conditions, most projections do not reveal interesting structures but look like scale mixtures of spherically symmetric Gaussian distributions. We present a method which combines global search strategies and local projection pursuit via maximizing the maximum mean discrepancy (MMD) between the empirical distribution of the projected data and a data-driven Gaussian mixture distribution. Here, MMD is based on kernel mean embeddings with Gaussian kernels.
在高维数据中检测和可视化有趣的结构是一个普遍存在的挑战。如果一个人的目标是在低维空间上进行线性投影,一个众所周知的问题现象是Diaconis-Freedman效应:在温和的条件下,大多数投影不会显示出有趣的结构,而是看起来像球对称高斯分布的尺度混合物。我们提出了一种结合全局搜索策略和局部投影寻踪的方法,通过最大化投影数据的经验分布与数据驱动的高斯混合分布之间的最大平均差异(MMD)。这里,MMD是基于高斯核的核均值嵌入。
{"title":"Projection pursuit via kernel mean embeddings","authors":"Oliver Warth,&nbsp;Lutz Dümbgen","doi":"10.1016/j.jmva.2025.105534","DOIUrl":"10.1016/j.jmva.2025.105534","url":null,"abstract":"<div><div>Detecting and visualizing interesting structures in high-dimensional data is a ubiquitous challenge. If one aims for linear projections onto low-dimensional spaces, a well-known problematic phenomenon is the Diaconis–Freedman effect: under mild conditions, most projections do not reveal interesting structures but look like scale mixtures of spherically symmetric Gaussian distributions. We present a method which combines global search strategies and local projection pursuit via maximizing the maximum mean discrepancy (MMD) between the empirical distribution of the projected data and a data-driven Gaussian mixture distribution. Here, MMD is based on kernel mean embeddings with Gaussian kernels.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105534"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Skewness and kurtosis projection pursuit for the multivariate extended skew-normal and skew-Student distributions 多元扩展斜正态分布和斜生分布的偏态和峰度投影追踪
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-08 DOI: 10.1016/j.jmva.2025.105533
C.J. Adcock
This paper reports the results of a study into projection pursuit for the multivariate extended skew-normal and skew-Student distributions. Computation of the projection pursuit vectors is done using an algorithm that exploits the structure of the moments. Detailed results are reported for a range of values of the shape vector, the extension parameter and degrees of freedom. The required scale matrix and shape vectors are based on data reported in a study of diabetes. The same parameters and data are used to illustrate the role that projection pursuit can play in variable selection for regression. The differences between third and fourth order projection pursuit are not great, this being a consequence of the structure of the moments induced by the form of the distribution. There are differences depending on the choice of parameterization. Use of the central parameterization changes the structure of both the covariance matrix and the shape vector.
本文报道了多元扩展偏正态分布和偏生分布的投影追踪问题的研究结果。投影跟踪向量的计算使用利用矩的结构的算法完成。详细的结果报告了一系列的值的形状矢量,扩展参数和自由度。所需的尺度矩阵和形状向量基于一项糖尿病研究报告的数据。使用相同的参数和数据来说明投影寻踪在回归变量选择中的作用。三阶和四阶投影追踪之间的差异并不大,这是由分布形式引起的力矩结构的结果。根据参数化的选择,存在差异。中心参数化的使用改变了协方差矩阵和形状向量的结构。
{"title":"Skewness and kurtosis projection pursuit for the multivariate extended skew-normal and skew-Student distributions","authors":"C.J. Adcock","doi":"10.1016/j.jmva.2025.105533","DOIUrl":"10.1016/j.jmva.2025.105533","url":null,"abstract":"<div><div>This paper reports the results of a study into projection pursuit for the multivariate extended skew-normal and skew-Student distributions. Computation of the projection pursuit vectors is done using an algorithm that exploits the structure of the moments. Detailed results are reported for a range of values of the shape vector, the extension parameter and degrees of freedom. The required scale matrix and shape vectors are based on data reported in a study of diabetes. The same parameters and data are used to illustrate the role that projection pursuit can play in variable selection for regression. The differences between third and fourth order projection pursuit are not great, this being a consequence of the structure of the moments induced by the form of the distribution. There are differences depending on the choice of parameterization. Use of the central parameterization changes the structure of both the covariance matrix and the shape vector.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105533"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised linear discrimination using skewness 使用偏度的无监督线性判别
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-08 DOI: 10.1016/j.jmva.2025.105524
Una Radojičić , Klaus Nordhausen , Joni Virta
It is well-known that, in Gaussian two-group separation, the optimally discriminating projection direction can be estimated without any knowledge on the group labels. In this work, we gather several such unsupervised estimators based on skewness and derive their limiting distributions. As one of our main results, we show that all affine equivariant estimators of the optimal direction have proportional asymptotic covariance matrices, making their comparison straightforward. Two of our four estimators are novel and two have been proposed already earlier. We use simulations to verify our results and to inspect the finite-sample behaviors of the estimators.
众所周知,在高斯两组分离中,可以在不知道组标签的情况下估计出最优判别投影方向。在这项工作中,我们收集了几个基于偏度的无监督估计量,并推导了它们的极限分布。作为我们的主要结果之一,我们证明了最优方向的所有仿射等变估计量都具有比例渐近协方差矩阵,使得它们的比较简单明了。我们的四个估计器中有两个是新颖的,另外两个已经在早些时候提出过。我们使用模拟来验证我们的结果,并检查估计器的有限样本行为。
{"title":"Unsupervised linear discrimination using skewness","authors":"Una Radojičić ,&nbsp;Klaus Nordhausen ,&nbsp;Joni Virta","doi":"10.1016/j.jmva.2025.105524","DOIUrl":"10.1016/j.jmva.2025.105524","url":null,"abstract":"<div><div>It is well-known that, in Gaussian two-group separation, the optimally discriminating projection direction can be estimated without any knowledge on the group labels. In this work, we gather several such unsupervised estimators based on skewness and derive their limiting distributions. As one of our main results, we show that all affine equivariant estimators of the optimal direction have proportional asymptotic covariance matrices, making their comparison straightforward. Two of our four estimators are novel and two have been proposed already earlier. We use simulations to verify our results and to inspect the finite-sample behaviors of the estimators.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105524"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A unified framework of principal component analysis and factor analysis 主成分分析与因子分析的统一框架
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-08 DOI: 10.1016/j.jmva.2025.105529
Shifeng Xiong
Principal component analysis and factor analysis are fundamental multivariate analysis methods. In this paper a unified framework to connect them is introduced. Under a general latent variable model, we present matrix optimization problems from the viewpoint of loss function minimization, and show that the two methods can be viewed as solutions to the optimization problems with specific loss functions. Specifically, principal component analysis can be derived from a broad class of loss functions including the 2 norm, while factor analysis corresponds to a modified 0 norm problem. Related problems are discussed, including algorithms, penalized maximum likelihood estimation under the latent variable model, and a principal component factor model. These results can lead to new tools of data analysis and research topics.
主成分分析和因子分析是基本的多变量分析方法。本文介绍了一个统一的框架来连接它们。在一般隐变量模型下,我们从损失函数最小化的角度提出了矩阵优化问题,并证明了这两种方法都可以看作是具有特定损失函数的优化问题的解。具体地说,主成分分析可以从包含2范数的损失函数中导出,而因子分析对应于一个修正的0范数问题。讨论了相关问题,包括算法、潜在变量模型下的惩罚极大似然估计和主成分因子模型。这些结果可以导致新的数据分析工具和研究课题。
{"title":"A unified framework of principal component analysis and factor analysis","authors":"Shifeng Xiong","doi":"10.1016/j.jmva.2025.105529","DOIUrl":"10.1016/j.jmva.2025.105529","url":null,"abstract":"<div><div>Principal component analysis and factor analysis are fundamental multivariate analysis methods. In this paper a unified framework to connect them is introduced. Under a general latent variable model, we present matrix optimization problems from the viewpoint of loss function minimization, and show that the two methods can be viewed as solutions to the optimization problems with specific loss functions. Specifically, principal component analysis can be derived from a broad class of loss functions including the <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span> norm, while factor analysis corresponds to a modified <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span> norm problem. Related problems are discussed, including algorithms, penalized maximum likelihood estimation under the latent variable model, and a principal component factor model. These results can lead to new tools of data analysis and research topics.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105529"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Multivariate Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1