首页 > 最新文献

Journal of Multivariate Analysis最新文献

英文 中文
Cover it up! Bipartite graphs uncover identifiability in sparse factor analysis 把它盖起来!二部图揭示了稀疏因子分析中的可辨识性
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-08 DOI: 10.1016/j.jmva.2025.105536
Darjus Hosszejni, Sylvia Frühwirth-Schnatter
Factor models are an indispensable tool in dimension reduction in multivariate statistical analysis. Methodological research for factor models is often concerned with identifying rotations that provide the best interpretation of the loadings. This focus on rotational invariance, however, does not ensure unique variance decomposition, which is crucial in many applications where separating common and idiosyncratic variation is key. The present paper provides conditions for variance identification based solely on a counting rule for the binary zero–nonzero pattern of the factor loading matrix which underpins subsequent inference and interpretability. By connecting factor analysis with some classical elements from graph and network theory, it is proven that this condition is sufficient for variance identification without imposing any conditions on the factor loading matrix. An efficient algorithm is designed to verify the seemingly untractable condition in polynomial number of steps. To illustrate the practical relevance of these new insights, the paper makes an explicit connection to post-processing in sparse Bayesian factor analysis. A simulation study and a real world data analysis of financial returns with a time-varying factor model illustrates that verifying variance identification is highly relevant for statistical factor analysis, in particular when the factor dimension is unknown.
因子模型是多元统计分析中降维不可缺少的工具。因子模型的方法学研究通常涉及确定对负荷提供最佳解释的旋转。然而,这种对旋转不变性的关注并不能确保唯一的方差分解,这在许多应用程序中是至关重要的,在这些应用程序中,分离常见和特殊的变异是关键。本文提供了仅基于因子加载矩阵二进制零-非零模式的计数规则进行方差识别的条件,这是后续推理和可解释性的基础。通过将因子分析与图论和网络理论中的一些经典元素联系起来,证明了该条件是方差识别的充分条件,而不需要对因子加载矩阵施加任何条件。设计了一种有效的算法,以多项式步数来验证看似难以处理的条件。为了说明这些新见解的实际意义,本文明确地将其与稀疏贝叶斯因子分析中的后处理联系起来。一项模拟研究和使用时变因素模型对财务回报进行的真实世界数据分析表明,验证方差识别与统计因素分析高度相关,特别是在因素维度未知的情况下。
{"title":"Cover it up! Bipartite graphs uncover identifiability in sparse factor analysis","authors":"Darjus Hosszejni,&nbsp;Sylvia Frühwirth-Schnatter","doi":"10.1016/j.jmva.2025.105536","DOIUrl":"10.1016/j.jmva.2025.105536","url":null,"abstract":"<div><div>Factor models are an indispensable tool in dimension reduction in multivariate statistical analysis. Methodological research for factor models is often concerned with identifying rotations that provide the best interpretation of the loadings. This focus on rotational invariance, however, does not ensure unique variance decomposition, which is crucial in many applications where separating common and idiosyncratic variation is key. The present paper provides conditions for variance identification based solely on a counting rule for the binary zero–nonzero pattern of the factor loading matrix which underpins subsequent inference and interpretability. By connecting factor analysis with some classical elements from graph and network theory, it is proven that this condition is sufficient for variance identification without imposing any conditions on the factor loading matrix. An efficient algorithm is designed to verify the seemingly untractable condition in polynomial number of steps. To illustrate the practical relevance of these new insights, the paper makes an explicit connection to post-processing in sparse Bayesian factor analysis. A simulation study and a real world data analysis of financial returns with a time-varying factor model illustrates that verifying variance identification is highly relevant for statistical factor analysis, in particular when the factor dimension is unknown.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105536"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Projection pursuit via kernel mean embeddings 基于核均值嵌入的投影追踪
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-08 DOI: 10.1016/j.jmva.2025.105534
Oliver Warth, Lutz Dümbgen
Detecting and visualizing interesting structures in high-dimensional data is a ubiquitous challenge. If one aims for linear projections onto low-dimensional spaces, a well-known problematic phenomenon is the Diaconis–Freedman effect: under mild conditions, most projections do not reveal interesting structures but look like scale mixtures of spherically symmetric Gaussian distributions. We present a method which combines global search strategies and local projection pursuit via maximizing the maximum mean discrepancy (MMD) between the empirical distribution of the projected data and a data-driven Gaussian mixture distribution. Here, MMD is based on kernel mean embeddings with Gaussian kernels.
在高维数据中检测和可视化有趣的结构是一个普遍存在的挑战。如果一个人的目标是在低维空间上进行线性投影,一个众所周知的问题现象是Diaconis-Freedman效应:在温和的条件下,大多数投影不会显示出有趣的结构,而是看起来像球对称高斯分布的尺度混合物。我们提出了一种结合全局搜索策略和局部投影寻踪的方法,通过最大化投影数据的经验分布与数据驱动的高斯混合分布之间的最大平均差异(MMD)。这里,MMD是基于高斯核的核均值嵌入。
{"title":"Projection pursuit via kernel mean embeddings","authors":"Oliver Warth,&nbsp;Lutz Dümbgen","doi":"10.1016/j.jmva.2025.105534","DOIUrl":"10.1016/j.jmva.2025.105534","url":null,"abstract":"<div><div>Detecting and visualizing interesting structures in high-dimensional data is a ubiquitous challenge. If one aims for linear projections onto low-dimensional spaces, a well-known problematic phenomenon is the Diaconis–Freedman effect: under mild conditions, most projections do not reveal interesting structures but look like scale mixtures of spherically symmetric Gaussian distributions. We present a method which combines global search strategies and local projection pursuit via maximizing the maximum mean discrepancy (MMD) between the empirical distribution of the projected data and a data-driven Gaussian mixture distribution. Here, MMD is based on kernel mean embeddings with Gaussian kernels.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105534"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Skewness and kurtosis projection pursuit for the multivariate extended skew-normal and skew-Student distributions 多元扩展斜正态分布和斜生分布的偏态和峰度投影追踪
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-08 DOI: 10.1016/j.jmva.2025.105533
C.J. Adcock
This paper reports the results of a study into projection pursuit for the multivariate extended skew-normal and skew-Student distributions. Computation of the projection pursuit vectors is done using an algorithm that exploits the structure of the moments. Detailed results are reported for a range of values of the shape vector, the extension parameter and degrees of freedom. The required scale matrix and shape vectors are based on data reported in a study of diabetes. The same parameters and data are used to illustrate the role that projection pursuit can play in variable selection for regression. The differences between third and fourth order projection pursuit are not great, this being a consequence of the structure of the moments induced by the form of the distribution. There are differences depending on the choice of parameterization. Use of the central parameterization changes the structure of both the covariance matrix and the shape vector.
本文报道了多元扩展偏正态分布和偏生分布的投影追踪问题的研究结果。投影跟踪向量的计算使用利用矩的结构的算法完成。详细的结果报告了一系列的值的形状矢量,扩展参数和自由度。所需的尺度矩阵和形状向量基于一项糖尿病研究报告的数据。使用相同的参数和数据来说明投影寻踪在回归变量选择中的作用。三阶和四阶投影追踪之间的差异并不大,这是由分布形式引起的力矩结构的结果。根据参数化的选择,存在差异。中心参数化的使用改变了协方差矩阵和形状向量的结构。
{"title":"Skewness and kurtosis projection pursuit for the multivariate extended skew-normal and skew-Student distributions","authors":"C.J. Adcock","doi":"10.1016/j.jmva.2025.105533","DOIUrl":"10.1016/j.jmva.2025.105533","url":null,"abstract":"<div><div>This paper reports the results of a study into projection pursuit for the multivariate extended skew-normal and skew-Student distributions. Computation of the projection pursuit vectors is done using an algorithm that exploits the structure of the moments. Detailed results are reported for a range of values of the shape vector, the extension parameter and degrees of freedom. The required scale matrix and shape vectors are based on data reported in a study of diabetes. The same parameters and data are used to illustrate the role that projection pursuit can play in variable selection for regression. The differences between third and fourth order projection pursuit are not great, this being a consequence of the structure of the moments induced by the form of the distribution. There are differences depending on the choice of parameterization. Use of the central parameterization changes the structure of both the covariance matrix and the shape vector.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105533"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised linear discrimination using skewness 使用偏度的无监督线性判别
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-08 DOI: 10.1016/j.jmva.2025.105524
Una Radojičić , Klaus Nordhausen , Joni Virta
It is well-known that, in Gaussian two-group separation, the optimally discriminating projection direction can be estimated without any knowledge on the group labels. In this work, we gather several such unsupervised estimators based on skewness and derive their limiting distributions. As one of our main results, we show that all affine equivariant estimators of the optimal direction have proportional asymptotic covariance matrices, making their comparison straightforward. Two of our four estimators are novel and two have been proposed already earlier. We use simulations to verify our results and to inspect the finite-sample behaviors of the estimators.
众所周知,在高斯两组分离中,可以在不知道组标签的情况下估计出最优判别投影方向。在这项工作中,我们收集了几个基于偏度的无监督估计量,并推导了它们的极限分布。作为我们的主要结果之一,我们证明了最优方向的所有仿射等变估计量都具有比例渐近协方差矩阵,使得它们的比较简单明了。我们的四个估计器中有两个是新颖的,另外两个已经在早些时候提出过。我们使用模拟来验证我们的结果,并检查估计器的有限样本行为。
{"title":"Unsupervised linear discrimination using skewness","authors":"Una Radojičić ,&nbsp;Klaus Nordhausen ,&nbsp;Joni Virta","doi":"10.1016/j.jmva.2025.105524","DOIUrl":"10.1016/j.jmva.2025.105524","url":null,"abstract":"<div><div>It is well-known that, in Gaussian two-group separation, the optimally discriminating projection direction can be estimated without any knowledge on the group labels. In this work, we gather several such unsupervised estimators based on skewness and derive their limiting distributions. As one of our main results, we show that all affine equivariant estimators of the optimal direction have proportional asymptotic covariance matrices, making their comparison straightforward. Two of our four estimators are novel and two have been proposed already earlier. We use simulations to verify our results and to inspect the finite-sample behaviors of the estimators.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105524"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A unified framework of principal component analysis and factor analysis 主成分分析与因子分析的统一框架
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-08 DOI: 10.1016/j.jmva.2025.105529
Shifeng Xiong
Principal component analysis and factor analysis are fundamental multivariate analysis methods. In this paper a unified framework to connect them is introduced. Under a general latent variable model, we present matrix optimization problems from the viewpoint of loss function minimization, and show that the two methods can be viewed as solutions to the optimization problems with specific loss functions. Specifically, principal component analysis can be derived from a broad class of loss functions including the 2 norm, while factor analysis corresponds to a modified 0 norm problem. Related problems are discussed, including algorithms, penalized maximum likelihood estimation under the latent variable model, and a principal component factor model. These results can lead to new tools of data analysis and research topics.
主成分分析和因子分析是基本的多变量分析方法。本文介绍了一个统一的框架来连接它们。在一般隐变量模型下,我们从损失函数最小化的角度提出了矩阵优化问题,并证明了这两种方法都可以看作是具有特定损失函数的优化问题的解。具体地说,主成分分析可以从包含2范数的损失函数中导出,而因子分析对应于一个修正的0范数问题。讨论了相关问题,包括算法、潜在变量模型下的惩罚极大似然估计和主成分因子模型。这些结果可以导致新的数据分析工具和研究课题。
{"title":"A unified framework of principal component analysis and factor analysis","authors":"Shifeng Xiong","doi":"10.1016/j.jmva.2025.105529","DOIUrl":"10.1016/j.jmva.2025.105529","url":null,"abstract":"<div><div>Principal component analysis and factor analysis are fundamental multivariate analysis methods. In this paper a unified framework to connect them is introduced. Under a general latent variable model, we present matrix optimization problems from the viewpoint of loss function minimization, and show that the two methods can be viewed as solutions to the optimization problems with specific loss functions. Specifically, principal component analysis can be derived from a broad class of loss functions including the <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span> norm, while factor analysis corresponds to a modified <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span> norm problem. Related problems are discussed, including algorithms, penalized maximum likelihood estimation under the latent variable model, and a principal component factor model. These results can lead to new tools of data analysis and research topics.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105529"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized implementation of invariant coordinate selection with positive semi-definite scatter matrices 正半定散点矩阵不变量坐标选择的广义实现
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-08 DOI: 10.1016/j.jmva.2025.105520
Aurore Archimbaud
Invariant coordinate selection is an unsupervised multivariate data transformation useful in many contexts such as outlier detection or clustering. It is based on the simultaneous diagonalization of two affine equivariant and positive definite scatter matrices. Its classical implementation relies on a non-symmetric eigenvalue problem by diagonalizing one scatter relatively to the other. In case of collinearity, at least one of the scatter matrices is singular, making the problem unsolvable. To address this limitation, three approaches are proposed using: a Moore–Penrose pseudo inverse, a dimension reduction, and a generalized singular value decomposition. Their properties are investigated both theoretically and through various empirical applications. Overall, the extension based on the generalized singular value decomposition seems the most promising, even though it restricts the choice of scatter matrices to those that can be expressed as cross-products. In practice, some of the approaches also appear suitable in the context of data in high-dimension low-sample-size data.
不变坐标选择是一种无监督的多变量数据变换,在离群点检测或聚类等很多情况下都很有用。它基于两个仿射等变正定散射矩阵的同时对角化。它的经典实现依赖于一个非对称的特征值问题,通过对角化一个散射相对于另一个。在共线性的情况下,至少有一个散点矩阵是奇异的,使得问题不可解。为了解决这一限制,提出了三种方法:Moore-Penrose伪逆,降维和广义奇异值分解。它们的性质在理论上和通过各种经验应用进行了研究。总的来说,基于广义奇异值分解的扩展似乎是最有前途的,尽管它将散点矩阵的选择限制在那些可以表示为叉积的散点矩阵上。在实践中,有些方法似乎也适用于高维低样本数据的数据背景。
{"title":"Generalized implementation of invariant coordinate selection with positive semi-definite scatter matrices","authors":"Aurore Archimbaud","doi":"10.1016/j.jmva.2025.105520","DOIUrl":"10.1016/j.jmva.2025.105520","url":null,"abstract":"<div><div>Invariant coordinate selection is an unsupervised multivariate data transformation useful in many contexts such as outlier detection or clustering. It is based on the simultaneous diagonalization of two affine equivariant and positive definite scatter matrices. Its classical implementation relies on a non-symmetric eigenvalue problem by diagonalizing one scatter relatively to the other. In case of collinearity, at least one of the scatter matrices is singular, making the problem unsolvable. To address this limitation, three approaches are proposed using: a Moore–Penrose pseudo inverse, a dimension reduction, and a generalized singular value decomposition. Their properties are investigated both theoretically and through various empirical applications. Overall, the extension based on the generalized singular value decomposition seems the most promising, even though it restricts the choice of scatter matrices to those that can be expressed as cross-products. In practice, some of the approaches also appear suitable in the context of data in high-dimension low-sample-size data.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105520"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust two-way dimension reduction by Grassmannian barycenter 格拉斯曼质心鲁棒双向降维
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-08 DOI: 10.1016/j.jmva.2025.105527
Zeyu Li , Yong He , Xinbing Kong , Xinsheng Zhang
Two-way dimension reduction for well-structured matrix-valued data is growing popular in the past few years. To achieve robustness against individual matrix outliers with large spikes, arising either from heavy-tailed noise or large individual low-rank signals deviating from the population subspace, we first calculate the leading singular subspaces of each individual matrix, and then find the barycenter of the locally estimated subspaces across all observations, in contrast to the existing methods which first integrate data across observations and then do eigenvalue decomposition. In addition, a robust cut-off dimension determination criteria is suggested based on comparing the eigenvalue ratios of the corresponding Euclidean means of the projection matrices. Theoretical properties of the resulting estimators are investigated under mild conditions. Numerical simulation studies justify the advantages and robustness of the proposed methods over the existing tools. Two real examples associated with medical imaging and financial portfolios are given to provide empirical evidence on our arguments and also to illustrate the usefulness of the algorithms.
结构良好的矩阵值数据的双向降维在过去几年中越来越流行。为了实现对由重尾噪声或偏离总体子空间的大个体低秩信号引起的具有大峰值的单个矩阵异常值的鲁棒性,我们首先计算每个单个矩阵的领先奇异子空间,然后找到局部估计子空间的重心,跨越所有观测值,而不是现有的方法,首先跨观测值整合数据,然后进行特征值分解。此外,通过比较投影矩阵对应欧几里得均值的特征值比值,提出了一种鲁棒的截止维确定准则。在温和条件下研究了所得估计量的理论性质。数值模拟研究证明了所提方法相对于现有工具的优越性和鲁棒性。给出了两个与医学成像和金融投资组合相关的真实例子,为我们的论点提供了经验证据,也说明了算法的有用性。
{"title":"Robust two-way dimension reduction by Grassmannian barycenter","authors":"Zeyu Li ,&nbsp;Yong He ,&nbsp;Xinbing Kong ,&nbsp;Xinsheng Zhang","doi":"10.1016/j.jmva.2025.105527","DOIUrl":"10.1016/j.jmva.2025.105527","url":null,"abstract":"<div><div>Two-way dimension reduction for well-structured matrix-valued data is growing popular in the past few years. To achieve robustness against individual matrix outliers with large spikes, arising either from heavy-tailed noise or large individual low-rank signals deviating from the population subspace, we first calculate the leading singular subspaces of each individual matrix, and then find the barycenter of the locally estimated subspaces across all observations, in contrast to the existing methods which first integrate data across observations and then do eigenvalue decomposition. In addition, a robust cut-off dimension determination criteria is suggested based on comparing the eigenvalue ratios of the corresponding Euclidean means of the projection matrices. Theoretical properties of the resulting estimators are investigated under mild conditions. Numerical simulation studies justify the advantages and robustness of the proposed methods over the existing tools. Two real examples associated with medical imaging and financial portfolios are given to provide empirical evidence on our arguments and also to illustrate the usefulness of the algorithms.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105527"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145616470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ICS for complex data with application to outlier detection for density data 复杂数据的ICS及其在密度数据离群值检测中的应用
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-08 DOI: 10.1016/j.jmva.2025.105522
Camille Mondon , Huong Thi Trinh , Anne Ruiz-Gazen , Christine Thomas-Agnan
Invariant coordinate selection (ICS) is a dimension reduction method, used as a preliminary step for clustering and outlier detection. It has been primarily applied to multivariate data. This work introduces a coordinate-free definition of ICS in an abstract Euclidean space and extends the method to complex data. Functional and distributional data are preprocessed into a finite-dimensional subspace. For example, in the framework of Bayes Hilbert spaces, distributional data are smoothed into compositional spline functions through the Maximum Penalised Likelihood method. We describe an outlier detection procedure for complex data and study the impact of some preprocessing parameters on the results. We compare our approach with other outlier detection methods through simulations, producing promising results in scenarios with a low proportion of outliers. ICS allows detecting abnormal climate events in a sample of daily maximum temperature distributions recorded across the provinces of Northern Vietnam between 1987 and 2016.
不变坐标选择(ICS)是一种降维方法,用于聚类和异常点检测的初步步骤。它主要应用于多变量数据。本文介绍了抽象欧几里得空间中ICS的无坐标定义,并将该方法扩展到复杂数据。函数和分布数据被预处理成一个有限维子空间。例如,在贝叶斯希尔伯特空间的框架中,通过最大惩罚似然方法将分布数据平滑成组合样条函数。我们描述了一个复杂数据的异常值检测过程,并研究了一些预处理参数对结果的影响。我们通过模拟将我们的方法与其他离群值检测方法进行了比较,在离群值比例较低的情况下产生了有希望的结果。ICS可以在1987年至2016年期间越南北部各省记录的日最高温度分布样本中检测异常气候事件。
{"title":"ICS for complex data with application to outlier detection for density data","authors":"Camille Mondon ,&nbsp;Huong Thi Trinh ,&nbsp;Anne Ruiz-Gazen ,&nbsp;Christine Thomas-Agnan","doi":"10.1016/j.jmva.2025.105522","DOIUrl":"10.1016/j.jmva.2025.105522","url":null,"abstract":"<div><div>Invariant coordinate selection (ICS) is a dimension reduction method, used as a preliminary step for clustering and outlier detection. It has been primarily applied to multivariate data. This work introduces a coordinate-free definition of ICS in an abstract Euclidean space and extends the method to complex data. Functional and distributional data are preprocessed into a finite-dimensional subspace. For example, in the framework of Bayes Hilbert spaces, distributional data are smoothed into compositional spline functions through the Maximum Penalised Likelihood method. We describe an outlier detection procedure for complex data and study the impact of some preprocessing parameters on the results. We compare our approach with other outlier detection methods through simulations, producing promising results in scenarios with a low proportion of outliers. ICS allows detecting abnormal climate events in a sample of daily maximum temperature distributions recorded across the provinces of Northern Vietnam between 1987 and 2016.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105522"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
tSNE-Spec: A new classification method for multivariate time series data tSNE-Spec:一种新的多元时间序列数据分类方法
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-08 DOI: 10.1016/j.jmva.2025.105537
Shubhajit Sen , Soudeep Deb
Classification of multivariate time series (MTS) data has applications in various domains, for example, medical sciences, finance, sports analytics, etc. In this work, we propose a new technique that uses the advantages of dimension reduction through the t-distributed stochastic neighbor embedding (t-SNE) method, coupled with the attractive properties of the spectral density estimates of a time series, and k-nearest neighbor algorithm. We transform each MTS to a lower dimensional time series using t-SNE, making it useful for visualizing and retaining the temporal patterns, and subsequently use that in classification. Then, we extend the standard univariate spectral density-based classification in the multivariate setting and prove its theoretical consistency. Empirically, at first, we establish that the pairwise structure of the multivariate spectral density based distance matrix is retained in the t-SNE transformed spectral density-based distance calculation method, thus indicating that the consistency derived based on multivariate spectral density is transferable to our proposed method. The performance of our proposed method is shown by comparing it against other widely used methods and we find that the proposed algorithm achieves superior classification accuracy across various settings. We also demonstrate the superiority of our method in a real-life health dataset where the task is to classify epilepsy seizures from other activities like walking and running based on accelerometer data.
多变量时间序列(MTS)数据的分类在许多领域都有应用,例如医学、金融、体育分析等。在这项工作中,我们提出了一种新技术,该技术利用了通过t分布随机邻居嵌入(t-SNE)方法降维的优势,加上时间序列的谱密度估计的吸引特性,以及k最近邻算法。我们使用t-SNE将每个MTS转换为较低维度的时间序列,使其有助于可视化和保留时间模式,并随后将其用于分类。然后,我们将标准的基于单变量谱密度的分类推广到多变量环境中,并证明了其理论一致性。在经验上,我们首先建立了基于多元谱密度的距离矩阵的两两结构在t-SNE变换的基于谱密度的距离计算方法中保持不变,从而表明基于多元谱密度的一致性可以转移到我们提出的方法中。通过与其他广泛使用的方法进行比较,我们发现所提出的算法在各种设置下都具有较高的分类精度。我们还在现实生活中的健康数据集中展示了我们方法的优越性,该数据集的任务是根据加速度计数据将癫痫发作与其他活动(如步行和跑步)进行分类。
{"title":"tSNE-Spec: A new classification method for multivariate time series data","authors":"Shubhajit Sen ,&nbsp;Soudeep Deb","doi":"10.1016/j.jmva.2025.105537","DOIUrl":"10.1016/j.jmva.2025.105537","url":null,"abstract":"<div><div>Classification of multivariate time series (MTS) data has applications in various domains, for example, medical sciences, finance, sports analytics, etc. In this work, we propose a new technique that uses the advantages of dimension reduction through the t-distributed stochastic neighbor embedding (t-SNE) method, coupled with the attractive properties of the spectral density estimates of a time series, and k-nearest neighbor algorithm. We transform each MTS to a lower dimensional time series using t-SNE, making it useful for visualizing and retaining the temporal patterns, and subsequently use that in classification. Then, we extend the standard univariate spectral density-based classification in the multivariate setting and prove its theoretical consistency. Empirically, at first, we establish that the pairwise structure of the multivariate spectral density based distance matrix is retained in the t-SNE transformed spectral density-based distance calculation method, thus indicating that the consistency derived based on multivariate spectral density is transferable to our proposed method. The performance of our proposed method is shown by comparing it against other widely used methods and we find that the proposed algorithm achieves superior classification accuracy across various settings. We also demonstrate the superiority of our method in a real-life health dataset where the task is to classify epilepsy seizures from other activities like walking and running based on accelerometer data.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105537"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Invariant Coordinate Selection and Fisher discriminant subspace beyond the case of two groups 两组以外的不变量坐标选择和Fisher判别子空间
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-11-08 DOI: 10.1016/j.jmva.2025.105521
Colombe Becquart , Aurore Archimbaud , Anne Ruiz-Gazen , Luka Prilć , Klaus Nordhausen
Invariant Coordinate Selection (ICS) is a multivariate technique that relies on the simultaneous diagonalization of two scatter matrices. It serves various purposes, including its use as a dimension reduction tool prior to clustering or outlier detection. ICS’s theoretical foundation establishes why and when the identified subspace should contain relevant information by demonstrating its connection with the Fisher discriminant subspace (FDS). These general results have been examined in detail primarily for specific scatter combinations within a two-cluster framework. In this study, we expand these investigations to include more clusters and scatter combinations. Our analysis reveals the importance of distinguishing whether the group centers matrix has full rank. In the full-rank case, we establish deeper connections between ICS and FDS. We provide a detailed study of these relationships for three clusters when the group centers matrix has full rank and when it does not. Based on these expanded theoretical insights and supported by numerical studies, we conclude that ICS is indeed suitable for recovering the FDS under very general settings and cases of failure seem rare.
不变坐标选择(ICS)是一种依赖于两个散点矩阵同时对角化的多变量技术。它有多种用途,包括在聚类或离群值检测之前用作降维工具。ICS的理论基础通过证明识别子空间与Fisher判别子空间(FDS)的联系,确定了识别子空间为什么以及何时应该包含相关信息。这些一般结果主要是针对两簇框架内的特定散点组合进行了详细研究。在本研究中,我们将这些研究扩展到包括更多的聚类和分散组合。我们的分析揭示了鉴别群中心矩阵是否满秩的重要性。在全级情况下,我们在ICS和FDS之间建立了更深层次的联系。我们提供了这些关系的详细研究对于三个簇,当群中心矩阵有满秩和当它没有。基于这些扩展的理论见解和数值研究的支持,我们得出结论,ICS确实适用于在非常一般的设置下恢复FDS,并且失败的情况似乎很少。
{"title":"Invariant Coordinate Selection and Fisher discriminant subspace beyond the case of two groups","authors":"Colombe Becquart ,&nbsp;Aurore Archimbaud ,&nbsp;Anne Ruiz-Gazen ,&nbsp;Luka Prilć ,&nbsp;Klaus Nordhausen","doi":"10.1016/j.jmva.2025.105521","DOIUrl":"10.1016/j.jmva.2025.105521","url":null,"abstract":"<div><div>Invariant Coordinate Selection (ICS) is a multivariate technique that relies on the simultaneous diagonalization of two scatter matrices. It serves various purposes, including its use as a dimension reduction tool prior to clustering or outlier detection. ICS’s theoretical foundation establishes why and when the identified subspace should contain relevant information by demonstrating its connection with the Fisher discriminant subspace (FDS). These general results have been examined in detail primarily for specific scatter combinations within a two-cluster framework. In this study, we expand these investigations to include more clusters and scatter combinations. Our analysis reveals the importance of distinguishing whether the group centers matrix has full rank. In the full-rank case, we establish deeper connections between ICS and FDS. We provide a detailed study of these relationships for three clusters when the group centers matrix has full rank and when it does not. Based on these expanded theoretical insights and supported by numerical studies, we conclude that ICS is indeed suitable for recovering the FDS under very general settings and cases of failure seem rare.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105521"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Multivariate Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1