Pub Date : 2025-11-08DOI: 10.1016/j.jmva.2025.105529
Shifeng Xiong
Principal component analysis and factor analysis are fundamental multivariate analysis methods. In this paper a unified framework to connect them is introduced. Under a general latent variable model, we present matrix optimization problems from the viewpoint of loss function minimization, and show that the two methods can be viewed as solutions to the optimization problems with specific loss functions. Specifically, principal component analysis can be derived from a broad class of loss functions including the norm, while factor analysis corresponds to a modified norm problem. Related problems are discussed, including algorithms, penalized maximum likelihood estimation under the latent variable model, and a principal component factor model. These results can lead to new tools of data analysis and research topics.
{"title":"A unified framework of principal component analysis and factor analysis","authors":"Shifeng Xiong","doi":"10.1016/j.jmva.2025.105529","DOIUrl":"10.1016/j.jmva.2025.105529","url":null,"abstract":"<div><div>Principal component analysis and factor analysis are fundamental multivariate analysis methods. In this paper a unified framework to connect them is introduced. Under a general latent variable model, we present matrix optimization problems from the viewpoint of loss function minimization, and show that the two methods can be viewed as solutions to the optimization problems with specific loss functions. Specifically, principal component analysis can be derived from a broad class of loss functions including the <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span> norm, while factor analysis corresponds to a modified <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span> norm problem. Related problems are discussed, including algorithms, penalized maximum likelihood estimation under the latent variable model, and a principal component factor model. These results can lead to new tools of data analysis and research topics.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105529"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-08DOI: 10.1016/j.jmva.2025.105520
Aurore Archimbaud
Invariant coordinate selection is an unsupervised multivariate data transformation useful in many contexts such as outlier detection or clustering. It is based on the simultaneous diagonalization of two affine equivariant and positive definite scatter matrices. Its classical implementation relies on a non-symmetric eigenvalue problem by diagonalizing one scatter relatively to the other. In case of collinearity, at least one of the scatter matrices is singular, making the problem unsolvable. To address this limitation, three approaches are proposed using: a Moore–Penrose pseudo inverse, a dimension reduction, and a generalized singular value decomposition. Their properties are investigated both theoretically and through various empirical applications. Overall, the extension based on the generalized singular value decomposition seems the most promising, even though it restricts the choice of scatter matrices to those that can be expressed as cross-products. In practice, some of the approaches also appear suitable in the context of data in high-dimension low-sample-size data.
{"title":"Generalized implementation of invariant coordinate selection with positive semi-definite scatter matrices","authors":"Aurore Archimbaud","doi":"10.1016/j.jmva.2025.105520","DOIUrl":"10.1016/j.jmva.2025.105520","url":null,"abstract":"<div><div>Invariant coordinate selection is an unsupervised multivariate data transformation useful in many contexts such as outlier detection or clustering. It is based on the simultaneous diagonalization of two affine equivariant and positive definite scatter matrices. Its classical implementation relies on a non-symmetric eigenvalue problem by diagonalizing one scatter relatively to the other. In case of collinearity, at least one of the scatter matrices is singular, making the problem unsolvable. To address this limitation, three approaches are proposed using: a Moore–Penrose pseudo inverse, a dimension reduction, and a generalized singular value decomposition. Their properties are investigated both theoretically and through various empirical applications. Overall, the extension based on the generalized singular value decomposition seems the most promising, even though it restricts the choice of scatter matrices to those that can be expressed as cross-products. In practice, some of the approaches also appear suitable in the context of data in high-dimension low-sample-size data.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105520"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-08DOI: 10.1016/j.jmva.2025.105527
Zeyu Li , Yong He , Xinbing Kong , Xinsheng Zhang
Two-way dimension reduction for well-structured matrix-valued data is growing popular in the past few years. To achieve robustness against individual matrix outliers with large spikes, arising either from heavy-tailed noise or large individual low-rank signals deviating from the population subspace, we first calculate the leading singular subspaces of each individual matrix, and then find the barycenter of the locally estimated subspaces across all observations, in contrast to the existing methods which first integrate data across observations and then do eigenvalue decomposition. In addition, a robust cut-off dimension determination criteria is suggested based on comparing the eigenvalue ratios of the corresponding Euclidean means of the projection matrices. Theoretical properties of the resulting estimators are investigated under mild conditions. Numerical simulation studies justify the advantages and robustness of the proposed methods over the existing tools. Two real examples associated with medical imaging and financial portfolios are given to provide empirical evidence on our arguments and also to illustrate the usefulness of the algorithms.
{"title":"Robust two-way dimension reduction by Grassmannian barycenter","authors":"Zeyu Li , Yong He , Xinbing Kong , Xinsheng Zhang","doi":"10.1016/j.jmva.2025.105527","DOIUrl":"10.1016/j.jmva.2025.105527","url":null,"abstract":"<div><div>Two-way dimension reduction for well-structured matrix-valued data is growing popular in the past few years. To achieve robustness against individual matrix outliers with large spikes, arising either from heavy-tailed noise or large individual low-rank signals deviating from the population subspace, we first calculate the leading singular subspaces of each individual matrix, and then find the barycenter of the locally estimated subspaces across all observations, in contrast to the existing methods which first integrate data across observations and then do eigenvalue decomposition. In addition, a robust cut-off dimension determination criteria is suggested based on comparing the eigenvalue ratios of the corresponding Euclidean means of the projection matrices. Theoretical properties of the resulting estimators are investigated under mild conditions. Numerical simulation studies justify the advantages and robustness of the proposed methods over the existing tools. Two real examples associated with medical imaging and financial portfolios are given to provide empirical evidence on our arguments and also to illustrate the usefulness of the algorithms.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105527"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145616470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Invariant coordinate selection (ICS) is a dimension reduction method, used as a preliminary step for clustering and outlier detection. It has been primarily applied to multivariate data. This work introduces a coordinate-free definition of ICS in an abstract Euclidean space and extends the method to complex data. Functional and distributional data are preprocessed into a finite-dimensional subspace. For example, in the framework of Bayes Hilbert spaces, distributional data are smoothed into compositional spline functions through the Maximum Penalised Likelihood method. We describe an outlier detection procedure for complex data and study the impact of some preprocessing parameters on the results. We compare our approach with other outlier detection methods through simulations, producing promising results in scenarios with a low proportion of outliers. ICS allows detecting abnormal climate events in a sample of daily maximum temperature distributions recorded across the provinces of Northern Vietnam between 1987 and 2016.
{"title":"ICS for complex data with application to outlier detection for density data","authors":"Camille Mondon , Huong Thi Trinh , Anne Ruiz-Gazen , Christine Thomas-Agnan","doi":"10.1016/j.jmva.2025.105522","DOIUrl":"10.1016/j.jmva.2025.105522","url":null,"abstract":"<div><div>Invariant coordinate selection (ICS) is a dimension reduction method, used as a preliminary step for clustering and outlier detection. It has been primarily applied to multivariate data. This work introduces a coordinate-free definition of ICS in an abstract Euclidean space and extends the method to complex data. Functional and distributional data are preprocessed into a finite-dimensional subspace. For example, in the framework of Bayes Hilbert spaces, distributional data are smoothed into compositional spline functions through the Maximum Penalised Likelihood method. We describe an outlier detection procedure for complex data and study the impact of some preprocessing parameters on the results. We compare our approach with other outlier detection methods through simulations, producing promising results in scenarios with a low proportion of outliers. ICS allows detecting abnormal climate events in a sample of daily maximum temperature distributions recorded across the provinces of Northern Vietnam between 1987 and 2016.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105522"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-08DOI: 10.1016/j.jmva.2025.105537
Shubhajit Sen , Soudeep Deb
Classification of multivariate time series (MTS) data has applications in various domains, for example, medical sciences, finance, sports analytics, etc. In this work, we propose a new technique that uses the advantages of dimension reduction through the t-distributed stochastic neighbor embedding (t-SNE) method, coupled with the attractive properties of the spectral density estimates of a time series, and k-nearest neighbor algorithm. We transform each MTS to a lower dimensional time series using t-SNE, making it useful for visualizing and retaining the temporal patterns, and subsequently use that in classification. Then, we extend the standard univariate spectral density-based classification in the multivariate setting and prove its theoretical consistency. Empirically, at first, we establish that the pairwise structure of the multivariate spectral density based distance matrix is retained in the t-SNE transformed spectral density-based distance calculation method, thus indicating that the consistency derived based on multivariate spectral density is transferable to our proposed method. The performance of our proposed method is shown by comparing it against other widely used methods and we find that the proposed algorithm achieves superior classification accuracy across various settings. We also demonstrate the superiority of our method in a real-life health dataset where the task is to classify epilepsy seizures from other activities like walking and running based on accelerometer data.
{"title":"tSNE-Spec: A new classification method for multivariate time series data","authors":"Shubhajit Sen , Soudeep Deb","doi":"10.1016/j.jmva.2025.105537","DOIUrl":"10.1016/j.jmva.2025.105537","url":null,"abstract":"<div><div>Classification of multivariate time series (MTS) data has applications in various domains, for example, medical sciences, finance, sports analytics, etc. In this work, we propose a new technique that uses the advantages of dimension reduction through the t-distributed stochastic neighbor embedding (t-SNE) method, coupled with the attractive properties of the spectral density estimates of a time series, and k-nearest neighbor algorithm. We transform each MTS to a lower dimensional time series using t-SNE, making it useful for visualizing and retaining the temporal patterns, and subsequently use that in classification. Then, we extend the standard univariate spectral density-based classification in the multivariate setting and prove its theoretical consistency. Empirically, at first, we establish that the pairwise structure of the multivariate spectral density based distance matrix is retained in the t-SNE transformed spectral density-based distance calculation method, thus indicating that the consistency derived based on multivariate spectral density is transferable to our proposed method. The performance of our proposed method is shown by comparing it against other widely used methods and we find that the proposed algorithm achieves superior classification accuracy across various settings. We also demonstrate the superiority of our method in a real-life health dataset where the task is to classify epilepsy seizures from other activities like walking and running based on accelerometer data.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105537"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-08DOI: 10.1016/j.jmva.2025.105521
Colombe Becquart , Aurore Archimbaud , Anne Ruiz-Gazen , Luka Prilć , Klaus Nordhausen
Invariant Coordinate Selection (ICS) is a multivariate technique that relies on the simultaneous diagonalization of two scatter matrices. It serves various purposes, including its use as a dimension reduction tool prior to clustering or outlier detection. ICS’s theoretical foundation establishes why and when the identified subspace should contain relevant information by demonstrating its connection with the Fisher discriminant subspace (FDS). These general results have been examined in detail primarily for specific scatter combinations within a two-cluster framework. In this study, we expand these investigations to include more clusters and scatter combinations. Our analysis reveals the importance of distinguishing whether the group centers matrix has full rank. In the full-rank case, we establish deeper connections between ICS and FDS. We provide a detailed study of these relationships for three clusters when the group centers matrix has full rank and when it does not. Based on these expanded theoretical insights and supported by numerical studies, we conclude that ICS is indeed suitable for recovering the FDS under very general settings and cases of failure seem rare.
{"title":"Invariant Coordinate Selection and Fisher discriminant subspace beyond the case of two groups","authors":"Colombe Becquart , Aurore Archimbaud , Anne Ruiz-Gazen , Luka Prilć , Klaus Nordhausen","doi":"10.1016/j.jmva.2025.105521","DOIUrl":"10.1016/j.jmva.2025.105521","url":null,"abstract":"<div><div>Invariant Coordinate Selection (ICS) is a multivariate technique that relies on the simultaneous diagonalization of two scatter matrices. It serves various purposes, including its use as a dimension reduction tool prior to clustering or outlier detection. ICS’s theoretical foundation establishes why and when the identified subspace should contain relevant information by demonstrating its connection with the Fisher discriminant subspace (FDS). These general results have been examined in detail primarily for specific scatter combinations within a two-cluster framework. In this study, we expand these investigations to include more clusters and scatter combinations. Our analysis reveals the importance of distinguishing whether the group centers matrix has full rank. In the full-rank case, we establish deeper connections between ICS and FDS. We provide a detailed study of these relationships for three clusters when the group centers matrix has full rank and when it does not. Based on these expanded theoretical insights and supported by numerical studies, we conclude that ICS is indeed suitable for recovering the FDS under very general settings and cases of failure seem rare.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105521"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-08DOI: 10.1016/j.jmva.2025.105532
Jorge M. Arevalillo , Hilario Navarro
Projection pursuit is an exploratory data analysis approach for summarizing multivariate data through the search of interesting data projections. It relies on the maximization of an abnormality measure that quantifies the relevance of a projection to capture data non-normal features. The need to expand the estimation approaches to address projection pursuit has motivated its study within parametric frameworks. This is a follow-up work aimed at exploring the problem under a general class of distributions as it is the scale mixture of skew-normal family. Projection pursuit is examined by exploring the path going from the role played by the fourth cumulant tensor for addressing the problem to its connection with model parameters. The paper contributes to build a triangulation among linear algebra, projection pursuit and parametric statistics. A simulation study and an example with real data are also provided.
{"title":"On the fourth cumulant tensor in projection pursuit for a flexible class of skewed models","authors":"Jorge M. Arevalillo , Hilario Navarro","doi":"10.1016/j.jmva.2025.105532","DOIUrl":"10.1016/j.jmva.2025.105532","url":null,"abstract":"<div><div>Projection pursuit is an exploratory data analysis approach for summarizing multivariate data through the search of interesting data projections. It relies on the maximization of an abnormality measure that quantifies the relevance of a projection to capture data non-normal features. The need to expand the estimation approaches to address projection pursuit has motivated its study within parametric frameworks. This is a follow-up work aimed at exploring the problem under a general class of distributions as it is the scale mixture of skew-normal family. Projection pursuit is examined by exploring the path going from the role played by the fourth cumulant tensor for addressing the problem to its connection with model parameters. The paper contributes to build a triangulation among linear algebra, projection pursuit and parametric statistics. A simulation study and an example with real data are also provided.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105532"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider the general dimensionality reduction problem of locating in a high-dimensional data cloud, a -dimensional non-Gaussian subspace of interesting features. We use a projection pursuit approach—we search for mutually orthogonal unit directions which maximise the -Wasserstein distance of the empirical distribution of data-projections along these directions from a standard Gaussian. Under a generative model, where there is a underlying (unknown) low-dimensional non-Gaussian subspace, we prove rigorous statistical guarantees on the accuracy of approximating this unknown subspace by the directions found by our projection pursuit approach. Our results operate in the regime where the data dimensionality is comparable to the sample size, and thus supplement the recent literature on the non-feasibility of locating interesting directions via projection pursuit in the complementary regime where the data dimensionality is much larger than the sample size.
{"title":"Wasserstein projection pursuit of non-Gaussian signals","authors":"Satyaki Mukherjee , Soumendu Sundar Mukherjee , Debarghya Ghoshdastidar","doi":"10.1016/j.jmva.2025.105535","DOIUrl":"10.1016/j.jmva.2025.105535","url":null,"abstract":"<div><div>We consider the general dimensionality reduction problem of locating in a high-dimensional data cloud, a <span><math><mi>k</mi></math></span>-dimensional non-Gaussian subspace of interesting features. We use a projection pursuit approach—we search for mutually orthogonal unit directions which maximise the <span><math><mi>q</mi></math></span>-Wasserstein distance of the empirical distribution of data-projections along these directions from a standard Gaussian. Under a generative model, where there is a underlying (unknown) low-dimensional non-Gaussian subspace, we prove rigorous statistical guarantees on the accuracy of approximating this unknown subspace by the directions found by our projection pursuit approach. Our results operate in the regime where the data dimensionality is comparable to the sample size, and thus supplement the recent literature on the non-feasibility of locating interesting directions via projection pursuit in the complementary regime where the data dimensionality is much larger than the sample size.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105535"},"PeriodicalIF":1.4,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145537558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-06DOI: 10.1016/j.jmva.2025.105517
Paavo Sattler , Dennis Dobler
Covariance matrices of random vectors contain information that is crucial for modeling. Specific structures and patterns of the covariances (or correlations) may be used to justify parametric models, e.g., autoregressive models. Until now, there have been only a few approaches for testing such covariance structures and most of them can only be used for one particular structure. In the present paper, we propose a systematic and unified testing procedure working among others for the large class of linear covariance structures. Our approach requires only weak distributional assumptions. It covers common structures such as diagonal matrices, Toeplitz matrices, and compound symmetry, as well as the more involved autoregressive matrices. We exemplify the approach for all these structures. We prove the correctness of these tests for large sample sizes and use bootstrap techniques for a better small-sample approximation. Moreover, the proposed tests invite adaptations to other covariance patterns by choosing the hypothesis matrix appropriately. With the help of a simulation study, we also assess the small sample properties of the tests. Finally, we illustrate the procedure in an application to a real data set.
{"title":"Testing for patterns and structures in covariance and correlation matrices","authors":"Paavo Sattler , Dennis Dobler","doi":"10.1016/j.jmva.2025.105517","DOIUrl":"10.1016/j.jmva.2025.105517","url":null,"abstract":"<div><div>Covariance matrices of random vectors contain information that is crucial for modeling. Specific structures and patterns of the covariances (or correlations) may be used to justify parametric models, e.g., autoregressive models. Until now, there have been only a few approaches for testing such covariance structures and most of them can only be used for one particular structure. In the present paper, we propose a systematic and unified testing procedure working among others for the large class of linear covariance structures. Our approach requires only weak distributional assumptions. It covers common structures such as diagonal matrices, Toeplitz matrices, and compound symmetry, as well as the more involved autoregressive matrices. We exemplify the approach for all these structures. We prove the correctness of these tests for large sample sizes and use bootstrap techniques for a better small-sample approximation. Moreover, the proposed tests invite adaptations to other covariance patterns by choosing the hypothesis matrix appropriately. With the help of a simulation study, we also assess the small sample properties of the tests. Finally, we illustrate the procedure in an application to a real data set.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105517"},"PeriodicalIF":1.4,"publicationDate":"2025-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-03DOI: 10.1016/j.jmva.2025.105518
Xilin Zhang , Guoliang Fan , Liping Zhu
Independence testing is a fundamental issue in statistics. In practice, almost all observations are measured with random errors. The independence test in the presence of measurement errors is an important issue but is rarely addressed in the literature. This paper focuses on distance correlation in the presence of measurement errors. We show that distance covariance is underestimated in the presence of measurement errors and is a strictly decreasing function of the dispersion of measurement errors. Furthermore, the powers of independence tests based on distance covariance and distance correlation are both strictly decreasing functions of the dispersion of measurement errors. Extensive numerical simulations and real data analysis support the conclusions drawn in this paper.
{"title":"Distance correlation in the presence of measurement errors","authors":"Xilin Zhang , Guoliang Fan , Liping Zhu","doi":"10.1016/j.jmva.2025.105518","DOIUrl":"10.1016/j.jmva.2025.105518","url":null,"abstract":"<div><div>Independence testing is a fundamental issue in statistics. In practice, almost all observations are measured with random errors. The independence test in the presence of measurement errors is an important issue but is rarely addressed in the literature. This paper focuses on distance correlation in the presence of measurement errors. We show that distance covariance is underestimated in the presence of measurement errors and is a strictly decreasing function of the dispersion of measurement errors. Furthermore, the powers of independence tests based on distance covariance and distance correlation are both strictly decreasing functions of the dispersion of measurement errors. Extensive numerical simulations and real data analysis support the conclusions drawn in this paper.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105518"},"PeriodicalIF":1.4,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145465407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}