Pub Date : 2026-01-01Epub Date: 2025-11-08DOI: 10.1016/j.jmva.2025.105520
Aurore Archimbaud
Invariant coordinate selection is an unsupervised multivariate data transformation useful in many contexts such as outlier detection or clustering. It is based on the simultaneous diagonalization of two affine equivariant and positive definite scatter matrices. Its classical implementation relies on a non-symmetric eigenvalue problem by diagonalizing one scatter relatively to the other. In case of collinearity, at least one of the scatter matrices is singular, making the problem unsolvable. To address this limitation, three approaches are proposed using: a Moore–Penrose pseudo inverse, a dimension reduction, and a generalized singular value decomposition. Their properties are investigated both theoretically and through various empirical applications. Overall, the extension based on the generalized singular value decomposition seems the most promising, even though it restricts the choice of scatter matrices to those that can be expressed as cross-products. In practice, some of the approaches also appear suitable in the context of data in high-dimension low-sample-size data.
{"title":"Generalized implementation of invariant coordinate selection with positive semi-definite scatter matrices","authors":"Aurore Archimbaud","doi":"10.1016/j.jmva.2025.105520","DOIUrl":"10.1016/j.jmva.2025.105520","url":null,"abstract":"<div><div>Invariant coordinate selection is an unsupervised multivariate data transformation useful in many contexts such as outlier detection or clustering. It is based on the simultaneous diagonalization of two affine equivariant and positive definite scatter matrices. Its classical implementation relies on a non-symmetric eigenvalue problem by diagonalizing one scatter relatively to the other. In case of collinearity, at least one of the scatter matrices is singular, making the problem unsolvable. To address this limitation, three approaches are proposed using: a Moore–Penrose pseudo inverse, a dimension reduction, and a generalized singular value decomposition. Their properties are investigated both theoretically and through various empirical applications. Overall, the extension based on the generalized singular value decomposition seems the most promising, even though it restricts the choice of scatter matrices to those that can be expressed as cross-products. In practice, some of the approaches also appear suitable in the context of data in high-dimension low-sample-size data.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105520"},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-10-30DOI: 10.1016/j.jmva.2025.105516
Feng Luo , Hongxia Xu , Guoliang Fan , Liping Zhu
Motivated by the need to effectively characterize complex spatial dependencies inherent in ultra-high dimensional data, this paper develops a sparse semiparametric framework for modeling dynamic high-order spatial autoregressive processes. In this framework, the number of covariates in the linear component grows at a rate much faster than the sample size under a sparsity assumption, whereas the nonparametric component remains of fixed dimension. The varying coefficients are approximated using B-spline basis functions. To address the endogeneity arising from spatial lag terms, two-stage sieve least squares together with instrumental variable methods are employed. We investigate the theoretical properties of the oracle estimator, assuming that the true sparsity structure is known, and establish its convergence rates and asymptotic normality. Further, we propose a nonconvex penalized estimation procedure that simultaneously performs variable selection and estimates both the linear and spatial autoregressive parameters, and we show that it possesses the oracle property under mild conditions. The effectiveness of the proposed method is demonstrated through simulation studies and an empirical application to the Communities and Crime data set from the UCI Machine Learning Repository.
{"title":"Ultra-high dimensional semiparametric dynamic high-order spatial autoregressive models","authors":"Feng Luo , Hongxia Xu , Guoliang Fan , Liping Zhu","doi":"10.1016/j.jmva.2025.105516","DOIUrl":"10.1016/j.jmva.2025.105516","url":null,"abstract":"<div><div>Motivated by the need to effectively characterize complex spatial dependencies inherent in ultra-high dimensional data, this paper develops a sparse semiparametric framework for modeling dynamic high-order spatial autoregressive processes. In this framework, the number of covariates in the linear component grows at a rate much faster than the sample size under a sparsity assumption, whereas the nonparametric component remains of fixed dimension. The varying coefficients are approximated using B-spline basis functions. To address the endogeneity arising from spatial lag terms, two-stage sieve least squares together with instrumental variable methods are employed. We investigate the theoretical properties of the oracle estimator, assuming that the true sparsity structure is known, and establish its convergence rates and asymptotic normality. Further, we propose a nonconvex penalized estimation procedure that simultaneously performs variable selection and estimates both the linear and spatial autoregressive parameters, and we show that it possesses the oracle property under mild conditions. The effectiveness of the proposed method is demonstrated through simulation studies and an empirical application to the Communities and Crime data set from the UCI Machine Learning Repository.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105516"},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145416650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-11-08DOI: 10.1016/j.jmva.2025.105537
Shubhajit Sen , Soudeep Deb
Classification of multivariate time series (MTS) data has applications in various domains, for example, medical sciences, finance, sports analytics, etc. In this work, we propose a new technique that uses the advantages of dimension reduction through the t-distributed stochastic neighbor embedding (t-SNE) method, coupled with the attractive properties of the spectral density estimates of a time series, and k-nearest neighbor algorithm. We transform each MTS to a lower dimensional time series using t-SNE, making it useful for visualizing and retaining the temporal patterns, and subsequently use that in classification. Then, we extend the standard univariate spectral density-based classification in the multivariate setting and prove its theoretical consistency. Empirically, at first, we establish that the pairwise structure of the multivariate spectral density based distance matrix is retained in the t-SNE transformed spectral density-based distance calculation method, thus indicating that the consistency derived based on multivariate spectral density is transferable to our proposed method. The performance of our proposed method is shown by comparing it against other widely used methods and we find that the proposed algorithm achieves superior classification accuracy across various settings. We also demonstrate the superiority of our method in a real-life health dataset where the task is to classify epilepsy seizures from other activities like walking and running based on accelerometer data.
{"title":"tSNE-Spec: A new classification method for multivariate time series data","authors":"Shubhajit Sen , Soudeep Deb","doi":"10.1016/j.jmva.2025.105537","DOIUrl":"10.1016/j.jmva.2025.105537","url":null,"abstract":"<div><div>Classification of multivariate time series (MTS) data has applications in various domains, for example, medical sciences, finance, sports analytics, etc. In this work, we propose a new technique that uses the advantages of dimension reduction through the t-distributed stochastic neighbor embedding (t-SNE) method, coupled with the attractive properties of the spectral density estimates of a time series, and k-nearest neighbor algorithm. We transform each MTS to a lower dimensional time series using t-SNE, making it useful for visualizing and retaining the temporal patterns, and subsequently use that in classification. Then, we extend the standard univariate spectral density-based classification in the multivariate setting and prove its theoretical consistency. Empirically, at first, we establish that the pairwise structure of the multivariate spectral density based distance matrix is retained in the t-SNE transformed spectral density-based distance calculation method, thus indicating that the consistency derived based on multivariate spectral density is transferable to our proposed method. The performance of our proposed method is shown by comparing it against other widely used methods and we find that the proposed algorithm achieves superior classification accuracy across various settings. We also demonstrate the superiority of our method in a real-life health dataset where the task is to classify epilepsy seizures from other activities like walking and running based on accelerometer data.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105537"},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-11-11DOI: 10.1016/j.jmva.2025.105539
Xiaomeng Ju, Hyung G. Park, Thaddeus Tarpey
This paper develops a novel Bayesian approach for nonlinear regression with symmetric matrix predictors, often used to encode connectivity of different nodes. Unlike methods that vectorize matrices as predictors that result in a large number of model parameters and unstable estimation, we propose a Bayesian multi-index regression method, resulting in a projection-pursuit-type estimator that leverages the structure of matrix-valued predictors. We establish the model identifiability conditions and impose a sparsity-inducing prior on the projection directions for sparse sampling to prevent overfitting and enhance interpretability of the parameter estimates. Posterior inference is conducted through Bayesian backfitting. The performance of the proposed method is evaluated through simulation studies and a case study investigating the relationship between brain connectivity features and cognitive scores.
{"title":"Projection pursuit Bayesian regression for symmetric matrix predictors","authors":"Xiaomeng Ju, Hyung G. Park, Thaddeus Tarpey","doi":"10.1016/j.jmva.2025.105539","DOIUrl":"10.1016/j.jmva.2025.105539","url":null,"abstract":"<div><div>This paper develops a novel Bayesian approach for nonlinear regression with symmetric matrix predictors, often used to encode connectivity of different nodes. Unlike methods that vectorize matrices as predictors that result in a large number of model parameters and unstable estimation, we propose a Bayesian multi-index regression method, resulting in a projection-pursuit-type estimator that leverages the structure of matrix-valued predictors. We establish the model identifiability conditions and impose a sparsity-inducing prior on the projection directions for sparse sampling to prevent overfitting and enhance interpretability of the parameter estimates. Posterior inference is conducted through Bayesian backfitting. The performance of the proposed method is evaluated through simulation studies and a case study investigating the relationship between brain connectivity features and cognitive scores.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105539"},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Invariant coordinate selection (ICS) is a dimension reduction method, used as a preliminary step for clustering and outlier detection. It has been primarily applied to multivariate data. This work introduces a coordinate-free definition of ICS in an abstract Euclidean space and extends the method to complex data. Functional and distributional data are preprocessed into a finite-dimensional subspace. For example, in the framework of Bayes Hilbert spaces, distributional data are smoothed into compositional spline functions through the Maximum Penalised Likelihood method. We describe an outlier detection procedure for complex data and study the impact of some preprocessing parameters on the results. We compare our approach with other outlier detection methods through simulations, producing promising results in scenarios with a low proportion of outliers. ICS allows detecting abnormal climate events in a sample of daily maximum temperature distributions recorded across the provinces of Northern Vietnam between 1987 and 2016.
{"title":"ICS for complex data with application to outlier detection for density data","authors":"Camille Mondon , Huong Thi Trinh , Anne Ruiz-Gazen , Christine Thomas-Agnan","doi":"10.1016/j.jmva.2025.105522","DOIUrl":"10.1016/j.jmva.2025.105522","url":null,"abstract":"<div><div>Invariant coordinate selection (ICS) is a dimension reduction method, used as a preliminary step for clustering and outlier detection. It has been primarily applied to multivariate data. This work introduces a coordinate-free definition of ICS in an abstract Euclidean space and extends the method to complex data. Functional and distributional data are preprocessed into a finite-dimensional subspace. For example, in the framework of Bayes Hilbert spaces, distributional data are smoothed into compositional spline functions through the Maximum Penalised Likelihood method. We describe an outlier detection procedure for complex data and study the impact of some preprocessing parameters on the results. We compare our approach with other outlier detection methods through simulations, producing promising results in scenarios with a low proportion of outliers. ICS allows detecting abnormal climate events in a sample of daily maximum temperature distributions recorded across the provinces of Northern Vietnam between 1987 and 2016.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105522"},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-11-12DOI: 10.1016/j.jmva.2025.105519
Harry Joe , Dorota Kurowicka
The method for generating random correlation matrices with a partial correlation C-vine is extended so that each correlation can have a distribution that is asymmetric on or on . With the recursion formulas from the partial correlation C-vine to the correlation matrix, first and second moments can be derived, in the case of the same distribution for each partial correlation in tree of the vine (). Algorithms and conditions are given so that, after a permutation step, all random correlations have a common mean and second moment. The algorithms can be useful for simulation experiments to generate random correlation matrices that cover the whole space or with the restriction that each correlation is positive.
{"title":"Random correlation matrices generated via partial correlation C-vines","authors":"Harry Joe , Dorota Kurowicka","doi":"10.1016/j.jmva.2025.105519","DOIUrl":"10.1016/j.jmva.2025.105519","url":null,"abstract":"<div><div>The method for generating random <span><math><mrow><mi>d</mi><mo>×</mo><mi>d</mi></mrow></math></span> correlation matrices with a partial correlation C-vine is extended so that each correlation can have a distribution that is asymmetric on <span><math><mrow><mo>(</mo><mo>−</mo><mn>1</mn><mo>,</mo><mn>1</mn><mo>)</mo></mrow></math></span> or on <span><math><mrow><mo>(</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>)</mo></mrow></math></span>. With the recursion formulas from the partial correlation C-vine to the correlation matrix, first and second moments can be derived, in the case of the same distribution for each partial correlation in tree <span><math><mi>ℓ</mi></math></span> of the vine (<span><math><mrow><mn>1</mn><mo>≤</mo><mi>ℓ</mi><mo><</mo><mi>d</mi></mrow></math></span>). Algorithms and conditions are given so that, after a permutation step, all random correlations have a common mean and second moment. The algorithms can be useful for simulation experiments to generate random correlation matrices that cover the whole space or with the restriction that each correlation is positive.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105519"},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-10-10DOI: 10.1016/j.jmva.2025.105514
Min Xu , Qi-Hang Zhou , Qin Fang , Zhuo-Xi Shi
We investigate the Nyström method as an efficient means of overcoming the computational bottleneck inherent in estimating the singular functions of kernel cross-covariance operators, which play a central role in tasks such as covariate shift correction and multi-view learning. We present a Nyström-type approximation of the kernel cross-covariance operator, and establish its convergence rate. Furthermore, we derive a novel bound on the weighted sum of squared estimation errors of all associated singular functions, providing tighter control than traditional bounds that treat each error individually. Our theoretical analysis reveals that the Nyström-based singular function estimators attain the same statistical accuracy as their full empirical counterparts, while offering significant computational savings. Numerical experiments further confirm the practical effectiveness of the proposed approach.
{"title":"Estimating singular functions of kernel cross-covariance operators: An investigation of the Nyström method","authors":"Min Xu , Qi-Hang Zhou , Qin Fang , Zhuo-Xi Shi","doi":"10.1016/j.jmva.2025.105514","DOIUrl":"10.1016/j.jmva.2025.105514","url":null,"abstract":"<div><div>We investigate the Nyström method as an efficient means of overcoming the computational bottleneck inherent in estimating the singular functions of kernel cross-covariance operators, which play a central role in tasks such as covariate shift correction and multi-view learning. We present a Nyström-type approximation of the kernel cross-covariance operator, and establish its convergence rate. Furthermore, we derive a novel bound on the weighted sum of squared estimation errors of all associated singular functions, providing tighter control than traditional bounds that treat each error individually. Our theoretical analysis reveals that the Nyström-based singular function estimators attain the same statistical accuracy as their full empirical counterparts, while offering significant computational savings. Numerical experiments further confirm the practical effectiveness of the proposed approach.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105514"},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145266689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-09-24DOI: 10.1016/j.jmva.2025.105511
Hui Chen , Yinxu Jia
In this article, we identify differential networks within the Gaussian graphical model framework by examining the equivalence of two precision matrices. It is challenging work when the dimension of the precision matrix increases with the sample size. Existing methods typically assume sparsity in the precision matrix structure, a condition often unmet in real data. To address this issue, we introduce a statistic based on debiased estimator of the high-dimensional precision matrix and employ multiplier bootstrap to approximate the null distribution of the proposed statistic. The proposed method can be easily coupled with various estimation algorithms for high-dimensional precision matrix. In comparison with existing methods, the superiority of the proposed approach lies in mild structure constraints to the unknown precision matrix, making it robust to intricate conditional dependence structures in real data. Additionally, we introduce a cross-fitting procedure that utilizes full data information, leading to enhanced statistical power. Theoretical justification is provided to ensure the validity of the proposed method without restrictive assumptions. We showcase the effectiveness of our proposed method by simulation and real data example, which provides evidence of the proposed method’s usefulness and potential for application in various domains.
{"title":"Identifying differential networks through high-dimensional two-sample inference","authors":"Hui Chen , Yinxu Jia","doi":"10.1016/j.jmva.2025.105511","DOIUrl":"10.1016/j.jmva.2025.105511","url":null,"abstract":"<div><div>In this article, we identify differential networks within the Gaussian graphical model framework by examining the equivalence of two precision matrices. It is challenging work when the dimension of the precision matrix increases with the sample size. Existing methods typically assume sparsity in the precision matrix structure, a condition often unmet in real data. To address this issue, we introduce a statistic based on debiased estimator of the high-dimensional precision matrix and employ multiplier bootstrap to approximate the null distribution of the proposed statistic. The proposed method can be easily coupled with various estimation algorithms for high-dimensional precision matrix. In comparison with existing methods, the superiority of the proposed approach lies in mild structure constraints to the unknown precision matrix, making it robust to intricate conditional dependence structures in real data. Additionally, we introduce a cross-fitting procedure that utilizes full data information, leading to enhanced statistical power. Theoretical justification is provided to ensure the validity of the proposed method without restrictive assumptions. We showcase the effectiveness of our proposed method by simulation and real data example, which provides evidence of the proposed method’s usefulness and potential for application in various domains.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105511"},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145266690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-11-03DOI: 10.1016/j.jmva.2025.105518
Xilin Zhang , Guoliang Fan , Liping Zhu
Independence testing is a fundamental issue in statistics. In practice, almost all observations are measured with random errors. The independence test in the presence of measurement errors is an important issue but is rarely addressed in the literature. This paper focuses on distance correlation in the presence of measurement errors. We show that distance covariance is underestimated in the presence of measurement errors and is a strictly decreasing function of the dispersion of measurement errors. Furthermore, the powers of independence tests based on distance covariance and distance correlation are both strictly decreasing functions of the dispersion of measurement errors. Extensive numerical simulations and real data analysis support the conclusions drawn in this paper.
{"title":"Distance correlation in the presence of measurement errors","authors":"Xilin Zhang , Guoliang Fan , Liping Zhu","doi":"10.1016/j.jmva.2025.105518","DOIUrl":"10.1016/j.jmva.2025.105518","url":null,"abstract":"<div><div>Independence testing is a fundamental issue in statistics. In practice, almost all observations are measured with random errors. The independence test in the presence of measurement errors is an important issue but is rarely addressed in the literature. This paper focuses on distance correlation in the presence of measurement errors. We show that distance covariance is underestimated in the presence of measurement errors and is a strictly decreasing function of the dispersion of measurement errors. Furthermore, the powers of independence tests based on distance covariance and distance correlation are both strictly decreasing functions of the dispersion of measurement errors. Extensive numerical simulations and real data analysis support the conclusions drawn in this paper.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105518"},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145465407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-09-12DOI: 10.1016/j.jmva.2025.105507
Samuel Valiquette , Jean Peyhardi , Éric Marchand , Gwladys Toulemonde , Frédéric Mortier
In this article, we develop a new class of multivariate distributions adapted for count data, called Tree Pólya Splitting. This class results from the combination of a univariate distribution and singular multivariate distributions along a fixed partition tree. Known distributions, including the Dirichlet-multinomial, the generalized Dirichlet-multinomial and the Dirichlet-tree multinomial, are particular cases within this class. As we will demonstrate, these distributions offer some flexibility, allowing for the modeling of complex dependence structures (positive, negative, or null) at the observation level. Specifically, we present theoretical properties of Tree Pólya Splitting distributions by focusing primarily on marginal distributions, factorial moments, and dependence structures (covariance and correlations). A dataset of abundance of Trichoptera is used, on one hand, as a benchmark to illustrate the theoretical properties developed in this article, and on the other hand, to demonstrate the interest of these types of models, notably by comparing them to other approaches for fitting multivariate data, such as the Poisson-lognormal model in ecology or singular multivariate distributions used in microbial analysis.
{"title":"Tree Pólya Splitting distributions for multivariate count data","authors":"Samuel Valiquette , Jean Peyhardi , Éric Marchand , Gwladys Toulemonde , Frédéric Mortier","doi":"10.1016/j.jmva.2025.105507","DOIUrl":"10.1016/j.jmva.2025.105507","url":null,"abstract":"<div><div>In this article, we develop a new class of multivariate distributions adapted for count data, called Tree Pólya Splitting. This class results from the combination of a univariate distribution and singular multivariate distributions along a fixed partition tree. Known distributions, including the Dirichlet-multinomial, the generalized Dirichlet-multinomial and the Dirichlet-tree multinomial, are particular cases within this class. As we will demonstrate, these distributions offer some flexibility, allowing for the modeling of complex dependence structures (positive, negative, or null) at the observation level. Specifically, we present theoretical properties of Tree Pólya Splitting distributions by focusing primarily on marginal distributions, factorial moments, and dependence structures (covariance and correlations). A dataset of abundance of Trichoptera is used, on one hand, as a benchmark to illustrate the theoretical properties developed in this article, and on the other hand, to demonstrate the interest of these types of models, notably by comparing them to other approaches for fitting multivariate data, such as the Poisson-lognormal model in ecology or singular multivariate distributions used in microbial analysis.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105507"},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}