首页 > 最新文献

Journal of Multivariate Analysis最新文献

英文 中文
Low-rank tensor regression for selection of grouped variables 用于选择分组变量的低秩张量回归
IF 1.6 3区 数学 Q2 Mathematics Pub Date : 2024-06-05 DOI: 10.1016/j.jmva.2024.105339
Yang Chen, Ziyan Luo, Lingchen Kong

Low-rank tensor regression (LRTR) problems are widely studied in statistics and machine learning, in which the regressors are generally grouped by clustering strongly correlated variables or variables corresponding to different levels of the same predictive factor in many practical applications. By virtue of the idea of group selection in the classical linear regression framework, we propose an LRTR method for adaptive selection of grouped variables in this article, which is formulated as a group SLOPE penalized low-rank, orthogonally decomposable tensor optimization problem. Moreover, we introduce the notion of tensor group false discovery rate (TgFDR) to measure the group selection performance. The proposed regression method provably controls TgFDR and achieves the asymptotically minimax estimate under the assumption that variable groups are orthogonal to each other. Finally, an alternating minimization algorithm is developed for efficient problem resolution. We demonstrate the performance of our proposed method in group selection and low-rank estimation through simulation studies and real dataset analysis.

低秩张量回归(LRTR)问题是统计学和机器学习领域广泛研究的问题,在许多实际应用中,一般通过对强相关变量或同一预测因子的不同等级所对应的变量进行聚类来对回归子进行分组。凭借经典线性回归框架中的分组选择思想,我们在本文中提出了一种用于分组变量自适应选择的 LRTR 方法,该方法被表述为一个分组 SLOPE 惩罚的低秩正交可分解张量优化问题。此外,我们还引入了张量组错误发现率(TgFDR)的概念来衡量组选择性能。在变量组相互正交的假设条件下,所提出的回归方法能有效控制 TgFDR,并实现渐近最小估计。最后,我们还开发了一种交替最小化算法,用于高效解决问题。我们通过模拟研究和实际数据集分析,证明了我们提出的方法在分组选择和低秩估计方面的性能。
{"title":"Low-rank tensor regression for selection of grouped variables","authors":"Yang Chen,&nbsp;Ziyan Luo,&nbsp;Lingchen Kong","doi":"10.1016/j.jmva.2024.105339","DOIUrl":"https://doi.org/10.1016/j.jmva.2024.105339","url":null,"abstract":"<div><p>Low-rank tensor regression (LRTR) problems are widely studied in statistics and machine learning, in which the regressors are generally grouped by clustering strongly correlated variables or variables corresponding to different levels of the same predictive factor in many practical applications. By virtue of the idea of group selection in the classical linear regression framework, we propose an LRTR method for adaptive selection of grouped variables in this article, which is formulated as a group SLOPE penalized low-rank, orthogonally decomposable tensor optimization problem. Moreover, we introduce the notion of tensor group false discovery rate (TgFDR) to measure the group selection performance. The proposed regression method provably controls TgFDR and achieves the asymptotically minimax estimate under the assumption that variable groups are orthogonal to each other. Finally, an alternating minimization algorithm is developed for efficient problem resolution. We demonstrate the performance of our proposed method in group selection and low-rank estimation through simulation studies and real dataset analysis.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141323989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bias correction for kernel density estimation with spherical data 球形数据核密度估计的偏差修正
IF 1.6 3区 数学 Q2 Mathematics Pub Date : 2024-06-01 DOI: 10.1016/j.jmva.2024.105338
Yasuhito Tsuruta

Kernel density estimations with spherical data can flexibly estimate the shape of an underlying density, including rotationally symmetric, skewed, and multimodal distributions. Standard estimators are generally based on rotationally symmetric kernel functions such as the von Mises kernel function. Unfortunately, their mean integrated squared error does not have root-n consistency and increasing the dimension slows its convergence rate. Therefore, this study aims to improve its accuracy by correcting this bias. It proposes bias correction methods by applying the generalized jackknifing method that can be generated from the von Mises kernel function. We also obtain the asymptotic mean integrated squared errors of the proposed estimators. We find that the convergence rates of the proposed estimators are higher than those of previous estimators. Further, a numerical experiment shows that the proposed estimators perform better than the von Mises kernel density estimators in finite samples in scenarios that are mixtures of von Mises densities.

球形数据的核密度估计可以灵活地估计基础密度的形状,包括旋转对称、倾斜和多模态分布。标准估计器一般基于旋转对称核函数,如 von Mises 核函数。遗憾的是,它们的平均综合平方误差不具有根 n 一致性,而且维度的增加会减慢其收敛速度。因此,本研究旨在通过纠正这一偏差来提高其精度。本研究通过应用可由 von Mises 核函数生成的广义千斤顶分度法,提出了偏差修正方法。我们还获得了所提估计器的渐近平均积分平方误差。我们发现,所提出的估计器的收敛率高于之前的估计器。此外,数值实验表明,在有限样本中,在 von Mises 密度混合的情况下,所提出的估计器比 von Mises 核密度估计器的性能更好。
{"title":"Bias correction for kernel density estimation with spherical data","authors":"Yasuhito Tsuruta","doi":"10.1016/j.jmva.2024.105338","DOIUrl":"10.1016/j.jmva.2024.105338","url":null,"abstract":"<div><p>Kernel density estimations with spherical data can flexibly estimate the shape of an underlying density, including rotationally symmetric, skewed, and multimodal distributions. Standard estimators are generally based on rotationally symmetric kernel functions such as the von Mises kernel function. Unfortunately, their mean integrated squared error does not have root-<span><math><mi>n</mi></math></span> consistency and increasing the dimension slows its convergence rate. Therefore, this study aims to improve its accuracy by correcting this bias. It proposes bias correction methods by applying the generalized jackknifing method that can be generated from the von Mises kernel function. We also obtain the asymptotic mean integrated squared errors of the proposed estimators. We find that the convergence rates of the proposed estimators are higher than those of previous estimators. Further, a numerical experiment shows that the proposed estimators perform better than the von Mises kernel density estimators in finite samples in scenarios that are mixtures of von Mises densities.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141281345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive directional estimator of the density in Rd for independent and mixing sequences 独立序列和混合序列的 Rd 中密度的自适应定向估计器
IF 1.6 3区 数学 Q2 Mathematics Pub Date : 2024-05-28 DOI: 10.1016/j.jmva.2024.105332
Sinda Ammous , Jérôme Dedecker , Céline Duval

A new multivariate density estimator for stationary sequences is obtained by Fourier inversion of the thresholded empirical characteristic function. This estimator does not depend on the choice of parameters related to the smoothness of the density; it is directly adaptive. We establish oracle inequalities valid for independent, α-mixing and τ-mixing sequences, which allows us to derive optimal convergence rates, up to a logarithmic loss. On general anisotropic Sobolev classes, the estimator adapts to the regularity of the unknown density but also achieves directional adaptivity. More precisely, the estimator is able to reach the convergence rate induced by the best Sobolev regularity of the density of AX, where A belongs to a class of invertible matrices describing all the possible directions. The estimator is easy to implement and numerically efficient. It depends on the calibration of a parameter for which we propose an innovative numerical selection procedure, using the Euler characteristic of the thresholded areas.

通过对阈值经验特征函数进行傅立叶反演,可以获得一种新的静态序列多元密度估算器。该估计器不依赖于与密度平滑性相关的参数选择;它是直接自适应的。我们建立了适用于独立、α 混合和 τ 混合序列的 oracle 不等式,从而得出了最佳收敛率,但损失不超过对数。在一般各向异性的索博列夫类上,估计器不仅能适应未知密度的规则性,还能实现方向适应性。更准确地说,估计器能够达到 AX 密度的最佳索博列夫正则性所引起的收敛率,其中 A 属于描述所有可能方向的一类可逆矩阵。该估计器易于实现,数值效率高。它取决于一个参数的校准,为此我们提出了一个创新的数值选择程序,使用阈值区域的欧拉特性。
{"title":"Adaptive directional estimator of the density in Rd for independent and mixing sequences","authors":"Sinda Ammous ,&nbsp;Jérôme Dedecker ,&nbsp;Céline Duval","doi":"10.1016/j.jmva.2024.105332","DOIUrl":"https://doi.org/10.1016/j.jmva.2024.105332","url":null,"abstract":"<div><p>A new multivariate density estimator for stationary sequences is obtained by Fourier inversion of the thresholded empirical characteristic function. This estimator does not depend on the choice of parameters related to the smoothness of the density; it is directly adaptive. We establish oracle inequalities valid for independent, <span><math><mi>α</mi></math></span>-mixing and <span><math><mi>τ</mi></math></span>-mixing sequences, which allows us to derive optimal convergence rates, up to a logarithmic loss. On general anisotropic Sobolev classes, the estimator adapts to the regularity of the unknown density but also achieves directional adaptivity. More precisely, the estimator is able to reach the convergence rate induced by the <em>best</em> Sobolev regularity of the density of <span><math><mrow><mi>A</mi><mi>X</mi></mrow></math></span>, where <span><math><mi>A</mi></math></span> belongs to a class of invertible matrices describing all the possible directions. The estimator is easy to implement and numerically efficient. It depends on the calibration of a parameter for which we propose an innovative numerical selection procedure, using the Euler characteristic of the thresholded areas.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141290044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ordinal pattern dependence and multivariate measures of dependence 序数模式依赖性和多元依赖性测量法
IF 1.6 3区 数学 Q2 Mathematics Pub Date : 2024-05-28 DOI: 10.1016/j.jmva.2024.105337
Angelika Silbernagel, Alexander Schnurr

Ordinal pattern dependence has been introduced in order to capture co-monotonic behavior between two time series. This concept has several features one would intuitively demand from a dependence measure. It was believed that ordinal pattern dependence satisfies the axioms which Grothe et al. (2014) proclaimed for a multivariate measure of dependence. In the present article we show that this is not true and that there is a mistake in the article by Betken et al. (2021). Furthermore, we show that ordinal pattern dependence satisfies a slightly modified set of axioms.

引入序数模式依赖性是为了捕捉两个时间序列之间的共单调行为。这一概念具有人们对隶属度量直观要求的几个特征。人们认为,序数模式依赖性满足 Grothe 等人(2014 年)提出的多元依赖性度量公理。在本文中,我们将证明事实并非如此,Betken 等人(2021 年)的文章存在错误。此外,我们还证明了序数模式依赖性满足一组稍加修改的公理。
{"title":"Ordinal pattern dependence and multivariate measures of dependence","authors":"Angelika Silbernagel,&nbsp;Alexander Schnurr","doi":"10.1016/j.jmva.2024.105337","DOIUrl":"https://doi.org/10.1016/j.jmva.2024.105337","url":null,"abstract":"<div><p>Ordinal pattern dependence has been introduced in order to capture co-monotonic behavior between two time series. This concept has several features one would intuitively demand from a dependence measure. It was believed that ordinal pattern dependence satisfies the axioms which Grothe et al. (2014) proclaimed for a multivariate measure of dependence. In the present article we show that this is not true and that there is a mistake in the article by Betken et al. (2021). Furthermore, we show that ordinal pattern dependence satisfies a slightly modified set of axioms.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0047259X24000447/pdfft?md5=1cb9743828786dd1e4dbfb081a6f213d&pid=1-s2.0-S0047259X24000447-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141323996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parametric dependence between random vectors via copula-based divergence measures 通过基于 copula 的发散度量随机向量之间的参数依赖性
IF 1.6 3区 数学 Q2 Mathematics Pub Date : 2024-05-24 DOI: 10.1016/j.jmva.2024.105336
Steven De Keyser, Irène Gijbels

This article proposes copula-based dependence quantification between multiple groups of random variables of possibly different sizes via the family of Φ-divergences. An axiomatic framework for this purpose is provided, after which we focus on the absolutely continuous setting assuming copula densities exist. We consider parametric and semi-parametric frameworks, discuss estimation procedures, and report on asymptotic properties of the proposed estimators. In particular, we first concentrate on a Gaussian copula approach yielding explicit and attractive dependence coefficients for specific choices of Φ, which are more amenable for estimation. Next, general parametric copula families are considered, with special attention to nested Archimedean copulas, being a natural choice for dependence modelling of random vectors. The results are illustrated by means of examples. Simulations and a real-world application on financial data are provided as well.

本文通过 Φ-divergences 系列提出了基于 copula 的大小可能不同的多组随机变量之间的依赖量化。本文为此提供了一个公理框架,之后我们将重点放在假设存在 copula 密度的绝对连续环境上。我们考虑了参数和半参数框架,讨论了估计程序,并报告了所提估计器的渐近特性。特别是,我们首先集中讨论了高斯共轭方法,这种方法对于特定的 Φ 选择具有明确而有吸引力的依赖系数,更适于估计。接下来,我们考虑了一般参数 copula 系列,并特别关注嵌套阿基米德 copulas,它是随机向量依赖性建模的自然选择。我们通过实例对结果进行了说明。此外,还提供了金融数据的模拟和实际应用。
{"title":"Parametric dependence between random vectors via copula-based divergence measures","authors":"Steven De Keyser,&nbsp;Irène Gijbels","doi":"10.1016/j.jmva.2024.105336","DOIUrl":"https://doi.org/10.1016/j.jmva.2024.105336","url":null,"abstract":"<div><p>This article proposes copula-based dependence quantification between multiple groups of random variables of possibly different sizes via the family of <span><math><mi>Φ</mi></math></span>-divergences. An axiomatic framework for this purpose is provided, after which we focus on the absolutely continuous setting assuming copula densities exist. We consider parametric and semi-parametric frameworks, discuss estimation procedures, and report on asymptotic properties of the proposed estimators. In particular, we first concentrate on a Gaussian copula approach yielding explicit and attractive dependence coefficients for specific choices of <span><math><mi>Φ</mi></math></span>, which are more amenable for estimation. Next, general parametric copula families are considered, with special attention to nested Archimedean copulas, being a natural choice for dependence modelling of random vectors. The results are illustrated by means of examples. Simulations and a real-world application on financial data are provided as well.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141239837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tensor recovery in high-dimensional Ising models 高维伊辛模型中的张量恢复
IF 1.6 3区 数学 Q2 Mathematics Pub Date : 2024-05-23 DOI: 10.1016/j.jmva.2024.105335
Tianyu Liu , Somabha Mukherjee , Rahul Biswas

The k-tensor Ising model is a multivariate exponential family on a p-dimensional binary hypercube for modeling dependent binary data, where the sufficient statistic consists of all k-fold products of the observations, and the parameter is an unknown k-fold tensor, designed to capture higher-order interactions between the binary variables. In this paper, we describe an approach based on a penalization technique that helps us recover the signed support of the tensor parameter with high probability, assuming that no entry of the true tensor is too close to zero. The method is based on an 1-regularized node-wise logistic regression, that recovers the signed neighborhood of each node with high probability. Our analysis is carried out in the high-dimensional regime, that allows the dimension p of the Ising model, as well as the interaction factor k to potentially grow to with the sample size n. We show that if the minimum interaction strength is not too small, then consistent recovery of the entire signed support is possible if one takes n=Ω((k!)8d3logp1k1) samples, where d denotes the maximum degree of the hypernetwork in question. Our results are validated in two simulation settings, and applied on a real neurobiological dataset consisting of multi-array electro-physiological recordings from the mouse visual cortex, to model higher-order interactions between the brain regions.

k 张量 Ising 模型是 p 维二元超立方体上的多元指数族,用于对依赖性二元数据建模,其中充分统计量由观测值的所有 k 倍乘积组成,而参数是一个未知的 k 倍张量,旨在捕捉二元变量之间的高阶交互作用。在本文中,我们介绍了一种基于惩罚技术的方法,假设真实张量的任何条目都不太接近零,该方法可以帮助我们高概率地恢复张量参数的符号支持。该方法基于 ℓ1-regularized 节点逻辑回归,能高概率地恢复每个节点的有符号邻域。我们的分析是在高维条件下进行的,这使得伊辛模型的维数 p 以及交互因子 k 有可能随着样本量 n 的增大而增长到 ∞。我们的研究表明,如果最小交互强度不太小,那么只要采取 n=Ω((k!)8d3logp-1k-1) 样本(其中 d 表示相关超网络的最大度数),就有可能一致地恢复整个有符号支持。我们的结果在两个模拟环境中得到了验证,并应用于由小鼠视觉皮层多阵列电生理记录组成的真实神经生物学数据集,以模拟大脑区域之间的高阶交互。
{"title":"Tensor recovery in high-dimensional Ising models","authors":"Tianyu Liu ,&nbsp;Somabha Mukherjee ,&nbsp;Rahul Biswas","doi":"10.1016/j.jmva.2024.105335","DOIUrl":"https://doi.org/10.1016/j.jmva.2024.105335","url":null,"abstract":"<div><p>The <span><math><mi>k</mi></math></span>-tensor Ising model is a multivariate exponential family on a <span><math><mi>p</mi></math></span>-dimensional binary hypercube for modeling dependent binary data, where the sufficient statistic consists of all <span><math><mi>k</mi></math></span>-fold products of the observations, and the parameter is an unknown <span><math><mi>k</mi></math></span>-fold tensor, designed to capture higher-order interactions between the binary variables. In this paper, we describe an approach based on a penalization technique that helps us recover the signed support of the tensor parameter with high probability, assuming that no entry of the true tensor is too close to zero. The method is based on an <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span>-regularized node-wise logistic regression, that recovers the signed neighborhood of each node with high probability. Our analysis is carried out in the high-dimensional regime, that allows the dimension <span><math><mi>p</mi></math></span> of the Ising model, as well as the interaction factor <span><math><mi>k</mi></math></span> to potentially grow to <span><math><mi>∞</mi></math></span> with the sample size <span><math><mi>n</mi></math></span>. We show that if the minimum interaction strength is not too small, then consistent recovery of the entire signed support is possible if one takes <span><math><mrow><mi>n</mi><mo>=</mo><mi>Ω</mi><mrow><mo>(</mo><msup><mrow><mrow><mo>(</mo><mi>k</mi><mo>!</mo><mo>)</mo></mrow></mrow><mrow><mn>8</mn></mrow></msup><msup><mrow><mi>d</mi></mrow><mrow><mn>3</mn></mrow></msup><mo>log</mo><mfenced><mrow><mfrac><mrow><mi>p</mi><mo>−</mo><mn>1</mn></mrow><mrow><mi>k</mi><mo>−</mo><mn>1</mn></mrow></mfrac></mrow></mfenced><mo>)</mo></mrow></mrow></math></span> samples, where <span><math><mi>d</mi></math></span> denotes the maximum degree of the hypernetwork in question. Our results are validated in two simulation settings, and applied on a real neurobiological dataset consisting of multi-array electro-physiological recordings from the mouse visual cortex, to model higher-order interactions between the brain regions.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141164340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distribution-on-distribution regression with Wasserstein metric: Multivariate Gaussian case 使用 Wasserstein 度量的分布对分布回归:多变量高斯情况
IF 1.6 3区 数学 Q2 Mathematics Pub Date : 2024-05-22 DOI: 10.1016/j.jmva.2024.105334
Ryo Okano , Masaaki Imaizumi

Distribution data refer to a data set in which each sample is represented as a probability distribution, a subject area that has received increasing interest in the field of statistics. Although several studies have developed distribution-to-distribution regression models for univariate variables, the multivariate scenario remains under-explored due to technical complexities. In this study, we introduce models for regression from one Gaussian distribution to another, using the Wasserstein metric. These models are constructed using the geometry of the Wasserstein space, which enables the transformation of Gaussian distributions into components of a linear matrix space. Owing to their linear regression frameworks, our models are intuitively understandable, and their implementation is simplified because of the optimal transport problem’s analytical solution between Gaussian distributions. We also explore a generalization of our models to encompass non-Gaussian scenarios. We establish the convergence rates of in-sample prediction errors for the empirical risk minimizations in our models. In comparative simulation experiments, our models demonstrate superior performance over a simpler alternative method that transforms Gaussian distributions into matrices. We present an application of our methodology using weather data for illustration purposes.

分布数据是指每个样本都表示为概率分布的数据集,这是统计学领域越来越受关注的一个主题领域。尽管已有多项研究针对单变量建立了分布到分布的回归模型,但由于技术复杂性,对多变量情况的研究仍然不足。在本研究中,我们使用 Wasserstein 度量引入了从一个高斯分布到另一个高斯分布的回归模型。这些模型是利用瓦瑟斯坦空间的几何结构构建的,它能将高斯分布转化为线性矩阵空间的分量。由于采用了线性回归框架,我们的模型直观易懂,而且由于高斯分布之间的最优传输问题有了解析解,模型的实现也得到了简化。我们还探索了模型的一般化,以涵盖非高斯情况。我们确定了模型中经验风险最小化的样本内预测误差收敛率。在比较模拟实验中,与将高斯分布转换为矩阵的更简单替代方法相比,我们的模型表现出更优越的性能。我们介绍了我们的方法在天气数据中的应用,以作说明。
{"title":"Distribution-on-distribution regression with Wasserstein metric: Multivariate Gaussian case","authors":"Ryo Okano ,&nbsp;Masaaki Imaizumi","doi":"10.1016/j.jmva.2024.105334","DOIUrl":"https://doi.org/10.1016/j.jmva.2024.105334","url":null,"abstract":"<div><p>Distribution data refer to a data set in which each sample is represented as a probability distribution, a subject area that has received increasing interest in the field of statistics. Although several studies have developed distribution-to-distribution regression models for univariate variables, the multivariate scenario remains under-explored due to technical complexities. In this study, we introduce models for regression from one Gaussian distribution to another, using the Wasserstein metric. These models are constructed using the geometry of the Wasserstein space, which enables the transformation of Gaussian distributions into components of a linear matrix space. Owing to their linear regression frameworks, our models are intuitively understandable, and their implementation is simplified because of the optimal transport problem’s analytical solution between Gaussian distributions. We also explore a generalization of our models to encompass non-Gaussian scenarios. We establish the convergence rates of in-sample prediction errors for the empirical risk minimizations in our models. In comparative simulation experiments, our models demonstrate superior performance over a simpler alternative method that transforms Gaussian distributions into matrices. We present an application of our methodology using weather data for illustration purposes.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0047259X24000411/pdfft?md5=dea43975f3758fd74adfc88e822be366&pid=1-s2.0-S0047259X24000411-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141239836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse subspace clustering in diverse multiplex network model 多样化多路复用网络模型中的稀疏子空间聚类
IF 1.6 3区 数学 Q2 Mathematics Pub Date : 2024-05-17 DOI: 10.1016/j.jmva.2024.105333
Majid Noroozi , Marianna Pensky

The paper considers the DIverse MultiPLEx (DIMPLE) network model, where all layers of the network have the same collection of nodes and are equipped with the Stochastic Block Models. In addition, all layers can be partitioned into groups with the same community structures, although the layers in the same group may have different matrices of block connection probabilities. To the best of our knowledge, the DIMPLE model, introduced in Pensky and Wang (2021), presents the most broad SBM-equipped binary multilayer network model on the same set of nodes and, thus, generalizes a multitude of papers that study more restrictive settings. Under the DIMPLE model, the main task is to identify the groups of layers with the same community structures since the matrices of block connection probabilities act as nuisance parameters under the DIMPLE paradigm. The main contribution of the paper is achieving the strongly consistent between-layer clustering by using Sparse Subspace Clustering (SSC), the well-developed technique in computer vision. In addition, SSC allows to handle much larger networks than spectral clustering, and is perfectly suitable for application of parallel computing. Moreover, our paper is the first one to obtain precision guarantees for SSC when it is applied to binary data.

本文考虑的是反向多 PLEx(DIMPLE)网络模型,其中网络的所有层都有相同的节点集合,并配备随机块模型。此外,所有层都可以划分为具有相同群落结构的组,尽管同一组中的层可能具有不同的块连接概率矩阵。据我们所知,Pensky 和 Wang(2021 年)提出的 DIMPLE 模型是同一节点集上最广泛的配备 SBM 的二元多层网络模型,因此,它概括了许多研究限制性更强的设置的论文。在 DIMPLE 模型下,主要任务是识别具有相同群落结构的层组,因为在 DIMPLE 范式下,块连接概率矩阵是干扰参数。本文的主要贡献在于通过使用稀疏子空间聚类(SSC)这一计算机视觉领域的成熟技术,实现了层间强一致性聚类。此外,与光谱聚类相比,稀疏子空间聚类可以处理更大的网络,而且非常适合并行计算的应用。此外,我们的论文是第一篇为 SSC 应用于二进制数据时获得精度保证的论文。
{"title":"Sparse subspace clustering in diverse multiplex network model","authors":"Majid Noroozi ,&nbsp;Marianna Pensky","doi":"10.1016/j.jmva.2024.105333","DOIUrl":"https://doi.org/10.1016/j.jmva.2024.105333","url":null,"abstract":"<div><p>The paper considers the DIverse MultiPLEx (DIMPLE) network model, where all layers of the network have the same collection of nodes and are equipped with the Stochastic Block Models. In addition, all layers can be partitioned into groups with the same community structures, although the layers in the same group may have different matrices of block connection probabilities. To the best of our knowledge, the DIMPLE model, introduced in Pensky and Wang (2021), presents the most broad SBM-equipped binary multilayer network model on the same set of nodes and, thus, generalizes a multitude of papers that study more restrictive settings. Under the DIMPLE model, the main task is to identify the groups of layers with the same community structures since the matrices of block connection probabilities act as nuisance parameters under the DIMPLE paradigm. The main contribution of the paper is achieving the strongly consistent between-layer clustering by using Sparse Subspace Clustering (SSC), the well-developed technique in computer vision. In addition, SSC allows to handle much larger networks than spectral clustering, and is perfectly suitable for application of parallel computing. Moreover, our paper is the first one to obtain precision guarantees for SSC when it is applied to binary data.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141095842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Mai–Wang stochastic decomposition for ℓp-norm symmetric survival functions on the positive orthant 论 p 上 ℓp 正态对称生存函数的麦-王随机分解
IF 1.6 3区 数学 Q2 Mathematics Pub Date : 2024-05-17 DOI: 10.1016/j.jmva.2024.105331
Christian Genest , Johanna G. Nešlehová

Recently, Mai and Wang (2021) investigated a class of p-norm symmetric survival functions on the positive orthant. In their paper, they claim that the generator of these functions must be d-monotone. This note explains that this is not true in general. Luckily, most of the results in Mai and Wang (2021) are not affected by this oversight.

最近,Mai 和 Wang(2021 年)研究了一类正正交上的ℓp 准则对称生存函数。在他们的论文中,他们声称这些函数的生成器必须是 d 单调的。本注释解释了这在一般情况下并非如此。幸运的是,Mai 和 Wang (2021) 中的大部分结果并没有受到这一疏忽的影响。
{"title":"On the Mai–Wang stochastic decomposition for ℓp-norm symmetric survival functions on the positive orthant","authors":"Christian Genest ,&nbsp;Johanna G. Nešlehová","doi":"10.1016/j.jmva.2024.105331","DOIUrl":"10.1016/j.jmva.2024.105331","url":null,"abstract":"<div><p>Recently, Mai and Wang (2021) investigated a class of <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mi>p</mi></mrow></msub></math></span>-norm symmetric survival functions on the positive orthant. In their paper, they claim that the generator of these functions must be <span><math><mi>d</mi></math></span>-monotone. This note explains that this is not true in general. Luckily, most of the results in Mai and Wang (2021) are not affected by this oversight.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0047259X24000381/pdfft?md5=f0a3613b1587ac23eed097d6f63a0a06&pid=1-s2.0-S0047259X24000381-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141028268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tuning-free sparse clustering via alternating hard-thresholding 通过交替硬阈值进行无调谐稀疏聚类
IF 1.6 3区 数学 Q2 Mathematics Pub Date : 2024-05-15 DOI: 10.1016/j.jmva.2024.105330
Wei Dong , Chen Xu , Jinhan Xie , Niansheng Tang

Model-based clustering is a commonly-used technique to partition heterogeneous data into homogeneous groups. When the analysis is to be conducted with a large number of features, analysts face simultaneous challenges in model interpretability, clustering accuracy, and computational efficiency. Several Bayesian and penalization methods have been proposed to select important features for model-based clustering. However, the performance of those methods relies on a careful algorithmic tuning, which can be time-consuming for high-dimensional cases. In this paper, we propose a new sparse clustering method based on alternating hard-thresholding. The new method is conceptually simple and tuning-free. With a user-specified sparsity level, it efficiently detects a set of key features by eliminating a large number of features that are less useful for clustering. Based on the selected key features, one can readily obtain an effective clustering of the original high-dimensional data under a general sparse covariance structure. Under mild conditions, we show that the new method leads to clusters with a misclassification rate consistent to the optimal rate as if the underlying true model were used. The promising performance of the new method is supported by both simulated and real data examples.

基于模型的聚类是将异质数据划分为同质组的常用技术。当需要使用大量特征进行分析时,分析人员同时面临着模型可解释性、聚类准确性和计算效率方面的挑战。目前已经提出了几种贝叶斯方法和惩罚方法来为基于模型的聚类选择重要特征。然而,这些方法的性能依赖于仔细的算法调整,这对于高维情况来说可能非常耗时。在本文中,我们提出了一种基于交替硬阈值的新稀疏聚类方法。新方法概念简单,无需调整。在用户指定的稀疏程度下,它能通过剔除大量对聚类作用较小的特征,高效地检测出一组关键特征。根据所选的关键特征,我们可以在一般稀疏协方差结构下轻松获得原始高维数据的有效聚类。在温和的条件下,我们发现新方法得到的聚类的误分类率与最佳误分类率一致,就像使用了底层真实模型一样。模拟和真实数据实例都证明了新方法的良好性能。
{"title":"Tuning-free sparse clustering via alternating hard-thresholding","authors":"Wei Dong ,&nbsp;Chen Xu ,&nbsp;Jinhan Xie ,&nbsp;Niansheng Tang","doi":"10.1016/j.jmva.2024.105330","DOIUrl":"10.1016/j.jmva.2024.105330","url":null,"abstract":"<div><p>Model-based clustering is a commonly-used technique to partition heterogeneous data into homogeneous groups. When the analysis is to be conducted with a large number of features, analysts face simultaneous challenges in model interpretability, clustering accuracy, and computational efficiency. Several Bayesian and penalization methods have been proposed to select important features for model-based clustering. However, the performance of those methods relies on a careful algorithmic tuning, which can be time-consuming for high-dimensional cases. In this paper, we propose a new sparse clustering method based on alternating hard-thresholding. The new method is conceptually simple and tuning-free. With a user-specified sparsity level, it efficiently detects a set of key features by eliminating a large number of features that are less useful for clustering. Based on the selected key features, one can readily obtain an effective clustering of the original high-dimensional data under a general sparse covariance structure. Under mild conditions, we show that the new method leads to clusters with a misclassification rate consistent to the optimal rate as if the underlying true model were used. The promising performance of the new method is supported by both simulated and real data examples.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141050885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Multivariate Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1