Journal of Multivariate Analysis最新文献

英文中文

Bayesian inference of graph-based dependencies from mixed-type data 从混合型数据中对基于图的依赖关系进行贝叶斯推断

IF 1.6 3区数学 Q2 STATISTICS & PROBABILITY

Journal of Multivariate Analysis

Pub Date : 2024-05-06 DOI: 10.1016/j.jmva.2024.105323

Chiara Galimberti , Stefano Peluso , Federico Castelletti

Mixed data comprise measurements of different types, with both categorical and continuous variables, and can be found in various areas, such as in life science or industrial processes. Inferring conditional independencies from the data is crucial to understand how these variables relate to each other. To this end, graphical models provide an effective framework, which adopts a graph-based representation of the joint distribution to encode such dependence relations. This framework has been extensively studied in the Gaussian and categorical settings separately; on the other hand, the literature addressing this problem in presence of mixed data is still narrow. We propose a Bayesian model for the analysis of mixed data based on the notion of Conditional Gaussian (CG) distribution. Our method is based on a canonical parameterization of the CG distribution, which allows for posterior inference of parameters indexing the (marginal) distributions of continuous and categorical variables, as well as expressing the interactions between the two types of variables. We derive the limiting Gaussian distributions, centered on the correct unknown value and with vanishing variance, for the Bayesian estimators of the canonical parameters expressing continuous, discrete and mixed interactions. In addition, we implement the proposed method for structure learning purposes, namely to infer the underlying graph of conditional independencies. When compared to alternative frequentist methods, our approach shows favorable results both in a simulation setting and in real-data applications, besides allowing for a coherent uncertainty quantification around parameter estimates.

混合数据由不同类型的测量数据组成，既有分类变量，也有连续变量，可用于生命科学或工业流程等多个领域。从数据中推断条件独立性对于理解这些变量之间的关系至关重要。为此，图形模型提供了一个有效的框架，它采用基于图形的联合分布表示法来编码这种依赖关系。这一框架已分别在高斯和分类设置中得到广泛研究；另一方面，解决混合数据问题的文献仍然很少。我们提出了一种基于条件高斯分布（CG）概念的贝叶斯模型，用于分析混合数据。我们的方法基于条件高斯分布的规范参数化，它允许对连续变量和分类变量（边际）分布的参数进行后验推断，并表达两类变量之间的交互作用。我们为表达连续、离散和混合交互作用的典型参数的贝叶斯估计值推导出了以正确未知值为中心且方差消失的极限高斯分布。此外，我们还将所提出的方法用于结构学习目的，即推断条件独立性的底层图。与其他频数主义方法相比，我们的方法在模拟环境和实际数据应用中都显示出良好的效果，而且还允许对参数估计进行连贯的不确定性量化。

{"title":"Bayesian inference of graph-based dependencies from mixed-type data","authors":"Chiara Galimberti , Stefano Peluso , Federico Castelletti","doi":"10.1016/j.jmva.2024.105323","DOIUrl":"https://doi.org/10.1016/j.jmva.2024.105323","url":null,"abstract":"<div><p>Mixed data comprise measurements of different types, with both categorical and continuous variables, and can be found in various areas, such as in life science or industrial processes. Inferring conditional independencies from the data is crucial to understand how these variables relate to each other. To this end, graphical models provide an effective framework, which adopts a graph-based representation of the joint distribution to encode such dependence relations. This framework has been extensively studied in the Gaussian and categorical settings separately; on the other hand, the literature addressing this problem in presence of mixed data is still narrow. We propose a Bayesian model for the analysis of mixed data based on the notion of Conditional Gaussian (CG) distribution. Our method is based on a canonical parameterization of the CG distribution, which allows for posterior inference of parameters indexing the (marginal) distributions of continuous and categorical variables, as well as expressing the interactions between the two types of variables. We derive the limiting Gaussian distributions, centered on the correct unknown value and with vanishing variance, for the Bayesian estimators of the canonical parameters expressing continuous, discrete and mixed interactions. In addition, we implement the proposed method for structure learning purposes, namely to infer the underlying graph of conditional independencies. When compared to alternative frequentist methods, our approach shows favorable results both in a simulation setting and in real-data applications, besides allowing for a coherent uncertainty quantification around parameter estimates.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"203 ","pages":"Article 105323"},"PeriodicalIF":1.6,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140906825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhanced Laplace approximation 增强拉普拉斯近似

IF 1.6 3区数学 Q2 STATISTICS & PROBABILITY

Journal of Multivariate Analysis

Pub Date : 2024-04-26 DOI: 10.1016/j.jmva.2024.105321

Jeongseop Han, Youngjo Lee

The Laplace approximation has been proposed as a method for approximating the marginal likelihood of statistical models with latent variables. However, the approximate maximum likelihood estimators derived from the Laplace approximation are often biased for binary or temporally and/or spatially correlated data. Additionally, the corresponding Hessian matrix tends to underestimates the standard errors of these approximate maximum likelihood estimators. While higher-order approximations have been suggested, they are not applicable to complex models, such as correlated random effects models, and fail to provide consistent variance estimators. In this paper, we propose an enhanced Laplace approximation that provides the true maximum likelihood estimator and its consistent variance estimator. We study its relationship with the variational Bayes method. We also define a new restricted maximum likelihood estimator for estimating dispersion parameters and study their asymptotic properties. Enhanced Laplace approximation generally demonstrates how to obtain the true restricted maximum likelihood estimators and their variance estimators. Our numerical studies indicate that the enhanced Laplace approximation provides a satisfactory maximum likelihood estimator and restricted maximum likelihood estimator, as well as their variance estimators in the frequentist perspective. The maximum likelihood estimator and restricted maximum likelihood estimator can be also interpreted as the posterior mode and marginal posterior mode under flat priors, respectively. Furthermore, we present some comparisons with Bayesian procedures under different priors.

拉普拉斯近似法是一种用于近似潜在变量统计模型边际似然的方法。然而，对于二元数据或时间和/或空间相关数据，从拉普拉斯近似法得出的近似极大似然估计值往往存在偏差。此外，相应的 Hessian 矩阵往往会低估这些近似极大似然估计值的标准误差。虽然有人提出了更高阶的近似值，但它们不适用于复杂的模型，如相关随机效应模型，也不能提供一致的方差估计值。在本文中，我们提出了一种增强的拉普拉斯近似方法，它能提供真正的最大似然估计值及其一致的方差估计值。我们研究了它与变异贝叶斯方法的关系。我们还定义了用于估计离散参数的新的受限最大似然估计器，并研究了其渐近特性。增强拉普拉斯近似一般展示了如何获得真正的受限极大似然估计器及其方差估计器。我们的数值研究表明，增强拉普拉斯近似提供了一个令人满意的最大似然估计器和受限最大似然估计器，以及频繁主义视角下的它们的方差估计器。最大似然估计和受限最大似然估计也可以分别解释为平面先验下的后验模式和边际后验模式。此外，我们还对不同先验下的贝叶斯程序进行了比较。

{"title":"Enhanced Laplace approximation","authors":"Jeongseop Han, Youngjo Lee","doi":"10.1016/j.jmva.2024.105321","DOIUrl":"https://doi.org/10.1016/j.jmva.2024.105321","url":null,"abstract":"<div><p>The Laplace approximation has been proposed as a method for approximating the marginal likelihood of statistical models with latent variables. However, the approximate maximum likelihood estimators derived from the Laplace approximation are often biased for binary or temporally and/or spatially correlated data. Additionally, the corresponding Hessian matrix tends to underestimates the standard errors of these approximate maximum likelihood estimators. While higher-order approximations have been suggested, they are not applicable to complex models, such as correlated random effects models, and fail to provide consistent variance estimators. In this paper, we propose an enhanced Laplace approximation that provides the true maximum likelihood estimator and its consistent variance estimator. We study its relationship with the variational Bayes method. We also define a new restricted maximum likelihood estimator for estimating dispersion parameters and study their asymptotic properties. Enhanced Laplace approximation generally demonstrates how to obtain the true restricted maximum likelihood estimators and their variance estimators. Our numerical studies indicate that the enhanced Laplace approximation provides a satisfactory maximum likelihood estimator and restricted maximum likelihood estimator, as well as their variance estimators in the frequentist perspective. The maximum likelihood estimator and restricted maximum likelihood estimator can be also interpreted as the posterior mode and marginal posterior mode under flat priors, respectively. Furthermore, we present some comparisons with Bayesian procedures under different priors.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"202 ","pages":"Article 105321"},"PeriodicalIF":1.6,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140807251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multivariate unified skew-t distributions and their properties 多变量统一偏斜-t 分布及其性质

IF 1.6 3区数学 Q2 STATISTICS & PROBABILITY

Journal of Multivariate Analysis

Pub Date : 2024-04-26 DOI: 10.1016/j.jmva.2024.105322

Kesen Wang , Maicon J. Karling , Reinaldo B. Arellano-Valle , Marc G. Genton

The unified skew- $t$ (SUT) is a flexible parametric multivariate distribution that accounts for skewness and heavy tails in the data. A few of its properties can be found scattered in the literature or in a parameterization that does not follow the original one for unified skew-normal (SUN) distributions, yet a systematic study is lacking. In this work, explicit properties of the multivariate SUT distribution are presented, such as its stochastic representations, moments, SUN-scale mixture representation, linear transformation, additivity, marginal distribution, canonical form, quadratic form, conditional distribution, change of latent dimensions, Mardia measures of multivariate skewness and kurtosis, and non-identifiability issue. These results are given in a parameterization that reduces to the original SUN distribution as a sub-model, hence facilitating the use of the SUT for applications. Several models based on the SUT distribution are provided for illustration.

统一偏斜正态分布（SUT）是一种灵活的参数多元分布，它考虑了数据的偏斜度和重尾。它的一些性质散见于文献或参数化中，与统一偏态正态分布（SUN）的原始参数化不同，但缺乏系统的研究。本研究提出了多元 SUT 分布的明确性质，如随机表示、矩、SUN 尺度混合表示、线性变换、可加性、边际分布、典型形式、二次形式、条件分布、潜维变化、多元偏度和峰度的 Mardia 度量以及不可识别性问题。这些结果以参数化的形式给出，可以还原为原始 SUN 分布的子模型，从而方便了 SUT 的应用。本文提供了几个基于 SUT 分布的模型以作说明。

引用次数: 0

Testing distributional equality for functional random variables 测试函数式随机变量的分布相等性

IF 1.6 3区数学 Q2 STATISTICS & PROBABILITY

Journal of Multivariate Analysis

Pub Date : 2024-04-22 DOI: 10.1016/j.jmva.2024.105318

Bilol Banerjee

In this article, we present a nonparametric method for the general two-sample problem involving functional random variables modeled as elements of a separable Hilbert space $H$ . First, we present a general recipe based on linear projections to construct a measure of dissimilarity between two probability distributions on $H$ . In particular, we consider a measure based on the energy statistic and present some of its nice theoretical properties. A plug-in estimator of this measure is used as the test statistic to construct a general two-sample test. Large sample distribution of this statistic is derived both under null and alternative hypotheses. However, since the quantiles of the limiting null distribution are analytically intractable, the test is calibrated using the permutation method. We prove the large sample consistency of the resulting permutation test under fairly general assumptions. We also study the efficiency of the proposed test by establishing a new local asymptotic normality result for functional random variables. Using that result, we derive the asymptotic distribution of the permuted test statistic and the asymptotic power of the permutation test under local contiguous alternatives. This establishes that the permutation test is statistically efficient in the Pitman sense. Extensive simulation studies are carried out and a real data set is analyzed to compare the performance of our proposed test with some state-of-the-art methods.

在本文中，我们提出了一种非参数方法，用于解决涉及作为可分离希尔伯特空间 H 的元素建模的函数式随机变量的一般双样本问题。首先，我们提出了一种基于线性投影的一般方法，用于构建 H 上两个概率分布之间的不相似度量。这个度量的插件估计器被用作检验统计量，以构建一般的双样本检验。在零假设和备择假设下，该统计量的大样本分布均可得出。然而，由于极限零分布的量级在分析上是难以处理的，因此该检验使用 permutation 方法进行校准。我们在相当一般的假设条件下证明了所得到的置换检验的大样本一致性。我们还通过为函数式随机变量建立一个新的局部渐近正态性结果，研究了所提出的检验的效率。利用这一结果，我们推导出了在局部连续替代条件下，置换检验统计量的渐近分布和置换检验的渐近功率。这证明了在皮特曼意义上，置换检验在统计上是有效的。我们进行了广泛的模拟研究，并分析了一个真实数据集，以比较我们提出的检验方法与一些最先进方法的性能。

{"title":"Testing distributional equality for functional random variables","authors":"Bilol Banerjee","doi":"10.1016/j.jmva.2024.105318","DOIUrl":"https://doi.org/10.1016/j.jmva.2024.105318","url":null,"abstract":"<div><p>In this article, we present a nonparametric method for the general two-sample problem involving functional random variables modeled as elements of a separable Hilbert space <span><math><mi>H</mi></math></span>. First, we present a general recipe based on linear projections to construct a measure of dissimilarity between two probability distributions on <span><math><mi>H</mi></math></span>. In particular, we consider a measure based on the energy statistic and present some of its nice theoretical properties. A plug-in estimator of this measure is used as the test statistic to construct a general two-sample test. Large sample distribution of this statistic is derived both under null and alternative hypotheses. However, since the quantiles of the limiting null distribution are analytically intractable, the test is calibrated using the permutation method. We prove the large sample consistency of the resulting permutation test under fairly general assumptions. We also study the efficiency of the proposed test by establishing a new local asymptotic normality result for functional random variables. Using that result, we derive the asymptotic distribution of the permuted test statistic and the asymptotic power of the permutation test under local contiguous alternatives. This establishes that the permutation test is statistically efficient in the Pitman sense. Extensive simulation studies are carried out and a real data set is analyzed to compare the performance of our proposed test with some state-of-the-art methods.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"203 ","pages":"Article 105318"},"PeriodicalIF":1.6,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140825304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A fast and accurate kernel-based independence test with applications to high-dimensional and functional data 基于内核的快速准确独立性测试，适用于高维数据和函数数据

IF 1.6 3区数学 Q2 STATISTICS & PROBABILITY

Journal of Multivariate Analysis

Pub Date : 2024-04-20 DOI: 10.1016/j.jmva.2024.105320

Jin-Ting Zhang , Tianming Zhu

Testing the dependency between two random variables is an important inference problem in statistics since many statistical procedures rely on the assumption that the two samples are independent. To test whether two samples are independent, a so-called HSIC (Hilbert–Schmidt Independence Criterion)-based test has been proposed. Its null distribution is approximated either by permutation or a Gamma approximation. In this paper, a new HSIC-based test is proposed. Its asymptotic null and alternative distributions are established. It is shown that the proposed test is root- $n$ consistent. A three-cumulant matched chi-squared-approximation is adopted to approximate the null distribution of the test statistic. By choosing a proper reproducing kernel, the proposed test can be applied to many different types of data including multivariate, high-dimensional, and functional data. Three simulation studies and two real data applications show that in terms of level accuracy, power, and computational cost, the proposed test outperforms several existing tests for multivariate, high-dimensional, and functional data.

测试两个随机变量之间的依赖关系是统计学中的一个重要推断问题，因为许多统计程序都依赖于两个样本是独立的这一假设。为了检验两个样本是否独立，有人提出了基于 HSIC（希尔伯特-施密特独立准则）的检验。其空分布可以用 permutation 或 Gamma 近似值来近似。本文提出了一种新的基于 HSIC 的检验。建立了它的渐近零分布和替代分布。结果表明，所提出的检验是根 n 一致的。本文采用了三积匹配卡方近似法来近似检验统计量的零分布。通过选择适当的重现核，所提出的检验可以应用于多种不同类型的数据，包括多变量、高维和函数数据。三项模拟研究和两项真实数据应用表明，在水平精度、功率和计算成本方面，所提出的检验方法优于现有的几种多变量、高维和函数数据检验方法。

{"title":"A fast and accurate kernel-based independence test with applications to high-dimensional and functional data","authors":"Jin-Ting Zhang , Tianming Zhu","doi":"10.1016/j.jmva.2024.105320","DOIUrl":"https://doi.org/10.1016/j.jmva.2024.105320","url":null,"abstract":"<div><p>Testing the dependency between two random variables is an important inference problem in statistics since many statistical procedures rely on the assumption that the two samples are independent. To test whether two samples are independent, a so-called HSIC (Hilbert–Schmidt Independence Criterion)-based test has been proposed. Its null distribution is approximated either by permutation or a Gamma approximation. In this paper, a new HSIC-based test is proposed. Its asymptotic null and alternative distributions are established. It is shown that the proposed test is root-<span><math><mi>n</mi></math></span> consistent. A three-cumulant matched chi-squared-approximation is adopted to approximate the null distribution of the test statistic. By choosing a proper reproducing kernel, the proposed test can be applied to many different types of data including multivariate, high-dimensional, and functional data. Three simulation studies and two real data applications show that in terms of level accuracy, power, and computational cost, the proposed test outperforms several existing tests for multivariate, high-dimensional, and functional data.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"202 ","pages":"Article 105320"},"PeriodicalIF":1.6,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140807250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multivariate directional tail-weighted dependence measures 多变量定向尾加权依赖性测量法

IF 1.6 3区数学 Q2 STATISTICS & PROBABILITY

Journal of Multivariate Analysis

Pub Date : 2024-04-18 DOI: 10.1016/j.jmva.2024.105319

Xiaoting Li, Harry Joe

We propose a new family of directional dependence measures for multivariate distributions. The family of dependence measures is indexed by $α \geq 1$ . When $α = 1$ , they measure the strength of dependence along different paths to the joint upper or lower orthant. For $α$ large, they become tail-weighted dependence measures that put more weight in the joint upper or lower tails of the distribution. As $α \to \infty$ , we show the convergence of the directional dependence measures to the multivariate tail dependence function and characterize the convergence pattern with an asymptotic expansion. This expansion leads to a method to estimate the multivariate tail dependence function using weighted least square regression. We develop rank-based sample estimators for the tail-weighted dependence measures and establish their asymptotic distributions. The practical utility of the tail-weighted dependence measures in multivariate tail inference is further demonstrated through their application to a financial dataset.

我们为多元分布提出了一个新的方向依赖性度量系列。当 α=1 时，它们测量的是通向联合正上方或联合正下方的不同路径的依赖强度。当 α 较大时，它们就变成了尾部加权的依赖性度量，在分布的联合上尾或下尾中赋予更多权重。随着α→∞的增大，我们证明了方向依赖度量向多元尾部依赖函数的收敛，并通过渐近展开描述了收敛模式的特征。这一扩展引出了一种使用加权最小二乘法回归估计多元尾部依赖函数的方法。我们为尾部加权依赖性度量开发了基于等级的样本估计器，并建立了它们的渐近分布。通过将其应用于金融数据集，进一步证明了尾加权依赖性度量在多元尾推断中的实用性。

{"title":"Multivariate directional tail-weighted dependence measures","authors":"Xiaoting Li, Harry Joe","doi":"10.1016/j.jmva.2024.105319","DOIUrl":"10.1016/j.jmva.2024.105319","url":null,"abstract":"<div><p>We propose a new family of directional dependence measures for multivariate distributions. The family of dependence measures is indexed by <span><math><mrow><mi>α</mi><mo>≥</mo><mn>1</mn></mrow></math></span>. When <span><math><mrow><mi>α</mi><mo>=</mo><mn>1</mn></mrow></math></span>, they measure the strength of dependence along different paths to the joint upper or lower orthant. For <span><math><mi>α</mi></math></span> large, they become tail-weighted dependence measures that put more weight in the joint upper or lower tails of the distribution. As <span><math><mrow><mi>α</mi><mo>→</mo><mi>∞</mi></mrow></math></span>, we show the convergence of the directional dependence measures to the multivariate tail dependence function and characterize the convergence pattern with an asymptotic expansion. This expansion leads to a method to estimate the multivariate tail dependence function using weighted least square regression. We develop rank-based sample estimators for the tail-weighted dependence measures and establish their asymptotic distributions. The practical utility of the tail-weighted dependence measures in multivariate tail inference is further demonstrated through their application to a financial dataset.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"203 ","pages":"Article 105319"},"PeriodicalIF":1.6,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0047259X24000265/pdfft?md5=b41054186655fc814404cc641ffc0dfe&pid=1-s2.0-S0047259X24000265-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140768086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A uniform kernel trick for high and infinite-dimensional two-sample problems 高维和无限维二维样本问题的均匀核技巧

IF 1.6 3区数学 Q2 STATISTICS & PROBABILITY

Journal of Multivariate Analysis

Pub Date : 2024-04-12 DOI: 10.1016/j.jmva.2024.105317

Javier Cárcamo , Antonio Cuevas , Luis-Alberto Rodríguez

We use a suitable version of the so-called ”kernel trick” to devise two-sample tests, especially focussed on high-dimensional and functional data. Our proposal entails a simplification of the practical problem of selecting an appropriate kernel function. Specifically, we apply a uniform variant of the kernel trick which involves the supremum within a class of kernel-based distances. We obtain the asymptotic distribution of the test statistic under the null and alternative hypotheses. The proofs rely on empirical processes theory, combined with the delta method and Hadamard directional differentiability techniques, and functional Karhunen–Loève-type expansions of the underlying processes. This methodology has some advantages over other standard approaches in the literature. We also give some experimental insight into the performance of our proposal compared to other kernel-based approaches (the original proposal by Borgwardt et al. (2006) and some variants based on splitting methods) as well as tests based on energy distances (Rizzo and Székely, 2017).

我们使用所谓的 "核函数技巧 "的一个合适版本来设计双样本检验，尤其侧重于高维和函数数据。我们的建议需要简化选择适当核函数的实际问题。具体来说，我们应用了核函数技巧的统一变体，它涉及一类基于核函数的距离中的至高点。我们得到了检验统计量在零假设和备择假设下的渐近分布。证明依赖于经验过程理论，结合德尔塔法和哈达玛定向可微分技术，以及基础过程的卡尔胡宁-洛埃夫函数式展开。与文献中的其他标准方法相比，这种方法具有一些优势。我们还通过实验深入分析了我们的建议与其他基于核的方法（Borgwardt 等人（2006 年）的原始建议和一些基于分裂方法的变体）以及基于能量距离的测试（Rizzo 等人，2017 年）相比的性能。

{"title":"A uniform kernel trick for high and infinite-dimensional two-sample problems","authors":"Javier Cárcamo , Antonio Cuevas , Luis-Alberto Rodríguez","doi":"10.1016/j.jmva.2024.105317","DOIUrl":"10.1016/j.jmva.2024.105317","url":null,"abstract":"<div><p>We use a suitable version of the so-called ”kernel trick” to devise two-sample tests, especially focussed on high-dimensional and functional data. Our proposal entails a simplification of the practical problem of selecting an appropriate kernel function. Specifically, we apply a uniform variant of the kernel trick which involves the supremum within a class of kernel-based distances. We obtain the asymptotic distribution of the test statistic under the null and alternative hypotheses. The proofs rely on empirical processes theory, combined with the delta method and Hadamard directional differentiability techniques, and functional Karhunen–Loève-type expansions of the underlying processes. This methodology has some advantages over other standard approaches in the literature. We also give some experimental insight into the performance of our proposal compared to other kernel-based approaches (the original proposal by Borgwardt et al. (2006) and some variants based on splitting methods) as well as tests based on energy distances (Rizzo and Székely, 2017).</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"202 ","pages":"Article 105317"},"PeriodicalIF":1.6,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0047259X24000241/pdfft?md5=19f44db706891c9aa40d12d1b8b7030a&pid=1-s2.0-S0047259X24000241-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140589405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sparse online regression algorithm with insensitive loss functions 损失函数不敏感的稀疏在线回归算法

IF 1.6 3区数学 Q2 STATISTICS & PROBABILITY

Journal of Multivariate Analysis

Pub Date : 2024-04-03 DOI: 10.1016/j.jmva.2024.105316

Ting Hu , Jing Xiong

Online learning is an efficient approach in machine learning and statistics, which iteratively updates models upon the observation of a sequence of training examples. A representative online learning algorithm is the online gradient descent, which has found wide applications due to its low complexity and scalability to large datasets. Kernel-based learning methods have been proven to be quite successful in dealing with nonlinearity in the data and multivariate optimization. In this paper we present a class of kernel-based online gradient descent algorithm for addressing regression problems, which generates sparse estimators in an iterative way to reduce the algorithmic complexity for training streaming datasets and model selection in large-scale learning scenarios. In the setting of support vector regression (SVR), we design the sparse online learning algorithm by introducing a sequence of insensitive distance-based loss functions. We prove consistency and error bounds quantifying the generalization performance of such algorithms under mild conditions. The theoretical results demonstrate the interplay between statistical accuracy and sparsity property during learning processes. We show that the insensitive parameter plays a crucial role in providing sparsity as well as fast convergence rates. The numerical experiments also support our theoretical results.

在线学习是机器学习和统计学中的一种高效方法，它在观察到一系列训练实例后迭代更新模型。在线梯度下降算法是一种具有代表性的在线学习算法，由于其复杂度低且可扩展至大型数据集，因此得到了广泛的应用。事实证明，基于核的学习方法在处理数据的非线性和多元优化方面非常成功。在本文中，我们提出了一类基于核的在线梯度下降算法，用于解决回归问题，该算法以迭代方式生成稀疏估计器，以降低大规模学习场景中训练流数据集和模型选择的算法复杂度。在支持向量回归（SVR）的环境中，我们通过引入一系列不敏感的基于距离的损失函数来设计稀疏在线学习算法。我们证明了在温和条件下量化此类算法泛化性能的一致性和误差边界。理论结果证明了学习过程中统计精度和稀疏性之间的相互作用。我们表明，不敏感参数在提供稀疏性和快速收敛率方面起着至关重要的作用。数值实验也支持我们的理论结果。

{"title":"Sparse online regression algorithm with insensitive loss functions","authors":"Ting Hu , Jing Xiong","doi":"10.1016/j.jmva.2024.105316","DOIUrl":"https://doi.org/10.1016/j.jmva.2024.105316","url":null,"abstract":"<div><p>Online learning is an efficient approach in machine learning and statistics, which iteratively updates models upon the observation of a sequence of training examples. A representative online learning algorithm is the online gradient descent, which has found wide applications due to its low complexity and scalability to large datasets. Kernel-based learning methods have been proven to be quite successful in dealing with nonlinearity in the data and multivariate optimization. In this paper we present a class of kernel-based online gradient descent algorithm for addressing regression problems, which generates sparse estimators in an iterative way to reduce the algorithmic complexity for training streaming datasets and model selection in large-scale learning scenarios. In the setting of support vector regression (SVR), we design the sparse online learning algorithm by introducing a sequence of insensitive distance-based loss functions. We prove consistency and error bounds quantifying the generalization performance of such algorithms under mild conditions. The theoretical results demonstrate the interplay between statistical accuracy and sparsity property during learning processes. We show that the insensitive parameter plays a crucial role in providing sparsity as well as fast convergence rates. The numerical experiments also support our theoretical results.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"202 ","pages":"Article 105316"},"PeriodicalIF":1.6,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140533309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient calibration of computer models with multivariate output 高效校准多变量输出的计算机模型

IF 1.6 3区数学 Q2 STATISTICS & PROBABILITY

Journal of Multivariate Analysis

Pub Date : 2024-03-21 DOI: 10.1016/j.jmva.2024.105315

Yang Sun, Xiangzhong Fang

The classical calibration procedures of computer models only concern the univariate output, which would not be satisfied in practice. Multivariate output is gradually more prevalent in a wide range of real-world applications, which motivates us to develop a new calibration procedure to extend the classical calibration methods to multivariate cases. In this work, we propose an efficient calibration procedure for multivariate output within restricted correlation. First, we construct an estimator of the discrepancy function between the true process and the computer model by the local linear approximation, then obtain an estimator of the calibration parameter by the weighted profile least squares and establish its asymptotic properties. In addition, we also develop an estimator of the calibration parameter in a special situation, whose asymptotic normality has been derived. Numerical studies including simulations and an application to composite fuselage simulation verify the efficiency of the proposed calibration procedure.

计算机模型的经典校准程序只涉及单变量输出，这在实践中无法满足要求。多变量输出在现实世界的广泛应用中越来越普遍，这促使我们开发一种新的校准程序，将经典校准方法扩展到多变量情况。在这项工作中，我们提出了一种在受限相关性内的多变量输出的高效校准程序。首先，我们通过局部线性近似构建了真实过程与计算机模型之间差异函数的估计器，然后通过加权剖面最小二乘法获得了校准参数的估计器，并建立了其渐近特性。此外，我们还开发了一种特殊情况下的校准参数估计器，并推导出其渐近正态性。包括模拟在内的数值研究以及在复合材料机身模拟中的应用验证了所提出的校准程序的效率。

引用次数: 0

On extreme quantile region estimation under heavy-tailed elliptical distributions 重尾椭圆分布下的极值量级区域估计

IF 1.6 3区数学 Q2 STATISTICS & PROBABILITY

Journal of Multivariate Analysis

Pub Date : 2024-03-20 DOI: 10.1016/j.jmva.2024.105314

Jaakko Pere , Pauliina Ilmonen , Lauri Viitasaari

Consider the estimation of an extreme quantile region corresponding to a very small probability. Estimation of extreme quantile regions is important but difficult since extreme regions contain only a few or no observations. In this article, we propose an affine equivariant extreme quantile region estimator for heavy-tailed elliptical distributions. The estimator is constructed by extending a well-known univariate extreme quantile estimator. Consistency of the estimator is proved under estimated location and scatter. The practicality of the developed estimator is illustrated with simulations and a real data example.

考虑估算与极小概率相对应的极值量级区域。极值量分区域的估计很重要，但却很困难，因为极值区域只包含少数观测值或不包含观测值。在本文中，我们提出了一种针对重尾椭圆分布的仿射等变极端量级区域估计器。该估计器是通过扩展著名的单变量极值量级估计器来构建的。在估计位置和散度条件下，证明了估计器的一致性。通过模拟和真实数据示例说明了所开发估计器的实用性。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of Multivariate Analysis

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀