首页 > 最新文献

arXiv - STAT - Statistics Theory最新文献

英文 中文
Functional Adaptive Huber Linear Regression 功能自适应胡贝尔线性回归
Pub Date : 2024-09-17 DOI: arxiv-2409.11053
Ling Peng, Xiaohui Liu, Heng Lian
Robust estimation has played an important role in statistical and machinelearning. However, its applications to functional linear regression are stillunder-developed. In this paper, we focus on Huber's loss with a divergingrobustness parameter which was previously used in parametric models. Comparedto other robust methods such as median regression, the distinction is that theproposed method aims to estimate the conditional mean robustly, instead ofestimating the conditional median. We only require $(1+kappa)$-th momentassumption ($kappa>0$) on the noise distribution, and the established errorbounds match the optimal rate in the least-squares case as soon as $kappage1$. We establish convergence rate in probability when the functional predictorhas a finite 4-th moment, and finite-sample bound with exponential tail whenthe functional predictor is Gaussian, in terms of both prediction error and$L^2$ error. The results also extend to the case of functional estimation in areproducing kernel Hilbert space (RKHS).
稳健估计在统计和机器学习中发挥了重要作用。然而,其在函数线性回归中的应用仍未得到充分发展。在本文中,我们将重点放在带有发散稳健性参数的 Huber 损失上,该参数以前曾用于参数模型。与其他稳健方法(如中值回归)相比,本文的区别在于,本文提出的方法旨在稳健地估计条件均值,而不是估计条件中值。我们只需要噪声分布上的 $(1+kappa)$-th moment 假设($kappa>0$),只要$kappage1$,所建立的误差边界就与最小二乘情况下的最优率相匹配。当函数预测器具有有限 4th 矩时,我们建立了概率收敛率;当函数预测器为高斯时,我们建立了具有指数尾部的有限样本约束。这些结果还扩展到了在产生核希尔伯特空间(RKHS)中进行函数估计的情况。
{"title":"Functional Adaptive Huber Linear Regression","authors":"Ling Peng, Xiaohui Liu, Heng Lian","doi":"arxiv-2409.11053","DOIUrl":"https://doi.org/arxiv-2409.11053","url":null,"abstract":"Robust estimation has played an important role in statistical and machine\u0000learning. However, its applications to functional linear regression are still\u0000under-developed. In this paper, we focus on Huber's loss with a diverging\u0000robustness parameter which was previously used in parametric models. Compared\u0000to other robust methods such as median regression, the distinction is that the\u0000proposed method aims to estimate the conditional mean robustly, instead of\u0000estimating the conditional median. We only require $(1+kappa)$-th moment\u0000assumption ($kappa>0$) on the noise distribution, and the established error\u0000bounds match the optimal rate in the least-squares case as soon as $kappage\u00001$. We establish convergence rate in probability when the functional predictor\u0000has a finite 4-th moment, and finite-sample bound with exponential tail when\u0000the functional predictor is Gaussian, in terms of both prediction error and\u0000$L^2$ error. The results also extend to the case of functional estimation in a\u0000reproducing kernel Hilbert space (RKHS).","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Permutation groups, partition lattices and block structures 置换群、分割网格和块结构
Pub Date : 2024-09-16 DOI: arxiv-2409.10461
Marina Anagnostopoulou-Merkouri, R. A. Bailey, Peter J. Cameron
Let $G$ be a transitive permutation group on $Omega$. The $G$-invariantpartitions form a sublattice of the lattice of all partitions of $Omega$,having the further property that all its elements are uniform (that is, haveall parts of the same size). If, in addition, all the equivalence relationsdefining the partitions commute, then the relations form an emph{orthogonalblock structure}, a concept from statistics; in this case the lattice ismodular. If it is distributive, then we have a emph{poset block structure},whose automorphism group is a emph{generalised wreath product}. We examinepermutation groups with these properties, which we call the emph{OB property}and emph{PB property} respectively, and in particular investigate when directand wreath products of groups with these properties also have these properties. A famous theorem on permutation groups asserts that a transitive imprimitivegroup $G$ is embeddable in the wreath product of two factors obtained from thegroup (the group induced on a block by its setwise stabiliser, and the groupinduced on the set of blocks by~$G$). We extend this theorem to groups with thePB property, embeddng them into generalised wreath products. We show that themap from posets to generalised wreath products preserves intersections andinclusions. We have included background and historical material on these concepts.
让 $G$ 是 $Omega$ 上的一个传递置换群。$G$-不变分区构成了$Omega$所有分区的网格的一个子网格,其进一步的性质是它的所有元素都是均匀的(即所有部分大小相同)。此外,如果定义分区的所有等价关系都是相通的,那么这些关系就构成了一个emph{正交块结构},这是统计学中的一个概念;在这种情况下,网格是模块化的。如果它是分布式的,那么我们就有了emph{集合块结构},它的自变群是一个emph{广义花环积}。我们研究了具有这些性质的置换群,并分别称之为 emph{OB 性质} 和 emph{PB 性质} ,特别是研究了具有这些性质的群的直积和花环积何时也具有这些性质。关于置换群的一个著名定理断言,一个传递imrimitive群$G$可以嵌入到由该群得到的两个因子(由其集合稳定器诱导的块上的群,以及由~$G$诱导的块集上的群)的花环积中。我们将这一定理推广到具有 PB 属性的群,将它们嵌入广义花环积中。我们证明,从 posets 到广义花环积的映射保留了交集和夹杂。我们还包含了这些概念的背景和历史材料。
{"title":"Permutation groups, partition lattices and block structures","authors":"Marina Anagnostopoulou-Merkouri, R. A. Bailey, Peter J. Cameron","doi":"arxiv-2409.10461","DOIUrl":"https://doi.org/arxiv-2409.10461","url":null,"abstract":"Let $G$ be a transitive permutation group on $Omega$. The $G$-invariant\u0000partitions form a sublattice of the lattice of all partitions of $Omega$,\u0000having the further property that all its elements are uniform (that is, have\u0000all parts of the same size). If, in addition, all the equivalence relations\u0000defining the partitions commute, then the relations form an emph{orthogonal\u0000block structure}, a concept from statistics; in this case the lattice is\u0000modular. If it is distributive, then we have a emph{poset block structure},\u0000whose automorphism group is a emph{generalised wreath product}. We examine\u0000permutation groups with these properties, which we call the emph{OB property}\u0000and emph{PB property} respectively, and in particular investigate when direct\u0000and wreath products of groups with these properties also have these properties. A famous theorem on permutation groups asserts that a transitive imprimitive\u0000group $G$ is embeddable in the wreath product of two factors obtained from the\u0000group (the group induced on a block by its setwise stabiliser, and the group\u0000induced on the set of blocks by~$G$). We extend this theorem to groups with the\u0000PB property, embeddng them into generalised wreath products. We show that the\u0000map from posets to generalised wreath products preserves intersections and\u0000inclusions. We have included background and historical material on these concepts.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variance Residual Life Ageing Intensity Function 方差残差 生命老化强度函数
Pub Date : 2024-09-16 DOI: arxiv-2409.10591
Ashutosh Singh
Quantitative measurement of ageing across systems and components is crucialfor accurately assessing reliability and predicting failure probabilities. Thismeasurement supports effective maintenance scheduling, performanceoptimisation, and cost management. Examining the ageing characteristics of asystem that operates beyond a specified time $t > 0$ yields valuable insights.This paper introduces a novel metric for ageing, termed the Variance ResidualLife Ageing Intensity (VRLAI) function, and explores its properties acrossvarious probability distributions. Additionally, we characterise the closureproperties of the two ageing classes defined by the VRLAI function. We proposea new ordering, called the Variance Residual Life Ageing Intensity (VRLAI)ordering, and discuss its various properties. Furthermore, we examine theclosure of the VRLAI order under coherent systems.
定量测量系统和部件的老化程度对于准确评估可靠性和预测故障概率至关重要。这种测量可支持有效的维护计划、性能优化和成本管理。本文介绍了一种新的老化度量方法,即方差残余寿命老化强度(VRLAI)函数,并探讨了它在各种概率分布中的特性。此外,我们还描述了 VRLAI 函数定义的两个老龄化类别的闭合属性。我们提出了一种新的排序,称为方差残差生命衰老强度(VRLAI)排序,并讨论了它的各种特性。此外,我们还研究了一致性系统下 VRLAI 排序的封闭性。
{"title":"Variance Residual Life Ageing Intensity Function","authors":"Ashutosh Singh","doi":"arxiv-2409.10591","DOIUrl":"https://doi.org/arxiv-2409.10591","url":null,"abstract":"Quantitative measurement of ageing across systems and components is crucial\u0000for accurately assessing reliability and predicting failure probabilities. This\u0000measurement supports effective maintenance scheduling, performance\u0000optimisation, and cost management. Examining the ageing characteristics of a\u0000system that operates beyond a specified time $t > 0$ yields valuable insights.\u0000This paper introduces a novel metric for ageing, termed the Variance Residual\u0000Life Ageing Intensity (VRLAI) function, and explores its properties across\u0000various probability distributions. Additionally, we characterise the closure\u0000properties of the two ageing classes defined by the VRLAI function. We propose\u0000a new ordering, called the Variance Residual Life Ageing Intensity (VRLAI)\u0000ordering, and discuss its various properties. Furthermore, we examine the\u0000closure of the VRLAI order under coherent systems.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning large softmax mixtures with warm start EM 利用热启动电磁学习大型软最大混合物
Pub Date : 2024-09-16 DOI: arxiv-2409.09903
Xin Bing, Florentina Bunea, Jonathan Niles-Weed, Marten Wegkamp
Mixed multinomial logits are discrete mixtures introduced several decades agoto model the probability of choosing an attribute from $p$ possible candidates,in heterogeneous populations. The model has recently attracted attention in theAI literature, under the name softmax mixtures, where it is routinely used inthe final layer of a neural network to map a large number $p$ of vectors in$mathbb{R}^L$ to a probability vector. Despite its wide applicability andempirical success, statistically optimal estimators of the mixture parameters,obtained via algorithms whose running time scales polynomially in $L$, are notknown. This paper provides a solution to this problem for contemporaryapplications, such as large language models, in which the mixture has a largenumber $p$ of support points, and the size $N$ of the sample observed from themixture is also large. Our proposed estimator combines two classicalestimators, obtained respectively via a method of moments (MoM) and theexpectation-minimization (EM) algorithm. Although both estimator types havebeen studied, from a theoretical perspective, for Gaussian mixtures, no similarresults exist for softmax mixtures for either procedure. We develop a new MoMparameter estimator based on latent moment estimation that is tailored to ourmodel, and provide the first theoretical analysis for a MoM-based procedure insoftmax mixtures. Although consistent, MoM for softmax mixtures can exhibitpoor numerical performance, as observed other mixture models. Nevertheless, asMoM is provably in a neighborhood of the target, it can be used as warm startfor any iterative algorithm. We study in detail the EM algorithm, and provideits first theoretical analysis for softmax mixtures. Our final proposal forparameter estimation is the EM algorithm with a MoM warm start.
混合多项式对数是几十年前引入的离散混合物,用于模拟在异质人群中从 $p$ 可能的候选属性中选择一个属性的概率。该模型最近在人工智能文献中引起了关注,被称为软最大混合物,通常用于神经网络的最后一层,将大量 $p$ 的向量映射到概率向量中。尽管混合物参数具有广泛的适用性和成功的经验,但通过运行时间在 $L$ 中呈多项式缩放的算法获得的混合物参数的统计最优估计值却并不为人所知。本文为这一问题提供了当代应用的解决方案,例如大型语言模型,其中混合物具有大量 $p$ 支持点,而且从混合物中观察到的样本大小 $N$ 也很大。我们提出的估计器结合了两种经典估计器,分别通过矩量法(MoM)和期望最小化算法(EM)获得。虽然从理论上讲,这两种估计方法都针对高斯混合物进行过研究,但对于软最大混合物,这两种方法都没有类似的结果。我们开发了一种新的基于潜矩估计的 MoM 参数估计器,它是为我们的模型量身定制的,并首次为基于 MoM 的软最大混合物程序提供了理论分析。软最大混合物的 MoM 虽然具有一致性,但与其他混合物模型一样,可能会表现出较差的数值性能。不过,由于 MoM 可以证明是在目标邻域内,因此它可以用作任何迭代算法的暖起点。我们详细研究了 EM 算法,并首次对软最大混合物进行了理论分析。我们对参数估计的最终建议是使用 MoM 暖起始的 EM 算法。
{"title":"Learning large softmax mixtures with warm start EM","authors":"Xin Bing, Florentina Bunea, Jonathan Niles-Weed, Marten Wegkamp","doi":"arxiv-2409.09903","DOIUrl":"https://doi.org/arxiv-2409.09903","url":null,"abstract":"Mixed multinomial logits are discrete mixtures introduced several decades ago\u0000to model the probability of choosing an attribute from $p$ possible candidates,\u0000in heterogeneous populations. The model has recently attracted attention in the\u0000AI literature, under the name softmax mixtures, where it is routinely used in\u0000the final layer of a neural network to map a large number $p$ of vectors in\u0000$mathbb{R}^L$ to a probability vector. Despite its wide applicability and\u0000empirical success, statistically optimal estimators of the mixture parameters,\u0000obtained via algorithms whose running time scales polynomially in $L$, are not\u0000known. This paper provides a solution to this problem for contemporary\u0000applications, such as large language models, in which the mixture has a large\u0000number $p$ of support points, and the size $N$ of the sample observed from the\u0000mixture is also large. Our proposed estimator combines two classical\u0000estimators, obtained respectively via a method of moments (MoM) and the\u0000expectation-minimization (EM) algorithm. Although both estimator types have\u0000been studied, from a theoretical perspective, for Gaussian mixtures, no similar\u0000results exist for softmax mixtures for either procedure. We develop a new MoM\u0000parameter estimator based on latent moment estimation that is tailored to our\u0000model, and provide the first theoretical analysis for a MoM-based procedure in\u0000softmax mixtures. Although consistent, MoM for softmax mixtures can exhibit\u0000poor numerical performance, as observed other mixture models. Nevertheless, as\u0000MoM is provably in a neighborhood of the target, it can be used as warm start\u0000for any iterative algorithm. We study in detail the EM algorithm, and provide\u0000its first theoretical analysis for softmax mixtures. Our final proposal for\u0000parameter estimation is the EM algorithm with a MoM warm start.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extending the Gini Index to Higher Dimensions via Whitening Processes 通过白化过程将基尼指数扩展到更高维度
Pub Date : 2024-09-16 DOI: arxiv-2409.10119
Gennaro Auricchio, Paolo Giudici, Giuseppe Toscani
Measuring the degree of inequality expressed by a multivariate statisticaldistribution is a challenging problem, which appears in many fields of scienceand engineering. In this paper, we propose to extend the well known univariateGini coefficient to multivariate distributions, by maintaining most of itsproperties. Our extension is based on the application of whitening processesthat possess the property of scale stability.
测量多元统计分布所表达的不平等程度是一个具有挑战性的问题,它出现在许多科学和工程领域。在本文中,我们建议将众所周知的单变量基尼系数扩展到多变量分布,并保持其大部分特性。我们的扩展基于具有规模稳定性的白化过程的应用。
{"title":"Extending the Gini Index to Higher Dimensions via Whitening Processes","authors":"Gennaro Auricchio, Paolo Giudici, Giuseppe Toscani","doi":"arxiv-2409.10119","DOIUrl":"https://doi.org/arxiv-2409.10119","url":null,"abstract":"Measuring the degree of inequality expressed by a multivariate statistical\u0000distribution is a challenging problem, which appears in many fields of science\u0000and engineering. In this paper, we propose to extend the well known univariate\u0000Gini coefficient to multivariate distributions, by maintaining most of its\u0000properties. Our extension is based on the application of whitening processes\u0000that possess the property of scale stability.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"104 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning with Sparsely Permuted Data: A Robust Bayesian Approach 利用稀疏堆积数据学习:稳健的贝叶斯方法
Pub Date : 2024-09-16 DOI: arxiv-2409.10678
Abhisek Chakraborty, Saptati Datta
Data dispersed across multiple files are commonly integrated throughprobabilistic linkage methods, where even minimal error rates in recordmatching can significantly contaminate subsequent statistical analyses. Inregression problems, we examine scenarios where the identifiers of predictorsor responses are subject to an unknown permutation, challenging the assumptionof correspondence. Many emerging approaches in the literature focus on sparselypermuted data, where only a small subset of pairs ($k << n$) are affected bythe permutation, treating these permuted entries as outliers to restoreoriginal correspondence and obtain consistent estimates of regressionparameters. In this article, we complement the existing literature byintroducing a novel generalized robust Bayesian formulation of the problem. Wedevelop an efficient posterior sampling scheme by adapting the fractionalposterior framework and addressing key computational bottlenecks via carefuluse of discrete optimal transport and sampling in the space of binary matriceswith fixed margins. Further, we establish new posterior contraction resultswithin this framework, providing theoretical guarantees for our approach. Theutility of the proposed framework is demonstrated via extensive numericalexperiments.
分散在多个文件中的数据通常通过概率链接方法进行整合,在这种方法中,即使记录匹配的错误率极低,也会对后续的统计分析造成严重污染。在回归问题中,我们研究了预测因子和响应的标识符受到未知排列组合影响的情况,这对对应假设提出了挑战。文献中的许多新方法侧重于稀疏置换数据,即只有一小部分数据对($k << n$)受置换影响,将这些置换条目视为异常值,以恢复原始对应关系并获得一致的回归参数估计。在本文中,我们对现有文献进行了补充,引入了一种新颖的广义稳健贝叶斯问题表述。我们开发了一种高效的后验采样方案,它采用了分数后验框架,并通过谨慎使用离散最优传输和具有固定边际的二元矩阵空间采样,解决了关键的计算瓶颈问题。此外,我们还在此框架内建立了新的后验收缩结果,为我们的方法提供了理论保证。我们通过大量的数值实验证明了所提框架的实用性。
{"title":"Learning with Sparsely Permuted Data: A Robust Bayesian Approach","authors":"Abhisek Chakraborty, Saptati Datta","doi":"arxiv-2409.10678","DOIUrl":"https://doi.org/arxiv-2409.10678","url":null,"abstract":"Data dispersed across multiple files are commonly integrated through\u0000probabilistic linkage methods, where even minimal error rates in record\u0000matching can significantly contaminate subsequent statistical analyses. In\u0000regression problems, we examine scenarios where the identifiers of predictors\u0000or responses are subject to an unknown permutation, challenging the assumption\u0000of correspondence. Many emerging approaches in the literature focus on sparsely\u0000permuted data, where only a small subset of pairs ($k << n$) are affected by\u0000the permutation, treating these permuted entries as outliers to restore\u0000original correspondence and obtain consistent estimates of regression\u0000parameters. In this article, we complement the existing literature by\u0000introducing a novel generalized robust Bayesian formulation of the problem. We\u0000develop an efficient posterior sampling scheme by adapting the fractional\u0000posterior framework and addressing key computational bottlenecks via careful\u0000use of discrete optimal transport and sampling in the space of binary matrices\u0000with fixed margins. Further, we establish new posterior contraction results\u0000within this framework, providing theoretical guarantees for our approach. The\u0000utility of the proposed framework is demonstrated via extensive numerical\u0000experiments.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Consistent complete independence test in high dimensions based on Chatterjee correlation coefficient 基于 Chatterjee 相关系数的高维一致完全独立测试
Pub Date : 2024-09-16 DOI: arxiv-2409.10315
Liqi Xia, Ruiyuan Cao, Jiang Du, Jun Dai
In this article, we consider the complete independence test ofhigh-dimensional data. Based on Chatterjee coefficient, we pioneer thedevelopment of quadratic test and extreme value test which possess good testingperformance for oscillatory data, and establish the corresponding large sampleproperties under both null hypotheses and alternative hypotheses. In order toovercome the shortcomings of quadratic statistic and extreme value statistic,we propose a testing method termed as power enhancement test by adding ascreening statistic to the quadratic statistic. The proposed method do notreduce the testing power under dense alternative hypotheses, but can enhancethe power significantly under sparse alternative hypotheses. Three syntheticdata examples and two real data examples are further used to illustrate theperformance of our proposed methods.
本文考虑了高维数据的完全独立性检验。在 Chatterjee 系数的基础上,我们率先开发了对振荡数据具有良好检验性能的二次检验和极值检验,并在零假设和备择假设下建立了相应的大样本属性。为了克服二次统计量和极值统计量的缺点,我们提出了一种检验方法,即在二次统计量的基础上加入筛选统计量,称为功率增强检验。所提出的方法不会降低密集替代假设下的测试能力,但能显著增强稀疏替代假设下的测试能力。三个合成数据示例和两个真实数据示例进一步说明了我们提出的方法的性能。
{"title":"Consistent complete independence test in high dimensions based on Chatterjee correlation coefficient","authors":"Liqi Xia, Ruiyuan Cao, Jiang Du, Jun Dai","doi":"arxiv-2409.10315","DOIUrl":"https://doi.org/arxiv-2409.10315","url":null,"abstract":"In this article, we consider the complete independence test of\u0000high-dimensional data. Based on Chatterjee coefficient, we pioneer the\u0000development of quadratic test and extreme value test which possess good testing\u0000performance for oscillatory data, and establish the corresponding large sample\u0000properties under both null hypotheses and alternative hypotheses. In order to\u0000overcome the shortcomings of quadratic statistic and extreme value statistic,\u0000we propose a testing method termed as power enhancement test by adding a\u0000screening statistic to the quadratic statistic. The proposed method do not\u0000reduce the testing power under dense alternative hypotheses, but can enhance\u0000the power significantly under sparse alternative hypotheses. Three synthetic\u0000data examples and two real data examples are further used to illustrate the\u0000performance of our proposed methods.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privately Learning Smooth Distributions on the Hypercube by Projections 通过投影私人学习超立方体上的平滑分布
Pub Date : 2024-09-16 DOI: arxiv-2409.10083
Clément LalanneTSE-R, Sébastien GadatTSE-R, IUF
Fueled by the ever-increasing need for statistics that guarantee the privacyof their training sets, this article studies the centrally-private estimationof Sobolev-smooth densities of probability over the hypercube in dimension d.The contributions of this article are two-fold : Firstly, it generalizes theone dimensional results of (Lalanne et al., 2023) to non-integer levels ofsmoothness and to a high-dimensional setting, which is important for tworeasons : it is more suited for modern learning tasks, and it allowsunderstanding the relations between privacy, dimensionality and smoothness,which is a central question with differential privacy. Secondly, this articlepresents a private strategy of estimation that is data-driven (usually referredto as adaptive in Statistics) in order to privately choose an estimator thatachieves a good bias-variance trade-off among a finite family of privateprojection estimators without prior knowledge of the ground-truth smoothness$beta$. This is achieved by adapting the Lepskii method for private selection,by adding a new penalization term that makes the estimation privacy-aware.
随着人们对保证训练集隐私的统计的需求日益增长,本文研究了在维数为 d 的超立方体上对 Sobolev 平滑概率密度的集中隐私估计、首先,本文将(Lalanne 等人,2023 年)的一维结果推广到非整数平滑度水平和高维环境,这有两个重要原因:一是它更适合现代学习任务,二是它允许理解隐私、维度和平滑度之间的关系,而这是微分隐私的核心问题。其次,本文介绍了一种由数据驱动的私人估计策略(通常在统计学中称为自适应策略),以便在事先不知道地面真实平滑度$beta$的情况下,在有限的私人投影估计器家族中私下选择一个能实现良好偏差-方差权衡的估计器。这是通过调整用于私人选择的 Lepskii 方法来实现的,方法是添加一个新的惩罚项,使估计具有隐私意识。
{"title":"Privately Learning Smooth Distributions on the Hypercube by Projections","authors":"Clément LalanneTSE-R, Sébastien GadatTSE-R, IUF","doi":"arxiv-2409.10083","DOIUrl":"https://doi.org/arxiv-2409.10083","url":null,"abstract":"Fueled by the ever-increasing need for statistics that guarantee the privacy\u0000of their training sets, this article studies the centrally-private estimation\u0000of Sobolev-smooth densities of probability over the hypercube in dimension d.\u0000The contributions of this article are two-fold : Firstly, it generalizes the\u0000one dimensional results of (Lalanne et al., 2023) to non-integer levels of\u0000smoothness and to a high-dimensional setting, which is important for two\u0000reasons : it is more suited for modern learning tasks, and it allows\u0000understanding the relations between privacy, dimensionality and smoothness,\u0000which is a central question with differential privacy. Secondly, this article\u0000presents a private strategy of estimation that is data-driven (usually referred\u0000to as adaptive in Statistics) in order to privately choose an estimator that\u0000achieves a good bias-variance trade-off among a finite family of private\u0000projection estimators without prior knowledge of the ground-truth smoothness\u0000$beta$. This is achieved by adapting the Lepskii method for private selection,\u0000by adding a new penalization term that makes the estimation privacy-aware.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mean Residual Life Ageing Intensity Function 平均残余寿命老化强度函数
Pub Date : 2024-09-16 DOI: arxiv-2409.10456
Ashutosh Singh, Ishapathik Das, Asok Kumar Nanda, Sumen Sen
The ageing intensity function is a powerful analytical tool that providesvaluable insights into the ageing process across diverse domains such asreliability engineering, actuarial science, and healthcare. Its applicationscontinue to expand as researchers delve deeper into understanding the complexdynamics of ageing and its implications for society. One common approach todefining the ageing intensity function is through the hazard rate or failurerate function, extensively explored in scholarly literature. Equallysignificant to the hazard rate function is the mean residual life function,which plays a crucial role in analyzing the ageing patterns exhibited by unitsor components. This article introduces the mean residual life ageing intensity(MRLAI) function to delve into component ageing behaviours across variousdistributions. Additionally, we scrutinize the closure properties of the MRLAIfunction across different reliability operations. Furthermore, a new ordertermed the mean residual life ageing intensity order is defined to analyze theageing behaviour of a system, and the closure property of this order undervarious reliability operations is discussed.
老龄化强度函数是一种功能强大的分析工具,可为可靠性工程、精算科学和医疗保健等不同领域的老龄化过程提供有价值的见解。随着研究人员深入了解老龄化的复杂动态及其对社会的影响,老龄化强度函数的应用范围也在不断扩大。界定老龄化强度函数的一种常见方法是通过危险率或失效率函数,学术文献对此进行了广泛探讨。与危险率函数同样重要的是平均残余寿命函数,它在分析单位或组成部分所表现出的老龄化模式中起着至关重要的作用。本文介绍了平均残余寿命老化强度(MRLAI)函数,以深入研究不同分布下的组件老化行为。此外,我们还仔细研究了 MRLAI 函数在不同可靠性操作中的闭合特性。此外,我们还定义了一个新的阶次,即平均残余寿命老化强度阶次,用于分析系统的老化行为,并讨论了该阶次在不同可靠性操作下的闭合特性。
{"title":"Mean Residual Life Ageing Intensity Function","authors":"Ashutosh Singh, Ishapathik Das, Asok Kumar Nanda, Sumen Sen","doi":"arxiv-2409.10456","DOIUrl":"https://doi.org/arxiv-2409.10456","url":null,"abstract":"The ageing intensity function is a powerful analytical tool that provides\u0000valuable insights into the ageing process across diverse domains such as\u0000reliability engineering, actuarial science, and healthcare. Its applications\u0000continue to expand as researchers delve deeper into understanding the complex\u0000dynamics of ageing and its implications for society. One common approach to\u0000defining the ageing intensity function is through the hazard rate or failure\u0000rate function, extensively explored in scholarly literature. Equally\u0000significant to the hazard rate function is the mean residual life function,\u0000which plays a crucial role in analyzing the ageing patterns exhibited by units\u0000or components. This article introduces the mean residual life ageing intensity\u0000(MRLAI) function to delve into component ageing behaviours across various\u0000distributions. Additionally, we scrutinize the closure properties of the MRLAI\u0000function across different reliability operations. Furthermore, a new order\u0000termed the mean residual life ageing intensity order is defined to analyze the\u0000ageing behaviour of a system, and the closure property of this order under\u0000various reliability operations is discussed.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"209 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards a Unified Theory for Semiparametric Data Fusion with Individual-Level Data 实现半参数数据与个体层面数据融合的统一理论
Pub Date : 2024-09-16 DOI: arxiv-2409.09973
Ellen GrahamUniversity of Washington, Marco CaroneUniversity of Washington, Andrea RotnitzkyUniversity of Washington
We address the goal of conducting inference about a smooth finite-dimensionalparameter by utilizing individual-level data from various independent sources.Recent advancements have led to the development of a comprehensive theorycapable of handling scenarios where different data sources align with, possiblydistinct subsets of, conditional distributions of a single factorization of thejoint target distribution. While this theory proves effective in manysignificant contexts, it falls short in certain common data fusion problems,such as two-sample instrumental variable analysis, settings that integrate datafrom epidemiological studies with diverse designs (e.g., prospective cohortsand retrospective case-control studies), and studies with variables prone tomeasurement error that are supplemented by validation studies. In this paper,we extend the aforementioned comprehensive theory to allow for the fusion ofindividual-level data from sources aligned with conditional distributions thatdo not correspond to a single factorization of the target distribution.Assuming conditional and marginal distribution alignments, we provide universalresults that characterize the class of all influence functions of regularasymptotically linear estimators and the efficient influence function of anypathwise differentiable parameter, irrespective of the number of data sources,the specific parameter of interest, or the statistical model for the targetdistribution. This theory paves the way for machine-learning debiased,semiparametric efficient estimation.
我们的目标是利用来自不同独立来源的个体级数据,对一个平滑的有限维参数进行推断。最近的研究进展促使我们发展出一套全面的理论,能够处理不同数据源与联合目标分布的单一因子化的条件分布(可能是其不同子集)相一致的情况。虽然这一理论在许多重要场合证明是有效的,但在某些常见的数据融合问题上,如双样本工具变量分析、整合来自不同设计的流行病学研究(如前瞻性队列和回顾性病例对照研究)的数据的设置,以及具有易产生测量误差的变量并辅以验证研究的研究中,它就显得不足了。在本文中,我们扩展了上述综合理论,允许融合来自条件分布对齐源的个体水平数据,这些条件分布并不对应于目标分布的单一因子化。假设条件分布和边际分布对齐,我们提供了通用结果,描述了正则渐近线性估计器的所有影响函数类,以及任何路径可微参数的有效影响函数,而与数据源的数量、感兴趣的特定参数或目标分布的统计模型无关。这一理论为机器学习去偏的、半参数的高效估计铺平了道路。
{"title":"Towards a Unified Theory for Semiparametric Data Fusion with Individual-Level Data","authors":"Ellen GrahamUniversity of Washington, Marco CaroneUniversity of Washington, Andrea RotnitzkyUniversity of Washington","doi":"arxiv-2409.09973","DOIUrl":"https://doi.org/arxiv-2409.09973","url":null,"abstract":"We address the goal of conducting inference about a smooth finite-dimensional\u0000parameter by utilizing individual-level data from various independent sources.\u0000Recent advancements have led to the development of a comprehensive theory\u0000capable of handling scenarios where different data sources align with, possibly\u0000distinct subsets of, conditional distributions of a single factorization of the\u0000joint target distribution. While this theory proves effective in many\u0000significant contexts, it falls short in certain common data fusion problems,\u0000such as two-sample instrumental variable analysis, settings that integrate data\u0000from epidemiological studies with diverse designs (e.g., prospective cohorts\u0000and retrospective case-control studies), and studies with variables prone to\u0000measurement error that are supplemented by validation studies. In this paper,\u0000we extend the aforementioned comprehensive theory to allow for the fusion of\u0000individual-level data from sources aligned with conditional distributions that\u0000do not correspond to a single factorization of the target distribution.\u0000Assuming conditional and marginal distribution alignments, we provide universal\u0000results that characterize the class of all influence functions of regular\u0000asymptotically linear estimators and the efficient influence function of any\u0000pathwise differentiable parameter, irrespective of the number of data sources,\u0000the specific parameter of interest, or the statistical model for the target\u0000distribution. This theory paves the way for machine-learning debiased,\u0000semiparametric efficient estimation.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - STAT - Statistics Theory
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1