首页 > 最新文献

arXiv - STAT - Statistics Theory最新文献

英文 中文
Jackknife Empirical Likelihood Ratio Test for Cauchy Distribution 考奇分布的积弱经验似然比检验
Pub Date : 2024-09-09 DOI: arxiv-2409.05764
Avhad Ganesh Vishnu, Ananya Lahiri, Sudheesh K. Kattumannil
Heavy-tailed distributions, such as the Cauchy distribution, are acknowledgedfor providing more accurate models for financial returns, as the normaldistribution is deemed insufficient for capturing the significant fluctuationsobserved in real-world assets. Data sets characterized by outlier sensitivityare critically important in diverse areas, including finance, economics,telecommunications, and signal processing. This article addresses agoodness-of-fit test for the Cauchy distribution. The proposed test utilizesempirical likelihood methods, including the jackknife empirical likelihood(JEL) and adjusted jackknife empirical likelihood (AJEL). Extensive Monte Carlosimulation studies are conducted to evaluate the finite sample performance ofthe proposed test. The application of the proposed test is illustrated throughthe analysing two real data sets.
重尾分布(如考奇分布)被认为能为金融回报提供更准确的模型,因为正态分布被认为不足以捕捉现实世界资产中的显著波动。以离群点敏感性为特征的数据集在金融、经济、电信和信号处理等多个领域都至关重要。本文探讨了考奇分布的拟合优度检验。所提出的检验利用了经验似然法,包括千分经验似然法(JEL)和调整千分经验似然法(AJEL)。为了评估所提出检验的有限样本性能,我们进行了广泛的蒙特卡洛模拟研究。通过分析两个真实数据集,说明了拟议检验的应用。
{"title":"Jackknife Empirical Likelihood Ratio Test for Cauchy Distribution","authors":"Avhad Ganesh Vishnu, Ananya Lahiri, Sudheesh K. Kattumannil","doi":"arxiv-2409.05764","DOIUrl":"https://doi.org/arxiv-2409.05764","url":null,"abstract":"Heavy-tailed distributions, such as the Cauchy distribution, are acknowledged\u0000for providing more accurate models for financial returns, as the normal\u0000distribution is deemed insufficient for capturing the significant fluctuations\u0000observed in real-world assets. Data sets characterized by outlier sensitivity\u0000are critically important in diverse areas, including finance, economics,\u0000telecommunications, and signal processing. This article addresses a\u0000goodness-of-fit test for the Cauchy distribution. The proposed test utilizes\u0000empirical likelihood methods, including the jackknife empirical likelihood\u0000(JEL) and adjusted jackknife empirical likelihood (AJEL). Extensive Monte Carlo\u0000simulation studies are conducted to evaluate the finite sample performance of\u0000the proposed test. The application of the proposed test is illustrated through\u0000the analysing two real data sets.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parameter estimation for fractional stochastic heat equations : Berry-Esséen bounds in CLTs 分数随机热方程的参数估计 :CLT中的贝里-埃森边界
Pub Date : 2024-09-09 DOI: arxiv-2409.05416
Soukaina Douissi, Fatimah Alshahrani
The aim of this work is to estimate the drift coefficient of a fractionalheat equation driven by an additive space-time noise using the Maximumlikelihood estimator (MLE). In the first part of the paper, the first $N$Fourier modes of the solution are observed continuously over a finite timeinterval $[0, T ]$. The explicit upper bounds for the Wasserstein distance forthe central limit theorem of the MLE is provided when $N rightarrow infty$and/or $T rightarrow infty$. While in the second part of the paper, the $N$Fourier modes are observed at uniform time grid : $t_i = i frac{T}{M}$,$i=0,..,M,$ where $M$ is the number of time grid points. The consistency andasymptotic normality are studied when $T,M,N rightarrow + infty$ in additionto the rate of convergence in law in the CLT.
这项工作的目的是利用最大似然估计法(MLE)估计由时空噪声加法驱动的分式热方程的漂移系数。在论文的第一部分,在有限的时间区间 $[0, T ]$ 内连续观测解的前 $N$Fourier 模式。当 $N rightarrow infty$ 和/或 $T rightarrow infty$ 时,为 MLE 的中心极限定理提供了明确的 Wasserstein 距离上限。而在本文的第二部分,$N$傅立叶模式是在统一时间网格下观察到的:$t_i = i frac{T}{M}$,$i=0,...,M,$ 其中$M$是时间网格点的数量。研究了当 $T,M,N rightarrow + infty$ 时的一致性和渐近正态性,以及 CLT 中的收敛速率规律。
{"title":"Parameter estimation for fractional stochastic heat equations : Berry-Esséen bounds in CLTs","authors":"Soukaina Douissi, Fatimah Alshahrani","doi":"arxiv-2409.05416","DOIUrl":"https://doi.org/arxiv-2409.05416","url":null,"abstract":"The aim of this work is to estimate the drift coefficient of a fractional\u0000heat equation driven by an additive space-time noise using the Maximum\u0000likelihood estimator (MLE). In the first part of the paper, the first $N$\u0000Fourier modes of the solution are observed continuously over a finite time\u0000interval $[0, T ]$. The explicit upper bounds for the Wasserstein distance for\u0000the central limit theorem of the MLE is provided when $N rightarrow infty$\u0000and/or $T rightarrow infty$. While in the second part of the paper, the $N$\u0000Fourier modes are observed at uniform time grid : $t_i = i frac{T}{M}$,\u0000$i=0,..,M,$ where $M$ is the number of time grid points. The consistency and\u0000asymptotic normality are studied when $T,M,N rightarrow + infty$ in addition\u0000to the rate of convergence in law in the CLT.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On integer partitions and the Wilcoxon rank-sum statistic 关于整数分区和威尔科克森秩和统计量
Pub Date : 2024-09-09 DOI: arxiv-2409.05741
Andrew V. Sills
In the literature, derivations of exact null distributions of rank-sumstatistics is often avoided in cases where one or more ties exist in the data.By deriving the null distribution in the no-ties case with the aid of classical$q$-series results of Euler and Rothe, we demonstrate how a naturalgeneralization of the method may be employed to derive exact null distributionseven when one or more ties are present in the data. It is suggested that thismethod could be implemented in a computer algebra system, or even a moreprimitive computer language, so that the normal approximation need not beemployed in the case of small sample sizes, when it is less likely to be veryaccurate. Several algorithms for determining exact distributions of therank-sum statistic (possibly with ties) have been given in the literature (seeStreitberg and R"ohmel (1986) and Marx et al. (2016)), but none seem as simpleas the procedure discussed here which amounts to multiplying out a certainpolynomial, extracting coefficients, and finally dividing by a binomalcoefficient.
通过借助欧拉和罗特的经典 q$ 系列结果推导无并列情况下的零分布,我们展示了如何利用该方法的自然概括来推导精确的零分布,即使数据中存在一个或多个并列。我们建议这种方法可以在计算机代数系统,甚至更简单的计算机语言中实现,这样在样本量较小的情况下就不必使用正态近似,因为正态近似不太可能非常精确。文献中已经给出了几种确定柄和统计量精确分布(可能有并列关系)的算法(见 Streitberg and R"ohmel (1986) and Marx et al. (2016)),但似乎都不如这里讨论的程序简单,它相当于乘出某个二项式,提取系数,最后除以二项式系数。
{"title":"On integer partitions and the Wilcoxon rank-sum statistic","authors":"Andrew V. Sills","doi":"arxiv-2409.05741","DOIUrl":"https://doi.org/arxiv-2409.05741","url":null,"abstract":"In the literature, derivations of exact null distributions of rank-sum\u0000statistics is often avoided in cases where one or more ties exist in the data.\u0000By deriving the null distribution in the no-ties case with the aid of classical\u0000$q$-series results of Euler and Rothe, we demonstrate how a natural\u0000generalization of the method may be employed to derive exact null distributions\u0000even when one or more ties are present in the data. It is suggested that this\u0000method could be implemented in a computer algebra system, or even a more\u0000primitive computer language, so that the normal approximation need not be\u0000employed in the case of small sample sizes, when it is less likely to be very\u0000accurate. Several algorithms for determining exact distributions of the\u0000rank-sum statistic (possibly with ties) have been given in the literature (see\u0000Streitberg and R\"ohmel (1986) and Marx et al. (2016)), but none seem as simple\u0000as the procedure discussed here which amounts to multiplying out a certain\u0000polynomial, extracting coefficients, and finally dividing by a binomal\u0000coefficient.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Empirical Bernstein in smooth Banach spaces 光滑巴拿赫空间中的经验伯恩斯坦
Pub Date : 2024-09-09 DOI: arxiv-2409.06060
Diego Martinez-Taboada, Aaditya Ramdas
Existing concentration bounds for bounded vector-valued random variablesinclude extensions of the scalar Hoeffding and Bernstein inequalities. Whilethe latter is typically tighter, it requires knowing a bound on the variance ofthe random variables. We derive a new vector-valued empirical Bernsteininequality, which makes use of an empirical estimator of the variance insteadof the true variance. The bound holds in 2-smooth separable Banach spaces,which include finite dimensional Euclidean spaces and separable Hilbert spaces.The resulting confidence sets are instantiated for both the batch setting(where the sample size is fixed) and the sequential setting (where the samplesize is a stopping time). The confidence set width asymptotically exactlymatches that achieved by Bernstein in the leading term. The method andsupermartingale proof technique combine several tools of Pinelis (1994) andWaudby-Smith and Ramdas (2024).
现有的有界向量随机变量集中约束包括标量霍夫定不等式和伯恩斯坦不等式的扩展。虽然后者通常更严密,但它需要知道随机变量的方差约束。我们推导出一种新的向量值经验伯恩斯坦不等式,它利用方差的经验估计值代替真实方差。该约束在 2 平滑可分离巴拿赫空间(包括有限维欧几里得空间和可分离希尔伯特空间)中成立。所得到的置信集在批处理设置(样本大小固定)和顺序设置(样本大小为停止时间)中都是实例化的。置信集的宽度在渐近上完全符合伯恩斯坦在前导项上的结果。该方法和超鞅证明技术结合了 Pinelis (1994) 和 Waudby-Smith and Ramdas (2024) 的几种工具。
{"title":"Empirical Bernstein in smooth Banach spaces","authors":"Diego Martinez-Taboada, Aaditya Ramdas","doi":"arxiv-2409.06060","DOIUrl":"https://doi.org/arxiv-2409.06060","url":null,"abstract":"Existing concentration bounds for bounded vector-valued random variables\u0000include extensions of the scalar Hoeffding and Bernstein inequalities. While\u0000the latter is typically tighter, it requires knowing a bound on the variance of\u0000the random variables. We derive a new vector-valued empirical Bernstein\u0000inequality, which makes use of an empirical estimator of the variance instead\u0000of the true variance. The bound holds in 2-smooth separable Banach spaces,\u0000which include finite dimensional Euclidean spaces and separable Hilbert spaces.\u0000The resulting confidence sets are instantiated for both the batch setting\u0000(where the sample size is fixed) and the sequential setting (where the sample\u0000size is a stopping time). The confidence set width asymptotically exactly\u0000matches that achieved by Bernstein in the leading term. The method and\u0000supermartingale proof technique combine several tools of Pinelis (1994) and\u0000Waudby-Smith and Ramdas (2024).","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Markov Chain Variance Estimation: A Stochastic Approximation Approach 马尔可夫链方差估计:随机逼近法
Pub Date : 2024-09-09 DOI: arxiv-2409.05733
Shubhada Agrawal, Prashanth L. A., Siva Theja Maguluri
We consider the problem of estimating the asymptotic variance of a functiondefined on a Markov chain, an important step for statistical inference of thestationary mean. We design the first recursive estimator that requires $O(1)$computation at each step, does not require storing any historical samples orany prior knowledge of run-length, and has optimal $O(frac{1}{n})$ rate ofconvergence for the mean-squared error (MSE) with provable finite sampleguarantees. Here, $n$ refers to the total number of samples generated. Thepreviously best-known rate of convergence in MSE was $O(frac{log n}{n})$,achieved by jackknifed estimators, which also do not enjoy these otherdesirable properties. Our estimator is based on linear stochastic approximationof an equivalent formulation of the asymptotic variance in terms of thesolution of the Poisson equation. We generalize our estimator in several directions, including estimating thecovariance matrix for vector-valued functions, estimating the stationaryvariance of a Markov chain, and approximately estimating the asymptoticvariance in settings where the state space of the underlying Markov chain islarge. We also show applications of our estimator in average rewardreinforcement learning (RL), where we work with asymptotic variance as a riskmeasure to model safety-critical applications. We design a temporal-differencetype algorithm tailored for policy evaluation in this context. We consider boththe tabular and linear function approximation settings. Our work paves the wayfor developing actor-critic style algorithms for variance-constrained RL.
我们考虑的问题是估计一个定义在马尔可夫链上的函数的渐近方差,这是统计推断静态均值的一个重要步骤。我们设计了第一个递归估计器,它每一步都需要计算 $O(1)$,不需要存储任何历史样本或运行长度的任何先验知识,并且具有最优的均方误差(MSE)$O(frac{1}{n})$收敛率和可证明的有限样本保证。这里的 $n$ 是指生成的样本总数。之前最著名的 MSE 收敛率为 $O(fraclog n}{n})$,由千斤顶式估计器实现,它也不具备这些其他理想特性。我们的估计方法基于线性随机近似,即用泊松方程的解等价表达渐近方差。我们在多个方向上推广了我们的估计器,包括估计向量值函数的协方差矩阵、估计马尔可夫链的静态方差,以及在底层马尔可夫链的状态空间很大的情况下近似估计渐近方差。我们还展示了我们的估计器在平均奖励强化学习(RL)中的应用,我们将渐近方差作为一种风险度量来模拟安全关键型应用。在这种情况下,我们设计了一种为策略评估量身定制的时差型算法。我们考虑了表格和线性函数近似设置。我们的工作为开发方差受限 RL 的行为批判式算法铺平了道路。
{"title":"Markov Chain Variance Estimation: A Stochastic Approximation Approach","authors":"Shubhada Agrawal, Prashanth L. A., Siva Theja Maguluri","doi":"arxiv-2409.05733","DOIUrl":"https://doi.org/arxiv-2409.05733","url":null,"abstract":"We consider the problem of estimating the asymptotic variance of a function\u0000defined on a Markov chain, an important step for statistical inference of the\u0000stationary mean. We design the first recursive estimator that requires $O(1)$\u0000computation at each step, does not require storing any historical samples or\u0000any prior knowledge of run-length, and has optimal $O(frac{1}{n})$ rate of\u0000convergence for the mean-squared error (MSE) with provable finite sample\u0000guarantees. Here, $n$ refers to the total number of samples generated. The\u0000previously best-known rate of convergence in MSE was $O(frac{log n}{n})$,\u0000achieved by jackknifed estimators, which also do not enjoy these other\u0000desirable properties. Our estimator is based on linear stochastic approximation\u0000of an equivalent formulation of the asymptotic variance in terms of the\u0000solution of the Poisson equation. We generalize our estimator in several directions, including estimating the\u0000covariance matrix for vector-valued functions, estimating the stationary\u0000variance of a Markov chain, and approximately estimating the asymptotic\u0000variance in settings where the state space of the underlying Markov chain is\u0000large. We also show applications of our estimator in average reward\u0000reinforcement learning (RL), where we work with asymptotic variance as a risk\u0000measure to model safety-critical applications. We design a temporal-difference\u0000type algorithm tailored for policy evaluation in this context. We consider both\u0000the tabular and linear function approximation settings. Our work paves the way\u0000for developing actor-critic style algorithms for variance-constrained RL.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Empirical likelihood for generalized smoothly trimmed mean 广义平滑修剪均值的经验似然法
Pub Date : 2024-09-09 DOI: arxiv-2409.05631
Elina Kresse, Emils Silins, Janis Valeinis
This paper introduces a new version of the smoothly trimmed mean with a moregeneral version of weights, which can be used as an alternative to theclassical trimmed mean. We derive its asymptotic variance and to furtherinvestigate its properties we establish the empirical likelihood for the newestimator. As expected from previous theoretical investigations we show in oursimulations a clear advantage of the proposed estimator over the classicaltrimmed mean estimator. Moreover, the empirical likelihood method gives anadditional advantage for data generated from contaminated models. For theclassical trimmed mean it is generally recommended in practice to usesymmetrical 10% or 20% trimming. However, if the trimming is done close todata gaps, it can even lead to spurious results, as known from the literatureand verified by our simulations. Instead, for practical data examples, wechoose the smoothing parameters by an optimality criterion that minimises thevariance of the proposed estimators.
本文介绍了一种新版本的平滑修剪均值,其权重版本更为宽泛,可作为经典修剪均值的替代方法。我们推导出了它的渐近方差,为了进一步研究它的特性,我们建立了最新估计器的经验似然。正如之前的理论研究所预期的那样,我们在模拟中显示了所提出的估计器相对于经典修剪均值估计器的明显优势。此外,经验似然法还为污染模型生成的数据提供了额外的优势。对于经典修剪均值,在实践中一般建议使用不对称的 10%或 20%修剪。然而,如果在接近数据间隙的地方进行修剪,甚至会导致虚假结果,这在文献中已有记载,我们的模拟也验证了这一点。相反,对于实际数据示例,我们通过最优化准则来选择平滑参数,使所提出的估计值的方差最小化。
{"title":"Empirical likelihood for generalized smoothly trimmed mean","authors":"Elina Kresse, Emils Silins, Janis Valeinis","doi":"arxiv-2409.05631","DOIUrl":"https://doi.org/arxiv-2409.05631","url":null,"abstract":"This paper introduces a new version of the smoothly trimmed mean with a more\u0000general version of weights, which can be used as an alternative to the\u0000classical trimmed mean. We derive its asymptotic variance and to further\u0000investigate its properties we establish the empirical likelihood for the new\u0000estimator. As expected from previous theoretical investigations we show in our\u0000simulations a clear advantage of the proposed estimator over the classical\u0000trimmed mean estimator. Moreover, the empirical likelihood method gives an\u0000additional advantage for data generated from contaminated models. For the\u0000classical trimmed mean it is generally recommended in practice to use\u0000symmetrical 10% or 20% trimming. However, if the trimming is done close to\u0000data gaps, it can even lead to spurious results, as known from the literature\u0000and verified by our simulations. Instead, for practical data examples, we\u0000choose the smoothing parameters by an optimality criterion that minimises the\u0000variance of the proposed estimators.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient estimation with incomplete data via generalised ANOVA decomposition 通过广义方差分解对不完整数据进行高效估计
Pub Date : 2024-09-09 DOI: arxiv-2409.05729
Thomas B. Berrett
We study the efficient estimation of a class of mean functionals in settingswhere a complete multivariate dataset is complemented by additional datasetsrecording subsets of the variables of interest. These datasets are allowed tohave a general, in particular non-monotonic, structure. Our main contributionis to characterise the asymptotic minimal mean squared error for these problemsand to introduce an estimator whose risk approximately matches this lowerbound. We show that the efficient rescaled variance can be expressed as theminimal value of a quadratic optimisation problem over a function space, thusestablishing a fundamental link between these estimation problems and thetheory of generalised ANOVA decompositions. Our estimation procedure usesiterated nonparametric regression to mimic an approximate influence functionderived through gradient descent. We prove that this estimator is approximatelynormally distributed, provide an estimator of its variance and thus developconfidence intervals of asymptotically minimal width. Finally we study a moredirect estimator, which can be seen as a U-statistic with a data-dependentkernel, showing that it is also efficient under stronger regularity conditions.
我们研究的是在一个完整的多元数据集的基础上,通过记录相关变量子集的附加数据集,对一类均值函数进行有效估计的问题。允许这些数据集具有一般结构,特别是非单调结构。我们的主要贡献在于描述了这些问题的渐近最小均方误差,并引入了一种风险与该下限近似匹配的估计器。我们证明,有效的重标方差可以表示为函数空间上二次优化问题的最小值,从而在这些估计问题和广义方差分解理论之间建立了基本联系。我们的估算程序使用迭代非参数回归来模拟通过梯度下降得到的近似影响函数。我们证明了该估计值近似正态分布,提供了其方差的估计值,从而得出了渐近最小宽度的置信区间。最后,我们研究了一种更直接的估计器,它可以看作是具有数据依赖核的 U 统计量,并表明在更强的正则性条件下它也是有效的。
{"title":"Efficient estimation with incomplete data via generalised ANOVA decomposition","authors":"Thomas B. Berrett","doi":"arxiv-2409.05729","DOIUrl":"https://doi.org/arxiv-2409.05729","url":null,"abstract":"We study the efficient estimation of a class of mean functionals in settings\u0000where a complete multivariate dataset is complemented by additional datasets\u0000recording subsets of the variables of interest. These datasets are allowed to\u0000have a general, in particular non-monotonic, structure. Our main contribution\u0000is to characterise the asymptotic minimal mean squared error for these problems\u0000and to introduce an estimator whose risk approximately matches this lower\u0000bound. We show that the efficient rescaled variance can be expressed as the\u0000minimal value of a quadratic optimisation problem over a function space, thus\u0000establishing a fundamental link between these estimation problems and the\u0000theory of generalised ANOVA decompositions. Our estimation procedure uses\u0000iterated nonparametric regression to mimic an approximate influence function\u0000derived through gradient descent. We prove that this estimator is approximately\u0000normally distributed, provide an estimator of its variance and thus develop\u0000confidence intervals of asymptotically minimal width. Finally we study a more\u0000direct estimator, which can be seen as a U-statistic with a data-dependent\u0000kernel, showing that it is also efficient under stronger regularity conditions.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"396 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical Mechanics of Min-Max Problems 最小-最大问题的统计力学
Pub Date : 2024-09-09 DOI: arxiv-2409.06053
Yuma Ichikawa, Koji Hukushima
Min-max optimization problems, also known as saddle point problems, haveattracted significant attention due to their applications in various fields,such as fair beamforming, generative adversarial networks (GANs), andadversarial learning. However, understanding the properties of these min-maxproblems has remained a substantial challenge. This study introduces astatistical mechanical formalism for analyzing the equilibrium values ofmin-max problems in the high-dimensional limit, while appropriately addressingthe order of operations for min and max. As a first step, we apply thisformalism to bilinear min-max games and simple GANs, deriving the relationshipbetween the amount of training data and generalization error and indicating theoptimal ratio of fake to real data for effective learning. This formalismprovides a groundwork for a deeper theoretical analysis of the equilibriumproperties in various machine learning methods based on min-max problems andencourages the development of new algorithms and architectures.
最小最大优化问题又称鞍点问题,因其在公平波束成形、生成式对抗网络(GAN)和对抗学习等多个领域的应用而备受关注。然而,如何理解这些最小问题的特性仍然是一个巨大的挑战。本研究引入了一种统计力学形式主义,用于分析高维极限下的最小-最大问题的均衡值,同时适当解决最小和最大的操作顺序问题。作为第一步,我们将这一形式主义应用于双线性最小-最大博弈和简单的 GAN,推导出训练数据量与泛化误差之间的关系,并指出有效学习所需的假数据与真实数据的最佳比例。这一形式主义为更深入地从理论上分析基于最小最大问题的各种机器学习方法中的均衡属性奠定了基础,并促进了新算法和新架构的开发。
{"title":"Statistical Mechanics of Min-Max Problems","authors":"Yuma Ichikawa, Koji Hukushima","doi":"arxiv-2409.06053","DOIUrl":"https://doi.org/arxiv-2409.06053","url":null,"abstract":"Min-max optimization problems, also known as saddle point problems, have\u0000attracted significant attention due to their applications in various fields,\u0000such as fair beamforming, generative adversarial networks (GANs), and\u0000adversarial learning. However, understanding the properties of these min-max\u0000problems has remained a substantial challenge. This study introduces a\u0000statistical mechanical formalism for analyzing the equilibrium values of\u0000min-max problems in the high-dimensional limit, while appropriately addressing\u0000the order of operations for min and max. As a first step, we apply this\u0000formalism to bilinear min-max games and simple GANs, deriving the relationship\u0000between the amount of training data and generalization error and indicating the\u0000optimal ratio of fake to real data for effective learning. This formalism\u0000provides a groundwork for a deeper theoretical analysis of the equilibrium\u0000properties in various machine learning methods based on min-max problems and\u0000encourages the development of new algorithms and architectures.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Common or specific source, features or scores; it is all a matter of information 共同或特定来源、特征或分数;这都是信息问题
Pub Date : 2024-09-09 DOI: arxiv-2409.05403
Aafko Boonstra, Ronald Meester, Klaas Slooten
We show that the incorporation of any new piece of information allows forimproved decision making in the sense that the expected costs of an optimaldecision decrease (or, in boundary cases where no or not enough new informationis incorporated, stays the same) whenever this is done by the appropriateupdate of the probabilities of the hypotheses. Versions of this result havebeen stated before. However, previous proofs rely on auxiliary constructionswith proper scoring rules. We, instead, offer a direct and completely generalproof by considering elementary properties of likelihood ratios only. We dopoint out the relation to proper scoring rules. We apply our results to make acontribution to the debates about the use of score based/feature based andcommon/specific source likelihood ratios. In the literature these are oftenpresented as different ``LR-systems''. We argue that deciding which LR tocompute is simply a matter of the available information. There is no such thingas different ``LR-systems'', there are only differences in the availableinformation. In particular, despite claims to the contrary, scores can verywell be used in forensic practice and we illustrate this with an extensiveexample in DNA kinship context.
我们证明,只要适当更新假设的概率,纳入任何新信息都能改善决策,即最优决策的预期成本会降低(或者,在没有或没有纳入足够新信息的边界情况下,预期成本保持不变)。这一结果的不同版本以前也曾提出过。然而,以前的证明依赖于具有适当评分规则的辅助构造。而我们只考虑似然比的基本性质,就提供了一个直接的、完全通用的证明。我们指出了与适当评分规则的关系。我们运用我们的结果,为关于使用基于分数/基于特征和基于共同/特定来源的似然比的争论做出了贡献。在文献中,这些通常被表述为不同的 "LR 系统"。我们认为,决定计算哪种 LR 只是可用信息的问题。并不存在不同的 "LR 系统",只有可用信息的差异。特别是,尽管有相反的说法,分数在法医实践中还是可以很好地使用,我们以 DNA 亲缘关系中的大量实例来说明这一点。
{"title":"Common or specific source, features or scores; it is all a matter of information","authors":"Aafko Boonstra, Ronald Meester, Klaas Slooten","doi":"arxiv-2409.05403","DOIUrl":"https://doi.org/arxiv-2409.05403","url":null,"abstract":"We show that the incorporation of any new piece of information allows for\u0000improved decision making in the sense that the expected costs of an optimal\u0000decision decrease (or, in boundary cases where no or not enough new information\u0000is incorporated, stays the same) whenever this is done by the appropriate\u0000update of the probabilities of the hypotheses. Versions of this result have\u0000been stated before. However, previous proofs rely on auxiliary constructions\u0000with proper scoring rules. We, instead, offer a direct and completely general\u0000proof by considering elementary properties of likelihood ratios only. We do\u0000point out the relation to proper scoring rules. We apply our results to make a\u0000contribution to the debates about the use of score based/feature based and\u0000common/specific source likelihood ratios. In the literature these are often\u0000presented as different ``LR-systems''. We argue that deciding which LR to\u0000compute is simply a matter of the available information. There is no such thing\u0000as different ``LR-systems'', there are only differences in the available\u0000information. In particular, despite claims to the contrary, scores can very\u0000well be used in forensic practice and we illustrate this with an extensive\u0000example in DNA kinship context.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"58 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Continuous Generalization of Hypothesis Testing 假设检验的连续一般化
Pub Date : 2024-09-09 DOI: arxiv-2409.05654
Nick W. Koning
Testing has developed into the fundamental statistical framework forfalsifying hypotheses. Unfortunately, tests are binary in nature: a test eitherrejects a hypothesis or not. Such binary decisions do not reflect the realityof many scientific studies, which often aim to present the evidence against ahypothesis and do not necessarily intend to establish a definitive conclusion.To solve this, we propose the continuous generalization of a test, which we useto measure the evidence against a hypothesis. Such a continuous test can beinterpreted as a non-randomized interpretation of the classical 'randomizedtest'. This offers the benefits of a randomized test, without the downsides ofexternal randomization. Another interpretation is as a literal measure, whichmeasures the amount of binary tests that reject the hypothesis. Our work alsooffers a new perspective on the $e$-value: the $e$-value is recovered as acontinuous test with $alpha to 0$, or as an unbounded measure of the amountof rejections.
检验已发展成为证伪假设的基本统计框架。遗憾的是,检验在本质上是二元对立的:检验要么否定假设,要么不否定假设。为了解决这个问题,我们提出了检验的连续泛化,我们用它来衡量反对假设的证据。这种连续检验可以解释为经典 "随机检验 "的非随机化解释。它既有随机试验的优点,又没有外部随机化的缺点。另一种解释是字面测量,即测量拒绝假设的二元检验的数量。我们的工作还为e$值提供了一个新的视角:e$值被恢复为$alpha to 0$的连续检验,或作为拒绝量的无界度量。
{"title":"A Continuous Generalization of Hypothesis Testing","authors":"Nick W. Koning","doi":"arxiv-2409.05654","DOIUrl":"https://doi.org/arxiv-2409.05654","url":null,"abstract":"Testing has developed into the fundamental statistical framework for\u0000falsifying hypotheses. Unfortunately, tests are binary in nature: a test either\u0000rejects a hypothesis or not. Such binary decisions do not reflect the reality\u0000of many scientific studies, which often aim to present the evidence against a\u0000hypothesis and do not necessarily intend to establish a definitive conclusion.\u0000To solve this, we propose the continuous generalization of a test, which we use\u0000to measure the evidence against a hypothesis. Such a continuous test can be\u0000interpreted as a non-randomized interpretation of the classical 'randomized\u0000test'. This offers the benefits of a randomized test, without the downsides of\u0000external randomization. Another interpretation is as a literal measure, which\u0000measures the amount of binary tests that reject the hypothesis. Our work also\u0000offers a new perspective on the $e$-value: the $e$-value is recovered as a\u0000continuous test with $alpha to 0$, or as an unbounded measure of the amount\u0000of rejections.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - STAT - Statistics Theory
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1