首页 > 最新文献

arXiv - MATH - Statistics Theory最新文献

英文 中文
Precision-based designs for sequential randomized experiments 基于精确度的顺序随机实验设计
Pub Date : 2024-05-06 DOI: arxiv-2405.03487
Mattias Nordin, Mårten Schultzberg
In this paper, we consider an experimental setting where units enter theexperiment sequentially. Our goal is to form stopping rules which lead toestimators of treatment effects with a given precision. We propose afixed-width confidence interval design (FWCID) where the experiment terminatesonce a pre-specified confidence interval width is achieved. We show that underthis design, the difference-in-means estimator is a consistent estimator of theaverage treatment effect and standard confidence intervals have asymptoticguarantees of coverage and efficiency for several versions of the design. Inaddition, we propose a version of the design that we call fixed power design(FPD) where a given power is asymptotically guaranteed for a given treatmenteffect, without the need to specify the variances of the outcomes undertreatment or control. In addition, this design also gives a consistentdifference-in-means estimator with correct coverage of the correspondingstandard confidence interval. We complement our theoretical findings with MonteCarlo simulations where we compare our proposed designs with standard designsin the sequential experiments literature, showing that our designs outperformthese designs in several important aspects. We believe our results to berelevant for many experimental settings where units enter sequentially, such asin clinical trials, as well as in online A/B tests used by the tech ande-commerce industry.
在本文中,我们考虑了单位依次进入实验的实验环境。我们的目标是制定停止规则,从而得出具有给定精度的治疗效果估计值。我们提出了一种固定宽度置信区间设计(FWCID),一旦达到预先指定的置信区间宽度,实验即终止。我们的研究表明,在这种设计下,均值差估计器是一个一致的平均治疗效果估计器,而且标准置信区间在几种设计版本中都有覆盖率和效率的渐近保证。此外,我们还提出了一种称为固定功率设计(FPD)的设计版本,在这种设计中,对于给定的治疗效果,给定的功率可以得到渐近保证,而无需指定治疗或对照结果的方差。此外,这种设计还给出了一致的均值差估计值,并能正确覆盖相应的标准置信区间。我们用蒙特卡罗模拟对我们的理论发现进行了补充,并将我们提出的设计与顺序实验文献中的标准设计进行了比较,结果表明我们的设计在几个重要方面优于这些设计。我们相信,我们的研究结果适用于许多单位依次进入的实验环境,例如临床试验,以及科技和电子商务行业使用的在线 A/B 测试。
{"title":"Precision-based designs for sequential randomized experiments","authors":"Mattias Nordin, Mårten Schultzberg","doi":"arxiv-2405.03487","DOIUrl":"https://doi.org/arxiv-2405.03487","url":null,"abstract":"In this paper, we consider an experimental setting where units enter the\u0000experiment sequentially. Our goal is to form stopping rules which lead to\u0000estimators of treatment effects with a given precision. We propose a\u0000fixed-width confidence interval design (FWCID) where the experiment terminates\u0000once a pre-specified confidence interval width is achieved. We show that under\u0000this design, the difference-in-means estimator is a consistent estimator of the\u0000average treatment effect and standard confidence intervals have asymptotic\u0000guarantees of coverage and efficiency for several versions of the design. In\u0000addition, we propose a version of the design that we call fixed power design\u0000(FPD) where a given power is asymptotically guaranteed for a given treatment\u0000effect, without the need to specify the variances of the outcomes under\u0000treatment or control. In addition, this design also gives a consistent\u0000difference-in-means estimator with correct coverage of the corresponding\u0000standard confidence interval. We complement our theoretical findings with Monte\u0000Carlo simulations where we compare our proposed designs with standard designs\u0000in the sequential experiments literature, showing that our designs outperform\u0000these designs in several important aspects. We believe our results to be\u0000relevant for many experimental settings where units enter sequentially, such as\u0000in clinical trials, as well as in online A/B tests used by the tech and\u0000e-commerce industry.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Strang Splitting for Parametric Inference in Second-order Stochastic Differential Equations 二阶随机微分方程中参数推理的 Strang Splitting
Pub Date : 2024-05-06 DOI: arxiv-2405.03606
Predrag Pilipovic, Adeline Samson, Susanne Ditlevsen
We address parameter estimation in second-order stochastic differentialequations (SDEs), prevalent in physics, biology, and ecology. Second-order SDEis converted to a first-order system by introducing an auxiliary velocityvariable raising two main challenges. First, the system is hypoelliptic sincethe noise affects only the velocity, making the Euler-Maruyama estimatorill-conditioned. To overcome that, we propose an estimator based on the Strangsplitting scheme. Second, since the velocity is rarely observed we adjust theestimator for partial observations. We present four estimators for complete andpartial observations, using full likelihood or only velocity marginallikelihood. These estimators are intuitive, easy to implement, andcomputationally fast, and we prove their consistency and asymptotic normality.Our analysis demonstrates that using full likelihood with complete observationsreduces the asymptotic variance of the diffusion estimator. With partialobservations, the asymptotic variance increases due to information loss butremains unaffected by the likelihood choice. However, a numerical study on theKramers oscillator reveals that using marginal likelihood for partialobservations yields less biased estimators. We apply our approach topaleoclimate data from the Greenland ice core and fit it to the Kramersoscillator model, capturing transitions between metastable states reflectingobserved climatic conditions during glacial eras.
我们探讨了物理学、生物学和生态学中常见的二阶随机微分方程(SDE)的参数估计问题。通过引入辅助速度变量,二阶微分方程被转换为一阶系统,这带来了两大挑战。首先,该系统是低椭圆的,噪声只影响速度,这使得 Euler-Maruyama 估计器缺乏条件。为了克服这一问题,我们提出了一种基于斯特朗斯分裂方案的估计器。其次,由于很少观测到速度,我们调整了部分观测的估计器。我们针对完整观测和部分观测提出了四种估计方法,分别使用完全似然法或仅使用速度边际似然法。这些估计器直观、易于实现、计算速度快,我们还证明了它们的一致性和渐近正态性。我们的分析表明,使用完全似然法进行完全观测会降低扩散估计器的渐近方差。我们的分析表明,在有部分观测数据的情况下,使用完全似然会减小扩散估计器的渐近方差;在有部分观测数据的情况下,渐近方差会因信息丢失而增大,但不受似然选择的影响。然而,对克拉默振荡器的数值研究表明,对部分观测使用边际似然法可以得到偏差较小的估计值。我们将这一方法应用于格陵兰冰芯中的大气候数据,并将其与克拉默振荡器模型进行拟合,从而捕捉到反映冰川时代气候条件的可变状态之间的转换。
{"title":"Strang Splitting for Parametric Inference in Second-order Stochastic Differential Equations","authors":"Predrag Pilipovic, Adeline Samson, Susanne Ditlevsen","doi":"arxiv-2405.03606","DOIUrl":"https://doi.org/arxiv-2405.03606","url":null,"abstract":"We address parameter estimation in second-order stochastic differential\u0000equations (SDEs), prevalent in physics, biology, and ecology. Second-order SDE\u0000is converted to a first-order system by introducing an auxiliary velocity\u0000variable raising two main challenges. First, the system is hypoelliptic since\u0000the noise affects only the velocity, making the Euler-Maruyama estimator\u0000ill-conditioned. To overcome that, we propose an estimator based on the Strang\u0000splitting scheme. Second, since the velocity is rarely observed we adjust the\u0000estimator for partial observations. We present four estimators for complete and\u0000partial observations, using full likelihood or only velocity marginal\u0000likelihood. These estimators are intuitive, easy to implement, and\u0000computationally fast, and we prove their consistency and asymptotic normality.\u0000Our analysis demonstrates that using full likelihood with complete observations\u0000reduces the asymptotic variance of the diffusion estimator. With partial\u0000observations, the asymptotic variance increases due to information loss but\u0000remains unaffected by the likelihood choice. However, a numerical study on the\u0000Kramers oscillator reveals that using marginal likelihood for partial\u0000observations yields less biased estimators. We apply our approach to\u0000paleoclimate data from the Greenland ice core and fit it to the Kramers\u0000oscillator model, capturing transitions between metastable states reflecting\u0000observed climatic conditions during glacial eras.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"238 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection 广义去偏拉索的稳定性及其在基于重采样的变量选择中的应用
Pub Date : 2024-05-05 DOI: arxiv-2405.03063
Jingbo Liu
Suppose that we first apply the Lasso to a design matrix, and then update oneof its columns. In general, the signs of the Lasso coefficients may change, andthere is no closed-form expression for updating the Lasso solution exactly. Inthis work, we propose an approximate formula for updating a debiased Lassocoefficient. We provide general nonasymptotic error bounds in terms of thenorms and correlations of a given design matrix's columns, and then proveasymptotic convergence results for the case of a random design matrix withi.i.d. sub-Gaussian row vectors and i.i.d. Gaussian noise. Notably, theapproximate formula is asymptotically correct for most coordinates in theproportional growth regime, under the mild assumption that each row of thedesign matrix is sub-Gaussian with a covariance matrix having a boundedcondition number. Our proof only requires certain concentration andanti-concentration properties to control various error terms and the number ofsign changes. In contrast, rigorously establishing distributional limitproperties (e.g. Gaussian limits for the debiased Lasso) under similarlygeneral assumptions has been considered open problem in the universalitytheory. As applications, we show that the approximate formula allows us toreduce the computation complexity of variable selection algorithms that requiresolving multiple Lasso problems, such as the conditional randomization test anda variant of the knockoff filter.
假设我们首先对设计矩阵应用拉索法,然后更新其中一列。一般来说,Lasso 系数的符号可能会发生变化,因此没有精确更新 Lasso 解的封闭式表达式。在这项工作中,我们提出了更新去偏拉索系数的近似公式。我们用给定设计矩阵列的矩阵和相关性提供了一般的非渐近误差边界,然后证明了具有 i.i.d. sub-Gaussian 行向量和 i.i.d. Gaussian 噪声的随机设计矩阵的渐近收敛结果。值得注意的是,在设计矩阵的每一行都是亚高斯、协方差矩阵具有约束条件数的温和假设下,近似公式对于比例增长机制中的大多数坐标都是渐进正确的。我们的证明只需要一定的集中和反集中特性来控制各种误差项和符号变化的数量。相比之下,在类似的一般假设条件下严格建立分布极限特性(如去势拉索的高斯极限)一直被认为是普遍性理论中的未决问题。作为应用,我们展示了近似公式允许我们降低需要解决多个拉索问题的变量选择算法的计算复杂度,例如条件随机化检验和一种变体的山寨过滤器。
{"title":"Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection","authors":"Jingbo Liu","doi":"arxiv-2405.03063","DOIUrl":"https://doi.org/arxiv-2405.03063","url":null,"abstract":"Suppose that we first apply the Lasso to a design matrix, and then update one\u0000of its columns. In general, the signs of the Lasso coefficients may change, and\u0000there is no closed-form expression for updating the Lasso solution exactly. In\u0000this work, we propose an approximate formula for updating a debiased Lasso\u0000coefficient. We provide general nonasymptotic error bounds in terms of the\u0000norms and correlations of a given design matrix's columns, and then prove\u0000asymptotic convergence results for the case of a random design matrix with\u0000i.i.d. sub-Gaussian row vectors and i.i.d. Gaussian noise. Notably, the\u0000approximate formula is asymptotically correct for most coordinates in the\u0000proportional growth regime, under the mild assumption that each row of the\u0000design matrix is sub-Gaussian with a covariance matrix having a bounded\u0000condition number. Our proof only requires certain concentration and\u0000anti-concentration properties to control various error terms and the number of\u0000sign changes. In contrast, rigorously establishing distributional limit\u0000properties (e.g. Gaussian limits for the debiased Lasso) under similarly\u0000general assumptions has been considered open problem in the universality\u0000theory. As applications, we show that the approximate formula allows us to\u0000reduce the computation complexity of variable selection algorithms that require\u0000solving multiple Lasso problems, such as the conditional randomization test and\u0000a variant of the knockoff filter.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"118 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Limiting Behavior of Maxima under Dependence 依赖性下最大值的极限行为
Pub Date : 2024-05-05 DOI: arxiv-2405.02833
Klaus Herrmann, Marius Hofert, Johanna G. Neslehova
Weak convergence of maxima of dependent sequences of identically distributedcontinuous random variables is studied under normalizing sequences arising assubsequences of the normalizing sequences from an associated iid sequence. Thisgeneral framework allows one to derive several generalizations of thewell-known Fisher-Tippett-Gnedenko theorem under conditions on the univariatemarginal distribution and the dependence structure of the sequence. Thelimiting distributions are shown to be compositions of a generalized extremevalue distribution and a distortion function which reflects the limitingbehavior of the diagonal of the underlying copula. Uniform convergence ratesfor the weak convergence to the limiting distribution are also derived.Examples covering well-known dependence structures are provided. Severalexisting results, e.g. for exchangeable sequences or stationary time series,are embedded in the proposed framework.
研究了同分布连续随机变量依存序列最大值的弱收敛性,它是在一个相关的 iid 序列的归一化序列的子序列下产生的。在这个一般框架下,我们可以根据序列的单边边际分布和依存结构条件,推导出著名的费希尔-蒂佩特-格内登科定理的几个一般化。极限分布被证明是广义极值分布和扭曲函数的组合,扭曲函数反映了基础协方差对角线的极限行为。此外,还推导出了弱收敛到极限分布的均匀收敛率。一些现有的结果,如可交换序列或静态时间序列的结果,都被嵌入到了所提出的框架中。
{"title":"Limiting Behavior of Maxima under Dependence","authors":"Klaus Herrmann, Marius Hofert, Johanna G. Neslehova","doi":"arxiv-2405.02833","DOIUrl":"https://doi.org/arxiv-2405.02833","url":null,"abstract":"Weak convergence of maxima of dependent sequences of identically distributed\u0000continuous random variables is studied under normalizing sequences arising as\u0000subsequences of the normalizing sequences from an associated iid sequence. This\u0000general framework allows one to derive several generalizations of the\u0000well-known Fisher-Tippett-Gnedenko theorem under conditions on the univariate\u0000marginal distribution and the dependence structure of the sequence. The\u0000limiting distributions are shown to be compositions of a generalized extreme\u0000value distribution and a distortion function which reflects the limiting\u0000behavior of the diagonal of the underlying copula. Uniform convergence rates\u0000for the weak convergence to the limiting distribution are also derived.\u0000Examples covering well-known dependence structures are provided. Several\u0000existing results, e.g. for exchangeable sequences or stationary time series,\u0000are embedded in the proposed framework.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Probabilistic cellular automata with local transition matrices: synchronization, ergodicity, and inference 具有局部过渡矩阵的概率蜂窝自动机:同步、遍历性和推理
Pub Date : 2024-05-05 DOI: arxiv-2405.02928
Erhan Bayrakta, Fei Lu, Mauro Maggioni, Ruoyu Wu, Sichen Yang
We introduce a new class of probabilistic cellular automata that are capableof exhibiting rich dynamics such as synchronization and ergodicity and can beeasily inferred from data. The system is a finite-state locally interactingMarkov chain on a circular graph. Each site's subsequent state is random, witha distribution determined by its neighborhood's empirical distributionmultiplied by a local transition matrix. We establish sufficient and necessaryconditions on the local transition matrix for synchronization and ergodicity.Also, we introduce novel least squares estimators for inferring the localtransition matrix from various types of data, which may consist of eithermultiple trajectories, a long trajectory, or ensemble sequences withouttrajectory information. Under suitable identifiability conditions, we show theasymptotic normality of these estimators and provide non-asymptotic bounds fortheir accuracy.
我们介绍了一类新的概率蜂窝自动机,它能够展示丰富的动力学特性,如同步性和遍历性,并且可以很容易地从数据中推断出来。该系统是环形图上的有限状态局部交互马尔可夫链。每个站点的后续状态都是随机的,其分布由其邻域的经验分布乘以局部转换矩阵决定。我们还引入了新的最小二乘估计器,用于从各种类型的数据中推断局部过渡矩阵,这些数据可能由多个轨迹、长轨迹或无轨迹信息的集合序列组成。在合适的可识别性条件下,我们证明了这些估计器的渐近正态性,并为其精度提供了非渐近边界。
{"title":"Probabilistic cellular automata with local transition matrices: synchronization, ergodicity, and inference","authors":"Erhan Bayrakta, Fei Lu, Mauro Maggioni, Ruoyu Wu, Sichen Yang","doi":"arxiv-2405.02928","DOIUrl":"https://doi.org/arxiv-2405.02928","url":null,"abstract":"We introduce a new class of probabilistic cellular automata that are capable\u0000of exhibiting rich dynamics such as synchronization and ergodicity and can be\u0000easily inferred from data. The system is a finite-state locally interacting\u0000Markov chain on a circular graph. Each site's subsequent state is random, with\u0000a distribution determined by its neighborhood's empirical distribution\u0000multiplied by a local transition matrix. We establish sufficient and necessary\u0000conditions on the local transition matrix for synchronization and ergodicity.\u0000Also, we introduce novel least squares estimators for inferring the local\u0000transition matrix from various types of data, which may consist of either\u0000multiple trajectories, a long trajectory, or ensemble sequences without\u0000trajectory information. Under suitable identifiability conditions, we show the\u0000asymptotic normality of these estimators and provide non-asymptotic bounds for\u0000their accuracy.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tuning parameter selection in econometrics 调整计量经济学中的参数选择
Pub Date : 2024-05-05 DOI: arxiv-2405.03021
Denis Chetverikov
I review some of the main methods for selecting tuning parameters innonparametric and $ell_1$-penalized estimation. For the nonparametricestimation, I consider the methods of Mallows, Stein, Lepski, cross-validation,penalization, and aggregation in the context of series estimation. For the$ell_1$-penalized estimation, I consider the methods based on the theory ofself-normalized moderate deviations, bootstrap, Stein's unbiased riskestimation, and cross-validation in the context of Lasso estimation. I explainthe intuition behind each of the methods and discuss their comparativeadvantages. I also give some extensions.
我回顾了在非参数估计和 $ell_1$ 惩罚估计中选择调整参数的一些主要方法。对于非参数估计,我考虑了 Mallows、Stein、Lepski、交叉验证、惩罚以及序列估计中的聚合等方法。对于$ell_1$-惩罚估计,我考虑了基于自归一化中等偏差理论的方法、bootstrap、Stein 的无偏风险估计以及 Lasso 估计背景下的交叉验证。我解释了每种方法背后的直觉,并讨论了它们的比较优势。我还给出了一些扩展。
{"title":"Tuning parameter selection in econometrics","authors":"Denis Chetverikov","doi":"arxiv-2405.03021","DOIUrl":"https://doi.org/arxiv-2405.03021","url":null,"abstract":"I review some of the main methods for selecting tuning parameters in\u0000nonparametric and $ell_1$-penalized estimation. For the nonparametric\u0000estimation, I consider the methods of Mallows, Stein, Lepski, cross-validation,\u0000penalization, and aggregation in the context of series estimation. For the\u0000$ell_1$-penalized estimation, I consider the methods based on the theory of\u0000self-normalized moderate deviations, bootstrap, Stein's unbiased risk\u0000estimation, and cross-validation in the context of Lasso estimation. I explain\u0000the intuition behind each of the methods and discuss their comparative\u0000advantages. I also give some extensions.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Negative Probability 负概率
Pub Date : 2024-05-05 DOI: arxiv-2405.03043
Nick Polson, Vadim Sokolov
Negative probabilities arise primarily in quantum theory and computing.Bartlett provides a definition based on characteristic functions andextraordinary random variables. As Bartlett observes, negative probabilitiesmust always be combined with positive probabilities to yield a validprobability distribution before any physical interpretation is admissible.Negative probabilities arise as mixing distributions of unobserved latentvariables in Bayesian modeling. Our goal is to provide a link with dualdensities and the class of scale mixtures of normal distributions. We providean analysis of the classic half coin distribution and Feynman's negativeprobability examples. A number of examples of dual densities with negativemixing measures including the linnik distribution, Wigner distribution and thestable distribution are provided. Finally, we conclude with directions forfuture research.
巴特利特根据特征函数和超常随机变量给出了一个定义。正如巴特利特所观察到的,负概率必须总是与正概率相结合,才能产生有效的概率分布,然后才能进行任何物理解释。负概率作为贝叶斯建模中未观察到的潜在变量的混合分布而出现。我们的目标是提供一种与对偶性和正态分布规模混合物类的联系。我们分析了经典的半硬币分布和费曼的负概率例子。我们还提供了一些具有负混合度量的对偶密度实例,包括林尼克分布、维格纳分布和稳定分布。最后,我们总结了未来的研究方向。
{"title":"Negative Probability","authors":"Nick Polson, Vadim Sokolov","doi":"arxiv-2405.03043","DOIUrl":"https://doi.org/arxiv-2405.03043","url":null,"abstract":"Negative probabilities arise primarily in quantum theory and computing.\u0000Bartlett provides a definition based on characteristic functions and\u0000extraordinary random variables. As Bartlett observes, negative probabilities\u0000must always be combined with positive probabilities to yield a valid\u0000probability distribution before any physical interpretation is admissible.\u0000Negative probabilities arise as mixing distributions of unobserved latent\u0000variables in Bayesian modeling. Our goal is to provide a link with dual\u0000densities and the class of scale mixtures of normal distributions. We provide\u0000an analysis of the classic half coin distribution and Feynman's negative\u0000probability examples. A number of examples of dual densities with negative\u0000mixing measures including the linnik distribution, Wigner distribution and the\u0000stable distribution are provided. Finally, we conclude with directions for\u0000future research.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"177 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unscented Trajectory Optimization 无色轨迹优化
Pub Date : 2024-05-04 DOI: arxiv-2405.02753
I. M. Ross, R. J. Proulx, M. Karpenko
In a nutshell, unscented trajectory optimization is the generation of optimaltrajectories through the use of an unscented transform. Although unscentedtrajectory optimization was introduced by the authors about a decade ago, it isreintroduced in this paper as a special instantiation of tychastic optimalcontrol theory. Tychastic optimal control theory (from textit{Tyche}, theGreek goddess of chance) avoids the use of a Brownian motion and the resultingIt^{o} calculus even though it uses random variables across the entirespectrum of a problem formulation. This approach circumvents the enormoustechnical and numerical challenges associated with stochastic trajectoryoptimization. Furthermore, it is shown how a tychastic optimal control problemthat involves nonlinear transformations of the expectation operator can bequickly instantiated using an unscented transform. These nonlineartransformations are particularly useful in managing trajectory dispersions beit associated with path constraints or targeted values of final-timeconditions. This paper also presents a systematic and rapid process forformulating and computing the most desirable tychastic trajectory using anunscented transform. Numerical examples are used to illustrate how unscentedtrajectory optimization may be used for risk reduction and mission recoverycaused by uncertainties and failures.
简而言之,无特征轨迹优化就是通过使用无特征变换生成最优轨迹。虽然无特征轨迹优化是作者在大约十年前提出的,但本文将其作为tychastic最优控制理论的一个特殊实例加以介绍。tychastic最优控制理论(源自希腊的机会女神textit{Tyche})避免使用布朗运动和由此产生的It^{o}微积分,即使它在问题表述的整个过程中使用随机变量。这种方法规避了与随机轨迹优化相关的巨大技术和数值挑战。此外,它还展示了如何使用无符号变换快速实例化涉及期望算子非线性变换的tychastic最优控制问题。这些非线性变换在管理与路径约束或最终时间条件的目标值相关的轨迹离散方面特别有用。本文还介绍了一种系统化的快速流程,用于使用非香味变换制定和计算最理想的非线性轨迹。通过数值示例,说明了无特征轨迹优化如何用于降低不确定性和故障导致的风险和任务恢复。
{"title":"Unscented Trajectory Optimization","authors":"I. M. Ross, R. J. Proulx, M. Karpenko","doi":"arxiv-2405.02753","DOIUrl":"https://doi.org/arxiv-2405.02753","url":null,"abstract":"In a nutshell, unscented trajectory optimization is the generation of optimal\u0000trajectories through the use of an unscented transform. Although unscented\u0000trajectory optimization was introduced by the authors about a decade ago, it is\u0000reintroduced in this paper as a special instantiation of tychastic optimal\u0000control theory. Tychastic optimal control theory (from textit{Tyche}, the\u0000Greek goddess of chance) avoids the use of a Brownian motion and the resulting\u0000It^{o} calculus even though it uses random variables across the entire\u0000spectrum of a problem formulation. This approach circumvents the enormous\u0000technical and numerical challenges associated with stochastic trajectory\u0000optimization. Furthermore, it is shown how a tychastic optimal control problem\u0000that involves nonlinear transformations of the expectation operator can be\u0000quickly instantiated using an unscented transform. These nonlinear\u0000transformations are particularly useful in managing trajectory dispersions be\u0000it associated with path constraints or targeted values of final-time\u0000conditions. This paper also presents a systematic and rapid process for\u0000formulating and computing the most desirable tychastic trajectory using an\u0000unscented transform. Numerical examples are used to illustrate how unscented\u0000trajectory optimization may be used for risk reduction and mission recovery\u0000caused by uncertainties and failures.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"118 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Power-Enhanced Two-Sample Mean Tests for High-Dimensional Compositional Data with Application to Microbiome Data Analysis 应用于微生物组数据分析的高维组合数据的功率增强型双样本均值检验
Pub Date : 2024-05-04 DOI: arxiv-2405.02551
Danning Li, Lingzhou Xue, Haoyi Yang, Xiufan Yu
Testing differences in mean vectors is a fundamental task in the analysis ofhigh-dimensional compositional data. Existing methods may suffer from low powerif the underlying signal pattern is in a situation that does not favor thedeployed test. In this work, we develop two-sample power-enhanced mean testsfor high-dimensional compositional data based on the combination of $p$-values,which integrates strengths from two popular types of tests: the maximum-typetest and the quadratic-type test. We provide rigorous theoretical guarantees onthe proposed tests, showing accurate Type-I error rate control and enhancedtesting power. Our method boosts the testing power towards a broaderalternative space, which yields robust performance across a wide range ofsignal pattern settings. Our theory also contributes to the literature on powerenhancement and Gaussian approximation for high-dimensional hypothesis testing.We demonstrate the performance of our method on both simulated data andreal-world microbiome data, showing that our proposed approach improves thetesting power substantially compared to existing methods.
测试平均向量的差异是分析高维合成数据的一项基本任务。如果底层信号模式处于不利于所部署的测试的情况,现有方法可能会受到低功率的影响。在这项工作中,我们开发了基于 $p$ 值组合的高维组合数据双样本功率增强均值检验,它整合了两种流行检验类型的优势:最大类型检验和二次类型检验。我们为所提出的检验提供了严格的理论保证,显示了精确的第一类错误率控制和更强的检验能力。我们的方法将测试能力提升到了一个广阔的替代空间,从而在各种信号模式设置中都能获得稳健的性能。我们在模拟数据和真实世界微生物组数据上展示了我们方法的性能,表明与现有方法相比,我们提出的方法大大提高了测试能力。
{"title":"Power-Enhanced Two-Sample Mean Tests for High-Dimensional Compositional Data with Application to Microbiome Data Analysis","authors":"Danning Li, Lingzhou Xue, Haoyi Yang, Xiufan Yu","doi":"arxiv-2405.02551","DOIUrl":"https://doi.org/arxiv-2405.02551","url":null,"abstract":"Testing differences in mean vectors is a fundamental task in the analysis of\u0000high-dimensional compositional data. Existing methods may suffer from low power\u0000if the underlying signal pattern is in a situation that does not favor the\u0000deployed test. In this work, we develop two-sample power-enhanced mean tests\u0000for high-dimensional compositional data based on the combination of $p$-values,\u0000which integrates strengths from two popular types of tests: the maximum-type\u0000test and the quadratic-type test. We provide rigorous theoretical guarantees on\u0000the proposed tests, showing accurate Type-I error rate control and enhanced\u0000testing power. Our method boosts the testing power towards a broader\u0000alternative space, which yields robust performance across a wide range of\u0000signal pattern settings. Our theory also contributes to the literature on power\u0000enhancement and Gaussian approximation for high-dimensional hypothesis testing.\u0000We demonstrate the performance of our method on both simulated data and\u0000real-world microbiome data, showing that our proposed approach improves the\u0000testing power substantially compared to existing methods.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140890018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Grouping predictors via network-wide metrics 通过全网指标对预测因子进行分组
Pub Date : 2024-05-04 DOI: arxiv-2405.02715
Brandon Woosuk Park, Anand N. Vidyashankar, Tucker S. McElroy
When multitudes of features can plausibly be associated with a response, bothprivacy considerations and model parsimony suggest grouping them to increasethe predictive power of a regression model. Specifically, the identification ofgroups of predictors significantly associated with the response variable easesfurther downstream analysis and decision-making. This paper proposes a new dataanalysis methodology that utilizes the high-dimensional predictor space toconstruct an implicit network with weighted edges %and weights on the edges toidentify significant associations between the response and the predictors.Using a population model for groups of predictors defined via network-widemetrics, a new supervised grouping algorithm is proposed to determine thecorrect group, with probability tending to one as the sample size diverges toinfinity. For this reason, we establish several theoretical properties of theestimates of network-wide metrics. A novel model-assisted bootstrap procedurethat substantially decreases computational complexity is developed,facilitating the assessment of uncertainty in the estimates of network-widemetrics. The proposed methods account for several challenges that arise in thehigh-dimensional data setting, including (i) a large number of predictors, (ii)uncertainty regarding the true statistical model, and (iii) model selectionvariability. The performance of the proposed methods is demonstrated throughnumerical experiments, data from sports analytics, and breast cancer data.
当许多特征都有可能与响应相关联时,出于隐私考虑和模型简约性的考虑,建议将这些特征分组,以提高回归模型的预测能力。具体来说,确定与响应变量显著相关的一组预测因子可以简化进一步的下游分析和决策。本文提出了一种新的数据分析方法,即利用高维预测因子空间来构建一个隐式网络,通过加权边%和边上的权重来识别响应与预测因子之间的显著关联。使用通过网络宽度计量学定义的预测因子组群模型,提出了一种新的监督分组算法来确定正确的组群,当样本量发散到无限大时,概率趋向于1。为此,我们建立了网络度量估计值的几个理论属性。我们还开发了一种新颖的模型辅助引导程序,它大大降低了计算复杂度,便于评估全网度量估计值的不确定性。所提出的方法解决了高维数据环境中出现的几个难题,包括:(i) 大量预测因子;(ii) 真实统计模型的不确定性;(iii) 模型选择的可变性。通过数值实验、体育分析数据和乳腺癌数据,展示了所提方法的性能。
{"title":"Grouping predictors via network-wide metrics","authors":"Brandon Woosuk Park, Anand N. Vidyashankar, Tucker S. McElroy","doi":"arxiv-2405.02715","DOIUrl":"https://doi.org/arxiv-2405.02715","url":null,"abstract":"When multitudes of features can plausibly be associated with a response, both\u0000privacy considerations and model parsimony suggest grouping them to increase\u0000the predictive power of a regression model. Specifically, the identification of\u0000groups of predictors significantly associated with the response variable eases\u0000further downstream analysis and decision-making. This paper proposes a new data\u0000analysis methodology that utilizes the high-dimensional predictor space to\u0000construct an implicit network with weighted edges %and weights on the edges to\u0000identify significant associations between the response and the predictors.\u0000Using a population model for groups of predictors defined via network-wide\u0000metrics, a new supervised grouping algorithm is proposed to determine the\u0000correct group, with probability tending to one as the sample size diverges to\u0000infinity. For this reason, we establish several theoretical properties of the\u0000estimates of network-wide metrics. A novel model-assisted bootstrap procedure\u0000that substantially decreases computational complexity is developed,\u0000facilitating the assessment of uncertainty in the estimates of network-wide\u0000metrics. The proposed methods account for several challenges that arise in the\u0000high-dimensional data setting, including (i) a large number of predictors, (ii)\u0000uncertainty regarding the true statistical model, and (iii) model selection\u0000variability. The performance of the proposed methods is demonstrated through\u0000numerical experiments, data from sports analytics, and breast cancer data.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - MATH - Statistics Theory
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1