首页 > 最新文献

arXiv - STAT - Statistics Theory最新文献

英文 中文
Likelihood Geometry of the Squared Grassmannian 平方格拉斯曼的似然几何
Pub Date : 2024-09-05 DOI: arxiv-2409.03730
Hannah Friedman
We study projection determinantal point processes and their connection to thesquared Grassmannian. We prove that the log-likelihood function of thisstatistical model has $(n - 1)!/2$ critical points, all of which are real andpositive, thereby settling a conjecture of Devriendt, Friedman, Reinke, andSturmfels.
我们研究了投影行列式点过程及其与平方格拉斯曼的联系。我们证明了这一统计模型的对数似然函数有 $(n - 1)!/2$ 个临界点,所有临界点都是实数和正数,从而解决了 Devriendt、Friedman、Reinke 和 Sturmfels 的一个猜想。
{"title":"Likelihood Geometry of the Squared Grassmannian","authors":"Hannah Friedman","doi":"arxiv-2409.03730","DOIUrl":"https://doi.org/arxiv-2409.03730","url":null,"abstract":"We study projection determinantal point processes and their connection to the\u0000squared Grassmannian. We prove that the log-likelihood function of this\u0000statistical model has $(n - 1)!/2$ critical points, all of which are real and\u0000positive, thereby settling a conjecture of Devriendt, Friedman, Reinke, and\u0000Sturmfels.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Error bounds of Median-of-means estimators with VC-dimension 具有 VC 维度的均值中值估计器的误差边界
Pub Date : 2024-09-05 DOI: arxiv-2409.03410
Yuxuan Wang, Yiming Chen, Hanchao Wang, Lixin Zhang
We obtain the upper error bounds of robust estimators for mean vector, usingthe median-of-means (MOM) method. The method is designed to handle data withheavy tails and contamination, with only a finite second moment, which isweaker than many others, relying on the VC dimension rather than the Rademachercomplexity to measure statistical complexity. This allows us to implement MOMin covariance estimation, without imposing conditions such as $L$-sub-Gaussianor $L_{4}-L_{2}$ norm equivalence. In particular, we derive a new robustestimator, the MOM version of the halfspace depth, along with error bounds formean estimation in any norm.
我们利用均值中值(MOM)方法获得了均值向量稳健估计器的误差上限。该方法设计用于处理尾部和污染严重的数据,只有有限的第二矩,比许多其他方法更弱,依靠 VC 维度而不是拉德马赫复杂性来衡量统计复杂性。这样,我们就可以实现 MOMin 协方差估计,而无需强加诸如 $L$-sub-Gaussian 或 $L_{4}-L_{2}$ norm equivalence 等条件。特别是,我们推导出了一种新的稳健估计器--半空间深度的 MOM 版本,以及在任何规范下进行均值估计的误差边界。
{"title":"Error bounds of Median-of-means estimators with VC-dimension","authors":"Yuxuan Wang, Yiming Chen, Hanchao Wang, Lixin Zhang","doi":"arxiv-2409.03410","DOIUrl":"https://doi.org/arxiv-2409.03410","url":null,"abstract":"We obtain the upper error bounds of robust estimators for mean vector, using\u0000the median-of-means (MOM) method. The method is designed to handle data with\u0000heavy tails and contamination, with only a finite second moment, which is\u0000weaker than many others, relying on the VC dimension rather than the Rademacher\u0000complexity to measure statistical complexity. This allows us to implement MOM\u0000in covariance estimation, without imposing conditions such as $L$-sub-Gaussian\u0000or $L_{4}-L_{2}$ norm equivalence. In particular, we derive a new robust\u0000estimator, the MOM version of the halfspace depth, along with error bounds for\u0000mean estimation in any norm.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"61 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Geometry and Well-Posedness of Sparse Regularized Linear Regression 稀疏正则化线性回归的几何学和良好假设性
Pub Date : 2024-09-05 DOI: arxiv-2409.03461
Jasper Marijn Everink, Yiqiu Dong, Martin Skovgaard Andersen
In this work, we study the well-posedness of certain sparse regularizedlinear regression problems, i.e., the existence, uniqueness and continuity ofthe solution map with respect to the data. We focus on regularization functionsthat are convex piecewise linear, i.e., whose epigraph is polyhedral. Thisincludes total variation on graphs and polyhedral constraints. We provide ageometric framework for these functions based on their connection to polyhedralsets and apply this to the study of the well-posedness of the correspondingsparse regularized linear regression problem. Particularly, we providegeometric conditions for well-posedness of the regression problem, comparethese conditions to those for smooth regularization, and show the computationaldifficulty of verifying these conditions.
在这项工作中,我们研究了某些稀疏正则化线性回归问题的良好提出性,即相对于数据的解图的存在性、唯一性和连续性。我们重点研究凸片面线性的正则化函数,即其外延为多面体的正则化函数。这包括图形上的总变化和多面体约束。我们根据这些函数与多面体集的联系,为它们提供了计量学框架,并将其应用于相应的解析正则化线性回归问题的好拟性研究。特别是,我们为回归问题的良好拟合提供了几何条件,将这些条件与平滑正则化的条件进行了比较,并展示了验证这些条件的计算难度。
{"title":"The Geometry and Well-Posedness of Sparse Regularized Linear Regression","authors":"Jasper Marijn Everink, Yiqiu Dong, Martin Skovgaard Andersen","doi":"arxiv-2409.03461","DOIUrl":"https://doi.org/arxiv-2409.03461","url":null,"abstract":"In this work, we study the well-posedness of certain sparse regularized\u0000linear regression problems, i.e., the existence, uniqueness and continuity of\u0000the solution map with respect to the data. We focus on regularization functions\u0000that are convex piecewise linear, i.e., whose epigraph is polyhedral. This\u0000includes total variation on graphs and polyhedral constraints. We provide a\u0000geometric framework for these functions based on their connection to polyhedral\u0000sets and apply this to the study of the well-posedness of the corresponding\u0000sparse regularized linear regression problem. Particularly, we provide\u0000geometric conditions for well-posedness of the regression problem, compare\u0000these conditions to those for smooth regularization, and show the computational\u0000difficulty of verifying these conditions.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Convergence Rates for the Maximum A Posteriori Estimator in PDE-Regression Models with Random Design 具有随机设计的 PDE 回归模型中最大后验估计器的收敛率
Pub Date : 2024-09-05 DOI: arxiv-2409.03417
Maximilian Siebel
We consider the statistical inverse problem of recovering a parameter$thetain H^alpha$ from data arising from the Gaussian regression problembegin{equation*} Y = mathscr{G}(theta)(Z)+varepsilon end{equation*} with nonlinear forwardmap $mathscr{G}:mathbb{L}^2tomathbb{L}^2$, random design points $Z$ andGaussian noise $varepsilon$. The estimation strategy is based on a leastsquares approach under $VertcdotVert_{H^alpha}$-constraints. We establishthe existence of a least squares estimator $hat{theta}$ as a maximizer for agiven functional under Lipschitz-type assumptions on the forward map$mathscr{G}$. A general concentration result is shown, which is used to proveconsistency and upper bounds for the prediction error. The corresponding ratesof convergence reflect not only the smoothness of the parameter of interest butalso the ill-posedness of the underlying inverse problem. We apply the generalmodel to the Darcy problem, where the recovery of an unknown coefficientfunction $f$ of a PDE is of interest. For this example, we also providecorresponding rates of convergence for the prediction and estimation errors.Additionally, we briefly discuss the applicability of the general model toother problems.
我们考虑从高斯回归问题产生的数据中恢复 H^alpha 中的参数的统计逆问题。Y = mathscr{G}(theta)(Z)+varepsilon end{equation*} 具有非线性前向映射 $mathscr{G}:mathbb{L}^2tomathbb{L}^2$, 随机设计点 $Z$ 和高斯噪声 $varepsilon$.估计策略基于 $VertcdotVert_{H^alpha}$ 约束下的最小二乘法。我们建立了最小二乘估计器 $hat{theta}$ 的存在性,它是前向映射 $mathscr{G}$ 的 Lipschitz 型假设下给定函数的最大化。显示了一个一般的集中结果,它被用来证明预测误差的一致性和上限。相应的收敛率不仅反映了相关参数的平滑性,也反映了基本逆问题的拟合不良性。我们将一般模型应用于达西问题,在达西问题中,我们关注的是恢复一个 PDE 的未知系数函数 $f$。此外,我们还简要讨论了一般模型对其他问题的适用性。
{"title":"Convergence Rates for the Maximum A Posteriori Estimator in PDE-Regression Models with Random Design","authors":"Maximilian Siebel","doi":"arxiv-2409.03417","DOIUrl":"https://doi.org/arxiv-2409.03417","url":null,"abstract":"We consider the statistical inverse problem of recovering a parameter\u0000$thetain H^alpha$ from data arising from the Gaussian regression problem\u0000begin{equation*} Y = mathscr{G}(theta)(Z)+varepsilon end{equation*} with nonlinear forward\u0000map $mathscr{G}:mathbb{L}^2tomathbb{L}^2$, random design points $Z$ and\u0000Gaussian noise $varepsilon$. The estimation strategy is based on a least\u0000squares approach under $VertcdotVert_{H^alpha}$-constraints. We establish\u0000the existence of a least squares estimator $hat{theta}$ as a maximizer for a\u0000given functional under Lipschitz-type assumptions on the forward map\u0000$mathscr{G}$. A general concentration result is shown, which is used to prove\u0000consistency and upper bounds for the prediction error. The corresponding rates\u0000of convergence reflect not only the smoothness of the parameter of interest but\u0000also the ill-posedness of the underlying inverse problem. We apply the general\u0000model to the Darcy problem, where the recovery of an unknown coefficient\u0000function $f$ of a PDE is of interest. For this example, we also provide\u0000corresponding rates of convergence for the prediction and estimation errors.\u0000Additionally, we briefly discuss the applicability of the general model to\u0000other problems.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"68 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bulk Spectra of Truncated Sample Covariance Matrices 截断样本协方差矩阵的总体频谱
Pub Date : 2024-09-04 DOI: arxiv-2409.02911
Subhroshekhar Ghosh, Soumendu Sundar Mukherjee, Himasish Talukdar
Determinantal Point Processes (DPPs), which originate from quantum andstatistical physics, are known for modelling diversity. Recent research [Ghoshand Rigollet (2020)] has demonstrated that certain matrix-valued $U$-statistics(that are truncated versions of the usual sample covariance matrix) caneffectively estimate parameters in the context of Gaussian DPPs and enhancedimension reduction techniques, outperforming standard methods like PCA inclustering applications. This paper explores the spectral properties of thesematrix-valued $U$-statistics in the null setting of an isotropic design. Thesematrices may be represented as $X L X^top$, where $X$ is a data matrix and $L$is the Laplacian matrix of a random geometric graph associated to $X$. The mainmathematically interesting twist here is that the matrix $L$ is dependent on$X$. We give complete descriptions of the bulk spectra of these matrix-valued$U$-statistics in terms of the Stieltjes transforms of their empirical spectralmeasures. The results and the techniques are in fact able to address a broaderclass of kernelised random matrices, connecting their limiting spectra togeneralised Marv{c}enko-Pastur laws and free probability.
确定点过程(DPPs)源于量子物理学和统计物理学,以建模多样性而著称。最近的研究[Ghoshand Rigollet (2020)]证明,某些矩阵值 $U$统计量(通常是样本协方差矩阵的截断版本)可以有效地估计高斯 DPPs 和增强维度缩减技术中的参数,其性能优于 PCA 倾斜应用等标准方法。本文探讨了在各向同性设计的空设置中,这些矩阵值 $U$ 统计量的频谱特性。这些矩阵可以表示为 $X L X^top$,其中 $X$ 是数据矩阵,$L$ 是与 $X$ 相关的随机几何图的拉普拉斯矩阵。这里在数学上最有趣的转折是矩阵 $L$ 与 $X$ 有关。我们根据这些矩阵值$U$统计量的经验光谱度量的斯蒂尔杰斯变换,给出了这些矩阵值$U$统计量的体谱的完整描述。事实上,这些结果和技术能够处理更广泛的核化随机矩阵,将它们的极限谱与广义的 Marv{c}enko-Pastur 规律和自由概率联系起来。
{"title":"Bulk Spectra of Truncated Sample Covariance Matrices","authors":"Subhroshekhar Ghosh, Soumendu Sundar Mukherjee, Himasish Talukdar","doi":"arxiv-2409.02911","DOIUrl":"https://doi.org/arxiv-2409.02911","url":null,"abstract":"Determinantal Point Processes (DPPs), which originate from quantum and\u0000statistical physics, are known for modelling diversity. Recent research [Ghosh\u0000and Rigollet (2020)] has demonstrated that certain matrix-valued $U$-statistics\u0000(that are truncated versions of the usual sample covariance matrix) can\u0000effectively estimate parameters in the context of Gaussian DPPs and enhance\u0000dimension reduction techniques, outperforming standard methods like PCA in\u0000clustering applications. This paper explores the spectral properties of these\u0000matrix-valued $U$-statistics in the null setting of an isotropic design. These\u0000matrices may be represented as $X L X^top$, where $X$ is a data matrix and $L$\u0000is the Laplacian matrix of a random geometric graph associated to $X$. The main\u0000mathematically interesting twist here is that the matrix $L$ is dependent on\u0000$X$. We give complete descriptions of the bulk spectra of these matrix-valued\u0000$U$-statistics in terms of the Stieltjes transforms of their empirical spectral\u0000measures. The results and the techniques are in fact able to address a broader\u0000class of kernelised random matrices, connecting their limiting spectra to\u0000generalised Marv{c}enko-Pastur laws and free probability.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Smoothed Robust Phase Retrieval 平滑稳健相位检索
Pub Date : 2024-09-03 DOI: arxiv-2409.01570
Zhong Zheng, Lingzhou Xue
The phase retrieval problem in the presence of noise aims to recover thesignal vector of interest from a set of quadratic measurements with infrequentbut arbitrary corruptions, and it plays an important role in many scientificapplications. However, the essential geometric structure of the nonconvexrobust phase retrieval based on the $ell_1$-loss is largely unknown to studyspurious local solutions, even under the ideal noiseless setting, and itsintrinsic nonsmooth nature also impacts the efficiency of optimizationalgorithms. This paper introduces the smoothed robust phase retrieval (SRPR)based on a family of convolution-type smoothed loss functions. Theoretically,we prove that the SRPR enjoys a benign geometric structure with highprobability: (1) under the noiseless situation, the SRPR has no spurious localsolutions, and the target signals are global solutions, and (2) under theinfrequent but arbitrary corruptions, we characterize the stationary points ofthe SRPR and prove its benign landscape, which is the first landscape analysisof phase retrieval with corruption in the literature. Moreover, we prove thelocal linear convergence rate of gradient descent for solving the SRPR underthe noiseless situation. Experiments on both simulated datasets and imagerecovery are provided to demonstrate the numerical performance of the SRPR.
存在噪声时的相位检索问题旨在从一组具有不频繁但任意破坏的二次测量中恢复感兴趣的信号矢量,它在许多科学应用中发挥着重要作用。然而,基于 $ell_1$-loss 的非凸稳健相位检索的基本几何结构在很大程度上不为人所知,即使在理想的无噪声环境下也无法研究出虚假的局部解,而且其固有的非光滑性质也影响了优化算法的效率。本文介绍了基于卷积型平滑损失函数族的平滑鲁棒相位检索(SRPR)。理论上,我们证明了 SRPR 具有高概率的良性几何结构:(1) 在无噪声情况下,SRPR 没有虚假局部,目标信号是全局解;(2) 在不频繁但任意的损坏情况下,我们描述了 SRPR 的静止点并证明了其良性景观,这是文献中首次对有损坏的相位检索进行景观分析。此外,我们还证明了无噪声情况下梯度下降求解 SRPR 的局部线性收敛率。我们还提供了模拟数据集和图像复原实验,以证明 SRPR 的数值性能。
{"title":"Smoothed Robust Phase Retrieval","authors":"Zhong Zheng, Lingzhou Xue","doi":"arxiv-2409.01570","DOIUrl":"https://doi.org/arxiv-2409.01570","url":null,"abstract":"The phase retrieval problem in the presence of noise aims to recover the\u0000signal vector of interest from a set of quadratic measurements with infrequent\u0000but arbitrary corruptions, and it plays an important role in many scientific\u0000applications. However, the essential geometric structure of the nonconvex\u0000robust phase retrieval based on the $ell_1$-loss is largely unknown to study\u0000spurious local solutions, even under the ideal noiseless setting, and its\u0000intrinsic nonsmooth nature also impacts the efficiency of optimization\u0000algorithms. This paper introduces the smoothed robust phase retrieval (SRPR)\u0000based on a family of convolution-type smoothed loss functions. Theoretically,\u0000we prove that the SRPR enjoys a benign geometric structure with high\u0000probability: (1) under the noiseless situation, the SRPR has no spurious local\u0000solutions, and the target signals are global solutions, and (2) under the\u0000infrequent but arbitrary corruptions, we characterize the stationary points of\u0000the SRPR and prove its benign landscape, which is the first landscape analysis\u0000of phase retrieval with corruption in the literature. Moreover, we prove the\u0000local linear convergence rate of gradient descent for solving the SRPR under\u0000the noiseless situation. Experiments on both simulated datasets and image\u0000recovery are provided to demonstrate the numerical performance of the SRPR.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"82 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Demystified: double robustness with nuisance parameters estimated at rate n-to-the-1/4 解密:以 n 比 1/4 的速率估算滋扰参数的双重稳健性
Pub Date : 2024-09-03 DOI: arxiv-2409.02320
Judith J. Lok
Have you also been wondering what is this thing with double robustness andnuisance parameters estimated at rate n^(1/4)? It turns out that to understandthis phenomenon one just needs the Middle Value Theorem (or a Taylor expansion)and some smoothness conditions. This note explains why under some fairly simpleconditions, as long as the nuisance parameter theta in R^k is estimated at raten^(1/4) or faster, 1. the resulting variance of the estimator of the parameterof interest psi in R^d does not depend on how the nuisance parameter theta isestimated, and 2. the sandwich estimator of the variance of psi-hat ignoringestimation of theta is consistent.
你是否也想知道双重稳健性和以 n^(1/4) 速率估计的滋扰参数是怎么回事?事实证明,要理解这一现象,只需要中值定理(或泰勒展开式)和一些平稳性条件。本说明解释了为什么在一些相当简单的条件下,只要 R^k 中的滋扰参数 theta 是以 n^(1/4) 或更快的速度估计的,1.R^d 中感兴趣的参数 psi 的估计值的方差就不取决于滋扰参数 theta 是如何估计的,2.忽略了对 theta 的估计的 psi-hat方差的三明治估计值是一致的。
{"title":"Demystified: double robustness with nuisance parameters estimated at rate n-to-the-1/4","authors":"Judith J. Lok","doi":"arxiv-2409.02320","DOIUrl":"https://doi.org/arxiv-2409.02320","url":null,"abstract":"Have you also been wondering what is this thing with double robustness and\u0000nuisance parameters estimated at rate n^(1/4)? It turns out that to understand\u0000this phenomenon one just needs the Middle Value Theorem (or a Taylor expansion)\u0000and some smoothness conditions. This note explains why under some fairly simple\u0000conditions, as long as the nuisance parameter theta in R^k is estimated at rate\u0000n^(1/4) or faster, 1. the resulting variance of the estimator of the parameter\u0000of interest psi in R^d does not depend on how the nuisance parameter theta is\u0000estimated, and 2. the sandwich estimator of the variance of psi-hat ignoring\u0000estimation of theta is consistent.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142193000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deconvolution of repeated measurements corrupted by unknown noise 对受未知噪声干扰的重复测量进行解卷积
Pub Date : 2024-09-03 DOI: arxiv-2409.02014
Jérémie Capitao-Miniconi, Elisabeth Gassiat, Luc Lehéricy
Recent advances have demonstrated the possibility of solving thedeconvolution problem without prior knowledge of the noise distribution. Inthis paper, we study the repeated measurements model, where information isderived from multiple measurements of X perturbed independently by additiveerrors. Our contributions include establishing identifiability without anyassumption on the noise except for coordinate independence. We propose anestimator of the density of the signal for which we provide rates ofconvergence, and prove that it reaches the minimax rate in the case where thesupport of the signal is compact. Additionally, we propose a model selectionprocedure for adaptive estimation. Numerical simulations demonstrate theeffectiveness of our approach even with limited sample sizes.
最近的研究进展证明,在不预先知道噪声分布的情况下,也有可能解决解卷积问题。在本文中,我们研究了重复测量模型,该模型中的信息来自于受到加性干扰独立扰动的 X 的多次测量。我们的贡献包括:除了坐标独立性之外,在不对噪声做任何假设的情况下建立了可识别性。我们提出了一种信号密度的估计方法,并提供了收敛率,证明在信号支持紧凑的情况下,它能达到最小收敛率。此外,我们还提出了自适应估计的模型选择程序。数值模拟证明了我们的方法即使在样本量有限的情况下也是有效的。
{"title":"Deconvolution of repeated measurements corrupted by unknown noise","authors":"Jérémie Capitao-Miniconi, Elisabeth Gassiat, Luc Lehéricy","doi":"arxiv-2409.02014","DOIUrl":"https://doi.org/arxiv-2409.02014","url":null,"abstract":"Recent advances have demonstrated the possibility of solving the\u0000deconvolution problem without prior knowledge of the noise distribution. In\u0000this paper, we study the repeated measurements model, where information is\u0000derived from multiple measurements of X perturbed independently by additive\u0000errors. Our contributions include establishing identifiability without any\u0000assumption on the noise except for coordinate independence. We propose an\u0000estimator of the density of the signal for which we provide rates of\u0000convergence, and prove that it reaches the minimax rate in the case where the\u0000support of the signal is compact. Additionally, we propose a model selection\u0000procedure for adaptive estimation. Numerical simulations demonstrate the\u0000effectiveness of our approach even with limited sample sizes.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A sparse PAC-Bayesian approach for high-dimensional quantile prediction 用于高维量化预测的稀疏 PAC-Bayesian 方法
Pub Date : 2024-09-03 DOI: arxiv-2409.01687
The Tien Mai
Quantile regression, a robust method for estimating conditional quantiles,has advanced significantly in fields such as econometrics, statistics, andmachine learning. In high-dimensional settings, where the number of covariatesexceeds sample size, penalized methods like lasso have been developed toaddress sparsity challenges. Bayesian methods, initially connected to quantileregression via the asymmetric Laplace likelihood, have also evolved, thoughissues with posterior variance have led to new approaches, includingpseudo/score likelihoods. This paper presents a novel probabilistic machinelearning approach for high-dimensional quantile prediction. It uses apseudo-Bayesian framework with a scaled Student-t prior and Langevin MonteCarlo for efficient computation. The method demonstrates strong theoreticalguarantees, through PAC-Bayes bounds, that establish non-asymptotic oracleinequalities, showing minimax-optimal prediction error and adaptability tounknown sparsity. Its effectiveness is validated through simulations andreal-world data, where it performs competitively against establishedfrequentist and Bayesian techniques.
量子回归是一种用于估计条件量值的稳健方法,在计量经济学、统计学和机器学习等领域取得了长足的进步。在高维环境中,协方差的数量超过了样本大小,为了解决稀疏性难题,人们开发了拉索(lasso)等惩罚性方法。贝叶斯方法最初是通过非对称拉普拉斯似然与量子回归联系在一起的,现在也得到了发展,不过后验方差的问题导致了新方法的出现,包括伪似然/分数似然。本文介绍了一种用于高维量化预测的新型概率机器学习方法。该方法采用伪贝叶斯框架,带有按比例的 Student-t 先验和用于高效计算的 Langevin MonteCarlo。该方法通过 PAC-Bayes 边界提供了强有力的理论保证,建立了非渐近的甲骨文方程,显示了最小最优预测误差和对已知稀疏性的适应性。它的有效性通过模拟和现实世界的数据得到了验证,在这些数据中,它的表现与已有的频率主义和贝叶斯技术相比具有竞争力。
{"title":"A sparse PAC-Bayesian approach for high-dimensional quantile prediction","authors":"The Tien Mai","doi":"arxiv-2409.01687","DOIUrl":"https://doi.org/arxiv-2409.01687","url":null,"abstract":"Quantile regression, a robust method for estimating conditional quantiles,\u0000has advanced significantly in fields such as econometrics, statistics, and\u0000machine learning. In high-dimensional settings, where the number of covariates\u0000exceeds sample size, penalized methods like lasso have been developed to\u0000address sparsity challenges. Bayesian methods, initially connected to quantile\u0000regression via the asymmetric Laplace likelihood, have also evolved, though\u0000issues with posterior variance have led to new approaches, including\u0000pseudo/score likelihoods. This paper presents a novel probabilistic machine\u0000learning approach for high-dimensional quantile prediction. It uses a\u0000pseudo-Bayesian framework with a scaled Student-t prior and Langevin Monte\u0000Carlo for efficient computation. The method demonstrates strong theoretical\u0000guarantees, through PAC-Bayes bounds, that establish non-asymptotic oracle\u0000inequalities, showing minimax-optimal prediction error and adaptability to\u0000unknown sparsity. Its effectiveness is validated through simulations and\u0000real-world data, where it performs competitively against established\u0000frequentist and Bayesian techniques.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Personalized and uncertainty-aware coronary hemodynamics simulations: From Bayesian estimation to improved multi-fidelity uncertainty quantification 个性化和不确定性感知的冠状动脉血流动力学模拟:从贝叶斯估计到改进的多保真度不确定性量化
Pub Date : 2024-09-03 DOI: arxiv-2409.02247
Karthik Menon, Andrea Zanoni, Owais Khan, Gianluca Geraci, Koen Nieman, Daniele E. Schiavazzi, Alison L. Marsden
Simulations of coronary hemodynamics have improved non-invasive clinical riskstratification and treatment outcomes for coronary artery disease, compared torelying on anatomical imaging alone. However, simulations typically useempirical approaches to distribute total coronary flow amongst the arteries inthe coronary tree. This ignores patient variability, the presence of disease,and other clinical factors. Further, uncertainty in the clinical data oftenremains unaccounted for in the modeling pipeline. We present an end-to-enduncertainty-aware pipeline to (1) personalize coronary flow simulations byincorporating branch-specific coronary flows as well as cardiac function; and(2) predict clinical and biomechanical quantities of interest with improvedprecision, while accounting for uncertainty in the clinical data. We assimilatepatient-specific measurements of myocardial blood flow from CT myocardialperfusion imaging to estimate branch-specific coronary flows. We use adaptiveMarkov Chain Monte Carlo sampling to estimate the joint posterior distributionsof model parameters with simulated noise in the clinical data. Additionally, wedetermine the posterior predictive distribution for relevant quantities ofinterest using a new approach combining multi-fidelity Monte Carlo estimationwith non-linear, data-driven dimensionality reduction. Our frameworkrecapitulates clinically measured cardiac function as well as branch-specificcoronary flows under measurement uncertainty. We substantially shrink theconfidence intervals for estimated quantities of interest compared tosingle-fidelity and state-of-the-art multi-fidelity Monte Carlo methods. Thisis especially true for quantities that showed limited correlation between thelow- and high-fidelity model predictions. Moreover, the proposed estimators aresignificantly cheaper to compute for a specified confidence level or variance.
与仅依靠解剖成像相比,冠状动脉血流动力学模拟改善了冠状动脉疾病的无创临床风险分级和治疗效果。然而,模拟通常使用经验方法在冠状动脉树中分配冠状动脉总流量。这忽略了患者的可变性、疾病的存在以及其他临床因素。此外,在建模过程中,临床数据的不确定性往往没有考虑在内。我们提出了一种端到端不确定性感知管道,以便:(1) 通过纳入特定冠状动脉分支的血流以及心脏功能,对冠状动脉血流进行个性化模拟;(2) 在考虑临床数据不确定性的同时,以更高的精度预测临床和生物力学相关量。我们从 CT 心肌灌注成像中同化了特定患者的心肌血流测量数据,以估算特定分支的冠状动脉血流。我们使用自适应马尔可夫链蒙特卡洛抽样来估计模型参数的联合后验分布,并模拟临床数据中的噪声。此外,我们还采用一种新方法,将多保真度蒙特卡罗估计与非线性、数据驱动的降维相结合,确定了相关感兴趣量的后验预测分布。我们的框架重现了临床测量的心脏功能,以及在测量不确定性条件下的特异性冠状动脉血流。与单保真度蒙特卡罗方法和最先进的多保真度蒙特卡罗方法相比,我们大幅缩小了相关估计量的置信区间。这对于低保真和高保真模型预测之间相关性有限的量来说尤其如此。此外,对于指定的置信度或方差,所提出的估计器的计算成本明显更低。
{"title":"Personalized and uncertainty-aware coronary hemodynamics simulations: From Bayesian estimation to improved multi-fidelity uncertainty quantification","authors":"Karthik Menon, Andrea Zanoni, Owais Khan, Gianluca Geraci, Koen Nieman, Daniele E. Schiavazzi, Alison L. Marsden","doi":"arxiv-2409.02247","DOIUrl":"https://doi.org/arxiv-2409.02247","url":null,"abstract":"Simulations of coronary hemodynamics have improved non-invasive clinical risk\u0000stratification and treatment outcomes for coronary artery disease, compared to\u0000relying on anatomical imaging alone. However, simulations typically use\u0000empirical approaches to distribute total coronary flow amongst the arteries in\u0000the coronary tree. This ignores patient variability, the presence of disease,\u0000and other clinical factors. Further, uncertainty in the clinical data often\u0000remains unaccounted for in the modeling pipeline. We present an end-to-end\u0000uncertainty-aware pipeline to (1) personalize coronary flow simulations by\u0000incorporating branch-specific coronary flows as well as cardiac function; and\u0000(2) predict clinical and biomechanical quantities of interest with improved\u0000precision, while accounting for uncertainty in the clinical data. We assimilate\u0000patient-specific measurements of myocardial blood flow from CT myocardial\u0000perfusion imaging to estimate branch-specific coronary flows. We use adaptive\u0000Markov Chain Monte Carlo sampling to estimate the joint posterior distributions\u0000of model parameters with simulated noise in the clinical data. Additionally, we\u0000determine the posterior predictive distribution for relevant quantities of\u0000interest using a new approach combining multi-fidelity Monte Carlo estimation\u0000with non-linear, data-driven dimensionality reduction. Our framework\u0000recapitulates clinically measured cardiac function as well as branch-specific\u0000coronary flows under measurement uncertainty. We substantially shrink the\u0000confidence intervals for estimated quantities of interest compared to\u0000single-fidelity and state-of-the-art multi-fidelity Monte Carlo methods. This\u0000is especially true for quantities that showed limited correlation between the\u0000low- and high-fidelity model predictions. Moreover, the proposed estimators are\u0000significantly cheaper to compute for a specified confidence level or variance.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - STAT - Statistics Theory
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1