Information and Inference-A Journal of the Ima最新文献

英文中文

Robust and resource efficient identification of shallow neural networks by fewest samples 基于最少样本的浅层神经网络鲁棒有效识别

IF 1.6 4区数学 Q2 MATHEMATICS, APPLIED

Information and Inference-A Journal of the Ima

Pub Date : 2020-10-01 DOI: 10.1093/imaiai/iaaa036

Massimo Fornasier;Jan Vybíral;Ingrid Daubechies

We address the structure identification and the uniform approximation of sums of ridge functions $f(x)=sum _{i=1}^m g_i(langle a_i,xrangle )$ on ${mathbb{R}}^d$, representing a general form of a shallow feed-forward neural network, from a small number of query samples. Higher order differentiation, as used in our constructive approximations, of sums of ridge functions or of their compositions, as in deeper neural network, yields a natural connection between neural network weight identification and tensor product decomposition identification. In the case of the shallowest feed-forward neural network, second-order differentiation and tensors of order two (i.e., matrices) suffice as we prove in this paper. We use two sampling schemes to perform approximate differentiation—active sampling, where the sampling points are universal, actively and randomly designed, and passive sampling, where sampling points were preselected at random from a distribution with known density. Based on multiple gathered approximated first- and second-order differentials, our general approximation strategy is developed as a sequence of algorithms to perform individual sub-tasks. We first perform an active subspace search by approximating the span of the weight vectors $a_1,dots ,a_m$. Then we use a straightforward substitution, which reduces the dimensionality of the problem from $d$ to $m$. The core of the construction is then the stable and efficient approximation of weights expressed in terms of rank-$1$ matrices $a_i otimes a_i$, realized by formulating their individual identification as a suitable nonlinear program. We prove the successful identification by this program of weight vectors being close to orthonormal and we also show how we can constructively reduce to this case by a whitening procedure, without loss of any generality. We finally discuss the implementation and the performance of the proposed algorithmic pipeline with extensive numerical experiments, which illustrate and confirm the theoretical results.

我们从少量的查询样本中讨论了脊函数$f（x）=sum_｛i=1｝^m g_i（langle a_i，xrangle）$在${mathbb｛R｝^d$上的和的结构识别和一致逼近，该函数表示浅层前馈神经网络的一般形式。在我们的构造近似中，脊函数的和或其组成的高阶微分，如在更深的神经网络中，在神经网络权重识别和张量积分解识别之间产生了自然的联系。在最浅的前馈神经网络的情况下，正如我们在本文中证明的那样，二阶微分和二阶张量（即矩阵）就足够了。我们使用两种采样方案来执行近似微分——主动采样，其中采样点是通用的、主动和随机设计的；被动采样，其中从已知密度的分布中随机预选采样点。基于多个集合的近似一阶和二阶微分，我们的通用近似策略被开发为执行单个子任务的一系列算法。我们首先通过近似权重向量$a_1，dots，a_m$的跨度来执行主动子空间搜索。然后我们使用一个直接的替换，它将问题的维数从$d$降低到$m$。该结构的核心是用秩-$1$矩阵$a_iotimes a_i$表示的权重的稳定有效近似，通过将它们的个体识别公式化为合适的非线性程序来实现。我们证明了通过该程序成功地识别了接近正交的权重向量，我们还展示了如何通过白化程序建设性地减少到这种情况，而不损失任何通用性。最后，我们通过大量的数值实验讨论了所提出的算法流水线的实现和性能，这些实验说明并证实了理论结果。

{"title":"Robust and resource efficient identification of shallow neural networks by fewest samples","authors":"Massimo Fornasier;Jan Vybíral;Ingrid Daubechies","doi":"10.1093/imaiai/iaaa036","DOIUrl":"https://doi.org/10.1093/imaiai/iaaa036","url":null,"abstract":"We address the structure identification and the uniform approximation of sums of ridge functions \u0000<tex>$f(x)=sum _{i=1}^m g_i(langle a_i,xrangle )$</tex>\u0000 on \u0000<tex>${mathbb{R}}^d$</tex>\u0000, representing a general form of a shallow feed-forward neural network, from a small number of query samples. Higher order differentiation, as used in our constructive approximations, of sums of ridge functions or of their compositions, as in deeper neural network, yields a natural connection between neural network weight identification and tensor product decomposition identification. In the case of the shallowest feed-forward neural network, second-order differentiation and tensors of order two (i.e., matrices) suffice as we prove in this paper. We use two sampling schemes to perform approximate differentiation—active sampling, where the sampling points are universal, actively and randomly designed, and passive sampling, where sampling points were preselected at random from a distribution with known density. Based on multiple gathered approximated first- and second-order differentials, our general approximation strategy is developed as a sequence of algorithms to perform individual sub-tasks. We first perform an active subspace search by approximating the span of the weight vectors \u0000<tex>$a_1,dots ,a_m$</tex>\u0000. Then we use a straightforward substitution, which reduces the dimensionality of the problem from \u0000<tex>$d$</tex>\u0000 to \u0000<tex>$m$</tex>\u0000. The core of the construction is then the stable and efficient approximation of weights expressed in terms of rank-\u0000<tex>$1$</tex>\u0000 matrices \u0000<tex>$a_i otimes a_i$</tex>\u0000, realized by formulating their individual identification as a suitable nonlinear program. We prove the successful identification by this program of weight vectors being close to orthonormal and we also show how we can constructively reduce to this case by a whitening procedure, without loss of any generality. We finally discuss the implementation and the performance of the proposed algorithmic pipeline with extensive numerical experiments, which illustrate and confirm the theoretical results.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"10 1","pages":"625-695"},"PeriodicalIF":1.6,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/imaiai/iaaa036","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50262520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Sensitivity of ℓ1 minimization to parameter choice 的灵敏度ℓ1参数选择的最小化

IF 1.6 4区数学 Q2 MATHEMATICS, APPLIED

Information and Inference-A Journal of the Ima

Pub Date : 2020-10-01 DOI: 10.1093/imaiai/iaaa014

Aaron Berk;Yaniv Plan;Özgür Yilmaz

The use of generalized Lasso is a common technique for recovery of structured high-dimensional signals. There are three common formulations of generalized Lasso; each program has a governing parameter whose optimal value depends on properties of the data. At this optimal value, compressed sensing theory explains why Lasso programs recover structured high-dimensional signals with minimax order-optimal error. Unfortunately in practice, the optimal choice is generally unknown and must be estimated. Thus, we investigate stability of each of the three Lasso programs with respect to its governing parameter. Our goal is to aid the practitioner in answering the following question: given real data, which Lasso program should be used? We take a step towards answering this by analysing the case where the measurement matrix is identity (the so-called proximal denoising setup) and we use $ell _{1}$ regularization. For each Lasso program, we specify settings in which that program is provably unstable with respect to its governing parameter. We support our analysis with detailed numerical simulations. For example, there are settings where a 0.1% underestimate of a Lasso parameter can increase the error significantly and a 50% underestimate can cause the error to increase by a factor of $10^{9}$.

广义Lasso的使用是用于恢复结构化高维信号的常用技术。广义拉索有三种常见的公式；每个程序都有一个控制参数，其最优值取决于数据的性质。在这个最优值下，压缩传感理论解释了为什么Lasso程序恢复具有最小-最大阶最优误差的结构化高维信号。不幸的是，在实践中，最佳选择通常是未知的，必须进行估计。因此，我们研究了三个拉索程序中每一个程序相对于其控制参数的稳定性。我们的目标是帮助从业者回答以下问题：给定真实数据，应该使用哪个Lasso程序？我们通过分析测量矩阵是恒等式的情况（所谓的近端去噪设置），并使用$ell_{1}$正则化，朝着回答这个问题迈出了一步。对于每个Lasso程序，我们指定程序相对于其控制参数可证明不稳定的设置。我们通过详细的数值模拟来支持我们的分析。例如，在某些设置中，对Lasso参数低估0.1%会显著增加误差，低估50%会导致误差增加$10^｛9｝$。

{"title":"Sensitivity of ℓ1 minimization to parameter choice","authors":"Aaron Berk;Yaniv Plan;Özgür Yilmaz","doi":"10.1093/imaiai/iaaa014","DOIUrl":"https://doi.org/10.1093/imaiai/iaaa014","url":null,"abstract":"The use of generalized Lasso is a common technique for recovery of structured high-dimensional signals. There are three common formulations of generalized Lasso; each program has a governing parameter whose optimal value depends on properties of the data. At this optimal value, compressed sensing theory explains why Lasso programs recover structured high-dimensional signals with minimax order-optimal error. Unfortunately in practice, the optimal choice is generally unknown and must be estimated. Thus, we investigate stability of each of the three Lasso programs with respect to its governing parameter. Our goal is to aid the practitioner in answering the following question: given real data, which Lasso program should be used? We take a step towards answering this by analysing the case where the measurement matrix is identity (the so-called proximal denoising setup) and we use \u0000<tex>$ell _{1}$</tex>\u0000 regularization. For each Lasso program, we specify settings in which that program is provably unstable with respect to its governing parameter. We support our analysis with detailed numerical simulations. For example, there are settings where a 0.1% underestimate of a Lasso parameter can increase the error significantly and a 50% underestimate can cause the error to increase by a factor of \u0000<tex>$10^{9}$</tex>\u0000.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"10 1","pages":"397-453"},"PeriodicalIF":1.6,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/imaiai/iaaa014","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50262611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Super-resolution of near-colliding point sources 近碰撞点源的超分辨率

IF 1.6 4区数学 Q2 MATHEMATICS, APPLIED

Information and Inference-A Journal of the Ima

Pub Date : 2020-10-01 DOI: 10.1093/imaiai/iaaa005

Dmitry Batenkov;Gil Goldman;Yosef Yomdin

We consider the problem of stable recovery of sparse signals of the form $$begin{equation*}F(x)=sum_{j=1}^d a_jdelta(x-x_j),quad x_jinmathbb{R},;a_jinmathbb{C}, end{equation*}$$ from their spectral measurements, known in a bandwidth $varOmega $ with absolute error not exceeding $epsilon>0$. We consider the case when at most $pleqslant d$ nodes ${x_j}$ of $F$ form a cluster whose extent is smaller than the Rayleigh limit ${1over varOmega }$, while the rest of the nodes is well separated. Provided that $epsilon lessapprox operatorname{SRF}^{-2p+1}$, where $operatorname{SRF}=(varOmega varDelta )^{-1}$ and $varDelta $ is the minimal separation between the nodes, we show that the minimax error rate for reconstruction of the cluster nodes is of order ${1over varOmega }operatorname{SRF}^{2p-1}epsilon $, while for recovering the corresponding amplitudes ${a_j}$ the rate is of the order $operatorname{SRF}^{2p-1}epsilon $. Moreover, the corresponding minimax rates for the recovery of the non-clustered nodes and amplitudes are ${epsilon over varOmega }$ and $epsilon $, respectively. These results suggest that stable super-resolution is possible in much more general situations than previously thought. Our numerical experiments show that the well-known matrix pencil method achieves the above accuracy bounds.

我们考虑形式为$$beart｛方程*｝F（x）=sum_{j=1｝^d a_jdelta（x-x_j），quad x_jinmathbb｛R｝，的稀疏信号的稳定恢复问题；a_jinmathbb｛C｝，end｛方程*｝$$来自它们的光谱测量，在带宽$varOmega$中已知，绝对误差不超过$epsilon>；0美元。我们考虑这样的情况，即$F$的最多$pleqslant d$节点$｛x_j｝$形成一个范围小于瑞利极限${1overvarOmega｝$的簇，而其余节点则很好地分离。假设$epsilonlessapproxoperatorname｛SRF｝^｛-2p+1｝$，其中$operatorname｛SRF｝=（varOmegavarDelta）^｛-1｝$和$varDelta$是节点之间的最小间隔，我们证明了簇节点重构的最小最大错误率为${1overvarOmega｝operatorname{SRF}^｛2p-1｝epsilon$，而对于恢复相应的振幅$｛a_j｝$，速率为$运算符名称｛SRF｝^｛2p-1｝ε$的阶。此外，非聚类节点和振幅的恢复的相应的最小-最大速率分别为${epsilonovervarOmega}$和$epsilon$。这些结果表明，在比以前想象的更普遍的情况下，稳定的超分辨率是可能的。我们的数值实验表明，众所周知的矩阵笔方法达到了上述精度界限。

{"title":"Super-resolution of near-colliding point sources","authors":"Dmitry Batenkov;Gil Goldman;Yosef Yomdin","doi":"10.1093/imaiai/iaaa005","DOIUrl":"https://doi.org/10.1093/imaiai/iaaa005","url":null,"abstract":"We consider the problem of stable recovery of sparse signals of the form \u0000<tex>$$begin{equation*}F(x)=sum_{j=1}^d a_jdelta(x-x_j),quad x_jinmathbb{R},;a_jinmathbb{C}, end{equation*}$$</tex>\u0000 from their spectral measurements, known in a bandwidth \u0000<tex>$varOmega $</tex>\u0000 with absolute error not exceeding \u0000<tex>$epsilon>0$</tex>\u0000. We consider the case when at most \u0000<tex>$pleqslant d$</tex>\u0000 nodes \u0000<tex>${x_j}$</tex>\u0000 of \u0000<tex>$F$</tex>\u0000 form a cluster whose extent is smaller than the Rayleigh limit \u0000<tex>${1over varOmega }$</tex>\u0000, while the rest of the nodes is well separated. Provided that \u0000<tex>$epsilon lessapprox operatorname{SRF}^{-2p+1}$</tex>\u0000, where \u0000<tex>$operatorname{SRF}=(varOmega varDelta )^{-1}$</tex>\u0000 and \u0000<tex>$varDelta $</tex>\u0000 is the minimal separation between the nodes, we show that the minimax error rate for reconstruction of the cluster nodes is of order \u0000<tex>${1over varOmega }operatorname{SRF}^{2p-1}epsilon $</tex>\u0000, while for recovering the corresponding amplitudes \u0000<tex>${a_j}$</tex>\u0000 the rate is of the order \u0000<tex>$operatorname{SRF}^{2p-1}epsilon $</tex>\u0000. Moreover, the corresponding minimax rates for the recovery of the non-clustered nodes and amplitudes are \u0000<tex>${epsilon over varOmega }$</tex>\u0000 and \u0000<tex>$epsilon $</tex>\u0000, respectively. These results suggest that stable super-resolution is possible in much more general situations than previously thought. Our numerical experiments show that the well-known matrix pencil method achieves the above accuracy bounds.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"10 1","pages":"515-572"},"PeriodicalIF":1.6,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/imaiai/iaaa005","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50262614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 48

Low-rank matrix completion and denoising under Poisson noise 泊松噪声下的低秩矩阵补全与去噪

IF 1.6 4区数学 Q2 MATHEMATICS, APPLIED

Information and Inference-A Journal of the Ima

Pub Date : 2020-10-01 DOI: 10.1093/imaiai/iaaa020

Andrew D McRae;Mark A Davenport

This paper considers the problem of estimating a low-rank matrix from the observation of all or a subset of its entries in the presence of Poisson noise. When we observe all entries, this is a problem of matrix denoising; when we observe only a subset of the entries, this is a problem of matrix completion. In both cases, we exploit an assumption that the underlying matrix is low-rank. Specifically, we analyse several estimators, including a constrained nuclear-norm minimization program, nuclear-norm regularized least squares and a non-convex constrained low-rank optimization problem. We show that for all three estimators, with high probability, we have an upper error bound (in the Frobenius norm error metric) that depends on the matrix rank, the fraction of the elements observed and the maximal row and column sums of the true matrix. We furthermore show that the above results are minimax optimal (within a universal constant) in classes of matrices with low-rank and bounded row and column sums. We also extend these results to handle the case of matrix multinomial denoising and completion.

本文考虑了在存在泊松噪声的情况下，根据对其所有或子集项的观测来估计低秩矩阵的问题。当我们观察所有条目时，这是一个矩阵去噪的问题；当我们只观察条目的子集时，这是一个矩阵完备的问题。在这两种情况下，我们都利用了一个假设，即底层矩阵是低秩的。具体地，我们分析了几个估计量，包括约束核范数最小化程序、核范数正则化最小二乘和非凸约束低秩优化问题。我们证明，对于所有三个估计量，在高概率的情况下，我们有一个误差上界（在Frobenius范数误差度量中），它取决于矩阵秩、观察到的元素的分数以及真矩阵的最大行和和列和。我们进一步证明了在具有低秩和有界行和列和的矩阵类中，上述结果是极小极大最优的（在通用常数内）。我们还将这些结果扩展到处理矩阵多项式去噪和补全的情况。

引用次数: 12

Mutual information for low-rank even-order symmetric tensor estimation 低秩偶阶对称张量估计的互信息

IF 1.6 4区数学 Q2 MATHEMATICS, APPLIED

Information and Inference-A Journal of the Ima

Pub Date : 2020-09-24 DOI: 10.1093/imaiai/iaaa022

Clément Luneau, Jean Barbier, N. Macris

We consider a statistical model for finite-rank symmetric tensor factorization and prove a singleletter variational expression for its asymptotic mutual information when the tensor is of even order. The proof applies the adaptive interpolation method originally invented for rank-one factorization. Here we show how to extend the adaptive interpolation to finite-rank and even-order tensors. This requires new nontrivial ideas with respect to the current analysis in the literature. We also underline where the proof falls short when dealing with odd-order tensors.

考虑有限秩对称张量分解的统计模型，证明了其偶阶张量渐近互信息的单字母变分表达式。证明采用了最初发明的秩一分解自适应插值方法。这里我们展示了如何将自适应插值扩展到有限秩和偶阶张量。这就需要在当前文献分析的基础上提出新的重要观点。我们还强调了在处理奇阶张量时证明不足的地方。

引用次数: 14

Two-sample statistics based on anisotropic kernels. 基于各向异性核的双样本统计。

IF 1.6 4区数学 Q2 MATHEMATICS, APPLIED

Information and Inference-A Journal of the Ima

Pub Date : 2020-09-01 Epub Date: 2019-12-10 DOI: 10.1093/imaiai/iaz018

Xiuyuan Cheng, Alexander Cloninger, Ronald R Coifman

The paper introduces a new kernel-based Maximum Mean Discrepancy (MMD) statistic for measuring the distance between two distributions given finitely many multivariate samples. When the distributions are locally low-dimensional, the proposed test can be made more powerful to distinguish certain alternatives by incorporating local covariance matrices and constructing an anisotropic kernel. The kernel matrix is asymmetric; it computes the affinity between [Formula: see text] data points and a set of [Formula: see text] reference points, where [Formula: see text] can be drastically smaller than [Formula: see text]. While the proposed statistic can be viewed as a special class of Reproducing Kernel Hilbert Space MMD, the consistency of the test is proved, under mild assumptions of the kernel, as long as [Formula: see text], and a finite-sample lower bound of the testing power is obtained. Applications to flow cytometry and diffusion MRI datasets are demonstrated, which motivate the proposed approach to compare distributions.

本文介绍了一种新的基于核的最大平均差异统计量，用于测量给定有限多变量样本的两个分布之间的距离。当分布是局部低维时，通过结合局部协方差矩阵和构造各向异性核，可以使所提出的测试更有效地区分某些备选方案。核矩阵是非对称的;它计算[公式:参见文本]数据点与一组[公式:参见文本]参考点之间的关联，其中[公式:参见文本]可能比[公式:参见文本]小得多。虽然所提出的统计量可以看作是一类特殊的再现核希尔伯特空间MMD，但在核的温和假设下，只要[公式:见文]，就证明了检验的一致性，并得到了检验能力的有限样本下界。应用于流式细胞术和扩散MRI数据集被证明，这激发了提出的方法来比较分布。

引用次数: 16

Sparse confidence sets for normal mean models 正态均值模型的稀疏置信集

IF 1.6 4区数学 Q2 MATHEMATICS, APPLIED

Information and Inference-A Journal of the Ima

Pub Date : 2020-08-17 DOI: 10.1093/imaiai/iaad003

Y. Ning, Guang Cheng

In this paper, we propose a new framework to construct confidence sets for a $d$-dimensional unknown sparse parameter ${boldsymbol theta }$ under the normal mean model ${boldsymbol X}sim N({boldsymbol theta },sigma ^{2}bf{I})$. A key feature of the proposed confidence set is its capability to account for the sparsity of ${boldsymbol theta }$, thus named as sparse confidence set. This is in sharp contrast with the classical methods, such as the Bonferroni confidence intervals and other resampling-based procedures, where the sparsity of ${boldsymbol theta }$ is often ignored. Specifically, we require the desired sparse confidence set to satisfy the following two conditions: (i) uniformly over the parameter space, the coverage probability for ${boldsymbol theta }$ is above a pre-specified level; (ii) there exists a random subset $S$ of ${1,...,d}$ such that $S$ guarantees the pre-specified true negative rate for detecting non-zero $theta _{j}$’s. To exploit the sparsity of ${boldsymbol theta }$, we allow the confidence interval for $theta _{j}$ to degenerate to a single point 0 for any $jnotin S$. Under this new framework, we first consider whether there exist sparse confidence sets that satisfy the above two conditions. To address this question, we establish a non-asymptotic minimax lower bound for the non-coverage probability over a suitable class of sparse confidence sets. The lower bound deciphers the role of sparsity and minimum signal-to-noise ratio (SNR) in the construction of sparse confidence sets. Furthermore, under suitable conditions on the SNR, a two-stage procedure is proposed to construct a sparse confidence set. To evaluate the optimality, the proposed sparse confidence set is shown to attain a minimax lower bound of some properly defined risk function up to a constant factor. Finally, we develop an adaptive procedure to the unknown sparsity. Numerical studies are conducted to verify the theoretical results.

本文提出了一种新的框架，用于在正态均值模型${boldsymbol X}sim N({boldsymbol theta },sigma ^{2}bf{I})$下构造$d$维未知稀疏参数${boldsymbol theta }$的置信集。所提出的置信集的一个关键特征是它能够考虑到${boldsymbol theta }$的稀疏性，因此被称为稀疏置信集。这与经典方法形成鲜明对比，例如Bonferroni置信区间和其他基于重采样的程序，其中${boldsymbol theta }$的稀疏性通常被忽略。具体来说，我们要求所需的稀疏置信集满足以下两个条件:(i)在参数空间上均匀地，${boldsymbol theta }$的覆盖概率大于预先指定的水平;(ii)存在一个${1,...,d}$的随机子集$S$，使得$S$保证预先指定的检测非零$theta _{j}$的真负率。为了利用${boldsymbol theta }$的稀疏性，我们允许$theta _{j}$的置信区间退化为任意$jnotin S$的单个点0。在此框架下，我们首先考虑是否存在满足上述两个条件的稀疏置信集。为了解决这个问题，我们建立了一类合适的稀疏置信集上的非覆盖概率的非渐近极小极大下界。下界解释了稀疏度和最小信噪比(SNR)在稀疏置信集构建中的作用。此外，在适当的信噪比条件下，提出了一种两阶段构造稀疏置信集的方法。为了评估最优性，所提出的稀疏置信集被证明可以获得一些适当定义的风险函数的最小极大下界，直至一个常数因子。最后，提出了一种对未知稀疏度的自适应处理方法。数值研究验证了理论结果。

{"title":"Sparse confidence sets for normal mean models","authors":"Y. Ning, Guang Cheng","doi":"10.1093/imaiai/iaad003","DOIUrl":"https://doi.org/10.1093/imaiai/iaad003","url":null,"abstract":"\u0000 In this paper, we propose a new framework to construct confidence sets for a $d$-dimensional unknown sparse parameter ${boldsymbol theta }$ under the normal mean model ${boldsymbol X}sim N({boldsymbol theta },sigma ^{2}bf{I})$. A key feature of the proposed confidence set is its capability to account for the sparsity of ${boldsymbol theta }$, thus named as sparse confidence set. This is in sharp contrast with the classical methods, such as the Bonferroni confidence intervals and other resampling-based procedures, where the sparsity of ${boldsymbol theta }$ is often ignored. Specifically, we require the desired sparse confidence set to satisfy the following two conditions: (i) uniformly over the parameter space, the coverage probability for ${boldsymbol theta }$ is above a pre-specified level; (ii) there exists a random subset $S$ of ${1,...,d}$ such that $S$ guarantees the pre-specified true negative rate for detecting non-zero $theta _{j}$’s. To exploit the sparsity of ${boldsymbol theta }$, we allow the confidence interval for $theta _{j}$ to degenerate to a single point 0 for any $jnotin S$. Under this new framework, we first consider whether there exist sparse confidence sets that satisfy the above two conditions. To address this question, we establish a non-asymptotic minimax lower bound for the non-coverage probability over a suitable class of sparse confidence sets. The lower bound deciphers the role of sparsity and minimum signal-to-noise ratio (SNR) in the construction of sparse confidence sets. Furthermore, under suitable conditions on the SNR, a two-stage procedure is proposed to construct a sparse confidence set. To evaluate the optimality, the proposed sparse confidence set is shown to attain a minimax lower bound of some properly defined risk function up to a constant factor. Finally, we develop an adaptive procedure to the unknown sparsity. Numerical studies are conducted to verify the theoretical results.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"42 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2020-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80096659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Strong uniform consistency with rates for kernel density estimators with general kernels on manifolds 流形上具有一般核的核密度估计与率的强一致相合

IF 1.6 4区数学 Q2 MATHEMATICS, APPLIED

Information and Inference-A Journal of the Ima

Pub Date : 2020-07-13 DOI: 10.1093/IMAIAI/IAAB014

Hau‐Tieng Wu, Nan Wu

When analyzing modern machine learning algorithms, we may need to handle kernel density estimation (KDE) with intricate kernels that are not designed by the user and might even be irregular and asymmetric. To handle this emerging challenge, we provide a strong uniform consistency result with the $L^infty $ convergence rate for KDE on Riemannian manifolds with Riemann integrable kernels (in the ambient Euclidean space). We also provide an $L^1$ consistency result for kernel density estimation on Riemannian manifolds with Lebesgue integrable kernels. The isotropic kernels considered in this paper are different from the kernels in the Vapnik–Chervonenkis class that are frequently considered in statistics society. We illustrate the difference when we apply them to estimate the probability density function. Moreover, we elaborate the delicate difference when the kernel is designed on the intrinsic manifold and on the ambient Euclidian space, both might be encountered in practice. At last, we prove the necessary and sufficient condition for an isotropic kernel to be Riemann integrable on a submanifold in the Euclidean space.

在分析现代机器学习算法时，我们可能需要处理复杂的内核密度估计(KDE)，这些内核不是由用户设计的，甚至可能是不规则和不对称的。为了应对这个新出现的挑战，我们提供了一个强大的一致一致性结果，即在具有黎曼可积核的黎曼流形上(在环境欧几里德空间中)KDE的$L^infty $收敛率。我们还提供了具有Lebesgue可积核的黎曼流形核密度估计的$L^1$一致性结果。本文考虑的各向同性核不同于统计学界经常考虑的Vapnik-Chervonenkis类核。当我们应用它们来估计概率密度函数时，我们说明了它们的区别。此外，我们还详细阐述了核在本然流形和周围欧几里德空间上设计时的微妙区别，这两种设计在实际中都可能遇到。最后，我们证明了欧几里德空间中各向同性核在子流形上是黎曼可积的充要条件。

{"title":"Strong uniform consistency with rates for kernel density estimators with general kernels on manifolds","authors":"Hau‐Tieng Wu, Nan Wu","doi":"10.1093/IMAIAI/IAAB014","DOIUrl":"https://doi.org/10.1093/IMAIAI/IAAB014","url":null,"abstract":"\u0000 When analyzing modern machine learning algorithms, we may need to handle kernel density estimation (KDE) with intricate kernels that are not designed by the user and might even be irregular and asymmetric. To handle this emerging challenge, we provide a strong uniform consistency result with the $L^infty $ convergence rate for KDE on Riemannian manifolds with Riemann integrable kernels (in the ambient Euclidean space). We also provide an $L^1$ consistency result for kernel density estimation on Riemannian manifolds with Lebesgue integrable kernels. The isotropic kernels considered in this paper are different from the kernels in the Vapnik–Chervonenkis class that are frequently considered in statistics society. We illustrate the difference when we apply them to estimate the probability density function. Moreover, we elaborate the delicate difference when the kernel is designed on the intrinsic manifold and on the ambient Euclidian space, both might be encountered in practice. At last, we prove the necessary and sufficient condition for an isotropic kernel to be Riemann integrable on a submanifold in the Euclidean space.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"46 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87811383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A dimensionality reduction technique for unconstrained global optimization of functions with low effective dimensionality 低有效维数函数无约束全局优化的降维技术

IF 1.6 4区数学 Q2 MATHEMATICS, APPLIED

Information and Inference-A Journal of the Ima

Pub Date : 2020-03-21 DOI: 10.1093/IMAIAI/IAAB011

C. Cartis, Adilet Otemissov

We investigate the unconstrained global optimization of functions with low effective dimensionality, which are constant along certain (unknown) linear subspaces. Extending the technique of random subspace embeddings in Wang et al. (2016, J. Artificial Intelligence Res., 55, 361–387), we study a generic Random Embeddings for Global Optimization (REGO) framework that is compatible with any global minimization algorithm. Instead of the original, potentially large-scale optimization problem, within REGO, a Gaussian random, low-dimensional problem with bound constraints is formulated and solved in a reduced space. We provide novel probabilistic bounds for the success of REGO in solving the original, low effective-dimensionality problem, which show its independence of the (potentially large) ambient dimension and its precise dependence on the dimensions of the effective and embedding subspaces. These results significantly improve existing theoretical analyses by providing the exact distribution of a reduced minimizer and its Euclidean norm and by the general assumptions required on the problem. We validate our theoretical findings by extensive numerical testing of REGO with three types of global optimization solvers, illustrating the improved scalability of REGO compared with the full-dimensional application of the respective solvers.

研究了低有效维数函数沿某(未知)线性子空间的无约束全局优化问题。在Wang等人(2016,J. Artificial Intelligence Res.， 55, 361-387)的随机子空间嵌入技术的基础上，我们研究了一个通用的全局优化随机嵌入(REGO)框架，该框架与任何全局最小化算法兼容。而不是原来的，潜在的大规模优化问题，在REGO中，一个高斯随机的，低维的有界约束的问题被制定并在一个简化的空间中解决。我们为REGO成功解决原始的低有效维数问题提供了新的概率边界，这表明它与(潜在的大)环境维数的独立性以及与有效和嵌入子空间的维数的精确依赖。这些结果通过提供简化极小器及其欧几里得范数的精确分布以及问题所需的一般假设，显著改进了现有的理论分析。我们用三种类型的全局优化求解器对REGO进行了大量的数值测试，验证了我们的理论发现，说明了与各自求解器的全维应用相比，REGO的可扩展性得到了提高。

{"title":"A dimensionality reduction technique for unconstrained global optimization of functions with low effective dimensionality","authors":"C. Cartis, Adilet Otemissov","doi":"10.1093/IMAIAI/IAAB011","DOIUrl":"https://doi.org/10.1093/IMAIAI/IAAB011","url":null,"abstract":"\u0000 We investigate the unconstrained global optimization of functions with low effective dimensionality, which are constant along certain (unknown) linear subspaces. Extending the technique of random subspace embeddings in Wang et al. (2016, J. Artificial Intelligence Res., 55, 361–387), we study a generic Random Embeddings for Global Optimization (REGO) framework that is compatible with any global minimization algorithm. Instead of the original, potentially large-scale optimization problem, within REGO, a Gaussian random, low-dimensional problem with bound constraints is formulated and solved in a reduced space. We provide novel probabilistic bounds for the success of REGO in solving the original, low effective-dimensionality problem, which show its independence of the (potentially large) ambient dimension and its precise dependence on the dimensions of the effective and embedding subspaces. These results significantly improve existing theoretical analyses by providing the exact distribution of a reduced minimizer and its Euclidean norm and by the general assumptions required on the problem. We validate our theoretical findings by extensive numerical testing of REGO with three types of global optimization solvers, illustrating the improved scalability of REGO compared with the full-dimensional application of the respective solvers.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"90 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2020-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80168237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

The information complexity of learning tasks, their structure and their distance 学习任务的信息复杂性、结构和距离

IF 1.6 4区数学 Q2 MATHEMATICS, APPLIED

Information and Inference-A Journal of the Ima

Pub Date : 2020-03-01 DOI: 10.1093/imaiai/iaaa033

Alessandro Achille;Giovanni Paolini;Glen Mbeng;Stefano Soatto

We introduce an asymmetric distance in the space of learning tasks and a framework to compute their complexity. These concepts are foundational for the practice of transfer learning, whereby a parametric model is pre-trained for a task, and then fine tuned for another. The framework we develop is non-asymptotic, captures the finite nature of the training dataset and allows distinguishing learning from memorization. It encompasses, as special cases, classical notions from Kolmogorov complexity and Shannon and Fisher information. However, unlike some of those frameworks, it can be applied to large-scale models and real-world datasets. Our framework is the first to measure complexity in a way that accounts for the effect of the optimization scheme, which is critical in deep learning.

我们引入了学习任务空间中的非对称距离，以及计算其复杂性的框架。这些概念是迁移学习实践的基础，通过迁移学习，参数模型被预先训练用于一项任务，然后被微调用于另一项任务。我们开发的框架是非渐进的，捕获了训练数据集的有限性质，并允许区分学习和记忆。作为特例，它包含了来自Kolmogorov复杂性和Shannon和Fisher信息的经典概念。然而，与其中一些框架不同，它可以应用于大规模模型和真实世界的数据集。我们的框架是第一个以考虑优化方案效果的方式来衡量复杂性的框架，这在深度学习中至关重要。

引用次数: 40

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Information and Inference-A Journal of the Ima

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀