首页 > 最新文献

Information and Inference-A Journal of the Ima最新文献

英文 中文
Wavelet invariants for statistically robust multi-reference alignment. 用于统计稳健多参考对齐的小波不变式。
IF 1.6 4区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2021-12-01 Epub Date: 2020-08-13 DOI: 10.1093/imaiai/iaaa016
Matthew Hirn, Anna Little

We propose a nonlinear, wavelet-based signal representation that is translation invariant and robust to both additive noise and random dilations. Motivated by the multi-reference alignment problem and generalizations thereof, we analyze the statistical properties of this representation given a large number of independent corruptions of a target signal. We prove the nonlinear wavelet-based representation uniquely defines the power spectrum but allows for an unbiasing procedure that cannot be directly applied to the power spectrum. After unbiasing the representation to remove the effects of the additive noise and random dilations, we recover an approximation of the power spectrum by solving a convex optimization problem, and thus reduce to a phase retrieval problem. Extensive numerical experiments demonstrate the statistical robustness of this approximation procedure.

我们提出了一种基于小波的非线性信号表示法,这种表示法具有平移不变性,并对加性噪声和随机扩张具有鲁棒性。受多参考对齐问题及其一般化的启发,我们分析了这种表示法在目标信号受到大量独立破坏时的统计特性。我们证明了基于小波的非线性表示法唯一定义了功率谱,但允许采用无法直接应用于功率谱的无偏程序。在对表示进行无偏化以消除加性噪声和随机扩张的影响后,我们通过解决一个凸优化问题来恢复功率谱的近似值,从而简化为一个相位检索问题。大量的数值实验证明了这一近似程序的统计稳健性。
{"title":"Wavelet invariants for statistically robust multi-reference alignment.","authors":"Matthew Hirn, Anna Little","doi":"10.1093/imaiai/iaaa016","DOIUrl":"10.1093/imaiai/iaaa016","url":null,"abstract":"<p><p>We propose a nonlinear, wavelet-based signal representation that is translation invariant and robust to both additive noise and random dilations. Motivated by the multi-reference alignment problem and generalizations thereof, we analyze the statistical properties of this representation given a large number of independent corruptions of a target signal. We prove the nonlinear wavelet-based representation uniquely defines the power spectrum but allows for an unbiasing procedure that cannot be directly applied to the power spectrum. After unbiasing the representation to remove the effects of the additive noise and random dilations, we recover an approximation of the power spectrum by solving a convex optimization problem, and thus reduce to a phase retrieval problem. Extensive numerical experiments demonstrate the statistical robustness of this approximation procedure.</p>","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"10 4","pages":"1287-1351"},"PeriodicalIF":1.6,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8782248/pdf/nihms-1726636.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39962758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Erratum to: Subspace clustering using ensembles of K>-subspaces 对使用K>-子空间集合的子空间聚类的勘误
IF 1.6 4区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2021-10-15 DOI: 10.1093/imaiai/iaab026
J. Lipor, D. Hong, Yan Shuo Tan, L. Balzano
{"title":"Erratum to: Subspace clustering using ensembles of K>-subspaces","authors":"J. Lipor, D. Hong, Yan Shuo Tan, L. Balzano","doi":"10.1093/imaiai/iaab026","DOIUrl":"https://doi.org/10.1093/imaiai/iaab026","url":null,"abstract":"","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"23 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2021-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89241466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Estimating location parameters in sample-heterogeneous distributions 估计样本异质分布中的位置参数
IF 1.6 4区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2021-06-03 DOI: 10.1093/IMAIAI/IAAB013
Ankit Pensia, Varun Jog, Po-Ling Loh
Estimating the mean of a probability distribution using i.i.d. samples is a classical problem in statistics, wherein finite-sample optimal estimators are sought under various distributional assumptions. In this paper, we consider the problem of mean estimation when independent samples are drawn from ddimensional non-identical distributions possessing a common mean. When the distributions are radially symmetric and unimodal, we propose a novel estimator, which is a hybrid of the modal interval, shorth, and median estimators, and whose performance adapts to the level of heterogeneity in the data. We show that our estimator is near-optimal when data are i.i.d. and when the fraction of “low-noise” distributions is as small as Ω ( d logn n ) , where n is the number of samples. We also derive minimax lower bounds on the expected error of any estimator that is agnostic to the scales of individual data points. Finally, we extend our theory to linear regression. In both the mean estimation and regression settings, we present computationally feasible versions of our estimators that run in time polynomial in the number of data points.
利用i.i.d样本估计概率分布的均值是统计学中的一个经典问题,其中在各种分布假设下寻求有限样本最优估计。本文研究了从具有共同均值的非同维分布中抽取独立样本的均值估计问题。当分布是径向对称和单峰分布时,我们提出了一种新的估计器,它是模态区间估计器、短估计器和中值估计器的混合,其性能适应数据的异质性水平。我们表明,当数据是i.i.d并且“低噪声”分布的比例小到Ω (d logn)时,我们的估计器是接近最优的,其中n是样本数。我们还推导出与单个数据点的尺度无关的任何估计器的期望误差的极小极大下界。最后,我们将我们的理论扩展到线性回归。在均值估计和回归设置中,我们提出了计算上可行的估计器版本,这些估计器以数据点数量的时间多项式运行。
{"title":"Estimating location parameters in sample-heterogeneous distributions","authors":"Ankit Pensia, Varun Jog, Po-Ling Loh","doi":"10.1093/IMAIAI/IAAB013","DOIUrl":"https://doi.org/10.1093/IMAIAI/IAAB013","url":null,"abstract":"Estimating the mean of a probability distribution using i.i.d. samples is a classical problem in statistics, wherein finite-sample optimal estimators are sought under various distributional assumptions. In this paper, we consider the problem of mean estimation when independent samples are drawn from ddimensional non-identical distributions possessing a common mean. When the distributions are radially symmetric and unimodal, we propose a novel estimator, which is a hybrid of the modal interval, shorth, and median estimators, and whose performance adapts to the level of heterogeneity in the data. We show that our estimator is near-optimal when data are i.i.d. and when the fraction of “low-noise” distributions is as small as Ω ( d logn n ) , where n is the number of samples. We also derive minimax lower bounds on the expected error of any estimator that is agnostic to the scales of individual data points. Finally, we extend our theory to linear regression. In both the mean estimation and regression settings, we present computationally feasible versions of our estimators that run in time polynomial in the number of data points.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"73 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2021-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86158303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Compressive learning with privacy guarantees 具有隐私保证的压缩学习
IF 1.6 4区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2021-05-15 DOI: 10.1093/IMAIAI/IAAB005
Antoine Chatalic, V. Schellekens, F. Houssiau, Y. de Montjoye, L. Jacques, R. Gribonval
This work addresses the problem of learning from large collections of data with privacy guarantees. The compressive learning framework proposes to deal with the large scale of datasets by compressing them into a single vector of generalized random moments, called a sketch vector, from which the learning task is then performed. We provide sharp bounds on the so-called sensitivity of this sketching mechanism. This allows us to leverage standard techniques to ensure differential privacy—a well-established formalism for defining and quantifying the privacy of a random mechanism—by adding Laplace of Gaussian noise to the sketch. We combine these standard mechanisms with a new feature subsampling mechanism, which reduces the computational cost without damaging privacy. The overall framework is applied to the tasks of Gaussian modeling, k-means clustering and principal component analysis, for which sharp privacy bounds are derived. Empirically, the quality (for subsequent learning) of the compressed representation produced by our mechanism is strongly related with the induced noise level, for which we give analytical expressions.
这项工作解决了从具有隐私保证的大型数据集合中学习的问题。压缩学习框架提出通过将数据集压缩成单个广义随机矩向量(称为草图向量)来处理大规模数据集,然后从中执行学习任务。我们对这种绘图机制的所谓灵敏度提供了明确的界限。这允许我们利用标准技术来确保差分隐私——一种完善的定义和量化随机机制隐私的形式——通过向草图中添加拉普拉斯高斯噪声。我们将这些标准机制与一种新的特征子采样机制结合起来,在不损害隐私的情况下降低了计算成本。将整体框架应用于高斯建模、k-均值聚类和主成分分析任务,并推导出明确的隐私边界。根据经验,我们的机制产生的压缩表示的质量(用于后续学习)与诱导噪声水平密切相关,我们给出了解析表达式。
{"title":"Compressive learning with privacy guarantees","authors":"Antoine Chatalic, V. Schellekens, F. Houssiau, Y. de Montjoye, L. Jacques, R. Gribonval","doi":"10.1093/IMAIAI/IAAB005","DOIUrl":"https://doi.org/10.1093/IMAIAI/IAAB005","url":null,"abstract":"\u0000 This work addresses the problem of learning from large collections of data with privacy guarantees. The compressive learning framework proposes to deal with the large scale of datasets by compressing them into a single vector of generalized random moments, called a sketch vector, from which the learning task is then performed. We provide sharp bounds on the so-called sensitivity of this sketching mechanism. This allows us to leverage standard techniques to ensure differential privacy—a well-established formalism for defining and quantifying the privacy of a random mechanism—by adding Laplace of Gaussian noise to the sketch. We combine these standard mechanisms with a new feature subsampling mechanism, which reduces the computational cost without damaging privacy. The overall framework is applied to the tasks of Gaussian modeling, k-means clustering and principal component analysis, for which sharp privacy bounds are derived. Empirically, the quality (for subsequent learning) of the compressed representation produced by our mechanism is strongly related with the induced noise level, for which we give analytical expressions.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"51 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2021-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90454586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Double robust semi-supervised inference for the mean: selection bias under MAR labeling with decaying overlap 均值的双鲁棒半监督推理:重叠衰减的MAR标记下的选择偏差
IF 1.6 4区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2021-04-14 DOI: 10.1093/imaiai/iaad021
Yuqian Zhang, Abhishek Chakrabortty, Jelena Bradic
Semi-supervised (SS) inference has received much attention in recent years. Apart from a moderate-sized labeled data, $mathcal L$, the SS setting is characterized by an additional, much larger sized, unlabeled data, $mathcal U$. The setting of $|mathcal U |gg |mathcal L |$, makes SS inference unique and different from the standard missing data problems, owing to natural violation of the so-called ‘positivity’ or ‘overlap’ assumption. However, most of the SS literature implicitly assumes $mathcal L$ and $mathcal U$ to be equally distributed, i.e., no selection bias in the labeling. Inferential challenges in missing at random type labeling allowing for selection bias, are inevitably exacerbated by the decaying nature of the propensity score (PS). We address this gap for a prototype problem, the estimation of the response’s mean. We propose a double robust SS mean estimator and give a complete characterization of its asymptotic properties. The proposed estimator is consistent as long as either the outcome or the PS model is correctly specified. When both models are correctly specified, we provide inference results with a non-standard consistency rate that depends on the smaller size $|mathcal L |$. The results are also extended to causal inference with imbalanced treatment groups. Further, we provide several novel choices of models and estimators of the decaying PS, including a novel offset logistic model and a stratified labeling model. We present their properties under both high- and low-dimensional settings. These may be of independent interest. Lastly, we present extensive simulations and also a real data application.
半监督推理近年来受到广泛关注。除了中等大小的标记数据$mathcal L$之外,SS设置的特点是另外一个大得多的未标记数据$mathcal U$。$|mathcal U |gg |mathcal L |$的设置,使得SS推理是唯一的,不同于标准的缺失数据问题,因为它自然违反了所谓的“正性”或“重叠”假设。然而,大多数SS文献隐含地假设$mathcal L$和$mathcal U$是均匀分布的,即在标记中没有选择偏差。在允许选择偏差的随机类型标签缺失的推理挑战,不可避免地加剧了倾向得分(PS)的衰减性质。我们通过一个原型问题来解决这个差距,即响应均值的估计。我们提出了一个双鲁棒SS均值估计量,并给出了它的渐近性质的完整刻画。只要正确指定了结果或PS模型,所建议的估计量就是一致的。当两个模型都被正确指定时,我们提供的推理结果具有非标准的一致性率,该一致性率取决于较小的大小$|mathcal L |$。结果也扩展到不平衡处理组的因果推理。此外,我们提供了几种新的模型和衰减PS的估计器,包括一个新的偏移逻辑模型和一个分层标记模型。我们给出了它们在高维和低维设置下的性质。这些可能是独立的利益。最后,我们给出了大量的仿真和一个实际的数据应用。
{"title":"Double robust semi-supervised inference for the mean: selection bias under MAR labeling with decaying overlap","authors":"Yuqian Zhang, Abhishek Chakrabortty, Jelena Bradic","doi":"10.1093/imaiai/iaad021","DOIUrl":"https://doi.org/10.1093/imaiai/iaad021","url":null,"abstract":"\u0000 Semi-supervised (SS) inference has received much attention in recent years. Apart from a moderate-sized labeled data, $mathcal L$, the SS setting is characterized by an additional, much larger sized, unlabeled data, $mathcal U$. The setting of $|mathcal U |gg |mathcal L |$, makes SS inference unique and different from the standard missing data problems, owing to natural violation of the so-called ‘positivity’ or ‘overlap’ assumption. However, most of the SS literature implicitly assumes $mathcal L$ and $mathcal U$ to be equally distributed, i.e., no selection bias in the labeling. Inferential challenges in missing at random type labeling allowing for selection bias, are inevitably exacerbated by the decaying nature of the propensity score (PS). We address this gap for a prototype problem, the estimation of the response’s mean. We propose a double robust SS mean estimator and give a complete characterization of its asymptotic properties. The proposed estimator is consistent as long as either the outcome or the PS model is correctly specified. When both models are correctly specified, we provide inference results with a non-standard consistency rate that depends on the smaller size $|mathcal L |$. The results are also extended to causal inference with imbalanced treatment groups. Further, we provide several novel choices of models and estimators of the decaying PS, including a novel offset logistic model and a stratified labeling model. We present their properties under both high- and low-dimensional settings. These may be of independent interest. Lastly, we present extensive simulations and also a real data application.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"24 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2021-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78754638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Topological information retrieval with dilation-invariant bottleneck comparative measures 基于扩展不变瓶颈比较测度的拓扑信息检索
IF 1.6 4区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2021-04-04 DOI: 10.1093/imaiai/iaad022
Athanasios Vlontzos, Yueqi Cao, Luca Schmidtke, Bernhard Kainz, Anthea Monod
Appropriately representing elements in a database so that queries may be accurately matched is a central task in information retrieval; recently, this has been achieved by embedding the graphical structure of the database into a manifold in a hierarchy-preserving manner using a variety of metrics. Persistent homology is a tool commonly used in topological data analysis that is able to rigorously characterize a database in terms of both its hierarchy and connectivity structure. Computing persistent homology on a variety of embedded datasets reveals that some commonly used embeddings fail to preserve the connectivity. We show that those embeddings which successfully retain the database topology coincide in persistent homology by introducing two dilation-invariant comparative measures to capture this effect: in particular, they address the issue of metric distortion on manifolds. We provide an algorithm for their computation that exhibits greatly reduced time complexity over existing methods. We use these measures to perform the first instance of topology-based information retrieval and demonstrate its increased performance over the standard bottleneck distance for persistent homology. We showcase our approach on databases of different data varieties including text, videos and medical images.
适当地表示数据库中的元素,使查询能够准确匹配是信息检索的中心任务;最近,这已经通过使用各种度量以保持层次结构的方式将数据库的图形结构嵌入到流形中来实现。持久同源性是拓扑数据分析中常用的一种工具,它能够严格地描述数据库的层次结构和连接结构。对多种嵌入数据集的持久同源性计算表明,一些常用的嵌入不能保持数据集的连通性。我们通过引入两个膨胀不变比较度量来捕捉这种效应,证明那些成功保留数据库拓扑的嵌入在持久同调中是一致的:特别是,它们解决了流形上的度量失真问题。我们提供了一种计算它们的算法,该算法比现有方法大大降低了时间复杂度。我们使用这些度量来执行基于拓扑的信息检索的第一个实例,并演示了它在持久同构的标准瓶颈距离上的性能提高。我们在不同数据类型的数据库上展示了我们的方法,包括文本、视频和医学图像。
{"title":"Topological information retrieval with dilation-invariant bottleneck comparative measures","authors":"Athanasios Vlontzos, Yueqi Cao, Luca Schmidtke, Bernhard Kainz, Anthea Monod","doi":"10.1093/imaiai/iaad022","DOIUrl":"https://doi.org/10.1093/imaiai/iaad022","url":null,"abstract":"\u0000 Appropriately representing elements in a database so that queries may be accurately matched is a central task in information retrieval; recently, this has been achieved by embedding the graphical structure of the database into a manifold in a hierarchy-preserving manner using a variety of metrics. Persistent homology is a tool commonly used in topological data analysis that is able to rigorously characterize a database in terms of both its hierarchy and connectivity structure. Computing persistent homology on a variety of embedded datasets reveals that some commonly used embeddings fail to preserve the connectivity. We show that those embeddings which successfully retain the database topology coincide in persistent homology by introducing two dilation-invariant comparative measures to capture this effect: in particular, they address the issue of metric distortion on manifolds. We provide an algorithm for their computation that exhibits greatly reduced time complexity over existing methods. We use these measures to perform the first instance of topology-based information retrieval and demonstrate its increased performance over the standard bottleneck distance for persistent homology. We showcase our approach on databases of different data varieties including text, videos and medical images.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"58 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2021-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77142316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multi-scale vector quantization with reconstruction trees 基于重构树的多尺度矢量量化
IF 1.6 4区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2021-02-01 DOI: 10.1093/imaiai/iaaa004
Enrico Cecini;Ernesto De Vito;Lorenzo Rosasco
We propose and study a multi-scale approach to vector quantization (VQ). We develop an algorithm, dubbed reconstruction trees, inspired by decision trees. Here the objective is parsimonious reconstruction of unsupervised data, rather than classification. Contrasted to more standard VQ methods, such as $k$-means, the proposed approach leverages a family of given partitions, to quickly explore the data in a coarse-to-fine multi-scale fashion. Our main technical contribution is an analysis of the expected distortion achieved by the proposed algorithm, when the data are assumed to be sampled from a fixed unknown distribution. In this context, we derive both asymptotic and finite sample results under suitable regularity assumptions on the distribution. As a special case, we consider the setting where the data generating distribution is supported on a compact Riemannian submanifold. Tools from differential geometry and concentration of measure are useful in our analysis.
我们提出并研究了一种矢量量化(VQ)的多尺度方法。受决策树的启发,我们开发了一种算法,称为重建树。这里的目标是无监督数据的简约重建,而不是分类。与更标准的VQ方法(如$k$-means)相比,所提出的方法利用一系列给定的分区,以从粗到细的多尺度方式快速探索数据。我们的主要技术贡献是分析所提出的算法所实现的预期失真,当假设数据是从固定的未知分布中采样时。在这种情况下,我们在分布的适当正则性假设下导出了渐近和有限样本结果。作为一种特殊情况,我们考虑数据生成分布在紧致黎曼子流形上得到支持的设置。微分几何和度量集中的工具在我们的分析中很有用。
{"title":"Multi-scale vector quantization with reconstruction trees","authors":"Enrico Cecini;Ernesto De Vito;Lorenzo Rosasco","doi":"10.1093/imaiai/iaaa004","DOIUrl":"https://doi.org/10.1093/imaiai/iaaa004","url":null,"abstract":"We propose and study a multi-scale approach to vector quantization (VQ). We develop an algorithm, dubbed reconstruction trees, inspired by decision trees. Here the objective is parsimonious reconstruction of unsupervised data, rather than classification. Contrasted to more standard VQ methods, such as \u0000<tex>$k$</tex>\u0000-means, the proposed approach leverages a family of given partitions, to quickly explore the data in a coarse-to-fine multi-scale fashion. Our main technical contribution is an analysis of the expected distortion achieved by the proposed algorithm, when the data are assumed to be sampled from a fixed unknown distribution. In this context, we derive both asymptotic and finite sample results under suitable regularity assumptions on the distribution. As a special case, we consider the setting where the data generating distribution is supported on a compact Riemannian submanifold. Tools from differential geometry and concentration of measure are useful in our analysis.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"10 3","pages":"955-986"},"PeriodicalIF":1.6,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50347109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Freeness over the diagonal and outliers detection in deformed random matrices with a variance profile 具有方差分布的变形随机矩阵中对角线上的自由度和异常值检测
IF 1.6 4区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2021-02-01 DOI: 10.1093/imaiai/iaaa012
Jérémie Bigot;Camille Male
We study the eigenvalue distribution of a Gaussian unitary ensemble (GUE) matrix with a variance profile that is perturbed by an additive random matrix that may possess spikes. Our approach is guided by Voiculescu's notion of freeness with amalgamation over the diagonal and by the notion of deterministic equivalent. This allows to derive a fixed point equation to approximate the spectral distribution of certain deformed GUE matrices with a variance profile and to characterize the location of potential outliers in such models in a non-asymptotic setting. We also consider the singular values distribution of a rectangular Gaussian random matrix with a variance profile in a similar setting of additive perturbation. We discuss the application of this approach to the study of low-rank matrix denoising models in the presence of heteroscedastic noise, that is when the amount of variance in the observed data matrix may change from entry to entry. Numerical experiments are used to illustrate our results. Deformed random matrix, Variance profile, Outlier detection, Free probability, Freeness with amalgamation, Operator-valued Stieltjes transform, Gaussian spiked model, Low-rank model. 2000 Math Subject Classification: 62G05, 62H12.
我们研究了具有方差分布的高斯酉系综(GUE)矩阵的特征值分布,该矩阵受到可能具有尖峰的加性随机矩阵的扰动。我们的方法以Voiculescu的对角线上融合的自由度概念和确定性等价的概念为指导。这允许导出一个不动点方程,以近似具有方差分布的某些变形GUE矩阵的谱分布,并在非渐近设置中表征此类模型中潜在异常值的位置。我们还考虑了具有方差分布的矩形高斯随机矩阵在类似的加性扰动下的奇异值分布。我们讨论了在异方差噪声存在的情况下,即当观测数据矩阵中的方差可能随着条目的不同而变化时,这种方法在低秩矩阵去噪模型研究中的应用。用数值实验来说明我们的结果。变形随机矩阵、方差轮廓、异常值检测、自由概率、融合自由度、算子值Stieltjes变换、高斯尖峰模型、低秩模型。2000年数学科目分类:62G05、62H12。
{"title":"Freeness over the diagonal and outliers detection in deformed random matrices with a variance profile","authors":"Jérémie Bigot;Camille Male","doi":"10.1093/imaiai/iaaa012","DOIUrl":"https://doi.org/10.1093/imaiai/iaaa012","url":null,"abstract":"We study the eigenvalue distribution of a Gaussian unitary ensemble (GUE) matrix with a variance profile that is perturbed by an additive random matrix that may possess spikes. Our approach is guided by Voiculescu's notion of freeness with amalgamation over the diagonal and by the notion of deterministic equivalent. This allows to derive a fixed point equation to approximate the spectral distribution of certain deformed GUE matrices with a variance profile and to characterize the location of potential outliers in such models in a non-asymptotic setting. We also consider the singular values distribution of a rectangular Gaussian random matrix with a variance profile in a similar setting of additive perturbation. We discuss the application of this approach to the study of low-rank matrix denoising models in the presence of heteroscedastic noise, that is when the amount of variance in the observed data matrix may change from entry to entry. Numerical experiments are used to illustrate our results. Deformed random matrix, Variance profile, Outlier detection, Free probability, Freeness with amalgamation, Operator-valued Stieltjes transform, Gaussian spiked model, Low-rank model. 2000 Math Subject Classification: 62G05, 62H12.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"10 3","pages":"863-919"},"PeriodicalIF":1.6,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/imaiai/iaaa012","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50224073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonlinear generalization of the monotone single index model 单调单指标模型的非线性推广
IF 1.6 4区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2021-02-01 DOI: 10.1093/imaiai/iaaa013
Željko Kereta;Timo Klock;Valeriya Naumova
Single index model is a powerful yet simple model, widely used in statistics, machine learning and other scientific fields. It models the regression function as $g(left <{a},{x}right>)$, where $a$ is an unknown index vector and $x$ are the features. This paper deals with a nonlinear generalization of this framework to allow for a regressor that uses multiple index vectors, adapting to local changes in the responses. To do so, we exploit the conditional distribution over function-driven partitions and use linear regression to locally estimate index vectors. We then regress by applying a k-nearest neighbor-type estimator that uses a localized proxy of the geodesic metric. We present theoretical guarantees for estimation of local index vectors and out-of-sample prediction and demonstrate the performance of our method with experiments on synthetic and real-world data sets, comparing it with state-of-the-art methods.
单指标模型是一种强大而简单的模型,广泛应用于统计学、机器学习等科学领域。它将回归函数建模为$g(left<;{a},{x}right>;)$,其中$a$是未知的索引向量,$x$是特征。本文讨论了该框架的非线性推广,以允许回归器使用多个索引向量,适应响应的局部变化。为此,我们利用函数驱动分区上的条件分布,并使用线性回归来局部估计索引向量。然后,我们通过应用使用测地度量的局部代理的k最近邻类型估计器来回归。我们为局部索引向量的估计和样本外预测提供了理论保证,并通过在合成和真实世界数据集上的实验证明了我们的方法的性能,并将其与最先进的方法进行了比较。
{"title":"Nonlinear generalization of the monotone single index model","authors":"Željko Kereta;Timo Klock;Valeriya Naumova","doi":"10.1093/imaiai/iaaa013","DOIUrl":"https://doi.org/10.1093/imaiai/iaaa013","url":null,"abstract":"Single index model is a powerful yet simple model, widely used in statistics, machine learning and other scientific fields. It models the regression function as \u0000<tex>$g(left &lt;{a},{x}right&gt;)$</tex>\u0000, where \u0000<tex>$a$</tex>\u0000 is an unknown index vector and \u0000<tex>$x$</tex>\u0000 are the features. This paper deals with a nonlinear generalization of this framework to allow for a regressor that uses multiple index vectors, adapting to local changes in the responses. To do so, we exploit the conditional distribution over function-driven partitions and use linear regression to locally estimate index vectors. We then regress by applying a k-nearest neighbor-type estimator that uses a localized proxy of the geodesic metric. We present theoretical guarantees for estimation of local index vectors and out-of-sample prediction and demonstrate the performance of our method with experiments on synthetic and real-world data sets, comparing it with state-of-the-art methods.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"10 3","pages":"987-1029"},"PeriodicalIF":1.6,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/imaiai/iaaa013","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50347110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Approximate separability of symmetrically penalized least squares in high dimensions: characterization and consequences 高维对称惩罚最小二乘的近似可分性:特征和结果
IF 1.6 4区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2021-02-01 DOI: 10.1093/imaiai/iaaa037
Michael Celentano
We show that the high-dimensional behavior of symmetrically penalized least squares with a possibly non-separable, symmetric, convex penalty in both (i) the Gaussian sequence model and (ii) the linear model with uncorrelated Gaussian designs nearly agrees with the behavior of least squares with an appropriately chosen separable penalty in these same models. This agreement is established by finite-sample concentration inequalities which precisely characterize the behavior of symmetrically penalized least squares in both models via a comparison to a simple scalar statistical model. The concentration inequalities are novel in their precision and generality. Our results help clarify that the role non-separability can play in high-dimensional M-estimation. In particular, if the empirical distribution of the coordinates of the parameter is known—exactly or approximately—there are at most limited advantages to use non-separable, symmetric penalties over separable ones. In contrast, if the empirical distribution of the coordinates of the parameter is unknown, we argue that non-separable, symmetric penalties automatically implement an adaptive procedure, which we characterize. We also provide a partial converse which characterizes the adaptive procedures which can be implemented in this way.
我们证明,在(i)高斯序列模型和(ii)具有不相关高斯设计的线性模型中,具有可能不可分离、对称、凸惩罚的对称惩罚最小二乘的高维行为与在这些相同模型中具有适当选择的可分离惩罚的最小二乘的行为几乎一致。这种一致性是由有限样本浓度不等式建立的,该不等式通过与简单标量统计模型的比较,精确地表征了两个模型中对称惩罚最小二乘的行为。集中不等式在精度和一般性方面都是新颖的。我们的结果有助于阐明不可分性在高维M-估计中的作用。特别是,如果参数坐标的经验分布是已知的——确切地或近似地——那么使用不可分离的对称惩罚相对于可分离惩罚的优势最多是有限的。相反,如果参数坐标的经验分布是未知的,我们认为不可分离的对称惩罚会自动实现自适应过程,我们对此进行了描述。我们还提供了一个部分逆,它表征了可以以这种方式实现的自适应过程。
{"title":"Approximate separability of symmetrically penalized least squares in high dimensions: characterization and consequences","authors":"Michael Celentano","doi":"10.1093/imaiai/iaaa037","DOIUrl":"https://doi.org/10.1093/imaiai/iaaa037","url":null,"abstract":"We show that the high-dimensional behavior of symmetrically penalized least squares with a possibly non-separable, symmetric, convex penalty in both (i) the Gaussian sequence model and (ii) the linear model with uncorrelated Gaussian designs nearly agrees with the behavior of least squares with an appropriately chosen separable penalty in these same models. This agreement is established by finite-sample concentration inequalities which precisely characterize the behavior of symmetrically penalized least squares in both models via a comparison to a simple scalar statistical model. The concentration inequalities are novel in their precision and generality. Our results help clarify that the role non-separability can play in high-dimensional M-estimation. In particular, if the empirical distribution of the coordinates of the parameter is known—exactly or approximately—there are at most limited advantages to use non-separable, symmetric penalties over separable ones. In contrast, if the empirical distribution of the coordinates of the parameter is unknown, we argue that non-separable, symmetric penalties automatically implement an adaptive procedure, which we characterize. We also provide a partial converse which characterizes the adaptive procedures which can be implemented in this way.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"10 3","pages":"1105-1165"},"PeriodicalIF":1.6,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/imaiai/iaaa037","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50347113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
Information and Inference-A Journal of the Ima
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1