首页 > 最新文献

Information and Inference-A Journal of the Ima最新文献

英文 中文
OUP accepted manuscript OUP接受稿件
IF 1.6 4区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2022-01-01 DOI: 10.1093/imaiai/iaab028
{"title":"OUP accepted manuscript","authors":"","doi":"10.1093/imaiai/iaab028","DOIUrl":"https://doi.org/10.1093/imaiai/iaab028","url":null,"abstract":"","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"57 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80501605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Third-order moment varieties of linear non-Gaussian graphical models 线性非高斯图形模型的三阶矩变化
IF 1.6 4区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2021-12-20 DOI: 10.1093/imaiai/iaad007
Carlos Am'endola, M. Drton, Alexandros Grosdos, R. Homs, Elina Robeva
In this paper, we study linear non-Gaussian graphical models from the perspective of algebraic statistics. These are acyclic causal models in which each variable is a linear combination of its direct causes and independent noise. The underlying directed causal graph can be identified uniquely via the set of second and third-order moments of all random vectors that lie in the corresponding model. Our focus is on finding the algebraic relations among these moments for a given graph. We show that when the graph is a polytree, these relations form a toric ideal. We construct explicit trek-matrices associated to 2-treks and 3-treks in the graph. Their entries are covariances and third-order moments and their $2$-minors define our model set-theoretically. Furthermore, we prove that their 2-minors also generate the vanishing ideal of the model. Finally, we describe the polytopes of third-order moments and the ideals for models with hidden variables.
本文从代数统计的角度研究了线性非高斯图形模型。这些是无循环的因果模型,其中每个变量是其直接原因和独立噪声的线性组合。潜在的有向因果图可以通过位于相应模型中的所有随机向量的二阶和三阶矩集唯一地识别。我们的重点是找出给定图中这些矩之间的代数关系。我们证明当图是一个多树时,这些关系形成一个环理想。我们在图中构造了与2-treks和3-treks相关的显式徒步矩阵。它们的项是协方差和三阶矩,它们的$2$次元从理论上定义了我们的模型集。进一步证明了它们的2次元也产生了模型的消失理想。最后,我们描述了三阶矩的多面体和带隐变量模型的理想。
{"title":"Third-order moment varieties of linear non-Gaussian graphical models","authors":"Carlos Am'endola, M. Drton, Alexandros Grosdos, R. Homs, Elina Robeva","doi":"10.1093/imaiai/iaad007","DOIUrl":"https://doi.org/10.1093/imaiai/iaad007","url":null,"abstract":"\u0000 In this paper, we study linear non-Gaussian graphical models from the perspective of algebraic statistics. These are acyclic causal models in which each variable is a linear combination of its direct causes and independent noise. The underlying directed causal graph can be identified uniquely via the set of second and third-order moments of all random vectors that lie in the corresponding model. Our focus is on finding the algebraic relations among these moments for a given graph. We show that when the graph is a polytree, these relations form a toric ideal. We construct explicit trek-matrices associated to 2-treks and 3-treks in the graph. Their entries are covariances and third-order moments and their $2$-minors define our model set-theoretically. Furthermore, we prove that their 2-minors also generate the vanishing ideal of the model. Finally, we describe the polytopes of third-order moments and the ideals for models with hidden variables.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"47 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2021-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90964388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
From the simplex to the sphere: faster constrained optimization using the Hadamard parametrization 从单纯形到球面:使用Hadamard参数化的更快约束优化
IF 1.6 4区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2021-12-10 DOI: 10.1093/imaiai/iaad017
Qiuwei Li, Daniel Mckenzie, W. Yin
The standard simplex in $mathbb{R}^{n}$, also known as the probability simplex, is the set of nonnegative vectors whose entries sum up to 1. It frequently appears as a constraint in optimization problems that arise in machine learning, statistics, data science, operations research and beyond. We convert the standard simplex to the unit sphere and thus transform the corresponding constrained optimization problem into an optimization problem on a simple, smooth manifold. We show that Karush-Kuhn-Tucker points and strict-saddle points of the minimization problem on the standard simplex all correspond to those of the transformed problem, and vice versa. So, solving one problem is equivalent to solving the other problem. Then, we propose several simple, efficient and projection-free algorithms using the manifold structure. The equivalence and the proposed algorithm can be extended to optimization problems with unit simplex, weighted probability simplex or $ell _{1}$-norm sphere constraints. Numerical experiments between the new algorithms and existing ones show the advantages of the new approach. Open source code is available at https://github.com/DanielMckenzie/HadRGD.
$mathbb{R}^{n}$中的标准单纯形,也称为概率单纯形,是其项之和为1的非负向量的集合。它经常作为约束出现在机器学习、统计学、数据科学、运筹学等领域的优化问题中。我们将标准单纯形转化为单位球,从而将相应的约束优化问题转化为简单光滑流形上的优化问题。证明了标准单纯形上最小化问题的Karush-Kuhn-Tucker点和严格鞍点都对应于变换问题的Karush-Kuhn-Tucker点和严格鞍点,反之亦然。所以,解决一个问题等于解决另一个问题。然后,我们利用流形结构提出了几种简单、高效、无投影的算法。该等价性和所提出的算法可以推广到具有单位单纯形、加权概率单纯形或$ well _{1}$-范数球面约束的优化问题。通过与现有算法的对比实验,证明了新算法的优越性。开源代码可从https://github.com/DanielMckenzie/HadRGD获得。
{"title":"From the simplex to the sphere: faster constrained optimization using the Hadamard parametrization","authors":"Qiuwei Li, Daniel Mckenzie, W. Yin","doi":"10.1093/imaiai/iaad017","DOIUrl":"https://doi.org/10.1093/imaiai/iaad017","url":null,"abstract":"\u0000 The standard simplex in $mathbb{R}^{n}$, also known as the probability simplex, is the set of nonnegative vectors whose entries sum up to 1. It frequently appears as a constraint in optimization problems that arise in machine learning, statistics, data science, operations research and beyond. We convert the standard simplex to the unit sphere and thus transform the corresponding constrained optimization problem into an optimization problem on a simple, smooth manifold. We show that Karush-Kuhn-Tucker points and strict-saddle points of the minimization problem on the standard simplex all correspond to those of the transformed problem, and vice versa. So, solving one problem is equivalent to solving the other problem. Then, we propose several simple, efficient and projection-free algorithms using the manifold structure. The equivalence and the proposed algorithm can be extended to optimization problems with unit simplex, weighted probability simplex or $ell _{1}$-norm sphere constraints. Numerical experiments between the new algorithms and existing ones show the advantages of the new approach. Open source code is available at https://github.com/DanielMckenzie/HadRGD.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"66 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2021-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78101118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Wavelet invariants for statistically robust multi-reference alignment. 用于统计稳健多参考对齐的小波不变式。
IF 1.6 4区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2021-12-01 Epub Date: 2020-08-13 DOI: 10.1093/imaiai/iaaa016
Matthew Hirn, Anna Little

We propose a nonlinear, wavelet-based signal representation that is translation invariant and robust to both additive noise and random dilations. Motivated by the multi-reference alignment problem and generalizations thereof, we analyze the statistical properties of this representation given a large number of independent corruptions of a target signal. We prove the nonlinear wavelet-based representation uniquely defines the power spectrum but allows for an unbiasing procedure that cannot be directly applied to the power spectrum. After unbiasing the representation to remove the effects of the additive noise and random dilations, we recover an approximation of the power spectrum by solving a convex optimization problem, and thus reduce to a phase retrieval problem. Extensive numerical experiments demonstrate the statistical robustness of this approximation procedure.

我们提出了一种基于小波的非线性信号表示法,这种表示法具有平移不变性,并对加性噪声和随机扩张具有鲁棒性。受多参考对齐问题及其一般化的启发,我们分析了这种表示法在目标信号受到大量独立破坏时的统计特性。我们证明了基于小波的非线性表示法唯一定义了功率谱,但允许采用无法直接应用于功率谱的无偏程序。在对表示进行无偏化以消除加性噪声和随机扩张的影响后,我们通过解决一个凸优化问题来恢复功率谱的近似值,从而简化为一个相位检索问题。大量的数值实验证明了这一近似程序的统计稳健性。
{"title":"Wavelet invariants for statistically robust multi-reference alignment.","authors":"Matthew Hirn, Anna Little","doi":"10.1093/imaiai/iaaa016","DOIUrl":"10.1093/imaiai/iaaa016","url":null,"abstract":"<p><p>We propose a nonlinear, wavelet-based signal representation that is translation invariant and robust to both additive noise and random dilations. Motivated by the multi-reference alignment problem and generalizations thereof, we analyze the statistical properties of this representation given a large number of independent corruptions of a target signal. We prove the nonlinear wavelet-based representation uniquely defines the power spectrum but allows for an unbiasing procedure that cannot be directly applied to the power spectrum. After unbiasing the representation to remove the effects of the additive noise and random dilations, we recover an approximation of the power spectrum by solving a convex optimization problem, and thus reduce to a phase retrieval problem. Extensive numerical experiments demonstrate the statistical robustness of this approximation procedure.</p>","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"10 4","pages":"1287-1351"},"PeriodicalIF":1.6,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8782248/pdf/nihms-1726636.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39962758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Erratum to: Subspace clustering using ensembles of K>-subspaces 对使用K>-子空间集合的子空间聚类的勘误
IF 1.6 4区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2021-10-15 DOI: 10.1093/imaiai/iaab026
J. Lipor, D. Hong, Yan Shuo Tan, L. Balzano
{"title":"Erratum to: Subspace clustering using ensembles of K>-subspaces","authors":"J. Lipor, D. Hong, Yan Shuo Tan, L. Balzano","doi":"10.1093/imaiai/iaab026","DOIUrl":"https://doi.org/10.1093/imaiai/iaab026","url":null,"abstract":"","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"23 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2021-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89241466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Estimating location parameters in sample-heterogeneous distributions 估计样本异质分布中的位置参数
IF 1.6 4区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2021-06-03 DOI: 10.1093/IMAIAI/IAAB013
Ankit Pensia, Varun Jog, Po-Ling Loh
Estimating the mean of a probability distribution using i.i.d. samples is a classical problem in statistics, wherein finite-sample optimal estimators are sought under various distributional assumptions. In this paper, we consider the problem of mean estimation when independent samples are drawn from ddimensional non-identical distributions possessing a common mean. When the distributions are radially symmetric and unimodal, we propose a novel estimator, which is a hybrid of the modal interval, shorth, and median estimators, and whose performance adapts to the level of heterogeneity in the data. We show that our estimator is near-optimal when data are i.i.d. and when the fraction of “low-noise” distributions is as small as Ω ( d logn n ) , where n is the number of samples. We also derive minimax lower bounds on the expected error of any estimator that is agnostic to the scales of individual data points. Finally, we extend our theory to linear regression. In both the mean estimation and regression settings, we present computationally feasible versions of our estimators that run in time polynomial in the number of data points.
利用i.i.d样本估计概率分布的均值是统计学中的一个经典问题,其中在各种分布假设下寻求有限样本最优估计。本文研究了从具有共同均值的非同维分布中抽取独立样本的均值估计问题。当分布是径向对称和单峰分布时,我们提出了一种新的估计器,它是模态区间估计器、短估计器和中值估计器的混合,其性能适应数据的异质性水平。我们表明,当数据是i.i.d并且“低噪声”分布的比例小到Ω (d logn)时,我们的估计器是接近最优的,其中n是样本数。我们还推导出与单个数据点的尺度无关的任何估计器的期望误差的极小极大下界。最后,我们将我们的理论扩展到线性回归。在均值估计和回归设置中,我们提出了计算上可行的估计器版本,这些估计器以数据点数量的时间多项式运行。
{"title":"Estimating location parameters in sample-heterogeneous distributions","authors":"Ankit Pensia, Varun Jog, Po-Ling Loh","doi":"10.1093/IMAIAI/IAAB013","DOIUrl":"https://doi.org/10.1093/IMAIAI/IAAB013","url":null,"abstract":"Estimating the mean of a probability distribution using i.i.d. samples is a classical problem in statistics, wherein finite-sample optimal estimators are sought under various distributional assumptions. In this paper, we consider the problem of mean estimation when independent samples are drawn from ddimensional non-identical distributions possessing a common mean. When the distributions are radially symmetric and unimodal, we propose a novel estimator, which is a hybrid of the modal interval, shorth, and median estimators, and whose performance adapts to the level of heterogeneity in the data. We show that our estimator is near-optimal when data are i.i.d. and when the fraction of “low-noise” distributions is as small as Ω ( d logn n ) , where n is the number of samples. We also derive minimax lower bounds on the expected error of any estimator that is agnostic to the scales of individual data points. Finally, we extend our theory to linear regression. In both the mean estimation and regression settings, we present computationally feasible versions of our estimators that run in time polynomial in the number of data points.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"73 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2021-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86158303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Compressive learning with privacy guarantees 具有隐私保证的压缩学习
IF 1.6 4区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2021-05-15 DOI: 10.1093/IMAIAI/IAAB005
Antoine Chatalic, V. Schellekens, F. Houssiau, Y. de Montjoye, L. Jacques, R. Gribonval
This work addresses the problem of learning from large collections of data with privacy guarantees. The compressive learning framework proposes to deal with the large scale of datasets by compressing them into a single vector of generalized random moments, called a sketch vector, from which the learning task is then performed. We provide sharp bounds on the so-called sensitivity of this sketching mechanism. This allows us to leverage standard techniques to ensure differential privacy—a well-established formalism for defining and quantifying the privacy of a random mechanism—by adding Laplace of Gaussian noise to the sketch. We combine these standard mechanisms with a new feature subsampling mechanism, which reduces the computational cost without damaging privacy. The overall framework is applied to the tasks of Gaussian modeling, k-means clustering and principal component analysis, for which sharp privacy bounds are derived. Empirically, the quality (for subsequent learning) of the compressed representation produced by our mechanism is strongly related with the induced noise level, for which we give analytical expressions.
这项工作解决了从具有隐私保证的大型数据集合中学习的问题。压缩学习框架提出通过将数据集压缩成单个广义随机矩向量(称为草图向量)来处理大规模数据集,然后从中执行学习任务。我们对这种绘图机制的所谓灵敏度提供了明确的界限。这允许我们利用标准技术来确保差分隐私——一种完善的定义和量化随机机制隐私的形式——通过向草图中添加拉普拉斯高斯噪声。我们将这些标准机制与一种新的特征子采样机制结合起来,在不损害隐私的情况下降低了计算成本。将整体框架应用于高斯建模、k-均值聚类和主成分分析任务,并推导出明确的隐私边界。根据经验,我们的机制产生的压缩表示的质量(用于后续学习)与诱导噪声水平密切相关,我们给出了解析表达式。
{"title":"Compressive learning with privacy guarantees","authors":"Antoine Chatalic, V. Schellekens, F. Houssiau, Y. de Montjoye, L. Jacques, R. Gribonval","doi":"10.1093/IMAIAI/IAAB005","DOIUrl":"https://doi.org/10.1093/IMAIAI/IAAB005","url":null,"abstract":"\u0000 This work addresses the problem of learning from large collections of data with privacy guarantees. The compressive learning framework proposes to deal with the large scale of datasets by compressing them into a single vector of generalized random moments, called a sketch vector, from which the learning task is then performed. We provide sharp bounds on the so-called sensitivity of this sketching mechanism. This allows us to leverage standard techniques to ensure differential privacy—a well-established formalism for defining and quantifying the privacy of a random mechanism—by adding Laplace of Gaussian noise to the sketch. We combine these standard mechanisms with a new feature subsampling mechanism, which reduces the computational cost without damaging privacy. The overall framework is applied to the tasks of Gaussian modeling, k-means clustering and principal component analysis, for which sharp privacy bounds are derived. Empirically, the quality (for subsequent learning) of the compressed representation produced by our mechanism is strongly related with the induced noise level, for which we give analytical expressions.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"51 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2021-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90454586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Double robust semi-supervised inference for the mean: selection bias under MAR labeling with decaying overlap 均值的双鲁棒半监督推理:重叠衰减的MAR标记下的选择偏差
IF 1.6 4区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2021-04-14 DOI: 10.1093/imaiai/iaad021
Yuqian Zhang, Abhishek Chakrabortty, Jelena Bradic
Semi-supervised (SS) inference has received much attention in recent years. Apart from a moderate-sized labeled data, $mathcal L$, the SS setting is characterized by an additional, much larger sized, unlabeled data, $mathcal U$. The setting of $|mathcal U |gg |mathcal L |$, makes SS inference unique and different from the standard missing data problems, owing to natural violation of the so-called ‘positivity’ or ‘overlap’ assumption. However, most of the SS literature implicitly assumes $mathcal L$ and $mathcal U$ to be equally distributed, i.e., no selection bias in the labeling. Inferential challenges in missing at random type labeling allowing for selection bias, are inevitably exacerbated by the decaying nature of the propensity score (PS). We address this gap for a prototype problem, the estimation of the response’s mean. We propose a double robust SS mean estimator and give a complete characterization of its asymptotic properties. The proposed estimator is consistent as long as either the outcome or the PS model is correctly specified. When both models are correctly specified, we provide inference results with a non-standard consistency rate that depends on the smaller size $|mathcal L |$. The results are also extended to causal inference with imbalanced treatment groups. Further, we provide several novel choices of models and estimators of the decaying PS, including a novel offset logistic model and a stratified labeling model. We present their properties under both high- and low-dimensional settings. These may be of independent interest. Lastly, we present extensive simulations and also a real data application.
半监督推理近年来受到广泛关注。除了中等大小的标记数据$mathcal L$之外,SS设置的特点是另外一个大得多的未标记数据$mathcal U$。$|mathcal U |gg |mathcal L |$的设置,使得SS推理是唯一的,不同于标准的缺失数据问题,因为它自然违反了所谓的“正性”或“重叠”假设。然而,大多数SS文献隐含地假设$mathcal L$和$mathcal U$是均匀分布的,即在标记中没有选择偏差。在允许选择偏差的随机类型标签缺失的推理挑战,不可避免地加剧了倾向得分(PS)的衰减性质。我们通过一个原型问题来解决这个差距,即响应均值的估计。我们提出了一个双鲁棒SS均值估计量,并给出了它的渐近性质的完整刻画。只要正确指定了结果或PS模型,所建议的估计量就是一致的。当两个模型都被正确指定时,我们提供的推理结果具有非标准的一致性率,该一致性率取决于较小的大小$|mathcal L |$。结果也扩展到不平衡处理组的因果推理。此外,我们提供了几种新的模型和衰减PS的估计器,包括一个新的偏移逻辑模型和一个分层标记模型。我们给出了它们在高维和低维设置下的性质。这些可能是独立的利益。最后,我们给出了大量的仿真和一个实际的数据应用。
{"title":"Double robust semi-supervised inference for the mean: selection bias under MAR labeling with decaying overlap","authors":"Yuqian Zhang, Abhishek Chakrabortty, Jelena Bradic","doi":"10.1093/imaiai/iaad021","DOIUrl":"https://doi.org/10.1093/imaiai/iaad021","url":null,"abstract":"\u0000 Semi-supervised (SS) inference has received much attention in recent years. Apart from a moderate-sized labeled data, $mathcal L$, the SS setting is characterized by an additional, much larger sized, unlabeled data, $mathcal U$. The setting of $|mathcal U |gg |mathcal L |$, makes SS inference unique and different from the standard missing data problems, owing to natural violation of the so-called ‘positivity’ or ‘overlap’ assumption. However, most of the SS literature implicitly assumes $mathcal L$ and $mathcal U$ to be equally distributed, i.e., no selection bias in the labeling. Inferential challenges in missing at random type labeling allowing for selection bias, are inevitably exacerbated by the decaying nature of the propensity score (PS). We address this gap for a prototype problem, the estimation of the response’s mean. We propose a double robust SS mean estimator and give a complete characterization of its asymptotic properties. The proposed estimator is consistent as long as either the outcome or the PS model is correctly specified. When both models are correctly specified, we provide inference results with a non-standard consistency rate that depends on the smaller size $|mathcal L |$. The results are also extended to causal inference with imbalanced treatment groups. Further, we provide several novel choices of models and estimators of the decaying PS, including a novel offset logistic model and a stratified labeling model. We present their properties under both high- and low-dimensional settings. These may be of independent interest. Lastly, we present extensive simulations and also a real data application.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"24 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2021-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78754638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Topological information retrieval with dilation-invariant bottleneck comparative measures 基于扩展不变瓶颈比较测度的拓扑信息检索
IF 1.6 4区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2021-04-04 DOI: 10.1093/imaiai/iaad022
Athanasios Vlontzos, Yueqi Cao, Luca Schmidtke, Bernhard Kainz, Anthea Monod
Appropriately representing elements in a database so that queries may be accurately matched is a central task in information retrieval; recently, this has been achieved by embedding the graphical structure of the database into a manifold in a hierarchy-preserving manner using a variety of metrics. Persistent homology is a tool commonly used in topological data analysis that is able to rigorously characterize a database in terms of both its hierarchy and connectivity structure. Computing persistent homology on a variety of embedded datasets reveals that some commonly used embeddings fail to preserve the connectivity. We show that those embeddings which successfully retain the database topology coincide in persistent homology by introducing two dilation-invariant comparative measures to capture this effect: in particular, they address the issue of metric distortion on manifolds. We provide an algorithm for their computation that exhibits greatly reduced time complexity over existing methods. We use these measures to perform the first instance of topology-based information retrieval and demonstrate its increased performance over the standard bottleneck distance for persistent homology. We showcase our approach on databases of different data varieties including text, videos and medical images.
适当地表示数据库中的元素,使查询能够准确匹配是信息检索的中心任务;最近,这已经通过使用各种度量以保持层次结构的方式将数据库的图形结构嵌入到流形中来实现。持久同源性是拓扑数据分析中常用的一种工具,它能够严格地描述数据库的层次结构和连接结构。对多种嵌入数据集的持久同源性计算表明,一些常用的嵌入不能保持数据集的连通性。我们通过引入两个膨胀不变比较度量来捕捉这种效应,证明那些成功保留数据库拓扑的嵌入在持久同调中是一致的:特别是,它们解决了流形上的度量失真问题。我们提供了一种计算它们的算法,该算法比现有方法大大降低了时间复杂度。我们使用这些度量来执行基于拓扑的信息检索的第一个实例,并演示了它在持久同构的标准瓶颈距离上的性能提高。我们在不同数据类型的数据库上展示了我们的方法,包括文本、视频和医学图像。
{"title":"Topological information retrieval with dilation-invariant bottleneck comparative measures","authors":"Athanasios Vlontzos, Yueqi Cao, Luca Schmidtke, Bernhard Kainz, Anthea Monod","doi":"10.1093/imaiai/iaad022","DOIUrl":"https://doi.org/10.1093/imaiai/iaad022","url":null,"abstract":"\u0000 Appropriately representing elements in a database so that queries may be accurately matched is a central task in information retrieval; recently, this has been achieved by embedding the graphical structure of the database into a manifold in a hierarchy-preserving manner using a variety of metrics. Persistent homology is a tool commonly used in topological data analysis that is able to rigorously characterize a database in terms of both its hierarchy and connectivity structure. Computing persistent homology on a variety of embedded datasets reveals that some commonly used embeddings fail to preserve the connectivity. We show that those embeddings which successfully retain the database topology coincide in persistent homology by introducing two dilation-invariant comparative measures to capture this effect: in particular, they address the issue of metric distortion on manifolds. We provide an algorithm for their computation that exhibits greatly reduced time complexity over existing methods. We use these measures to perform the first instance of topology-based information retrieval and demonstrate its increased performance over the standard bottleneck distance for persistent homology. We showcase our approach on databases of different data varieties including text, videos and medical images.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"58 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2021-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77142316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multi-scale vector quantization with reconstruction trees 基于重构树的多尺度矢量量化
IF 1.6 4区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2021-02-01 DOI: 10.1093/imaiai/iaaa004
Enrico Cecini;Ernesto De Vito;Lorenzo Rosasco
We propose and study a multi-scale approach to vector quantization (VQ). We develop an algorithm, dubbed reconstruction trees, inspired by decision trees. Here the objective is parsimonious reconstruction of unsupervised data, rather than classification. Contrasted to more standard VQ methods, such as $k$-means, the proposed approach leverages a family of given partitions, to quickly explore the data in a coarse-to-fine multi-scale fashion. Our main technical contribution is an analysis of the expected distortion achieved by the proposed algorithm, when the data are assumed to be sampled from a fixed unknown distribution. In this context, we derive both asymptotic and finite sample results under suitable regularity assumptions on the distribution. As a special case, we consider the setting where the data generating distribution is supported on a compact Riemannian submanifold. Tools from differential geometry and concentration of measure are useful in our analysis.
我们提出并研究了一种矢量量化(VQ)的多尺度方法。受决策树的启发,我们开发了一种算法,称为重建树。这里的目标是无监督数据的简约重建,而不是分类。与更标准的VQ方法(如$k$-means)相比,所提出的方法利用一系列给定的分区,以从粗到细的多尺度方式快速探索数据。我们的主要技术贡献是分析所提出的算法所实现的预期失真,当假设数据是从固定的未知分布中采样时。在这种情况下,我们在分布的适当正则性假设下导出了渐近和有限样本结果。作为一种特殊情况,我们考虑数据生成分布在紧致黎曼子流形上得到支持的设置。微分几何和度量集中的工具在我们的分析中很有用。
{"title":"Multi-scale vector quantization with reconstruction trees","authors":"Enrico Cecini;Ernesto De Vito;Lorenzo Rosasco","doi":"10.1093/imaiai/iaaa004","DOIUrl":"https://doi.org/10.1093/imaiai/iaaa004","url":null,"abstract":"We propose and study a multi-scale approach to vector quantization (VQ). We develop an algorithm, dubbed reconstruction trees, inspired by decision trees. Here the objective is parsimonious reconstruction of unsupervised data, rather than classification. Contrasted to more standard VQ methods, such as \u0000<tex>$k$</tex>\u0000-means, the proposed approach leverages a family of given partitions, to quickly explore the data in a coarse-to-fine multi-scale fashion. Our main technical contribution is an analysis of the expected distortion achieved by the proposed algorithm, when the data are assumed to be sampled from a fixed unknown distribution. In this context, we derive both asymptotic and finite sample results under suitable regularity assumptions on the distribution. As a special case, we consider the setting where the data generating distribution is supported on a compact Riemannian submanifold. Tools from differential geometry and concentration of measure are useful in our analysis.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"10 3","pages":"955-986"},"PeriodicalIF":1.6,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50347109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information and Inference-A Journal of the Ima
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1