首页 > 最新文献

2020 Information Theory and Applications Workshop (ITA)最新文献

英文 中文
Gaussian Multiple and Random Access in the Finite Blocklength Regime 有限块长度域的高斯倍数和随机访问
Pub Date : 2020-01-12 DOI: 10.1109/ITA50056.2020.9244955
Recep Can Yavas, V. Kostina, M. Effros
This paper presents finite-blocklength achievability bounds for the Gaussian multiple access channel (MAC) and random access channel (RAC) under average-error and maximal-power constraints. Using random codewords uniformly distributed on a sphere and a maximum likelihood decoder, the derived MAC bound on each transmitter's rate matches the MolavianJazi-Laneman bound (2015) in its first- and second- order terms, improving the remaining terms to $frac{1}{2}frac{{log n}}{n} + Oleft( {frac{1}{n}} right)$ bits per channel use. The result then extends to a RAC model in which neither the encoders nor the decoder knows which of K possible transmitters are active. In the proposed rateless coding strategy, decoding occurs at a time nt that depends on the decoder's estimate t of the number of active transmitters k. Single-bit feedback from the decoder to all encoders at each potential decoding time ni, i ≤ t, informs the encoders when to stop transmitting. For this RAC model, the proposed code achieves the same first-, second-, and third-order performance as the best known result for the Gaussian MAC in operation.
本文给出了平均误差和最大功率约束下高斯多址信道(MAC)和随机接入信道(RAC)的有限块长可达性边界。使用均匀分布在球体上的随机码字和最大似然解码器,每个发射机速率的派生MAC界在其一阶和二阶项中与MolavianJazi-Laneman界(2015)相匹配,将其余项提高到每个信道使用$frac{1}{2}frac{{log n}}{n} + Oleft( {frac{1}{n}} right)$位。然后将结果扩展到RAC模型,其中编码器和解码器都不知道K个可能的发送器中哪个是活动的。在所提出的无速率编码策略中,解码发生的时间nt取决于解码器对活动发送器数量k的估计t。在每个可能的解码时间ni i≤t时,解码器向所有编码器提供的单比特反馈通知编码器何时停止传输。对于这个RAC模型,所建议的代码实现了与高斯MAC运行中最著名的结果相同的一阶、二阶和三阶性能。
{"title":"Gaussian Multiple and Random Access in the Finite Blocklength Regime","authors":"Recep Can Yavas, V. Kostina, M. Effros","doi":"10.1109/ITA50056.2020.9244955","DOIUrl":"https://doi.org/10.1109/ITA50056.2020.9244955","url":null,"abstract":"This paper presents finite-blocklength achievability bounds for the Gaussian multiple access channel (MAC) and random access channel (RAC) under average-error and maximal-power constraints. Using random codewords uniformly distributed on a sphere and a maximum likelihood decoder, the derived MAC bound on each transmitter's rate matches the MolavianJazi-Laneman bound (2015) in its first- and second- order terms, improving the remaining terms to $frac{1}{2}frac{{log n}}{n} + Oleft( {frac{1}{n}} right)$ bits per channel use. The result then extends to a RAC model in which neither the encoders nor the decoder knows which of K possible transmitters are active. In the proposed rateless coding strategy, decoding occurs at a time nt that depends on the decoder's estimate t of the number of active transmitters k. Single-bit feedback from the decoder to all encoders at each potential decoding time ni, i ≤ t, informs the encoders when to stop transmitting. For this RAC model, the proposed code achieves the same first-, second-, and third-order performance as the best known result for the Gaussian MAC in operation.","PeriodicalId":137257,"journal":{"name":"2020 Information Theory and Applications Workshop (ITA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115318531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Communication-Efficient and Byzantine-Robust Distributed Learning 高效沟通和拜占庭鲁棒分布式学习
Pub Date : 2019-11-21 DOI: 10.1109/ITA50056.2020.9245017
Avishek Ghosh, R. Maity, S. Kadhe, A. Mazumdar, K. Ramchandran
We develop a communication-efficient distributed learning algorithm that is robust against Byzantine worker machines. We propose and analyze a distributed gradient-descent algorithm that performs a simple thresholding based on gradient norms to mitigate Byzantine failures. We show the (statistical) error-rate of our algorithm matches that of [YCKB18], which uses more complicated schemes (like coordinate-wise median or trimmed mean) and thus optimal. Furthermore, for communication efficiency, we consider a generic class of δ-approximate compressors from [KRSJ19] that encompasses sign-based compressors and top-k sparsification. Our algorithm uses compressed gradients and gradient norms for aggregation and Byzantine removal respectively. We establish the statistical error rate of the algorithm for arbitrary (convex or non-convex) smooth loss function. We show that, in the regime when the compression factor δ is constant and the dimension of the parameter space is fixed, the rate of convergence is not affected by the compression operation, and hence we effectively get the compression for free. Moreover, we extend the compressed gradient descent algorithm with error feedback proposed in [KRSJ19] for the distributed setting. We have experimentally validated our results and shown good performance in convergence for convex (least-square regression) and non-convex (neural network training) problems.
我们开发了一种通信高效的分布式学习算法,该算法对拜占庭工作机器具有鲁棒性。我们提出并分析了一种分布式梯度下降算法,该算法基于梯度规范执行简单的阈值,以减轻拜占庭故障。我们表明,我们的算法的(统计)错误率与[YCKB18]相匹配,后者使用更复杂的方案(如坐标明智的中位数或修剪的平均值),因此是最优的。此外,为了提高通信效率,我们考虑了来自[KRSJ19]的一类通用δ-近似压缩器,它包含基于符号的压缩器和top-k稀疏化。我们的算法分别使用压缩梯度和梯度规范进行聚合和拜占庭去除。我们建立了任意(凸或非凸)平滑损失函数的统计错误率。研究表明,在压缩因子δ一定且参数空间维数一定的情况下,压缩操作不影响收敛速度,从而有效地实现了免费压缩。此外,我们扩展了[KRSJ19]中提出的带有误差反馈的压缩梯度下降算法,用于分布式设置。我们已经通过实验验证了我们的结果,并在凸(最小二乘回归)和非凸(神经网络训练)问题的收敛方面显示出良好的性能。
{"title":"Communication-Efficient and Byzantine-Robust Distributed Learning","authors":"Avishek Ghosh, R. Maity, S. Kadhe, A. Mazumdar, K. Ramchandran","doi":"10.1109/ITA50056.2020.9245017","DOIUrl":"https://doi.org/10.1109/ITA50056.2020.9245017","url":null,"abstract":"We develop a communication-efficient distributed learning algorithm that is robust against Byzantine worker machines. We propose and analyze a distributed gradient-descent algorithm that performs a simple thresholding based on gradient norms to mitigate Byzantine failures. We show the (statistical) error-rate of our algorithm matches that of [YCKB18], which uses more complicated schemes (like coordinate-wise median or trimmed mean) and thus optimal. Furthermore, for communication efficiency, we consider a generic class of δ-approximate compressors from [KRSJ19] that encompasses sign-based compressors and top-k sparsification. Our algorithm uses compressed gradients and gradient norms for aggregation and Byzantine removal respectively. We establish the statistical error rate of the algorithm for arbitrary (convex or non-convex) smooth loss function. We show that, in the regime when the compression factor δ is constant and the dimension of the parameter space is fixed, the rate of convergence is not affected by the compression operation, and hence we effectively get the compression for free. Moreover, we extend the compressed gradient descent algorithm with error feedback proposed in [KRSJ19] for the distributed setting. We have experimentally validated our results and shown good performance in convergence for convex (least-square regression) and non-convex (neural network training) problems.","PeriodicalId":137257,"journal":{"name":"2020 Information Theory and Applications Workshop (ITA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116313228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Stochastic Iterative Hard Thresholding for Low-Tucker-Rank Tensor Recovery 低阶张量恢复的随机迭代硬阈值
Pub Date : 2019-09-23 DOI: 10.1109/ITA50056.2020.9244965
Rachel Grotheer, S. Li, A. Ma, D. Needell, Jing Qin
Low-rank tensor recovery problems have been widely studied in many signal processing and machine learning applications. Tensor rank is typically defined under certain tensor decomposition. In particular, Tucker decomposition is known as one of the most popular tensor decompositions. In recent years, researchers have developed many state-of-the-art algorithms to address the problem of low-Tucker-rank tensor recovery. Motivated by the favorable properties of the stochastic algorithms, such as stochastic gradient descent and stochastic iterative hard thresholding, we aim to extend the stochastic iterative hard thresholding algorithm from vectors to tensors in order to address the problem of recovering a low-Tucker-rank tensor from its linear measurements. We have also developed linear convergence analysis for the proposed method and conducted a series of experiments with both synthetic and real data to illustrate the performance of the proposed method.
低秩张量恢复问题在许多信号处理和机器学习应用中得到了广泛的研究。张量秩通常是在一定的张量分解下定义的。特别地,Tucker分解被认为是最流行的张量分解之一。近年来,研究人员开发了许多最先进的算法来解决低塔克秩张量恢复问题。基于随机梯度下降和随机迭代硬阈值等随机算法的优点,我们的目标是将随机迭代硬阈值算法从向量扩展到张量,以解决从线性测量中恢复低塔克秩张量的问题。我们还对所提出的方法进行了线性收敛分析,并对合成数据和实际数据进行了一系列实验,以说明所提出方法的性能。
{"title":"Stochastic Iterative Hard Thresholding for Low-Tucker-Rank Tensor Recovery","authors":"Rachel Grotheer, S. Li, A. Ma, D. Needell, Jing Qin","doi":"10.1109/ITA50056.2020.9244965","DOIUrl":"https://doi.org/10.1109/ITA50056.2020.9244965","url":null,"abstract":"Low-rank tensor recovery problems have been widely studied in many signal processing and machine learning applications. Tensor rank is typically defined under certain tensor decomposition. In particular, Tucker decomposition is known as one of the most popular tensor decompositions. In recent years, researchers have developed many state-of-the-art algorithms to address the problem of low-Tucker-rank tensor recovery. Motivated by the favorable properties of the stochastic algorithms, such as stochastic gradient descent and stochastic iterative hard thresholding, we aim to extend the stochastic iterative hard thresholding algorithm from vectors to tensors in order to address the problem of recovering a low-Tucker-rank tensor from its linear measurements. We have also developed linear convergence analysis for the proposed method and conducted a series of experiments with both synthetic and real data to illustrate the performance of the proposed method.","PeriodicalId":137257,"journal":{"name":"2020 Information Theory and Applications Workshop (ITA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130380724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Optimal Learning of Joint Alignments with a Faulty Oracle 错误Oracle下关节对齐的最优学习
Pub Date : 2019-09-21 DOI: 10.1109/ITA50056.2020.9244966
Kasper Green Larsen, M. Mitzenmacher, Charalampos E. Tsourakakis
We consider the following problem, which is useful in applications such as joint image and shape alignment. The goal is to recover n discrete variables gi ∈ {0,...,k − 1} (up to some global offset) given noisy observations of a set of their pairwise differences {(gi − gj) mod k}; specifically, with probability $frac{1}{k} + delta $ for some δ > 0 one obtains the correct answer, and with the remaining probability one obtains a uniformly random incorrect answer. We consider a learning-based formulation where one can perform a query to observe a pairwise difference, and the goal is to perform as few queries as possible while obtaining the exact joint alignment. We provide an easy-to-implement, time efficient algorithm that performs $Oleft( {frac{{nlg n}}{{k{delta ^2}}}} right)$ queries, and recovers the joint alignment with high probability. We also show that our algorithm is optimal by proving a general lower bound that holds for all non-adaptive algorithms. Our work improves significantly recent work by Chen and Candés [CC16], who view the problem as a constrained principal components analysis problem that can be solved using the power method. Specifically, our approach is simpler both in the algorithm and the analysis, and provides additional insights into the problem structure.
我们考虑以下问题,这是有用的应用,如关节图像和形状对齐。目标是恢复n个离散变量gi∈{0,…,k−1}(直到某个全局偏移量)给定它们成对差异的一组噪声观测{(gi−gj) mod k};具体来说,对于某些δ > 0的情况,得到正确答案的概率为$frac{1}{k} + delta $,剩下的概率得到一致随机的错误答案。我们考虑一个基于学习的公式,其中可以执行查询来观察两两差异,目标是在获得确切的联合对齐时执行尽可能少的查询。我们提供了一种易于实现,时间效率高的算法,该算法执行$Oleft( {frac{{nlg n}}{{k{delta ^2}}}} right)$查询,并以高概率恢复联合对齐。我们还通过证明一个适用于所有非自适应算法的一般下界来证明我们的算法是最优的。我们的工作大大改进了Chen和candimacs [CC16]最近的工作,他们将问题视为可以使用幂方法解决的约束主成分分析问题。具体来说,我们的方法在算法和分析方面都更简单,并提供了对问题结构的额外见解。
{"title":"Optimal Learning of Joint Alignments with a Faulty Oracle","authors":"Kasper Green Larsen, M. Mitzenmacher, Charalampos E. Tsourakakis","doi":"10.1109/ITA50056.2020.9244966","DOIUrl":"https://doi.org/10.1109/ITA50056.2020.9244966","url":null,"abstract":"We consider the following problem, which is useful in applications such as joint image and shape alignment. The goal is to recover n discrete variables gi ∈ {0,...,k − 1} (up to some global offset) given noisy observations of a set of their pairwise differences {(gi − gj) mod k}; specifically, with probability $frac{1}{k} + delta $ for some δ > 0 one obtains the correct answer, and with the remaining probability one obtains a uniformly random incorrect answer. We consider a learning-based formulation where one can perform a query to observe a pairwise difference, and the goal is to perform as few queries as possible while obtaining the exact joint alignment. We provide an easy-to-implement, time efficient algorithm that performs $Oleft( {frac{{nlg n}}{{k{delta ^2}}}} right)$ queries, and recovers the joint alignment with high probability. We also show that our algorithm is optimal by proving a general lower bound that holds for all non-adaptive algorithms. Our work improves significantly recent work by Chen and Candés [CC16], who view the problem as a constrained principal components analysis problem that can be solved using the power method. Specifically, our approach is simpler both in the algorithm and the analysis, and provides additional insights into the problem structure.","PeriodicalId":137257,"journal":{"name":"2020 Information Theory and Applications Workshop (ITA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113982627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Generalized List Decoding 广义链表解码
Pub Date : 2019-09-10 DOI: 10.4230/LIPIcs.ITCS.2020.51
Yihan Zhang, Amitalok J. Budkuley, S. Jaggi
This paper concerns itself with the question of list decoding for general adversarial channels, e.g., bit-flip (XOR) channels, erasure channels, AND (Z-) channels, OR (ℤ-) channels, real adder channels, noisy typewriter channels, etc. We precisely characterize when exponential-sized (or positive rate) (L − 1)-list decodable codes (where the list size L is a universal constant) exist for such channels. Our criterion essentially asserts that:For any given general adversarial channel, it is possible to construct positive rate (L − 1)-list decodable codes if and only if the set of completely positive tensors of order-L with admissible marginals is not entirely contained in the order-L confusability set associated to the channel.The sufficiency is shown via random code construction (combined with expurgation or time-sharing). The necessity is shown by1. extracting approximately equicoupled subcodes (generalization of equidistant codes) from any sequence of "large" codes using hypergraph Ramsey’s theorem, and2. significantly extending the classic Plotkin bound in coding theory to list decoding for general channels using duality between the completely positive tensor cone and the copositive tensor cone.In the proof, we also obtain a new fact regarding asymmetry of joint distributions, which may be of independent interest.Other results include1 List decoding capacity with asymptotically large L for general adversarial channels;2 A tight list size bound for most constant composition codes (generalization of constant weight codes);3 Rederivation and demystification of Blinovsky’s [9] characterization of the list decoding Plotkin points (threshold at which large codes are impossible) for bit-flip channels;4 Evaluation of general bounds ([43]) for unique decoding in the error correction code setting.
本文研究了一般对抗性信道的列表解码问题,如:位翻转(XOR)信道、擦除信道、与(Z-)信道、或(0 -)信道、实加法器信道、噪声打字机信道等。我们精确地描述了指数大小(或正率)(L−1)-列表可解码码(其中列表大小L是一个普遍常数)是否存在于这些通道中。我们的准则实质上断言:对于任何给定的一般对抗信道,当且仅当与该信道相关的L阶混淆集中不完全包含具有可容许边际的L阶完全正张量集时,有可能构造出正率(L−1)表可解码码。充分性通过随机代码构造(结合删减或分时)来显示。必要性由1来说明。1 .利用超图Ramsey定理从任意“大”码序列中提取近似等偶子码(等距码的概化);利用完全正张量锥和共张量锥之间的对偶性,将编码理论中的经典Plotkin界扩展到一般信道的列解码。在证明中,我们还得到了关于联合分布的不对称性的一个新的事实,这可能是一个独立的兴趣。其他结果包括:1一般对抗性信道的列表解码容量渐近大L;2大多数恒定组合码的紧列表大小界(恒权码的概化);3对bitflip信道的Blinovsky[9]对列表解码Plotkin点(不可能出现大码的阈值)的表征的重新推导和解密;4对纠错码设置中唯一解码的一般界([43])的评估。
{"title":"Generalized List Decoding","authors":"Yihan Zhang, Amitalok J. Budkuley, S. Jaggi","doi":"10.4230/LIPIcs.ITCS.2020.51","DOIUrl":"https://doi.org/10.4230/LIPIcs.ITCS.2020.51","url":null,"abstract":"This paper concerns itself with the question of list decoding for general adversarial channels, e.g., bit-flip (XOR) channels, erasure channels, AND (Z-) channels, OR (ℤ-) channels, real adder channels, noisy typewriter channels, etc. We precisely characterize when exponential-sized (or positive rate) (L − 1)-list decodable codes (where the list size L is a universal constant) exist for such channels. Our criterion essentially asserts that:For any given general adversarial channel, it is possible to construct positive rate (L − 1)-list decodable codes if and only if the set of completely positive tensors of order-L with admissible marginals is not entirely contained in the order-L confusability set associated to the channel.The sufficiency is shown via random code construction (combined with expurgation or time-sharing). The necessity is shown by1. extracting approximately equicoupled subcodes (generalization of equidistant codes) from any sequence of \"large\" codes using hypergraph Ramsey’s theorem, and2. significantly extending the classic Plotkin bound in coding theory to list decoding for general channels using duality between the completely positive tensor cone and the copositive tensor cone.In the proof, we also obtain a new fact regarding asymmetry of joint distributions, which may be of independent interest.Other results include1 List decoding capacity with asymptotically large L for general adversarial channels;2 A tight list size bound for most constant composition codes (generalization of constant weight codes);3 Rederivation and demystification of Blinovsky’s [9] characterization of the list decoding Plotkin points (threshold at which large codes are impossible) for bit-flip channels;4 Evaluation of general bounds ([43]) for unique decoding in the error correction code setting.","PeriodicalId":137257,"journal":{"name":"2020 Information Theory and Applications Workshop (ITA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115503772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Differentially Private Algorithms for Learning Mixtures of Separated Gaussians 分离高斯分布混合学习的差分私有算法
Pub Date : 2019-09-09 DOI: 10.1109/ITA50056.2020.9244945
Gautam Kamath, Or Sheffet, Vikrant Singhal, Jonathan Ullman
Learning the parameters of Gaussian mixture models is a fundamental and widely studied problem with numerous applications. In this work, we give new algorithms for learning the parameters of a high-dimensional, well separated, Gaussian mixture model subject to the strong constraint of differential privacy. In particular, we give a differentially private analogue of the algorithm of Achlioptas and McSherry. Our algorithm has two key properties not achieved by prior work: (1) The algorithm’s sample complexity matches that of the corresponding non-private algorithm up to lower order terms in a wide range of parameters. (2) The algorithm does not require strong a priori bounds on the parameters of the mixture components.
高斯混合模型参数的学习是一个基本的、被广泛研究的问题,有着广泛的应用。在这项工作中,我们给出了一种新的算法来学习受差分隐私强约束的高维、良好分离的高斯混合模型的参数。特别地,我们给出了Achlioptas和McSherry算法的差分私有模拟。我们的算法有两个先前工作没有实现的关键特性:(1)在广泛的参数范围内,该算法的样本复杂度与相应的非私有算法的样本复杂度相匹配,直到低阶项。(2)该算法不需要混合组分参数的强先验界。
{"title":"Differentially Private Algorithms for Learning Mixtures of Separated Gaussians","authors":"Gautam Kamath, Or Sheffet, Vikrant Singhal, Jonathan Ullman","doi":"10.1109/ITA50056.2020.9244945","DOIUrl":"https://doi.org/10.1109/ITA50056.2020.9244945","url":null,"abstract":"Learning the parameters of Gaussian mixture models is a fundamental and widely studied problem with numerous applications. In this work, we give new algorithms for learning the parameters of a high-dimensional, well separated, Gaussian mixture model subject to the strong constraint of differential privacy. In particular, we give a differentially private analogue of the algorithm of Achlioptas and McSherry. Our algorithm has two key properties not achieved by prior work: (1) The algorithm’s sample complexity matches that of the corresponding non-private algorithm up to lower order terms in a wide range of parameters. (2) The algorithm does not require strong a priori bounds on the parameters of the mixture components.","PeriodicalId":137257,"journal":{"name":"2020 Information Theory and Applications Workshop (ITA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123440943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Universal Bayes Consistency in Metric Spaces 度量空间中的普遍贝叶斯一致性
Pub Date : 2019-06-24 DOI: 10.1109/ITA50056.2020.9244988
Steve Hanneke, A. Kontorovich, Sivan Sabato, Roi Weiss
We show that a recently proposed 1-nearest-neighbor-based multiclass learning algorithm is universally strongly Bayes consistent in all metric spaces where such Bayes consistency is possible, making it an "optimistically universal" Bayes-consistent learner. This is the first learning algorithm known to enjoy this property; by comparison, k-NN and its variants are not generally universally Bayes consistent, except under additional structural assumptions, such as an inner product, a norm, finite doubling dimension, or a Besicovitch-type property.The metric spaces in which universal Bayes consistency is possible are the "essentially separable" ones — a new notion that we define, which is more general than standard separability. The existence of metric spaces that are not essentially separable is independent of the ZFC axioms of set theory. We prove that essential separability exactly characterizes the existence of a universal Bayes-consistent learner for the given metric space. In particular, this yields the first impossibility result for universal Bayes consistency.Taken together, these positive and negative results resolve the open problems posed in Kontorovich, Sabato, Weiss (2017).
我们证明了最近提出的基于1-近邻的多类学习算法在所有可能存在贝叶斯一致性的度量空间中是普遍强贝叶斯一致性的,使其成为“乐观普遍”贝叶斯一致性学习者。这是已知的第一个具有这种特性的学习算法;相比之下,k-NN及其变体通常不是普遍的贝叶斯一致的,除非在额外的结构假设下,例如内积,范数,有限倍维或besicovitch型性质。可能具有全称贝叶斯一致性的度量空间是“本质可分”的度量空间——这是我们定义的一个新概念,它比标准可分性更一般。非本质可分度量空间的存在性与集合论的ZFC公理无关。我们证明了对于给定度量空间,本质可分性精确地表征了普遍贝叶斯一致学习者的存在性。特别地,这产生了普遍贝叶斯一致性的第一个不可能结果。综上所述,这些积极和消极的结果解决了Kontorovich, Sabato, Weiss(2017)提出的开放性问题。
{"title":"Universal Bayes Consistency in Metric Spaces","authors":"Steve Hanneke, A. Kontorovich, Sivan Sabato, Roi Weiss","doi":"10.1109/ITA50056.2020.9244988","DOIUrl":"https://doi.org/10.1109/ITA50056.2020.9244988","url":null,"abstract":"We show that a recently proposed 1-nearest-neighbor-based multiclass learning algorithm is universally strongly Bayes consistent in all metric spaces where such Bayes consistency is possible, making it an \"optimistically universal\" Bayes-consistent learner. This is the first learning algorithm known to enjoy this property; by comparison, k-NN and its variants are not generally universally Bayes consistent, except under additional structural assumptions, such as an inner product, a norm, finite doubling dimension, or a Besicovitch-type property.The metric spaces in which universal Bayes consistency is possible are the \"essentially separable\" ones — a new notion that we define, which is more general than standard separability. The existence of metric spaces that are not essentially separable is independent of the ZFC axioms of set theory. We prove that essential separability exactly characterizes the existence of a universal Bayes-consistent learner for the given metric space. In particular, this yields the first impossibility result for universal Bayes consistency.Taken together, these positive and negative results resolve the open problems posed in Kontorovich, Sabato, Weiss (2017).","PeriodicalId":137257,"journal":{"name":"2020 Information Theory and Applications Workshop (ITA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116844895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Safe Testing 安全测试
Pub Date : 2019-06-18 DOI: 10.1109/ITA50056.2020.9244948
P. Grünwald, R. D. Heide, Wouter M. Koolen
We present a new theory of hypothesis testing. The main concept is the s-value, a notion of evidence which, unlike p-values, allows for effortlessly combining evidence from several tests, even in the common scenario where the decision to perform a new test depends on the previous test outcome: safe tests based on s-values generally preserve Type-I error guarantees under such ‘optional continuation’. S-values exist for completely general testing problems with composite null and alternatives. Their prime interpretation is in terms of gambling or investing, each S-value corresponding to a particular investment. Surprisingly, optimal "GROW" S-values, which lead to fastest capital growth, are fully characterized by the joint information projection (JIPr) between the set of all Bayes marginal distributions on ${mathcal{H}_0}$ and ${mathcal{H}_1}$. Thus, optimal s-values also have an interpretation as Bayes factors, with priors given by the JIPr. We illustrate the theory using two classical testing scenarios: the one-sample t-test and the 2 × 2-contingency table. In the t-test setting, GROW S-values correspond to adopting the right Haar prior on the variance, like in Jeffreys’ Bayesian t-test. However, unlike Jeffreys’, the default safe t-test puts a discrete 2-point prior on the effect size, leading to better behaviour in terms of statistical power. Sharing Fisherian, Neymanian and Jeffreys-Bayesian interpretations, S-values and safe tests may provide a methodology acceptable to adherents of all three schools.
我们提出了一种新的假设检验理论。主要概念是s值,这是一个证据的概念,与p值不同,它允许毫不费力地组合来自多个测试的证据,即使在执行新测试的决定取决于先前测试结果的常见场景中也是如此:基于s值的安全测试通常在这种“可选延续”下保留i型错误保证。s值存在于具有复合空值和替代值的完全通用测试问题中。它们的主要解释是赌博或投资,每个s值对应一个特定的投资。令人惊讶的是,导致资本增长最快的最优“GROW”s值完全由${mathcal{H}_0}$和${mathcal{H}_1}$上的所有贝叶斯边际分布集之间的联合信息投影(joint information projection, JIPr)来表征。因此,最优s值也可以解释为贝叶斯因子,其先验由JIPr给出。我们用两个经典的检验场景来说明这个理论:单样本t检验和2 × 2列联表。在t检验设置中,GROW s值对应于对方差采用正确的Haar先验,如Jeffreys的贝叶斯t检验。然而,与Jeffreys不同的是,默认的安全t检验在效应大小上放置了离散的2点先验,从而在统计能力方面导致了更好的行为。分享fisher, Neymanian和Jeffreys-Bayesian的解释,s值和安全测试可能会为这三个学派的追随者提供一种可接受的方法。
{"title":"Safe Testing","authors":"P. Grünwald, R. D. Heide, Wouter M. Koolen","doi":"10.1109/ITA50056.2020.9244948","DOIUrl":"https://doi.org/10.1109/ITA50056.2020.9244948","url":null,"abstract":"We present a new theory of hypothesis testing. The main concept is the s-value, a notion of evidence which, unlike p-values, allows for effortlessly combining evidence from several tests, even in the common scenario where the decision to perform a new test depends on the previous test outcome: safe tests based on s-values generally preserve Type-I error guarantees under such ‘optional continuation’. S-values exist for completely general testing problems with composite null and alternatives. Their prime interpretation is in terms of gambling or investing, each S-value corresponding to a particular investment. Surprisingly, optimal \"GROW\" S-values, which lead to fastest capital growth, are fully characterized by the joint information projection (JIPr) between the set of all Bayes marginal distributions on ${mathcal{H}_0}$ and ${mathcal{H}_1}$. Thus, optimal s-values also have an interpretation as Bayes factors, with priors given by the JIPr. We illustrate the theory using two classical testing scenarios: the one-sample t-test and the 2 × 2-contingency table. In the t-test setting, GROW S-values correspond to adopting the right Haar prior on the variance, like in Jeffreys’ Bayesian t-test. However, unlike Jeffreys’, the default safe t-test puts a discrete 2-point prior on the effect size, leading to better behaviour in terms of statistical power. Sharing Fisherian, Neymanian and Jeffreys-Bayesian interpretations, S-values and safe tests may provide a methodology acceptable to adherents of all three schools.","PeriodicalId":137257,"journal":{"name":"2020 Information Theory and Applications Workshop (ITA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129105981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 140
Active Embedding Search via Noisy Paired Comparisons 基于噪声配对比较的主动嵌入搜索
Pub Date : 2019-05-10 DOI: 10.1109/ITA50056.2020.9244936
Gregory H. Canal, A. Massimino, M. Davenport, C. Rozell
Task : for a given user, estimate a preference vector $w in {mathbb{R}^d}$ in a similarity embedding of items
任务:对于给定的用户,在项目的相似性嵌入中估计一个偏好向量$w in {mathbb{R}^d}$
{"title":"Active Embedding Search via Noisy Paired Comparisons","authors":"Gregory H. Canal, A. Massimino, M. Davenport, C. Rozell","doi":"10.1109/ITA50056.2020.9244936","DOIUrl":"https://doi.org/10.1109/ITA50056.2020.9244936","url":null,"abstract":"Task : for a given user, estimate a preference vector $w in {mathbb{R}^d}$ in a similarity embedding of items","PeriodicalId":137257,"journal":{"name":"2020 Information Theory and Applications Workshop (ITA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117314969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Minimum Uncertainty Based Detection of Adversaries in Deep Neural Networks 基于最小不确定性的深度神经网络对手检测
Pub Date : 2019-04-05 DOI: 10.1109/ITA50056.2020.9244964
Fatemeh Sheikholeslami, Swayambhoo Jain, G. Giannakis
Despite their unprecedented performance in various domains, utilization of Deep Neural Networks (DNNs) in safety-critical environments is severely limited in the presence of even small adversarial perturbations. The present work develops a randomized approach to detecting such perturbations based on minimum uncertainty metrics that rely on sampling at the hidden layers during the DNN inference stage. Inspired by Bayesian approaches to uncertainty estimation, the sampling probabilities are designed for effective detection of the adversarially corrupted inputs. Being modular, the novel detector of adversaries can be conveniently employed by any pre-trained DNN at no extra training overhead. Selecting which units to sample per hidden layer entails quantifying the amount of DNN output uncertainty, where the overall uncertainty is expressed in terms of its layer-wise components - what also promotes scalability. Sampling probabilities are then sought by minimizing uncertainty measures layer-by-layer, leading to a novel convex optimization problem that admits an exact solver with superlinear convergence rate. By simplifying the objective function, low-complexity approximate solvers are also developed. In addition to valuable insights, these approximations link the novel approach with state-of-the-art randomized adversarial detectors. The effectiveness of the novel detectors in the context of competing alternatives is highlighted through extensive tests for various types of adversarial attacks with variable levels of strength.
尽管深度神经网络(dnn)在各个领域都有前所未有的表现,但在安全关键环境中,即使存在很小的对抗性扰动,深度神经网络(dnn)的应用也受到严重限制。目前的工作开发了一种随机方法来检测这种扰动,该方法基于最小不确定性度量,该度量依赖于DNN推理阶段隐藏层的采样。受贝叶斯方法不确定性估计的启发,采样概率被设计用于有效检测对抗性损坏的输入。由于是模块化的,新的对手检测器可以方便地由任何预训练的DNN使用,而不需要额外的训练开销。选择每个隐藏层采样的单元需要量化DNN输出不确定性的数量,其中整体不确定性以其分层组件表示——这也促进了可扩展性。然后,通过逐层最小化不确定性度量来寻求采样概率,从而导致一种新的凸优化问题,该问题允许具有超线性收敛速率的精确求解器。通过对目标函数的简化,提出了低复杂度的近似解。除了有价值的见解之外,这些近似将新方法与最先进的随机对抗性检测器联系起来。通过对不同强度的各种类型的对抗性攻击进行广泛测试,突出了新型检测器在竞争性替代方案背景下的有效性。
{"title":"Minimum Uncertainty Based Detection of Adversaries in Deep Neural Networks","authors":"Fatemeh Sheikholeslami, Swayambhoo Jain, G. Giannakis","doi":"10.1109/ITA50056.2020.9244964","DOIUrl":"https://doi.org/10.1109/ITA50056.2020.9244964","url":null,"abstract":"Despite their unprecedented performance in various domains, utilization of Deep Neural Networks (DNNs) in safety-critical environments is severely limited in the presence of even small adversarial perturbations. The present work develops a randomized approach to detecting such perturbations based on minimum uncertainty metrics that rely on sampling at the hidden layers during the DNN inference stage. Inspired by Bayesian approaches to uncertainty estimation, the sampling probabilities are designed for effective detection of the adversarially corrupted inputs. Being modular, the novel detector of adversaries can be conveniently employed by any pre-trained DNN at no extra training overhead. Selecting which units to sample per hidden layer entails quantifying the amount of DNN output uncertainty, where the overall uncertainty is expressed in terms of its layer-wise components - what also promotes scalability. Sampling probabilities are then sought by minimizing uncertainty measures layer-by-layer, leading to a novel convex optimization problem that admits an exact solver with superlinear convergence rate. By simplifying the objective function, low-complexity approximate solvers are also developed. In addition to valuable insights, these approximations link the novel approach with state-of-the-art randomized adversarial detectors. The effectiveness of the novel detectors in the context of competing alternatives is highlighted through extensive tests for various types of adversarial attacks with variable levels of strength.","PeriodicalId":137257,"journal":{"name":"2020 Information Theory and Applications Workshop (ITA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129204119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
期刊
2020 Information Theory and Applications Workshop (ITA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1