Pub Date : 2020-01-12DOI: 10.1109/ITA50056.2020.9244955
Recep Can Yavas, V. Kostina, M. Effros
This paper presents finite-blocklength achievability bounds for the Gaussian multiple access channel (MAC) and random access channel (RAC) under average-error and maximal-power constraints. Using random codewords uniformly distributed on a sphere and a maximum likelihood decoder, the derived MAC bound on each transmitter's rate matches the MolavianJazi-Laneman bound (2015) in its first- and second- order terms, improving the remaining terms to $frac{1}{2}frac{{log n}}{n} + Oleft( {frac{1}{n}} right)$ bits per channel use. The result then extends to a RAC model in which neither the encoders nor the decoder knows which of K possible transmitters are active. In the proposed rateless coding strategy, decoding occurs at a time nt that depends on the decoder's estimate t of the number of active transmitters k. Single-bit feedback from the decoder to all encoders at each potential decoding time ni, i ≤ t, informs the encoders when to stop transmitting. For this RAC model, the proposed code achieves the same first-, second-, and third-order performance as the best known result for the Gaussian MAC in operation.
{"title":"Gaussian Multiple and Random Access in the Finite Blocklength Regime","authors":"Recep Can Yavas, V. Kostina, M. Effros","doi":"10.1109/ITA50056.2020.9244955","DOIUrl":"https://doi.org/10.1109/ITA50056.2020.9244955","url":null,"abstract":"This paper presents finite-blocklength achievability bounds for the Gaussian multiple access channel (MAC) and random access channel (RAC) under average-error and maximal-power constraints. Using random codewords uniformly distributed on a sphere and a maximum likelihood decoder, the derived MAC bound on each transmitter's rate matches the MolavianJazi-Laneman bound (2015) in its first- and second- order terms, improving the remaining terms to $frac{1}{2}frac{{log n}}{n} + Oleft( {frac{1}{n}} right)$ bits per channel use. The result then extends to a RAC model in which neither the encoders nor the decoder knows which of K possible transmitters are active. In the proposed rateless coding strategy, decoding occurs at a time nt that depends on the decoder's estimate t of the number of active transmitters k. Single-bit feedback from the decoder to all encoders at each potential decoding time ni, i ≤ t, informs the encoders when to stop transmitting. For this RAC model, the proposed code achieves the same first-, second-, and third-order performance as the best known result for the Gaussian MAC in operation.","PeriodicalId":137257,"journal":{"name":"2020 Information Theory and Applications Workshop (ITA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115318531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-21DOI: 10.1109/ITA50056.2020.9245017
Avishek Ghosh, R. Maity, S. Kadhe, A. Mazumdar, K. Ramchandran
We develop a communication-efficient distributed learning algorithm that is robust against Byzantine worker machines. We propose and analyze a distributed gradient-descent algorithm that performs a simple thresholding based on gradient norms to mitigate Byzantine failures. We show the (statistical) error-rate of our algorithm matches that of [YCKB18], which uses more complicated schemes (like coordinate-wise median or trimmed mean) and thus optimal. Furthermore, for communication efficiency, we consider a generic class of δ-approximate compressors from [KRSJ19] that encompasses sign-based compressors and top-k sparsification. Our algorithm uses compressed gradients and gradient norms for aggregation and Byzantine removal respectively. We establish the statistical error rate of the algorithm for arbitrary (convex or non-convex) smooth loss function. We show that, in the regime when the compression factor δ is constant and the dimension of the parameter space is fixed, the rate of convergence is not affected by the compression operation, and hence we effectively get the compression for free. Moreover, we extend the compressed gradient descent algorithm with error feedback proposed in [KRSJ19] for the distributed setting. We have experimentally validated our results and shown good performance in convergence for convex (least-square regression) and non-convex (neural network training) problems.
{"title":"Communication-Efficient and Byzantine-Robust Distributed Learning","authors":"Avishek Ghosh, R. Maity, S. Kadhe, A. Mazumdar, K. Ramchandran","doi":"10.1109/ITA50056.2020.9245017","DOIUrl":"https://doi.org/10.1109/ITA50056.2020.9245017","url":null,"abstract":"We develop a communication-efficient distributed learning algorithm that is robust against Byzantine worker machines. We propose and analyze a distributed gradient-descent algorithm that performs a simple thresholding based on gradient norms to mitigate Byzantine failures. We show the (statistical) error-rate of our algorithm matches that of [YCKB18], which uses more complicated schemes (like coordinate-wise median or trimmed mean) and thus optimal. Furthermore, for communication efficiency, we consider a generic class of δ-approximate compressors from [KRSJ19] that encompasses sign-based compressors and top-k sparsification. Our algorithm uses compressed gradients and gradient norms for aggregation and Byzantine removal respectively. We establish the statistical error rate of the algorithm for arbitrary (convex or non-convex) smooth loss function. We show that, in the regime when the compression factor δ is constant and the dimension of the parameter space is fixed, the rate of convergence is not affected by the compression operation, and hence we effectively get the compression for free. Moreover, we extend the compressed gradient descent algorithm with error feedback proposed in [KRSJ19] for the distributed setting. We have experimentally validated our results and shown good performance in convergence for convex (least-square regression) and non-convex (neural network training) problems.","PeriodicalId":137257,"journal":{"name":"2020 Information Theory and Applications Workshop (ITA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116313228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-23DOI: 10.1109/ITA50056.2020.9244965
Rachel Grotheer, S. Li, A. Ma, D. Needell, Jing Qin
Low-rank tensor recovery problems have been widely studied in many signal processing and machine learning applications. Tensor rank is typically defined under certain tensor decomposition. In particular, Tucker decomposition is known as one of the most popular tensor decompositions. In recent years, researchers have developed many state-of-the-art algorithms to address the problem of low-Tucker-rank tensor recovery. Motivated by the favorable properties of the stochastic algorithms, such as stochastic gradient descent and stochastic iterative hard thresholding, we aim to extend the stochastic iterative hard thresholding algorithm from vectors to tensors in order to address the problem of recovering a low-Tucker-rank tensor from its linear measurements. We have also developed linear convergence analysis for the proposed method and conducted a series of experiments with both synthetic and real data to illustrate the performance of the proposed method.
{"title":"Stochastic Iterative Hard Thresholding for Low-Tucker-Rank Tensor Recovery","authors":"Rachel Grotheer, S. Li, A. Ma, D. Needell, Jing Qin","doi":"10.1109/ITA50056.2020.9244965","DOIUrl":"https://doi.org/10.1109/ITA50056.2020.9244965","url":null,"abstract":"Low-rank tensor recovery problems have been widely studied in many signal processing and machine learning applications. Tensor rank is typically defined under certain tensor decomposition. In particular, Tucker decomposition is known as one of the most popular tensor decompositions. In recent years, researchers have developed many state-of-the-art algorithms to address the problem of low-Tucker-rank tensor recovery. Motivated by the favorable properties of the stochastic algorithms, such as stochastic gradient descent and stochastic iterative hard thresholding, we aim to extend the stochastic iterative hard thresholding algorithm from vectors to tensors in order to address the problem of recovering a low-Tucker-rank tensor from its linear measurements. We have also developed linear convergence analysis for the proposed method and conducted a series of experiments with both synthetic and real data to illustrate the performance of the proposed method.","PeriodicalId":137257,"journal":{"name":"2020 Information Theory and Applications Workshop (ITA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130380724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-21DOI: 10.1109/ITA50056.2020.9244966
Kasper Green Larsen, M. Mitzenmacher, Charalampos E. Tsourakakis
We consider the following problem, which is useful in applications such as joint image and shape alignment. The goal is to recover n discrete variables gi ∈ {0,...,k − 1} (up to some global offset) given noisy observations of a set of their pairwise differences {(gi − gj) mod k}; specifically, with probability $frac{1}{k} + delta $ for some δ > 0 one obtains the correct answer, and with the remaining probability one obtains a uniformly random incorrect answer. We consider a learning-based formulation where one can perform a query to observe a pairwise difference, and the goal is to perform as few queries as possible while obtaining the exact joint alignment. We provide an easy-to-implement, time efficient algorithm that performs $Oleft( {frac{{nlg n}}{{k{delta ^2}}}} right)$ queries, and recovers the joint alignment with high probability. We also show that our algorithm is optimal by proving a general lower bound that holds for all non-adaptive algorithms. Our work improves significantly recent work by Chen and Candés [CC16], who view the problem as a constrained principal components analysis problem that can be solved using the power method. Specifically, our approach is simpler both in the algorithm and the analysis, and provides additional insights into the problem structure.
{"title":"Optimal Learning of Joint Alignments with a Faulty Oracle","authors":"Kasper Green Larsen, M. Mitzenmacher, Charalampos E. Tsourakakis","doi":"10.1109/ITA50056.2020.9244966","DOIUrl":"https://doi.org/10.1109/ITA50056.2020.9244966","url":null,"abstract":"We consider the following problem, which is useful in applications such as joint image and shape alignment. The goal is to recover n discrete variables gi ∈ {0,...,k − 1} (up to some global offset) given noisy observations of a set of their pairwise differences {(gi − gj) mod k}; specifically, with probability $frac{1}{k} + delta $ for some δ > 0 one obtains the correct answer, and with the remaining probability one obtains a uniformly random incorrect answer. We consider a learning-based formulation where one can perform a query to observe a pairwise difference, and the goal is to perform as few queries as possible while obtaining the exact joint alignment. We provide an easy-to-implement, time efficient algorithm that performs $Oleft( {frac{{nlg n}}{{k{delta ^2}}}} right)$ queries, and recovers the joint alignment with high probability. We also show that our algorithm is optimal by proving a general lower bound that holds for all non-adaptive algorithms. Our work improves significantly recent work by Chen and Candés [CC16], who view the problem as a constrained principal components analysis problem that can be solved using the power method. Specifically, our approach is simpler both in the algorithm and the analysis, and provides additional insights into the problem structure.","PeriodicalId":137257,"journal":{"name":"2020 Information Theory and Applications Workshop (ITA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113982627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-10DOI: 10.4230/LIPIcs.ITCS.2020.51
Yihan Zhang, Amitalok J. Budkuley, S. Jaggi
This paper concerns itself with the question of list decoding for general adversarial channels, e.g., bit-flip (XOR) channels, erasure channels, AND (Z-) channels, OR (ℤ-) channels, real adder channels, noisy typewriter channels, etc. We precisely characterize when exponential-sized (or positive rate) (L − 1)-list decodable codes (where the list size L is a universal constant) exist for such channels. Our criterion essentially asserts that:For any given general adversarial channel, it is possible to construct positive rate (L − 1)-list decodable codes if and only if the set of completely positive tensors of order-L with admissible marginals is not entirely contained in the order-L confusability set associated to the channel.The sufficiency is shown via random code construction (combined with expurgation or time-sharing). The necessity is shown by1. extracting approximately equicoupled subcodes (generalization of equidistant codes) from any sequence of "large" codes using hypergraph Ramsey’s theorem, and2. significantly extending the classic Plotkin bound in coding theory to list decoding for general channels using duality between the completely positive tensor cone and the copositive tensor cone.In the proof, we also obtain a new fact regarding asymmetry of joint distributions, which may be of independent interest.Other results include1 List decoding capacity with asymptotically large L for general adversarial channels;2 A tight list size bound for most constant composition codes (generalization of constant weight codes);3 Rederivation and demystification of Blinovsky’s [9] characterization of the list decoding Plotkin points (threshold at which large codes are impossible) for bit-flip channels;4 Evaluation of general bounds ([43]) for unique decoding in the error correction code setting.
{"title":"Generalized List Decoding","authors":"Yihan Zhang, Amitalok J. Budkuley, S. Jaggi","doi":"10.4230/LIPIcs.ITCS.2020.51","DOIUrl":"https://doi.org/10.4230/LIPIcs.ITCS.2020.51","url":null,"abstract":"This paper concerns itself with the question of list decoding for general adversarial channels, e.g., bit-flip (XOR) channels, erasure channels, AND (Z-) channels, OR (ℤ-) channels, real adder channels, noisy typewriter channels, etc. We precisely characterize when exponential-sized (or positive rate) (L − 1)-list decodable codes (where the list size L is a universal constant) exist for such channels. Our criterion essentially asserts that:For any given general adversarial channel, it is possible to construct positive rate (L − 1)-list decodable codes if and only if the set of completely positive tensors of order-L with admissible marginals is not entirely contained in the order-L confusability set associated to the channel.The sufficiency is shown via random code construction (combined with expurgation or time-sharing). The necessity is shown by1. extracting approximately equicoupled subcodes (generalization of equidistant codes) from any sequence of \"large\" codes using hypergraph Ramsey’s theorem, and2. significantly extending the classic Plotkin bound in coding theory to list decoding for general channels using duality between the completely positive tensor cone and the copositive tensor cone.In the proof, we also obtain a new fact regarding asymmetry of joint distributions, which may be of independent interest.Other results include1 List decoding capacity with asymptotically large L for general adversarial channels;2 A tight list size bound for most constant composition codes (generalization of constant weight codes);3 Rederivation and demystification of Blinovsky’s [9] characterization of the list decoding Plotkin points (threshold at which large codes are impossible) for bit-flip channels;4 Evaluation of general bounds ([43]) for unique decoding in the error correction code setting.","PeriodicalId":137257,"journal":{"name":"2020 Information Theory and Applications Workshop (ITA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115503772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-09DOI: 10.1109/ITA50056.2020.9244945
Gautam Kamath, Or Sheffet, Vikrant Singhal, Jonathan Ullman
Learning the parameters of Gaussian mixture models is a fundamental and widely studied problem with numerous applications. In this work, we give new algorithms for learning the parameters of a high-dimensional, well separated, Gaussian mixture model subject to the strong constraint of differential privacy. In particular, we give a differentially private analogue of the algorithm of Achlioptas and McSherry. Our algorithm has two key properties not achieved by prior work: (1) The algorithm’s sample complexity matches that of the corresponding non-private algorithm up to lower order terms in a wide range of parameters. (2) The algorithm does not require strong a priori bounds on the parameters of the mixture components.
{"title":"Differentially Private Algorithms for Learning Mixtures of Separated Gaussians","authors":"Gautam Kamath, Or Sheffet, Vikrant Singhal, Jonathan Ullman","doi":"10.1109/ITA50056.2020.9244945","DOIUrl":"https://doi.org/10.1109/ITA50056.2020.9244945","url":null,"abstract":"Learning the parameters of Gaussian mixture models is a fundamental and widely studied problem with numerous applications. In this work, we give new algorithms for learning the parameters of a high-dimensional, well separated, Gaussian mixture model subject to the strong constraint of differential privacy. In particular, we give a differentially private analogue of the algorithm of Achlioptas and McSherry. Our algorithm has two key properties not achieved by prior work: (1) The algorithm’s sample complexity matches that of the corresponding non-private algorithm up to lower order terms in a wide range of parameters. (2) The algorithm does not require strong a priori bounds on the parameters of the mixture components.","PeriodicalId":137257,"journal":{"name":"2020 Information Theory and Applications Workshop (ITA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123440943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-06-24DOI: 10.1109/ITA50056.2020.9244988
Steve Hanneke, A. Kontorovich, Sivan Sabato, Roi Weiss
We show that a recently proposed 1-nearest-neighbor-based multiclass learning algorithm is universally strongly Bayes consistent in all metric spaces where such Bayes consistency is possible, making it an "optimistically universal" Bayes-consistent learner. This is the first learning algorithm known to enjoy this property; by comparison, k-NN and its variants are not generally universally Bayes consistent, except under additional structural assumptions, such as an inner product, a norm, finite doubling dimension, or a Besicovitch-type property.The metric spaces in which universal Bayes consistency is possible are the "essentially separable" ones — a new notion that we define, which is more general than standard separability. The existence of metric spaces that are not essentially separable is independent of the ZFC axioms of set theory. We prove that essential separability exactly characterizes the existence of a universal Bayes-consistent learner for the given metric space. In particular, this yields the first impossibility result for universal Bayes consistency.Taken together, these positive and negative results resolve the open problems posed in Kontorovich, Sabato, Weiss (2017).
{"title":"Universal Bayes Consistency in Metric Spaces","authors":"Steve Hanneke, A. Kontorovich, Sivan Sabato, Roi Weiss","doi":"10.1109/ITA50056.2020.9244988","DOIUrl":"https://doi.org/10.1109/ITA50056.2020.9244988","url":null,"abstract":"We show that a recently proposed 1-nearest-neighbor-based multiclass learning algorithm is universally strongly Bayes consistent in all metric spaces where such Bayes consistency is possible, making it an \"optimistically universal\" Bayes-consistent learner. This is the first learning algorithm known to enjoy this property; by comparison, k-NN and its variants are not generally universally Bayes consistent, except under additional structural assumptions, such as an inner product, a norm, finite doubling dimension, or a Besicovitch-type property.The metric spaces in which universal Bayes consistency is possible are the \"essentially separable\" ones — a new notion that we define, which is more general than standard separability. The existence of metric spaces that are not essentially separable is independent of the ZFC axioms of set theory. We prove that essential separability exactly characterizes the existence of a universal Bayes-consistent learner for the given metric space. In particular, this yields the first impossibility result for universal Bayes consistency.Taken together, these positive and negative results resolve the open problems posed in Kontorovich, Sabato, Weiss (2017).","PeriodicalId":137257,"journal":{"name":"2020 Information Theory and Applications Workshop (ITA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116844895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-06-18DOI: 10.1109/ITA50056.2020.9244948
P. Grünwald, R. D. Heide, Wouter M. Koolen
We present a new theory of hypothesis testing. The main concept is the s-value, a notion of evidence which, unlike p-values, allows for effortlessly combining evidence from several tests, even in the common scenario where the decision to perform a new test depends on the previous test outcome: safe tests based on s-values generally preserve Type-I error guarantees under such ‘optional continuation’. S-values exist for completely general testing problems with composite null and alternatives. Their prime interpretation is in terms of gambling or investing, each S-value corresponding to a particular investment. Surprisingly, optimal "GROW" S-values, which lead to fastest capital growth, are fully characterized by the joint information projection (JIPr) between the set of all Bayes marginal distributions on ${mathcal{H}_0}$ and ${mathcal{H}_1}$. Thus, optimal s-values also have an interpretation as Bayes factors, with priors given by the JIPr. We illustrate the theory using two classical testing scenarios: the one-sample t-test and the 2 × 2-contingency table. In the t-test setting, GROW S-values correspond to adopting the right Haar prior on the variance, like in Jeffreys’ Bayesian t-test. However, unlike Jeffreys’, the default safe t-test puts a discrete 2-point prior on the effect size, leading to better behaviour in terms of statistical power. Sharing Fisherian, Neymanian and Jeffreys-Bayesian interpretations, S-values and safe tests may provide a methodology acceptable to adherents of all three schools.
我们提出了一种新的假设检验理论。主要概念是s值,这是一个证据的概念,与p值不同,它允许毫不费力地组合来自多个测试的证据,即使在执行新测试的决定取决于先前测试结果的常见场景中也是如此:基于s值的安全测试通常在这种“可选延续”下保留i型错误保证。s值存在于具有复合空值和替代值的完全通用测试问题中。它们的主要解释是赌博或投资,每个s值对应一个特定的投资。令人惊讶的是,导致资本增长最快的最优“GROW”s值完全由${mathcal{H}_0}$和${mathcal{H}_1}$上的所有贝叶斯边际分布集之间的联合信息投影(joint information projection, JIPr)来表征。因此,最优s值也可以解释为贝叶斯因子,其先验由JIPr给出。我们用两个经典的检验场景来说明这个理论:单样本t检验和2 × 2列联表。在t检验设置中,GROW s值对应于对方差采用正确的Haar先验,如Jeffreys的贝叶斯t检验。然而,与Jeffreys不同的是,默认的安全t检验在效应大小上放置了离散的2点先验,从而在统计能力方面导致了更好的行为。分享fisher, Neymanian和Jeffreys-Bayesian的解释,s值和安全测试可能会为这三个学派的追随者提供一种可接受的方法。
{"title":"Safe Testing","authors":"P. Grünwald, R. D. Heide, Wouter M. Koolen","doi":"10.1109/ITA50056.2020.9244948","DOIUrl":"https://doi.org/10.1109/ITA50056.2020.9244948","url":null,"abstract":"We present a new theory of hypothesis testing. The main concept is the s-value, a notion of evidence which, unlike p-values, allows for effortlessly combining evidence from several tests, even in the common scenario where the decision to perform a new test depends on the previous test outcome: safe tests based on s-values generally preserve Type-I error guarantees under such ‘optional continuation’. S-values exist for completely general testing problems with composite null and alternatives. Their prime interpretation is in terms of gambling or investing, each S-value corresponding to a particular investment. Surprisingly, optimal \"GROW\" S-values, which lead to fastest capital growth, are fully characterized by the joint information projection (JIPr) between the set of all Bayes marginal distributions on ${mathcal{H}_0}$ and ${mathcal{H}_1}$. Thus, optimal s-values also have an interpretation as Bayes factors, with priors given by the JIPr. We illustrate the theory using two classical testing scenarios: the one-sample t-test and the 2 × 2-contingency table. In the t-test setting, GROW S-values correspond to adopting the right Haar prior on the variance, like in Jeffreys’ Bayesian t-test. However, unlike Jeffreys’, the default safe t-test puts a discrete 2-point prior on the effect size, leading to better behaviour in terms of statistical power. Sharing Fisherian, Neymanian and Jeffreys-Bayesian interpretations, S-values and safe tests may provide a methodology acceptable to adherents of all three schools.","PeriodicalId":137257,"journal":{"name":"2020 Information Theory and Applications Workshop (ITA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129105981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-05-10DOI: 10.1109/ITA50056.2020.9244936
Gregory H. Canal, A. Massimino, M. Davenport, C. Rozell
Task : for a given user, estimate a preference vector $w in {mathbb{R}^d}$ in a similarity embedding of items
任务:对于给定的用户,在项目的相似性嵌入中估计一个偏好向量$w in {mathbb{R}^d}$
{"title":"Active Embedding Search via Noisy Paired Comparisons","authors":"Gregory H. Canal, A. Massimino, M. Davenport, C. Rozell","doi":"10.1109/ITA50056.2020.9244936","DOIUrl":"https://doi.org/10.1109/ITA50056.2020.9244936","url":null,"abstract":"Task : for a given user, estimate a preference vector $w in {mathbb{R}^d}$ in a similarity embedding of items","PeriodicalId":137257,"journal":{"name":"2020 Information Theory and Applications Workshop (ITA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117314969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-05DOI: 10.1109/ITA50056.2020.9244964
Fatemeh Sheikholeslami, Swayambhoo Jain, G. Giannakis
Despite their unprecedented performance in various domains, utilization of Deep Neural Networks (DNNs) in safety-critical environments is severely limited in the presence of even small adversarial perturbations. The present work develops a randomized approach to detecting such perturbations based on minimum uncertainty metrics that rely on sampling at the hidden layers during the DNN inference stage. Inspired by Bayesian approaches to uncertainty estimation, the sampling probabilities are designed for effective detection of the adversarially corrupted inputs. Being modular, the novel detector of adversaries can be conveniently employed by any pre-trained DNN at no extra training overhead. Selecting which units to sample per hidden layer entails quantifying the amount of DNN output uncertainty, where the overall uncertainty is expressed in terms of its layer-wise components - what also promotes scalability. Sampling probabilities are then sought by minimizing uncertainty measures layer-by-layer, leading to a novel convex optimization problem that admits an exact solver with superlinear convergence rate. By simplifying the objective function, low-complexity approximate solvers are also developed. In addition to valuable insights, these approximations link the novel approach with state-of-the-art randomized adversarial detectors. The effectiveness of the novel detectors in the context of competing alternatives is highlighted through extensive tests for various types of adversarial attacks with variable levels of strength.
{"title":"Minimum Uncertainty Based Detection of Adversaries in Deep Neural Networks","authors":"Fatemeh Sheikholeslami, Swayambhoo Jain, G. Giannakis","doi":"10.1109/ITA50056.2020.9244964","DOIUrl":"https://doi.org/10.1109/ITA50056.2020.9244964","url":null,"abstract":"Despite their unprecedented performance in various domains, utilization of Deep Neural Networks (DNNs) in safety-critical environments is severely limited in the presence of even small adversarial perturbations. The present work develops a randomized approach to detecting such perturbations based on minimum uncertainty metrics that rely on sampling at the hidden layers during the DNN inference stage. Inspired by Bayesian approaches to uncertainty estimation, the sampling probabilities are designed for effective detection of the adversarially corrupted inputs. Being modular, the novel detector of adversaries can be conveniently employed by any pre-trained DNN at no extra training overhead. Selecting which units to sample per hidden layer entails quantifying the amount of DNN output uncertainty, where the overall uncertainty is expressed in terms of its layer-wise components - what also promotes scalability. Sampling probabilities are then sought by minimizing uncertainty measures layer-by-layer, leading to a novel convex optimization problem that admits an exact solver with superlinear convergence rate. By simplifying the objective function, low-complexity approximate solvers are also developed. In addition to valuable insights, these approximations link the novel approach with state-of-the-art randomized adversarial detectors. The effectiveness of the novel detectors in the context of competing alternatives is highlighted through extensive tests for various types of adversarial attacks with variable levels of strength.","PeriodicalId":137257,"journal":{"name":"2020 Information Theory and Applications Workshop (ITA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129204119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}