A matrix M: A × X → {−1,1} corresponds to the following learning problem: An unknown element x ∈ X is chosen uniformly at random. A learner tries to learn x from a stream of samples, (a1, b1), (a2, b2) …, where for every i, ai ∈ A is chosen uniformly at random and bi = M(ai,x). Assume that k, l, r are such that any submatrix of M of at least 2−k · |A| rows and at least 2−l · |X| columns, has a bias of at most 2−r. We show that any learning algorithm for the learning problem corresponding to M requires either a memory of size at least Ω(k · l ), or at least 2Ω(r) samples. The result holds even if the learner has an exponentially small success probability (of 2−Ω(r)). In particular, this shows that for a large class of learning problems, any learning algorithm requires either a memory of size at least Ω((log|X|) · (log|A|)) or an exponential number of samples, achieving a tight Ω((log|X|) · (log|A|)) lower bound on the size of the memory, rather than a bound of Ω(min{(log|X|)2,(log|A|)2}) obtained in previous works by Raz [FOCS’17] and Moshkovitz and Moshkovitz [ITCS’18]. Moreover, our result implies all previous memory-samples lower bounds, as well as a number of new applications. Our proof builds on the work of Raz [FOCS’17] that gave a general technique for proving memory samples lower bounds.
矩阵M: A × X→{−1,1}对应如下学习问题:随机均匀选择未知元素X∈X。学习者尝试从样本流(a1, b1), (a2, b2)…中学习x,其中对于每一个i, ai∈A是随机均匀选择的,且bi = M(ai,x)。假设k, l, r满足M的任意子矩阵至少有2−k·|A|行,至少有2−l·|X|列,其偏置不超过2−r。我们表明,任何与M对应的学习问题的学习算法都需要至少Ω(k·l)的内存大小,或者至少2Ω(r)个样本。即使学习者有一个指数级的小成功概率(2 - Ω(r)),结果仍然成立。特别地,这表明对于一大类学习问题,任何学习算法要么需要至少Ω((log|X|)·(log| a |))的内存大小,要么需要指数数量的样本,以实现内存大小的一个紧密的Ω((log|X|)·(log| a |))下界,而不是Raz [FOCS ' 17]和Moshkovitz和Moshkovitz [ITCS ' 18]在以前的工作中得到的Ω(min{(log|X|)2,(log| a |)2})的下界。此外,我们的结果暗示了所有以前的内存样本的下界,以及一些新的应用程序。我们的证明建立在Raz [FOCS ' 17]的工作基础上,该工作给出了证明内存样本下界的一般技术。
{"title":"Extractor-based time-space lower bounds for learning","authors":"Sumegha Garg, R. Raz, Avishay Tal","doi":"10.1145/3188745.3188962","DOIUrl":"https://doi.org/10.1145/3188745.3188962","url":null,"abstract":"A matrix M: A × X → {−1,1} corresponds to the following learning problem: An unknown element x ∈ X is chosen uniformly at random. A learner tries to learn x from a stream of samples, (a1, b1), (a2, b2) …, where for every i, ai ∈ A is chosen uniformly at random and bi = M(ai,x). Assume that k, l, r are such that any submatrix of M of at least 2−k · |A| rows and at least 2−l · |X| columns, has a bias of at most 2−r. We show that any learning algorithm for the learning problem corresponding to M requires either a memory of size at least Ω(k · l ), or at least 2Ω(r) samples. The result holds even if the learner has an exponentially small success probability (of 2−Ω(r)). In particular, this shows that for a large class of learning problems, any learning algorithm requires either a memory of size at least Ω((log|X|) · (log|A|)) or an exponential number of samples, achieving a tight Ω((log|X|) · (log|A|)) lower bound on the size of the memory, rather than a bound of Ω(min{(log|X|)2,(log|A|)2}) obtained in previous works by Raz [FOCS’17] and Moshkovitz and Moshkovitz [ITCS’18]. Moreover, our result implies all previous memory-samples lower bounds, as well as a number of new applications. Our proof builds on the work of Raz [FOCS’17] that gave a general technique for proving memory samples lower bounds.","PeriodicalId":20593,"journal":{"name":"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82337508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An important result in discrepancy due to Banaszczyk states that for any set of n vectors in ℝm of ℓ2 norm at most 1 and any convex body K in ℝm of Gaussian measure at least half, there exists a ± 1 combination of these vectors which lies in 5K. This result implies the best known bounds for several problems in discrepancy. Banaszczyk’s proof of this result is non-constructive and an open problem has been to give an efficient algorithm to find such a ± 1 combination of the vectors. In this paper, we resolve this question and give an efficient randomized algorithm to find a ± 1 combination of the vectors which lies in cK for c>0 an absolute constant. This leads to new efficient algorithms for several problems in discrepancy theory.
{"title":"The Gram-Schmidt walk: a cure for the Banaszczyk blues","authors":"N. Bansal, D. Dadush, S. Garg, Shachar Lovett","doi":"10.1145/3188745.3188850","DOIUrl":"https://doi.org/10.1145/3188745.3188850","url":null,"abstract":"An important result in discrepancy due to Banaszczyk states that for any set of n vectors in ℝm of ℓ2 norm at most 1 and any convex body K in ℝm of Gaussian measure at least half, there exists a ± 1 combination of these vectors which lies in 5K. This result implies the best known bounds for several problems in discrepancy. Banaszczyk’s proof of this result is non-constructive and an open problem has been to give an efficient algorithm to find such a ± 1 combination of the vectors. In this paper, we resolve this question and give an efficient randomized algorithm to find a ± 1 combination of the vectors which lies in cK for c>0 an absolute constant. This leads to new efficient algorithms for several problems in discrepancy theory.","PeriodicalId":20593,"journal":{"name":"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75072356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study the problem of approximating the number of k-cliques in a graph when given query access to the graph. We consider the standard query model for general graphs via (1) degree queries, (2) neighbor queries and (3) pair queries. Let n denote the number of vertices in the graph, m the number of edges, and Ck the number of k-cliques. We design an algorithm that outputs a (1+ε)-approximation (with high probability) for Ck, whose expected query complexity and running time are O(n/Ck1/k+mk/2/Ck )(logn, 1/ε,k). Hence, the complexity of the algorithm is sublinear in the size of the graph for Ck = ω(mk/2−1). Furthermore, we prove a lower bound showing that the query complexity of our algorithm is essentially optimal (up to the dependence on logn, 1/ε and k). The previous results in this vein are by Feige (SICOMP 06) and by Goldreich and Ron (RSA 08) for edge counting (k=2) and by Eden et al. (FOCS 2015) for triangle counting (k=3). Our result matches the complexities of these results. The previous result by Eden et al. hinges on a certain amortization technique that works only for triangle counting, and does not generalize for larger cliques. We obtain a general algorithm that works for any k≥ 3 by designing a procedure that samples each k-clique incident to a given set S of vertices with approximately equal probability. The primary difficulty is in finding cliques incident to purely high-degree vertices, since random sampling within neighbors has a low success probability. This is achieved by an algorithm that samples uniform random high degree vertices and a careful tradeoff between estimating cliques incident purely to high-degree vertices and those that include a low-degree vertex.
{"title":"On approximating the number of k-cliques in sublinear time","authors":"T. Eden, D. Ron, C. Seshadhri","doi":"10.1145/3188745.3188810","DOIUrl":"https://doi.org/10.1145/3188745.3188810","url":null,"abstract":"We study the problem of approximating the number of k-cliques in a graph when given query access to the graph. We consider the standard query model for general graphs via (1) degree queries, (2) neighbor queries and (3) pair queries. Let n denote the number of vertices in the graph, m the number of edges, and Ck the number of k-cliques. We design an algorithm that outputs a (1+ε)-approximation (with high probability) for Ck, whose expected query complexity and running time are O(n/Ck1/k+mk/2/Ck )(logn, 1/ε,k). Hence, the complexity of the algorithm is sublinear in the size of the graph for Ck = ω(mk/2−1). Furthermore, we prove a lower bound showing that the query complexity of our algorithm is essentially optimal (up to the dependence on logn, 1/ε and k). The previous results in this vein are by Feige (SICOMP 06) and by Goldreich and Ron (RSA 08) for edge counting (k=2) and by Eden et al. (FOCS 2015) for triangle counting (k=3). Our result matches the complexities of these results. The previous result by Eden et al. hinges on a certain amortization technique that works only for triangle counting, and does not generalize for larger cliques. We obtain a general algorithm that works for any k≥ 3 by designing a procedure that samples each k-clique incident to a given set S of vertices with approximately equal probability. The primary difficulty is in finding cliques incident to purely high-degree vertices, since random sampling within neighbors has a low success probability. This is achieved by an algorithm that samples uniform random high degree vertices and a careful tradeoff between estimating cliques incident purely to high-degree vertices and those that include a low-degree vertex.","PeriodicalId":20593,"journal":{"name":"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing","volume":"100 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80376908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we introduce a general framework for fine-grained reductions of approximate counting problems to their decision versions. (Thus we use an oracle that decides whether any witness exists to multiplicatively approximate the number of witnesses with minimal overhead.) This mirrors a foundational result of Sipser (STOC 1983) and Stockmeyer (SICOMP 1985) in the polynomial-time setting, and a similar result of Müller (IWPEC 2006) in the FPT setting. Using our framework, we obtain such reductions for some of the most important problems in fine-grained complexity: the Orthogonal Vectors problem, 3SUM, and the Negative-Weight Triangle problem (which is closely related to All-Pairs Shortest Path). While all these problems have simple algorithms over which it is conjectured that no polynomial improvement is possible, our reductions would remain interesting even if these conjectures were proved; they have only polylogarithmic overhead, and can therefore be applied to subpolynomial improvements such as the n3/exp(Θ(√logn))-time algorithm for the Negative-Weight Triangle problem due to Williams (STOC 2014). Our framework is also general enough to apply to versions of the problems for which more efficient algorithms are known. For example, the Orthogonal Vectors problem over GF(m)d for constant m can be solved in time n·poly(d) by a result of Williams and Yu (SODA 2014); our result implies that we can approximately count the number of orthogonal pairs with essentially the same running time. We also provide a fine-grained reduction from approximate #SAT to SAT. Suppose the Strong Exponential Time Hypothesis (SETH) is false, so that for some 1 0 as part of the input). A full version of this paper containing detailed proofs is available at https://arxiv.org/abs/1707.04609.
{"title":"Fine-grained reductions from approximate counting to decision","authors":"Holger Dell, John Lapinskas","doi":"10.1145/3188745.3188920","DOIUrl":"https://doi.org/10.1145/3188745.3188920","url":null,"abstract":"In this paper, we introduce a general framework for fine-grained reductions of approximate counting problems to their decision versions. (Thus we use an oracle that decides whether any witness exists to multiplicatively approximate the number of witnesses with minimal overhead.) This mirrors a foundational result of Sipser (STOC 1983) and Stockmeyer (SICOMP 1985) in the polynomial-time setting, and a similar result of Müller (IWPEC 2006) in the FPT setting. Using our framework, we obtain such reductions for some of the most important problems in fine-grained complexity: the Orthogonal Vectors problem, 3SUM, and the Negative-Weight Triangle problem (which is closely related to All-Pairs Shortest Path). While all these problems have simple algorithms over which it is conjectured that no polynomial improvement is possible, our reductions would remain interesting even if these conjectures were proved; they have only polylogarithmic overhead, and can therefore be applied to subpolynomial improvements such as the n3/exp(Θ(√logn))-time algorithm for the Negative-Weight Triangle problem due to Williams (STOC 2014). Our framework is also general enough to apply to versions of the problems for which more efficient algorithms are known. For example, the Orthogonal Vectors problem over GF(m)d for constant m can be solved in time n·poly(d) by a result of Williams and Yu (SODA 2014); our result implies that we can approximately count the number of orthogonal pairs with essentially the same running time. We also provide a fine-grained reduction from approximate #SAT to SAT. Suppose the Strong Exponential Time Hypothesis (SETH) is false, so that for some 1<c<2 and all k there is an O(cn)-time algorithm for #k-SAT. Then we prove that for all k, there is an O((c+o(1))n)-time algorithm for approximate #k-SAT. In particular, our result implies that the Exponential Time Hypothesis (ETH) is equivalent to the seemingly-weaker statement that there is no algorithm to approximate #3-SAT to within a factor of 1+ε in time 2o(n)/ε2 (taking ε > 0 as part of the input). A full version of this paper containing detailed proofs is available at https://arxiv.org/abs/1707.04609.","PeriodicalId":20593,"journal":{"name":"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88518714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Czumaj, Jakub Lacki, A. Madry, Slobodan Mitrovic, Krzysztof Onak, P. Sankowski
For over a decade now we have been witnessing the success of massive parallel computation (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to accurately capture the nature of large-scale computation. In particular, compared to the classic distributed algorithms or PRAM models, these frameworks allow for much more local computation. The fundamental question that arises in this context is though: can we leverage this additional power to obtain even faster parallel algorithms? A prominent example here is the maximum matching problem—one of the most classic graph problems. It is well known that in the PRAM model one can compute a 2-approximate maximum matching in O(logn) rounds. However, the exact complexity of this problem in the MPC framework is still far from understood. Lattanzi et al. (SPAA 2011) showed that if each machine has n1+Ω(1) memory, this problem can also be solved 2-approximately in a constant number of rounds. These techniques, as well as the approaches developed in the follow up work, seem though to get stuck in a fundamental way at roughly O(logn) rounds once we enter the (at most) near-linear memory regime. It is thus entirely possible that in this regime, which captures in particular the case of sparse graph computations, the best MPC round complexity matches what one can already get in the PRAM model, without the need to take advantage of the extra local computation power. In this paper, we finally refute that possibility. That is, we break the above O(logn) round complexity bound even in the case of slightly sublinear memory per machine. In fact, our improvement here is almost exponential: we are able to deliver a (2+є)-approximate maximum matching, for any fixed constant є>0, in O((loglogn)2) rounds. To establish our result we need to deviate from the previous work in two important ways that are crucial for exploiting the power of the MPC model, as compared to the PRAM model. Firstly, we use vertex–based graph partitioning, instead of the edge–based approaches that were utilized so far. Secondly, we develop a technique of round compression. This technique enables one to take a (distributed) algorithm that computes an O(1)-approximation of maximum matching in O(logn) independent PRAM phases and implement a super-constant number of these phases in only a constant number of MPC rounds.
十多年来,我们见证了大规模并行计算(MPC)框架的成功,比如MapReduce、Hadoop、Dryad或Spark。它们成功的原因之一是这些框架能够准确地捕捉大规模计算的本质。特别是,与经典的分布式算法或PRAM模型相比,这些框架允许更多的本地计算。然而,在这种情况下出现的基本问题是:我们能否利用这种额外的能力来获得更快的并行算法?这里一个突出的例子是最大匹配问题——最经典的图问题之一。众所周知,在PRAM模型中,可以在O(logn)轮中计算出2-近似最大匹配。然而,在MPC框架中,这个问题的确切复杂性仍远未被理解。Lattanzi et al. (SPAA 2011)表明,如果每台机器有n1+Ω(1)的内存,这个问题也可以在一个恒定的轮数中近似地解决。这些技术,以及在后续工作中开发的方法,似乎在大约O(logn)轮的基本方式中陷入困境,一旦我们进入(最多)近线性内存状态。因此,完全有可能在这种情况下,特别是在稀疏图计算的情况下,最佳MPC轮复杂度与PRAM模型中已经可以得到的复杂度相匹配,而不需要利用额外的局部计算能力。在本文中,我们最终驳斥了这种可能性。也就是说,即使在每台机器的内存略低于线性的情况下,我们也打破了上述O(logn)的复杂度界限。事实上,我们在这里的改进几乎是指数级的:我们能够在O((loglogn)2)轮中提供(2+ n) -近似最大匹配,对于任何固定常数n >0。为了建立我们的结果,我们需要在两个重要的方面偏离之前的工作,这对于利用MPC模型的力量至关重要,与PRAM模型相比。首先,我们使用基于顶点的图划分,而不是迄今为止使用的基于边缘的方法。其次,我们开发了一种圆压缩技术。这种技术使人们能够采用一种(分布式)算法,在O(logn)独立的PRAM阶段中计算最大匹配的O(1)近似值,并在仅常数次MPC轮中实现这些阶段的超常数次。
{"title":"Round compression for parallel matching algorithms","authors":"A. Czumaj, Jakub Lacki, A. Madry, Slobodan Mitrovic, Krzysztof Onak, P. Sankowski","doi":"10.1145/3188745.3188764","DOIUrl":"https://doi.org/10.1145/3188745.3188764","url":null,"abstract":"For over a decade now we have been witnessing the success of massive parallel computation (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to accurately capture the nature of large-scale computation. In particular, compared to the classic distributed algorithms or PRAM models, these frameworks allow for much more local computation. The fundamental question that arises in this context is though: can we leverage this additional power to obtain even faster parallel algorithms? A prominent example here is the maximum matching problem—one of the most classic graph problems. It is well known that in the PRAM model one can compute a 2-approximate maximum matching in O(logn) rounds. However, the exact complexity of this problem in the MPC framework is still far from understood. Lattanzi et al. (SPAA 2011) showed that if each machine has n1+Ω(1) memory, this problem can also be solved 2-approximately in a constant number of rounds. These techniques, as well as the approaches developed in the follow up work, seem though to get stuck in a fundamental way at roughly O(logn) rounds once we enter the (at most) near-linear memory regime. It is thus entirely possible that in this regime, which captures in particular the case of sparse graph computations, the best MPC round complexity matches what one can already get in the PRAM model, without the need to take advantage of the extra local computation power. In this paper, we finally refute that possibility. That is, we break the above O(logn) round complexity bound even in the case of slightly sublinear memory per machine. In fact, our improvement here is almost exponential: we are able to deliver a (2+є)-approximate maximum matching, for any fixed constant є>0, in O((loglogn)2) rounds. To establish our result we need to deviate from the previous work in two important ways that are crucial for exploiting the power of the MPC model, as compared to the PRAM model. Firstly, we use vertex–based graph partitioning, instead of the edge–based approaches that were utilized so far. Secondly, we develop a technique of round compression. This technique enables one to take a (distributed) algorithm that computes an O(1)-approximation of maximum matching in O(logn) independent PRAM phases and implement a super-constant number of these phases in only a constant number of MPC rounds.","PeriodicalId":20593,"journal":{"name":"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing","volume":"90 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80398881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study the efficient learnability of geometric concept classes — specifically, low-degree polynomial threshold functions (PTFs) and intersections of halfspaces — when a fraction of the training data is adversarially corrupted. We give the first polynomial-time PAC learning algorithms for these concept classes with dimension-independent error guarantees in the presence of nasty noise under the Gaussian distribution. In the nasty noise model, an omniscient adversary can arbitrarily corrupt a small fraction of both the unlabeled data points and their labels. This model generalizes well-studied noise models, including the malicious noise model and the agnostic (adversarial label noise) model. Prior to our work, the only concept class for which efficient malicious learning algorithms were known was the class of origin-centered halfspaces. At the core of our results is an efficient algorithm to approximate the low-degree Chow-parameters of any bounded function in the presence of nasty noise. Our robust approximation algorithm for the Chow parameters provides near-optimal error guarantees for a range of distribution families satisfying mild concentration bounds and moment conditions. At the technical level, this algorithm employs an iterative “spectral” technique for outlier detection and removal inspired by recent work in robust unsupervised learning, which makes essential use of low-degree multivariate polynomials. Our robust learning algorithm for low-degree PTFs provides dimension-independent error guarantees for a class of tame distributions, including Gaussians and, more generally, any logconcave distribution with (approximately) known low-degree moments. For LTFs under the Gaussian distribution, using a refinement of the localization technique, we give a polynomial-time algorithm that achieves a near-optimal error of O(є), where є is the noise rate. Our robust learning algorithm for intersections of halfspaces proceeds by projecting down to an appropriate low-dimensional subspace. Its correctness makes essential use of a novel robust inverse independence lemma that is of independent interest.
{"title":"Learning geometric concepts with nasty noise","authors":"Ilias Diakonikolas, D. Kane, Alistair Stewart","doi":"10.1145/3188745.3188754","DOIUrl":"https://doi.org/10.1145/3188745.3188754","url":null,"abstract":"We study the efficient learnability of geometric concept classes — specifically, low-degree polynomial threshold functions (PTFs) and intersections of halfspaces — when a fraction of the training data is adversarially corrupted. We give the first polynomial-time PAC learning algorithms for these concept classes with dimension-independent error guarantees in the presence of nasty noise under the Gaussian distribution. In the nasty noise model, an omniscient adversary can arbitrarily corrupt a small fraction of both the unlabeled data points and their labels. This model generalizes well-studied noise models, including the malicious noise model and the agnostic (adversarial label noise) model. Prior to our work, the only concept class for which efficient malicious learning algorithms were known was the class of origin-centered halfspaces. At the core of our results is an efficient algorithm to approximate the low-degree Chow-parameters of any bounded function in the presence of nasty noise. Our robust approximation algorithm for the Chow parameters provides near-optimal error guarantees for a range of distribution families satisfying mild concentration bounds and moment conditions. At the technical level, this algorithm employs an iterative “spectral” technique for outlier detection and removal inspired by recent work in robust unsupervised learning, which makes essential use of low-degree multivariate polynomials. Our robust learning algorithm for low-degree PTFs provides dimension-independent error guarantees for a class of tame distributions, including Gaussians and, more generally, any logconcave distribution with (approximately) known low-degree moments. For LTFs under the Gaussian distribution, using a refinement of the localization technique, we give a polynomial-time algorithm that achieves a near-optimal error of O(є), where є is the noise rate. Our robust learning algorithm for intersections of halfspaces proceeds by projecting down to an appropriate low-dimensional subspace. Its correctness makes essential use of a novel robust inverse independence lemma that is of independent interest.","PeriodicalId":20593,"journal":{"name":"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing","volume":"317 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80119317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the school choice market, where scarce public school seats are assigned to students, a key issue is how to reassign seats that are vacated after an initial round of centralized assignment. Every year around 10% of students assigned a seat in the NYC public high school system eventually do not use it, and their vacated seats can be reassigned. Practical solutions to the reassignment problem must be simple to implement, truthful and efficient. I propose and axiomatically justify a class of reassignment mechanisms, the Per- muted Lottery Deferred Acceptance (PLDA) mechanisms, which generalize the commonly used Deferred Acceptance (DA) school choice mechanism to a two-round setting and retain its desirable in- centive and efficiency properties. I also provide guidance to school districts as to how to choose the appropriate mechanism in this class for their setting. Centralized admissions are typically conducted in a single round using Deferred Acceptance, with a lottery used to break ties in each school’s prioritization of students. Our proposed PLDA mechanisms reassign vacated seats using a second round of DA with a lottery based on a suitable permutation of the first-round lottery numbers. I demonstrate that under a natural order condition on aggregate student demand for schools, the second-round tie-breaking lottery can be correlated arbitrarily with that of the first round without affecting allocative welfare. I also show how the identifying char- acteristic of PLDA mechanisms, their permutation, can be chosen to control reallocation. vacated after the initial round are reassigned using decentralized waitlists that create significant student movement after the start of the school year, which is costly for both students and schools. I show that reversing the lottery order between rounds minimizes reassignment among all PLDA mechanisms, allowing us to alleviate costly student movement between schools without affecting the ef- ficiency of the final allocation. In a setting without school priorities, I also characterize PLDA mechanisms as the class of mechanisms that provide students with a guarantee at their first-round assign- ment, respect school priorities, and are strategy-proof, constrained Pareto efficient, and satisfy some mild symmetry properties. Finally, I provide simulations of the performance of different PLDA mecha- nisms in the presence of school priorities. All simulated PLDAs have similar allocative efficiency, while the PLDA based on reversing the tie-breaking lottery between rounds minimizes the number of reassigned students. These results support our theoretical findings. This is based on joint work with Itai Feigenbaum, Yash Kanoria, and Jay Sethuraman.
{"title":"Dynamic matching in school choice: efficient seat reassignment after late cancellations (invited talk)","authors":"Irene Lo","doi":"10.2139/ssrn.2993375","DOIUrl":"https://doi.org/10.2139/ssrn.2993375","url":null,"abstract":"In the school choice market, where scarce public school seats are assigned to students, a key issue is how to reassign seats that are vacated after an initial round of centralized assignment. Every year around 10% of students assigned a seat in the NYC public high school system eventually do not use it, and their vacated seats can be reassigned. Practical solutions to the reassignment problem must be simple to implement, truthful and efficient. I propose and axiomatically justify a class of reassignment mechanisms, the Per- muted Lottery Deferred Acceptance (PLDA) mechanisms, which generalize the commonly used Deferred Acceptance (DA) school choice mechanism to a two-round setting and retain its desirable in- centive and efficiency properties. I also provide guidance to school districts as to how to choose the appropriate mechanism in this class for their setting. Centralized admissions are typically conducted in a single round using Deferred Acceptance, with a lottery used to break ties in each school’s prioritization of students. Our proposed PLDA mechanisms reassign vacated seats using a second round of DA with a lottery based on a suitable permutation of the first-round lottery numbers. I demonstrate that under a natural order condition on aggregate student demand for schools, the second-round tie-breaking lottery can be correlated arbitrarily with that of the first round without affecting allocative welfare. I also show how the identifying char- acteristic of PLDA mechanisms, their permutation, can be chosen to control reallocation. vacated after the initial round are reassigned using decentralized waitlists that create significant student movement after the start of the school year, which is costly for both students and schools. I show that reversing the lottery order between rounds minimizes reassignment among all PLDA mechanisms, allowing us to alleviate costly student movement between schools without affecting the ef- ficiency of the final allocation. In a setting without school priorities, I also characterize PLDA mechanisms as the class of mechanisms that provide students with a guarantee at their first-round assign- ment, respect school priorities, and are strategy-proof, constrained Pareto efficient, and satisfy some mild symmetry properties. Finally, I provide simulations of the performance of different PLDA mecha- nisms in the presence of school priorities. All simulated PLDAs have similar allocative efficiency, while the PLDA based on reversing the tie-breaking lottery between rounds minimizes the number of reassigned students. These results support our theoretical findings. This is based on joint work with Itai Feigenbaum, Yash Kanoria, and Jay Sethuraman.","PeriodicalId":20593,"journal":{"name":"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88865165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider a population of n agents which communicate with each other in a decentralized manner, through random pairwise interactions. One or more agents in the population may act as authoritative sources of information, and the objective of the remaining agents is to obtain information from or about these source agents. We study two basic tasks: broadcasting, in which the agents are to learn the bit-state of an authoritative source which is present in the population, and source detection, in which the agents are required to decide if at least one source agent is present in the population or not. We focus on designing protocols which meet two natural conditions: (1) universality, i.e., independence of population size, and (2) rapid convergence to a correct global state after a reconfiguration, such as a change in the state of a source agent. Our main positive result is to show that both of these constraints can be met. For both the broadcasting problem and the source detection problem, we obtain solutions with an expected convergence time of O(logn), from any starting configuration. The solution to broadcasting is exact, which means that all agents reach the state broadcast by the source, while the solution to source detection admits one-sided error on a ε-fraction of the population (which is unavoidable for this problem). Both protocols are easy to implement in practice and are self-stabilizing, in the sense that the stated bounds on convergence time hold starting from any possible initial configuration of the system. Our protocols exploit the properties of self-organizing oscillatory dynamics. On the hardness side, our main structural insight is to prove that any protocol which meets the constraints of universality and of rapid convergence after reconfiguration must display a form of non-stationary behavior (of which oscillatory dynamics are an example). We also observe that the periodicity of the oscillatory behavior of the protocol, when present, must necessarily depend on the number #X of source agents present in the population. For instance, our protocols inherently rely on the emergence of a signal passing through the population, whose period is Θ(log(n/#X)) rounds for most starting configurations. The design of phase clocks with tunable frequency may be of independent interest, notably in modeling biological networks.
{"title":"Universal protocols for information dissemination using emergent signals","authors":"Bartłomiej Dudek, A. Kosowski","doi":"10.1145/3188745.3188818","DOIUrl":"https://doi.org/10.1145/3188745.3188818","url":null,"abstract":"We consider a population of n agents which communicate with each other in a decentralized manner, through random pairwise interactions. One or more agents in the population may act as authoritative sources of information, and the objective of the remaining agents is to obtain information from or about these source agents. We study two basic tasks: broadcasting, in which the agents are to learn the bit-state of an authoritative source which is present in the population, and source detection, in which the agents are required to decide if at least one source agent is present in the population or not. We focus on designing protocols which meet two natural conditions: (1) universality, i.e., independence of population size, and (2) rapid convergence to a correct global state after a reconfiguration, such as a change in the state of a source agent. Our main positive result is to show that both of these constraints can be met. For both the broadcasting problem and the source detection problem, we obtain solutions with an expected convergence time of O(logn), from any starting configuration. The solution to broadcasting is exact, which means that all agents reach the state broadcast by the source, while the solution to source detection admits one-sided error on a ε-fraction of the population (which is unavoidable for this problem). Both protocols are easy to implement in practice and are self-stabilizing, in the sense that the stated bounds on convergence time hold starting from any possible initial configuration of the system. Our protocols exploit the properties of self-organizing oscillatory dynamics. On the hardness side, our main structural insight is to prove that any protocol which meets the constraints of universality and of rapid convergence after reconfiguration must display a form of non-stationary behavior (of which oscillatory dynamics are an example). We also observe that the periodicity of the oscillatory behavior of the protocol, when present, must necessarily depend on the number #X of source agents present in the population. For instance, our protocols inherently rely on the emergence of a signal passing through the population, whose period is Θ(log(n/#X)) rounds for most starting configurations. The design of phase clocks with tunable frequency may be of independent interest, notably in modeling biological networks.","PeriodicalId":20593,"journal":{"name":"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing","volume":"55 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82037867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The individualization-refinement paradigm provides a strong toolbox for testing isomorphism of two graphs and indeed, the currently fastest implementations of isomorphism solvers all follow this approach. While these solvers are fast in practice, from a theoretical point of view, no general lower bounds concerning the worst case complexity of these tools are known. In fact, it is an open question what the running time of individualization-refinement algorithms is. For all we know some of the algorithms could have polynomial running time. In this work we give a negative answer to this question and construct a family of graphs on which algorithms based on the individualization-refinement paradigm require exponential time. Contrary to a previous construction of Miyazaki, that only applies to a specific implementation within the individualization-refinement framework, our construction is immune to changing the cell selector, the refinement operator, the invariant that is used, or adding various heuristic invariants to the algorithm. In fact, our graphs also provide exponential lower bounds in the case when the k-dimensional Weisfeiler-Leman algorithm is used to replace the the 1-dimensional Weisfeiler-Leman algorithm (often called color refinement) that is normally used. Finally, the arguments even work when the entire automorphism group of the inputs is initially provided to the algorithm. The arguments apply to isomorphism testing algorithms as well as canonization algorithms within the framework.
{"title":"An exponential lower bound for individualization-refinement algorithms for graph isomorphism","authors":"Daniel Neuen, Pascal Schweitzer","doi":"10.1145/3188745.3188900","DOIUrl":"https://doi.org/10.1145/3188745.3188900","url":null,"abstract":"The individualization-refinement paradigm provides a strong toolbox for testing isomorphism of two graphs and indeed, the currently fastest implementations of isomorphism solvers all follow this approach. While these solvers are fast in practice, from a theoretical point of view, no general lower bounds concerning the worst case complexity of these tools are known. In fact, it is an open question what the running time of individualization-refinement algorithms is. For all we know some of the algorithms could have polynomial running time. In this work we give a negative answer to this question and construct a family of graphs on which algorithms based on the individualization-refinement paradigm require exponential time. Contrary to a previous construction of Miyazaki, that only applies to a specific implementation within the individualization-refinement framework, our construction is immune to changing the cell selector, the refinement operator, the invariant that is used, or adding various heuristic invariants to the algorithm. In fact, our graphs also provide exponential lower bounds in the case when the k-dimensional Weisfeiler-Leman algorithm is used to replace the the 1-dimensional Weisfeiler-Leman algorithm (often called color refinement) that is normally used. Finally, the arguments even work when the entire automorphism group of the inputs is initially provided to the algorithm. The arguments apply to isomorphism testing algorithms as well as canonization algorithms within the framework.","PeriodicalId":20593,"journal":{"name":"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing","volume":"69 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83892992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We construct near optimal linear decision trees for a variety of decision problems in combinatorics and discrete geometry. For example, for any constant k, we construct linear decision trees that solve the k-SUM problem on n elements using O(n log2 n) linear queries. Moreover, the queries we use are comparison queries, which compare the sums of two k-subsets; when viewed as linear queries, comparison queries are 2k-sparse and have only {−1,0,1} coefficients. We give similar constructions for sorting sumsets A+B and for solving the SUBSET-SUM problem, both with optimal number of queries, up to poly-logarithmic terms. Our constructions are based on the notion of “inference dimension”, recently introduced by the authors in the context of active classification with comparison queries. This can be viewed as another contribution to the fruitful link between machine learning and discrete geometry, which goes back to the discovery of the VC dimension.
{"title":"Near-optimal linear decision trees for k-SUM and related problems","authors":"D. Kane, Shachar Lovett, S. Moran","doi":"10.1145/3188745.3188770","DOIUrl":"https://doi.org/10.1145/3188745.3188770","url":null,"abstract":"We construct near optimal linear decision trees for a variety of decision problems in combinatorics and discrete geometry. For example, for any constant k, we construct linear decision trees that solve the k-SUM problem on n elements using O(n log2 n) linear queries. Moreover, the queries we use are comparison queries, which compare the sums of two k-subsets; when viewed as linear queries, comparison queries are 2k-sparse and have only {−1,0,1} coefficients. We give similar constructions for sorting sumsets A+B and for solving the SUBSET-SUM problem, both with optimal number of queries, up to poly-logarithmic terms. Our constructions are based on the notion of “inference dimension”, recently introduced by the authors in the context of active classification with comparison queries. This can be viewed as another contribution to the fruitful link between machine learning and discrete geometry, which goes back to the discovery of the VC dimension.","PeriodicalId":20593,"journal":{"name":"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing","volume":"70 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75587361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}