Nai-Hui Chia, A. Gilyén, Tongyang Li, Han-Hsuan Lin, Ewin Tang, C. Wang
We present an algorithmic framework for quantum-inspired classical algorithms on close-to-low-rank matrices, generalizing the series of results started by Tang’s breakthrough quantum-inspired algorithm for recommendation systems [STOC’19]. Motivated by quantum linear algebra algorithms and the quantum singular value transformation (SVT) framework of Gilyén et al. [STOC’19], we develop classical algorithms for SVT that run in time independent of input dimension, under suitable quantum-inspired sampling assumptions. Our results give compelling evidence that in the corresponding QRAM data structure input model, quantum SVT does not yield exponential quantum speedups. Since the quantum SVT framework generalizes essentially all known techniques for quantum linear algebra, our results, combined with sampling lemmas from previous work, suffice to generalize all prior results about dequantizing quantum machine learning algorithms. In particular, our classical SVT framework recovers and often improves the dequantization results on recommendation systems, principal component analysis, supervised clustering, support vector machines, low-rank regression, and semidefinite program solving. We also give additional dequantization results on low-rank Hamiltonian simulation and discriminant analysis. Our improvements come from identifying the key feature of the quantum-inspired input model that is at the core of all prior quantum-inspired results: ℓ2-norm sampling can approximate matrix products in time independent of their dimension. We reduce all our main results to this fact, making our exposition concise, self-contained, and intuitive.
{"title":"Sampling-based Sublinear Low-rank Matrix Arithmetic Framework for Dequantizing Quantum Machine Learning","authors":"Nai-Hui Chia, A. Gilyén, Tongyang Li, Han-Hsuan Lin, Ewin Tang, C. Wang","doi":"10.1145/3549524","DOIUrl":"https://doi.org/10.1145/3549524","url":null,"abstract":"We present an algorithmic framework for quantum-inspired classical algorithms on close-to-low-rank matrices, generalizing the series of results started by Tang’s breakthrough quantum-inspired algorithm for recommendation systems [STOC’19]. Motivated by quantum linear algebra algorithms and the quantum singular value transformation (SVT) framework of Gilyén et al. [STOC’19], we develop classical algorithms for SVT that run in time independent of input dimension, under suitable quantum-inspired sampling assumptions. Our results give compelling evidence that in the corresponding QRAM data structure input model, quantum SVT does not yield exponential quantum speedups. Since the quantum SVT framework generalizes essentially all known techniques for quantum linear algebra, our results, combined with sampling lemmas from previous work, suffice to generalize all prior results about dequantizing quantum machine learning algorithms. In particular, our classical SVT framework recovers and often improves the dequantization results on recommendation systems, principal component analysis, supervised clustering, support vector machines, low-rank regression, and semidefinite program solving. We also give additional dequantization results on low-rank Hamiltonian simulation and discriminant analysis. Our improvements come from identifying the key feature of the quantum-inspired input model that is at the core of all prior quantum-inspired results: ℓ2-norm sampling can approximate matrix products in time independent of their dimension. We reduce all our main results to this fact, making our exposition concise, self-contained, and intuitive.","PeriodicalId":50022,"journal":{"name":"Journal of the ACM","volume":"75 1","pages":"1 - 72"},"PeriodicalIF":2.5,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82026044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Borzoo Bonakdarpour, P. Fraigniaud, S. Rajsbaum, D. Rosenblueth, Corentin Travers
Runtime verification is a lightweight method for monitoring the formal specification of a system during its execution. It has recently been shown that a given state predicate can be monitored consistently by a set of crash-prone asynchronous distributed monitors observing the system, only if each monitor can emit verdicts taken from a large enough finite set. We revisit this impossibility result in the concrete context of linear-time logic (ltl) semantics for runtime verification, that is, when the correctness of the system is specified by an ltl formula on its execution traces. First, we show that monitors synthesized based on the 4-valued semantics of ltl (rv-ltl) may result in inconsistent distributed monitoring, even for some simple ltl formulas. More generally, given any ltl formula φ, we relate the number of different verdicts required by the monitors for consistently monitoring φ, with a specific structural characteristic of φ called its alternation number. Specifically, we show that, for every k ≥ 0, there is an ltl formula φ with alternation number k that cannot be verified at runtime by distributed monitors emitting verdicts from a set of cardinality smaller than k + 1. On the positive side, we define a family of logics, called distributed ltl (abbreviated as dltl), parameterized by k ≥ 0, which refines rv-ltl by incorporating 2k + 4 truth values. Our main contribution is to show that, for every k ≥ 0, every ltl formula φ with alternation number k can be consistently monitored by distributed monitors, each running an automaton based on a (2 ⌈ k/2 ⌉ +4)-valued logic taken from the dltl family.
{"title":"Decentralized Asynchronous Crash-resilient Runtime Verification","authors":"Borzoo Bonakdarpour, P. Fraigniaud, S. Rajsbaum, D. Rosenblueth, Corentin Travers","doi":"10.1145/3550483","DOIUrl":"https://doi.org/10.1145/3550483","url":null,"abstract":"Runtime verification is a lightweight method for monitoring the formal specification of a system during its execution. It has recently been shown that a given state predicate can be monitored consistently by a set of crash-prone asynchronous distributed monitors observing the system, only if each monitor can emit verdicts taken from a large enough finite set. We revisit this impossibility result in the concrete context of linear-time logic (ltl) semantics for runtime verification, that is, when the correctness of the system is specified by an ltl formula on its execution traces. First, we show that monitors synthesized based on the 4-valued semantics of ltl (rv-ltl) may result in inconsistent distributed monitoring, even for some simple ltl formulas. More generally, given any ltl formula φ, we relate the number of different verdicts required by the monitors for consistently monitoring φ, with a specific structural characteristic of φ called its alternation number. Specifically, we show that, for every k ≥ 0, there is an ltl formula φ with alternation number k that cannot be verified at runtime by distributed monitors emitting verdicts from a set of cardinality smaller than k + 1. On the positive side, we define a family of logics, called distributed ltl (abbreviated as dltl), parameterized by k ≥ 0, which refines rv-ltl by incorporating 2k + 4 truth values. Our main contribution is to show that, for every k ≥ 0, every ltl formula φ with alternation number k can be consistently monitored by distributed monitors, each running an automaton based on a (2 ⌈ k/2 ⌉ +4)-valued logic taken from the dltl family.","PeriodicalId":50022,"journal":{"name":"Journal of the ACM","volume":"22 1","pages":"1 - 31"},"PeriodicalIF":2.5,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91172810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-09DOI: 10.48550/arXiv.2208.04931
Nicola Cotumaccio, G. D’Agostino, A. Policriti, N. Prezza
The states of a finite-state automaton 𝒩 can be identified with collections of words in the prefix closure of the regular language accepted by 𝒩. But words can be ordered, and among the many possible orders a very natural one is the co-lexicographic order. Such naturalness stems from the fact that it suggests a transfer of the order from words to the automaton’s states. This suggestion is, in fact, concrete and in a number of articles automata admitting a total co-lexicographic (co-lex for brevity) ordering of states have been proposed and studied. Such class of ordered automata — Wheeler automata — turned out to require just a constant number of bits per transition to be represented and enable regular expression matching queries in constant time per matched character. Unfortunately, not all automata can be totally ordered as previously outlined. In the present work, we lay out a new theory showing that all automata can always be partially ordered, and an intrinsic measure of their complexity can be defined and effectively determined, namely, the minimum width p of one of their admissible co-lex partial orders–dubbed here the automaton’s co-lex width. We first show that this new measure captures at once the complexity of several seemingly-unrelated hard problems on automata. Any NFA of co-lex width p: (i) has an equivalent powerset DFA whose size is exponential in p rather than (as a classic analysis shows) in the NFA’s size; (ii) can be encoded using just Θ(log p) bits per transition; (iii) admits a linear-space data structure solving regular expression matching queries in time proportional to p2 per matched character. Some consequences of this new parameterization of automata are that PSPACE-hard problems such as NFA equivalence are FPT in p, and quadratic lower bounds for the regular expression matching problem do not hold for sufficiently small p. Having established that the co-lex width of an automaton is a fundamental complexity measure, we proceed by (i) determining its computational complexity and (ii) extending this notion from automata to regular languages by studying their smallest-width accepting NFAs and DFAs. In this work we focus on the deterministic case and prove that a canonical minimum-width DFA accepting a language ℒ–dubbed the Hasse automaton ℋ of ℒ–can be exhibited. ℋ provides, in a precise sense, the best possible way to (partially) order the states of any DFA accepting ℒ, as long as we want to maintain an operational link with the (co-lexicographic) order of ℒ’s prefixes. Finally, we explore the relationship between two conflicting objectives: minimizing the width and minimizing the number of states of a DFA. In this context, we provide an analogue of the Myhill-Nerode Theorem for co-lexicographically ordered regular languages.
{"title":"Co-lexicographically Ordering Automata and Regular Languages - Part I","authors":"Nicola Cotumaccio, G. D’Agostino, A. Policriti, N. Prezza","doi":"10.48550/arXiv.2208.04931","DOIUrl":"https://doi.org/10.48550/arXiv.2208.04931","url":null,"abstract":"The states of a finite-state automaton 𝒩 can be identified with collections of words in the prefix closure of the regular language accepted by 𝒩. But words can be ordered, and among the many possible orders a very natural one is the co-lexicographic order. Such naturalness stems from the fact that it suggests a transfer of the order from words to the automaton’s states. This suggestion is, in fact, concrete and in a number of articles automata admitting a total co-lexicographic (co-lex for brevity) ordering of states have been proposed and studied. Such class of ordered automata — Wheeler automata — turned out to require just a constant number of bits per transition to be represented and enable regular expression matching queries in constant time per matched character. Unfortunately, not all automata can be totally ordered as previously outlined. In the present work, we lay out a new theory showing that all automata can always be partially ordered, and an intrinsic measure of their complexity can be defined and effectively determined, namely, the minimum width p of one of their admissible co-lex partial orders–dubbed here the automaton’s co-lex width. We first show that this new measure captures at once the complexity of several seemingly-unrelated hard problems on automata. Any NFA of co-lex width p: (i) has an equivalent powerset DFA whose size is exponential in p rather than (as a classic analysis shows) in the NFA’s size; (ii) can be encoded using just Θ(log p) bits per transition; (iii) admits a linear-space data structure solving regular expression matching queries in time proportional to p2 per matched character. Some consequences of this new parameterization of automata are that PSPACE-hard problems such as NFA equivalence are FPT in p, and quadratic lower bounds for the regular expression matching problem do not hold for sufficiently small p. Having established that the co-lex width of an automaton is a fundamental complexity measure, we proceed by (i) determining its computational complexity and (ii) extending this notion from automata to regular languages by studying their smallest-width accepting NFAs and DFAs. In this work we focus on the deterministic case and prove that a canonical minimum-width DFA accepting a language ℒ–dubbed the Hasse automaton ℋ of ℒ–can be exhibited. ℋ provides, in a precise sense, the best possible way to (partially) order the states of any DFA accepting ℒ, as long as we want to maintain an operational link with the (co-lexicographic) order of ℒ’s prefixes. Finally, we explore the relationship between two conflicting objectives: minimizing the width and minimizing the number of states of a DFA. In this context, we provide an analogue of the Myhill-Nerode Theorem for co-lexicographically ordered regular languages.","PeriodicalId":50022,"journal":{"name":"Journal of the ACM","volume":"15 1","pages":"1 - 73"},"PeriodicalIF":2.5,"publicationDate":"2022-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85355525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce the abstract notion of a chain, which is a sequence of n points in the plane, ordered by x-coordinates, so that the edge between any two consecutive points is unavoidable as far as triangulations are concerned. A general theory of the structural properties of chains is developed, alongside a general understanding of their number of triangulations. We also describe an intriguing new and concrete configuration, which we call the Koch chain due to its similarities to the Koch curve. A specific construction based on Koch chains is then shown to have Ω (9.08n) triangulations. This is a significant improvement over the previous and long-standing lower bound of Ω (8.65n) for the maximum number of triangulations of planar point sets.
{"title":"Chains, Koch Chains, and Point Sets with Many Triangulations","authors":"Daniel Rutschmann, Manuel Wettstein","doi":"10.1145/3585535","DOIUrl":"https://doi.org/10.1145/3585535","url":null,"abstract":"We introduce the abstract notion of a chain, which is a sequence of n points in the plane, ordered by x-coordinates, so that the edge between any two consecutive points is unavoidable as far as triangulations are concerned. A general theory of the structural properties of chains is developed, alongside a general understanding of their number of triangulations. We also describe an intriguing new and concrete configuration, which we call the Koch chain due to its similarities to the Koch curve. A specific construction based on Koch chains is then shown to have Ω (9.08n) triangulations. This is a significant improvement over the previous and long-standing lower bound of Ω (8.65n) for the maximum number of triangulations of planar point sets.","PeriodicalId":50022,"journal":{"name":"Journal of the ACM","volume":"23 1","pages":"1 - 26"},"PeriodicalIF":2.5,"publicationDate":"2022-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81435765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study the problem of chasing convex bodies online: given a sequence of convex bodies the algorithm must respond with points in an online fashion (i.e., is chosen before is revealed). The objecti...
{"title":"Chasing Convex Bodies with Linear Competitive Ratio","authors":"J. ArgueC., GuptaAnupam, TangZiye, GuruganeshGuru","doi":"10.1145/3450349","DOIUrl":"https://doi.org/10.1145/3450349","url":null,"abstract":"We study the problem of chasing convex bodies online: given a sequence of convex bodies the algorithm must respond with points in an online fashion (i.e., is chosen before is revealed). The objecti...","PeriodicalId":50022,"journal":{"name":"Journal of the ACM","volume":"72 1","pages":""},"PeriodicalIF":2.5,"publicationDate":"2021-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83902459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pseudorandom functions (PRFs) are one of the foundational concepts in theoretical computer science, with numerous applications in complexity theory and cryptography. In this work, we study the secu...
{"title":"How to Construct Quantum Random Functions","authors":"ZhandryMark","doi":"10.1145/3450745","DOIUrl":"https://doi.org/10.1145/3450745","url":null,"abstract":"Pseudorandom functions (PRFs) are one of the foundational concepts in theoretical computer science, with numerous applications in complexity theory and cryptography. In this work, we study the secu...","PeriodicalId":50022,"journal":{"name":"Journal of the ACM","volume":"329 3","pages":"1-43"},"PeriodicalIF":2.5,"publicationDate":"2021-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72424609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ho-Lin Chen, David Doty, Wyatt Reeves, D. Soloveichik
Understanding the algorithmic behaviors that are in principle realizable in a chemical system is necessary for a rigorous understanding of the design principles of biological regulatory networks. Further, advances in synthetic biology herald the time when we will be able to rationally engineer complex chemical systems and when idealized formal models will become blueprints for engineering. Coupled chemical interactions in a well-mixed solution are commonly formalized as chemical reaction networks (CRNs). However, despite the widespread use of CRNs in the natural sciences, the range of computational behaviors exhibited by CRNs is not well understood. Here, we study the following problem: What functions f : ℝk → ℝ can be computed by a CRN, in which the CRN eventually produces the correct amount of the “output” molecule, no matter the rate at which reactions proceed? This captures a previously unexplored but very natural class of computations: For example, the reaction X1 + X2 → Y can be thought to compute the function y = min (x1, x2). Such a CRN is robust in the sense that it is correct whether its evolution is governed by the standard model of mass-action kinetics, alternatives such as Hill-function or Michaelis-Menten kinetics, or other arbitrary models of chemistry that respect the (fundamentally digital) stoichiometric constraints (what are the reactants and products?). We develop a reachability relation based on a broad notion of “what could happen” if reaction rates can vary arbitrarily over time. Using reachability, we define stable computation analogously to probability 1 computation in distributed computing and connect it with a seemingly stronger notion of rate-independent computation based on convergence in the limit t → ∞ under a wide class of generalized rate laws. Besides the direct mapping of a concentration to a nonnegative analog value, we also consider the “dual-rail representation” that can represent negative values as the difference of two concentrations and allows the composition of CRN modules. We prove that a function is rate-independently computable if and only if it is piecewise linear (with rational coefficients) and continuous (dual-rail representation), or non-negative with discontinuities occurring only when some inputs switch from zero to positive (direct representation). The many contexts where continuous piecewise linear functions are powerful targets for implementation, combined with the systematic construction we develop for computing these functions, demonstrate the potential of rate-independent chemical computation.
理解原则上在化学系统中可实现的算法行为对于严格理解生物调节网络的设计原则是必要的。此外,合成生物学的进步预示着我们将能够合理地设计复杂的化学系统,理想化的正式模型将成为工程的蓝图。在混合良好的溶液中,耦合的化学相互作用通常形式化为化学反应网络(crn)。然而,尽管crn在自然科学中广泛使用,但crn所表现出的计算行为的范围尚未得到很好的理解。在这里,我们研究了以下问题:什么函数f:∈k→∈可以由CRN计算,其中CRN最终产生正确数量的“输出”分子,无论反应进行的速度如何?这捕获了以前未探索但非常自然的一类计算:例如,反应X1 + X2→Y可以被认为是计算函数Y = min (X1, X2)。这样的CRN在某种意义上是可靠的,它是正确的,无论它的进化是由质量作用动力学的标准模型,希尔函数或Michaelis-Menten动力学等替代品,还是其他尊重(基本上是数字的)化学计量学约束(什么是反应物和产物?)的任意化学模型控制的。如果反应速率随时间任意变化,我们基于“可能发生的事情”这一广义概念开发了可达性关系。利用可达性,我们将稳定计算类比地定义为分布式计算中的概率1计算,并将其与广义速率定律下基于极限t→∞收敛的看似更强的速率无关计算概念联系起来。除了将浓度直接映射到非负模拟值之外,我们还考虑了“双轨表示”,它可以将负值表示为两个浓度的差值,并允许组成CRN模块。我们证明了一个函数是速率独立可计算的,当且仅当它是分段线性(具有有理系数)和连续(双轨道表示),或者非负的,只有当一些输入从零切换到正(直接表示)时才发生不连续。在许多情况下,连续分段线性函数是实现的强大目标,结合我们为计算这些函数而开发的系统结构,展示了速率无关化学计算的潜力。
{"title":"Rate-independent Computation in Continuous Chemical Reaction Networks","authors":"Ho-Lin Chen, David Doty, Wyatt Reeves, D. Soloveichik","doi":"10.1145/3590776","DOIUrl":"https://doi.org/10.1145/3590776","url":null,"abstract":"Understanding the algorithmic behaviors that are in principle realizable in a chemical system is necessary for a rigorous understanding of the design principles of biological regulatory networks. Further, advances in synthetic biology herald the time when we will be able to rationally engineer complex chemical systems and when idealized formal models will become blueprints for engineering. Coupled chemical interactions in a well-mixed solution are commonly formalized as chemical reaction networks (CRNs). However, despite the widespread use of CRNs in the natural sciences, the range of computational behaviors exhibited by CRNs is not well understood. Here, we study the following problem: What functions f : ℝk → ℝ can be computed by a CRN, in which the CRN eventually produces the correct amount of the “output” molecule, no matter the rate at which reactions proceed? This captures a previously unexplored but very natural class of computations: For example, the reaction X1 + X2 → Y can be thought to compute the function y = min (x1, x2). Such a CRN is robust in the sense that it is correct whether its evolution is governed by the standard model of mass-action kinetics, alternatives such as Hill-function or Michaelis-Menten kinetics, or other arbitrary models of chemistry that respect the (fundamentally digital) stoichiometric constraints (what are the reactants and products?). We develop a reachability relation based on a broad notion of “what could happen” if reaction rates can vary arbitrarily over time. Using reachability, we define stable computation analogously to probability 1 computation in distributed computing and connect it with a seemingly stronger notion of rate-independent computation based on convergence in the limit t → ∞ under a wide class of generalized rate laws. Besides the direct mapping of a concentration to a nonnegative analog value, we also consider the “dual-rail representation” that can represent negative values as the difference of two concentrations and allows the composition of CRN modules. We prove that a function is rate-independently computable if and only if it is piecewise linear (with rational coefficients) and continuous (dual-rail representation), or non-negative with discontinuities occurring only when some inputs switch from zero to positive (direct representation). The many contexts where continuous piecewise linear functions are powerful targets for implementation, combined with the systematic construction we develop for computing these functions, demonstrate the potential of rate-independent chemical computation.","PeriodicalId":50022,"journal":{"name":"Journal of the ACM","volume":"103 1","pages":"1 - 61"},"PeriodicalIF":2.5,"publicationDate":"2021-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73684848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We give a domain-theoretic semantics to a statistical programming language, using the plain old category of dcpos, in contrast to some more sophisticated recent proposals. Remarkably, our monad of minimal valuations is commutative, which allows for program transformations that permute the order of independent random draws, as one would expect. A similar property is not known for Jones and Plotkin’ s monad of continuous valuations. Instead of working with true real numbers, we work with exact real arithmetic, providing a bridge towards possible implementations. (Implementations by themselves are not addressed here.) Rather remarkably, we show that restricting ourselves to minimal valuations does not restrict us much: all measures on the real line can be modeled by minimal valuations on the domain (mathbf {I}mathbb {R}_bot ) of exact real arithmetic. We give three operational semantics for our language, and we show that they are all adequate with respect to the denotational semantics. We also explore quite a few examples in order to demonstrate that our semantics computes exactly as one would expect, and in order to debunk the myth that a semantics based on continuous maps would not be expressive enough to encode measures with non-compact support using only measures with compact support, or to encode measures via non-continuous density functions, for instance. Our examples also include some useful, non-trivial cases of distributions on higher-order objects.
{"title":"A Domain-Theoretic Approach to Statistical Programming Languages","authors":"J. Goubault-Larrecq, Xiaodong Jia, Clément Théron","doi":"10.1145/3611660","DOIUrl":"https://doi.org/10.1145/3611660","url":null,"abstract":"We give a domain-theoretic semantics to a statistical programming language, using the plain old category of dcpos, in contrast to some more sophisticated recent proposals. Remarkably, our monad of minimal valuations is commutative, which allows for program transformations that permute the order of independent random draws, as one would expect. A similar property is not known for Jones and Plotkin’ s monad of continuous valuations. Instead of working with true real numbers, we work with exact real arithmetic, providing a bridge towards possible implementations. (Implementations by themselves are not addressed here.) Rather remarkably, we show that restricting ourselves to minimal valuations does not restrict us much: all measures on the real line can be modeled by minimal valuations on the domain (mathbf {I}mathbb {R}_bot ) of exact real arithmetic. We give three operational semantics for our language, and we show that they are all adequate with respect to the denotational semantics. We also explore quite a few examples in order to demonstrate that our semantics computes exactly as one would expect, and in order to debunk the myth that a semantics based on continuous maps would not be expressive enough to encode measures with non-compact support using only measures with compact support, or to encode measures via non-continuous density functions, for instance. Our examples also include some useful, non-trivial cases of distributions on higher-order objects.","PeriodicalId":50022,"journal":{"name":"Journal of the ACM","volume":"2016 1","pages":""},"PeriodicalIF":2.5,"publicationDate":"2021-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82622227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Classically, data interpolation with a parametrized model class is possible as long as the number of parameters is larger than the number of equations to be satisfied. A puzzling phenomenon in deep learning is that models are trained with many more parameters than what this classical theory would suggest. We propose a partial theoretical explanation for this phenomenon. We prove that for a broad class of data distributions and model classes, overparametrization is necessary if one wants to interpolate the data smoothly. Namely we show that smooth interpolation requires d times more parameters than mere interpolation, where d is the ambient data dimension. We prove this universal law of robustness for any smoothly parametrized function class with polynomial size weights, and any covariate distribution verifying isoperimetry (or a mixture thereof). In the case of two-layer neural networks and Gaussian covariates, this law was conjectured in prior work by Bubeck, Li, and Nagaraj. We also give an interpretation of our result as an improved generalization bound for model classes consisting of smooth functions.
{"title":"A Universal Law of Robustness via Isoperimetry","authors":"Sébastien Bubeck, Mark Sellke","doi":"10.1145/3578580","DOIUrl":"https://doi.org/10.1145/3578580","url":null,"abstract":"Classically, data interpolation with a parametrized model class is possible as long as the number of parameters is larger than the number of equations to be satisfied. A puzzling phenomenon in deep learning is that models are trained with many more parameters than what this classical theory would suggest. We propose a partial theoretical explanation for this phenomenon. We prove that for a broad class of data distributions and model classes, overparametrization is necessary if one wants to interpolate the data smoothly. Namely we show that smooth interpolation requires d times more parameters than mere interpolation, where d is the ambient data dimension. We prove this universal law of robustness for any smoothly parametrized function class with polynomial size weights, and any covariate distribution verifying isoperimetry (or a mixture thereof). In the case of two-layer neural networks and Gaussian covariates, this law was conjectured in prior work by Bubeck, Li, and Nagaraj. We also give an interpretation of our result as an improved generalization bound for model classes consisting of smooth functions.","PeriodicalId":50022,"journal":{"name":"Journal of the ACM","volume":"19 1","pages":"1 - 18"},"PeriodicalIF":2.5,"publicationDate":"2021-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79273616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}