Formally verifying system properties is one of the most effective ways of improving system quality, but its high manual effort requirements often render it prohibitively expensive. Tools that automate formal verification by learning from proof corpora to synthesize proofs have just begun to show their promise. These tools are effective because of the richness of the data the proof corpora contain. This richness comes from the stylistic conventions followed by communities of proof developers, together with the powerful logical systems beneath proof assistants. However, this richness remains underexploited, with most work thus far focusing on architecture rather than on how to make the most of the proof data. This article systematically explores how to most effectively exploit one aspect of that proof data: identifiers.
We develop the Passport approach, a method for enriching the predictive Coq model used by an existing proof-synthesis tool with three new encoding mechanisms for identifiers: category vocabulary indexing, subword sequence modeling, and path elaboration. We evaluate our approach’s enrichment effect on three existing base tools: ASTactic, Tac, and Tok. In head-to-head comparisons, Passport automatically proves 29% more theorems than the best-performing of these base tools. Combining the three tools enhanced by the Passport approach automatically proves 38% more theorems than combining the three base tools. Finally, together, these base tools and their enhanced versions prove 45% more theorems than the combined base tools. Overall, our findings suggest that modeling identifiers can play a significant role in improving proof synthesis, leading to higher-quality software.
{"title":"Passport: Improving Automated Formal Verification Using Identifiers","authors":"Alex Sanchez-Stern, Emily First, Timothy Zhou, Zhanna Kaufman, Yuriy Brun, Talia Ringer","doi":"https://dl.acm.org/doi/10.1145/3593374","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3593374","url":null,"abstract":"<p>Formally verifying system properties is one of the most effective ways of improving system quality, but its high manual effort requirements often render it prohibitively expensive. Tools that automate formal verification by learning from proof corpora to synthesize proofs have just begun to show their promise. These tools are effective because of the richness of the data the proof corpora contain. This richness comes from the stylistic conventions followed by communities of proof developers, together with the powerful logical systems beneath proof assistants. However, this richness remains underexploited, with most work thus far focusing on architecture rather than on how to make the most of the proof data. This article systematically explores how to most effectively exploit one aspect of that proof data: identifiers.</p><p>We develop the Passport approach, a method for enriching the predictive Coq model used by an existing proof-synthesis tool with three new encoding mechanisms for identifiers: category vocabulary indexing, subword sequence modeling, and path elaboration. We evaluate our approach’s enrichment effect on three existing base tools: ASTactic, Tac, and Tok. In head-to-head comparisons, Passport automatically proves 29% more theorems than the best-performing of these base tools. Combining the three tools enhanced by the Passport approach automatically proves 38% more theorems than combining the three base tools. Finally, together, these base tools and their enhanced versions prove 45% more theorems than the combined base tools. Overall, our findings suggest that modeling identifiers can play a significant role in improving proof synthesis, leading to higher-quality software.</p>","PeriodicalId":50939,"journal":{"name":"ACM Transactions on Programming Languages and Systems","volume":"264 10","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-26DOI: https://dl.acm.org/doi/10.1145/3594736
Luigi Soares, Michael Canesche, Fernando Magno Quintão Pereira
Partial control-flow linearization is a code transformation conceived to maximize work performed in vectorized programs. In this article, we find a new service for it. We show that partial control-flow linearization protects programs against timing attacks. This transformation is sound: Given an instance of its public inputs, the partially linearized program always runs the same sequence of instructions, regardless of secret inputs. Incidentally, if the original program is publicly safe, then accesses to the data cache will be data oblivious in the transformed code. The transformation is optimal: Every branch that depends on some secret data is linearized; no branch that depends on only public data is linearized. Therefore, the transformation preserves loops that depend exclusively on public information. If every branch that leaves a loop depends on secret data, then the transformed program will not terminate. Our transformation extends previous work in non-trivial ways. It handles C constructs such as “goto,” “break,” “switch,” and “continue,” which are absent in the FaCT domain-specific language (2018). Like Constantine (2021), our transformation ensures operation invariance but without requiring profiling information. Additionally, in contrast to SC-Eliminator (2018) and Lif (2021), it handles programs containing loops whose trip count is not known at compilation time.
{"title":"Side-channel Elimination via Partial Control-flow Linearization","authors":"Luigi Soares, Michael Canesche, Fernando Magno Quintão Pereira","doi":"https://dl.acm.org/doi/10.1145/3594736","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3594736","url":null,"abstract":"<p>Partial control-flow linearization is a code transformation conceived to maximize work performed in vectorized programs. In this article, we find a new service for it. We show that partial control-flow linearization protects programs against timing attacks. This transformation is sound: Given an instance of its public inputs, the partially linearized program always runs the same sequence of instructions, regardless of secret inputs. Incidentally, if the original program is publicly safe, then accesses to the data cache will be data oblivious in the transformed code. The transformation is optimal: Every branch that depends on some secret data is linearized; no branch that depends on only public data is linearized. Therefore, the transformation preserves loops that depend exclusively on public information. If every branch that leaves a loop depends on secret data, then the transformed program will not terminate. Our transformation extends previous work in non-trivial ways. It handles C constructs such as “goto,” “break,” “switch,” and “continue,” which are absent in the FaCT domain-specific language (2018). Like Constantine (2021), our transformation ensures operation invariance but without requiring profiling information. Additionally, in contrast to SC-Eliminator (2018) and Lif (2021), it handles programs containing loops whose trip count is not known at compilation time.</p>","PeriodicalId":50939,"journal":{"name":"ACM Transactions on Programming Languages and Systems","volume":"258 8","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-26DOI: https://dl.acm.org/doi/10.1145/3591473
Matteo Basso, Aleksandar Prokopec, Andrea Rosà, Walter Binder
Tracking specific events in a program’s execution, such as object allocation or lock acquisition, is at the heart of dynamic analysis. Despite the apparent simplicity of this task, quantifying these events is challenging due to the presence of compiler optimizations. Profiling perturbs the optimizations that the compiler would normally do—a profiled program usually behaves differently than the original one.
In this article, we propose a novel technique for quantifying compiler-internal events in the optimized code, reducing the profiling perturbation on compiler optimizations. Our technique achieves this by instrumenting the program from within the compiler, and by delaying the instrumentation until the point in the compilation pipeline after which no subsequent optimizations can remove the events. We propose two different implementation strategies of our technique based on path-profiling, and a modification to the standard path-profiling algorithm that facilitates the use of the proposed strategies in a modern just-in-time (JIT) compiler. We use our technique to analyze the behaviour of the optimizations in Graal, a state-of-the-art compiler for the Java Virtual Machine, identifying the reasons behind a performance improvement of a specific optimization, and the causes behind an unexpected slowdown of another. Finally, our evaluation results show that the two proposed implementations result in a significantly lower execution-time overhead w.r.t. a naive implementation.
{"title":"Optimization-Aware Compiler-Level Event Profiling","authors":"Matteo Basso, Aleksandar Prokopec, Andrea Rosà, Walter Binder","doi":"https://dl.acm.org/doi/10.1145/3591473","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3591473","url":null,"abstract":"<p>Tracking specific events in a program’s execution, such as object allocation or lock acquisition, is at the heart of dynamic analysis. Despite the apparent simplicity of this task, quantifying these events is challenging due to the presence of compiler optimizations. Profiling perturbs the optimizations that the compiler would normally do—a profiled program usually behaves differently than the original one.</p><p>In this article, we propose a novel technique for quantifying compiler-internal events in the optimized code, reducing the profiling perturbation on compiler optimizations. Our technique achieves this by instrumenting the program from within the compiler, and by delaying the instrumentation until the point in the compilation pipeline after which no subsequent optimizations can remove the events. We propose two different implementation strategies of our technique based on path-profiling, and a modification to the standard path-profiling algorithm that facilitates the use of the proposed strategies in a modern <b>just-in-time (JIT)</b> compiler. We use our technique to analyze the behaviour of the optimizations in Graal, a state-of-the-art compiler for the Java Virtual Machine, identifying the reasons behind a performance improvement of a specific optimization, and the causes behind an unexpected slowdown of another. Finally, our evaluation results show that the two proposed implementations result in a significantly lower execution-time overhead w.r.t. a naive implementation.</p>","PeriodicalId":50939,"journal":{"name":"ACM Transactions on Programming Languages and Systems","volume":"261 8","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-17DOI: https://dl.acm.org/doi/10.1145/3589207
Matías Toro, David Darais, Chike Abuah, Joseph P. Near, Damián Árquez, Federico Olmedo, Éric Tanter
Language support for differentially private programming is both crucial and delicate. While elaborate program logics can be very expressive, type-system-based approaches using linear types tend to be more lightweight and amenable to automatic checking and inference, and in particular in the presence of higher-order programming. Since the seminal design of Fuzz, which is restricted to ϵ-differential privacy in its original design, significant progress has been made to support more advanced variants of differential privacy, like (ϵ, δ)-differential privacy. However, supporting these advanced privacy variants while also supporting higher-order programming in full has proven to be challenging. We present Jazz, a language and type system that uses linear types and latent contextual effects to support both advanced variants of differential privacy and higher-order programming. Latent contextual effects allow delaying the payment of effects for connectives such as products, sums, and functions, yielding advantages in terms of precision of the analysis and annotation burden upon elimination, as well as modularity. We formalize the core of Jazz, prove it sound for privacy via a logical relation for metric preservation, and illustrate its expressive power through a number of case studies drawn from the recent differential privacy literature.
{"title":"Contextual Linear Types for Differential Privacy","authors":"Matías Toro, David Darais, Chike Abuah, Joseph P. Near, Damián Árquez, Federico Olmedo, Éric Tanter","doi":"https://dl.acm.org/doi/10.1145/3589207","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3589207","url":null,"abstract":"<p>Language support for differentially private programming is both crucial and delicate. While elaborate program logics can be very expressive, type-system-based approaches using linear types tend to be more lightweight and amenable to automatic checking and inference, and in particular in the presence of higher-order programming. Since the seminal design of <span>Fuzz</span>, which is restricted to ϵ-differential privacy in its original design, significant progress has been made to support more advanced variants of differential privacy, like (ϵ, <i>δ</i>)-differential privacy. However, supporting these advanced privacy variants while also supporting higher-order programming in full has proven to be challenging. We present <span>Jazz</span>, a language and type system that uses linear types and latent contextual effects to support both advanced variants of differential privacy and higher-order programming. Latent contextual effects allow delaying the payment of effects for connectives such as products, sums, and functions, yielding advantages in terms of precision of the analysis and annotation burden upon elimination, as well as modularity. We formalize the core of <span>Jazz</span>, prove it sound for privacy via a logical relation for metric preservation, and illustrate its expressive power through a number of case studies drawn from the recent differential privacy literature.</p>","PeriodicalId":50939,"journal":{"name":"ACM Transactions on Programming Languages and Systems","volume":"261 5","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-15DOI: https://dl.acm.org/doi/10.1145/3583057
Adithya Murali, Lucas Peña, Christof Löding, P. Madhusudan
We propose a novel logic, Frame Logic (FL), that extends first-order logic and recursive definitions with a construct Sp(·) that captures the implicit supports of formulas—the precise subset of the universe upon which their meaning depends. Using such supports, we formulate proof rules that facilitate frame reasoning elegantly when the underlying model undergoes change. We show that the logic is expressive by capturing several data-structures and also exhibit a translation from a precise fragment of separation logic to frame logic. Finally, we design a program logic based on frame logic for reasoning with programs that dynamically update heaps that facilitates local specifications and frame reasoning. This program logic consists of both localized proof rules as well as rules that derive the weakest tightest preconditions in frame logic.
{"title":"A First-order Logic with Frames","authors":"Adithya Murali, Lucas Peña, Christof Löding, P. Madhusudan","doi":"https://dl.acm.org/doi/10.1145/3583057","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3583057","url":null,"abstract":"<p>We propose a novel logic, <i>Frame Logic</i> (FL), that extends first-order logic and recursive definitions with a construct <i>Sp</i>(·) that captures the <i>implicit supports</i> of formulas—the precise subset of the universe upon which their meaning depends. Using such supports, we formulate proof rules that facilitate frame reasoning elegantly when the underlying model undergoes change. We show that the logic is expressive by capturing several data-structures and also exhibit a translation from a <i>precise</i> fragment of separation logic to frame logic. Finally, we design a program logic based on frame logic for reasoning with programs that dynamically update heaps that facilitates local specifications and frame reasoning. This program logic consists of both localized proof rules as well as rules that derive the weakest tightest preconditions in frame logic.</p>","PeriodicalId":50939,"journal":{"name":"ACM Transactions on Programming Languages and Systems","volume":"262 11","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-15DOI: https://dl.acm.org/doi/10.1145/3591472
Xiaodong Jia, Ashish Kumar, Gang Tan
In this article, we present a derivative-based, functional recognizer and parser generator for visibly pushdown grammars. The generated parser accepts ambiguous grammars and produces a parse forest containing all valid parse trees for an input string in linear time. Each parse tree in the forest can then be extracted also in linear time. Besides the parser generator, to allow more flexible forms of the visibly pushdown grammars, we also present a translator that converts a tagged CFG to a visibly pushdown grammar in a sound way, and the parse trees of the tagged CFG are further produced by running the semantic actions embedded in the parse trees of the translated visibly pushdown grammar. The performance of the parser is compared with popular parsing tools, including ANTLR, GNU Bison, and other popular hand-crafted parsers. The correctness and the time complexity of the core parsing algorithm are formally verified in the proof assistant Coq.
{"title":"A Derivative-based Parser Generator for Visibly Pushdown Grammars","authors":"Xiaodong Jia, Ashish Kumar, Gang Tan","doi":"https://dl.acm.org/doi/10.1145/3591472","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3591472","url":null,"abstract":"<p>In this article, we present a derivative-based, functional recognizer and parser generator for visibly pushdown grammars. The generated parser accepts ambiguous grammars and produces a parse forest containing all valid parse trees for an input string in linear time. Each parse tree in the forest can then be extracted also in linear time. Besides the parser generator, to allow more flexible forms of the visibly pushdown grammars, we also present a translator that converts a tagged CFG to a visibly pushdown grammar in a sound way, and the parse trees of the tagged CFG are further produced by running the semantic actions embedded in the parse trees of the translated visibly pushdown grammar. The performance of the parser is compared with popular parsing tools, including ANTLR, GNU Bison, and other popular hand-crafted parsers. The correctness and the time complexity of the core parsing algorithm are formally verified in the proof assistant Coq.</p>","PeriodicalId":50939,"journal":{"name":"ACM Transactions on Programming Languages and Systems","volume":"264 2","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luigi Soares, Michael Canesche, Fernando Magno Quintão Pereira
Partial control-flow linearization is a code transformation conceived to maximize work performed in vectorized programs. In this article, we find a new service for it. We show that partial control-flow linearization protects programs against timing attacks. This transformation is sound: Given an instance of its public inputs, the partially linearized program always runs the same sequence of instructions, regardless of secret inputs. Incidentally, if the original program is publicly safe, then accesses to the data cache will be data oblivious in the transformed code. The transformation is optimal: Every branch that depends on some secret data is linearized; no branch that depends on only public data is linearized. Therefore, the transformation preserves loops that depend exclusively on public information. If every branch that leaves a loop depends on secret data, then the transformed program will not terminate. Our transformation extends previous work in non-trivial ways. It handles C constructs such as “goto,” “break,” “switch,” and “continue,” which are absent in the FaCT domain-specific language (2018). Like Constantine (2021), our transformation ensures operation invariance but without requiring profiling information. Additionally, in contrast to SC-Eliminator (2018) and Lif (2021), it handles programs containing loops whose trip count is not known at compilation time.
{"title":"Side-channel Elimination via Partial Control-flow Linearization","authors":"Luigi Soares, Michael Canesche, Fernando Magno Quintão Pereira","doi":"10.1145/3594736","DOIUrl":"https://doi.org/10.1145/3594736","url":null,"abstract":"Partial control-flow linearization is a code transformation conceived to maximize work performed in vectorized programs. In this article, we find a new service for it. We show that partial control-flow linearization protects programs against timing attacks. This transformation is sound: Given an instance of its public inputs, the partially linearized program always runs the same sequence of instructions, regardless of secret inputs. Incidentally, if the original program is publicly safe, then accesses to the data cache will be data oblivious in the transformed code. The transformation is optimal: Every branch that depends on some secret data is linearized; no branch that depends on only public data is linearized. Therefore, the transformation preserves loops that depend exclusively on public information. If every branch that leaves a loop depends on secret data, then the transformed program will not terminate. Our transformation extends previous work in non-trivial ways. It handles C constructs such as “goto,” “break,” “switch,” and “continue,” which are absent in the FaCT domain-specific language (2018). Like Constantine (2021), our transformation ensures operation invariance but without requiring profiling information. Additionally, in contrast to SC-Eliminator (2018) and Lif (2021), it handles programs containing loops whose trip count is not known at compilation time.","PeriodicalId":50939,"journal":{"name":"ACM Transactions on Programming Languages and Systems","volume":"45 1","pages":"1 - 43"},"PeriodicalIF":1.3,"publicationDate":"2023-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41499424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article introduces two new approaches in the areas of lexical analysis and context-free parsing. We present an extension, MGLL, of generalised parsing which allows multiple input strings to be parsed together efficiently, and we present an enhanced approach to lexical analysis which exploits this multiple parsing capability. The work provides new power to formal language specification and disambiguation, and brings new techniques into the historically well-studied areas of lexical and syntax analysis. It encompasses character-level parsing at one extreme and the classical LEX/YACC style division at the other, allowing the advantages of both approaches.
{"title":"Multiple Input Parsing and Lexical Analysis","authors":"E. Scott, A. Johnstone, R. Walsh","doi":"10.1145/3594734","DOIUrl":"https://doi.org/10.1145/3594734","url":null,"abstract":"This article introduces two new approaches in the areas of lexical analysis and context-free parsing. We present an extension, MGLL, of generalised parsing which allows multiple input strings to be parsed together efficiently, and we present an enhanced approach to lexical analysis which exploits this multiple parsing capability. The work provides new power to formal language specification and disambiguation, and brings new techniques into the historically well-studied areas of lexical and syntax analysis. It encompasses character-level parsing at one extreme and the classical LEX/YACC style division at the other, allowing the advantages of both approaches.","PeriodicalId":50939,"journal":{"name":"ACM Transactions on Programming Languages and Systems","volume":"45 1","pages":"1 - 44"},"PeriodicalIF":1.3,"publicationDate":"2023-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46875570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matteo Basso, Aleksandar Prokopec, Andrea Rosà, Walter Binder
Tracking specific events in a program’s execution, such as object allocation or lock acquisition, is at the heart of dynamic analysis. Despite the apparent simplicity of this task, quantifying these events is challenging due to the presence of compiler optimizations. Profiling perturbs the optimizations that the compiler would normally do—a profiled program usually behaves differently than the original one. In this article, we propose a novel technique for quantifying compiler-internal events in the optimized code, reducing the profiling perturbation on compiler optimizations. Our technique achieves this by instrumenting the program from within the compiler, and by delaying the instrumentation until the point in the compilation pipeline after which no subsequent optimizations can remove the events. We propose two different implementation strategies of our technique based on path-profiling, and a modification to the standard path-profiling algorithm that facilitates the use of the proposed strategies in a modern just-in-time (JIT) compiler. We use our technique to analyze the behaviour of the optimizations in Graal, a state-of-the-art compiler for the Java Virtual Machine, identifying the reasons behind a performance improvement of a specific optimization, and the causes behind an unexpected slowdown of another. Finally, our evaluation results show that the two proposed implementations result in a significantly lower execution-time overhead w.r.t. a naive implementation.
{"title":"Optimization-Aware Compiler-Level Event Profiling","authors":"Matteo Basso, Aleksandar Prokopec, Andrea Rosà, Walter Binder","doi":"10.1145/3591473","DOIUrl":"https://doi.org/10.1145/3591473","url":null,"abstract":"Tracking specific events in a program’s execution, such as object allocation or lock acquisition, is at the heart of dynamic analysis. Despite the apparent simplicity of this task, quantifying these events is challenging due to the presence of compiler optimizations. Profiling perturbs the optimizations that the compiler would normally do—a profiled program usually behaves differently than the original one. In this article, we propose a novel technique for quantifying compiler-internal events in the optimized code, reducing the profiling perturbation on compiler optimizations. Our technique achieves this by instrumenting the program from within the compiler, and by delaying the instrumentation until the point in the compilation pipeline after which no subsequent optimizations can remove the events. We propose two different implementation strategies of our technique based on path-profiling, and a modification to the standard path-profiling algorithm that facilitates the use of the proposed strategies in a modern just-in-time (JIT) compiler. We use our technique to analyze the behaviour of the optimizations in Graal, a state-of-the-art compiler for the Java Virtual Machine, identifying the reasons behind a performance improvement of a specific optimization, and the causes behind an unexpected slowdown of another. Finally, our evaluation results show that the two proposed implementations result in a significantly lower execution-time overhead w.r.t. a naive implementation.","PeriodicalId":50939,"journal":{"name":"ACM Transactions on Programming Languages and Systems","volume":"45 1","pages":"1 - 50"},"PeriodicalIF":1.3,"publicationDate":"2023-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49661240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this article, we present a derivative-based, functional recognizer and parser generator for visibly pushdown grammars. The generated parser accepts ambiguous grammars and produces a parse forest containing all valid parse trees for an input string in linear time. Each parse tree in the forest can then be extracted also in linear time. Besides the parser generator, to allow more flexible forms of the visibly pushdown grammars, we also present a translator that converts a tagged CFG to a visibly pushdown grammar in a sound way, and the parse trees of the tagged CFG are further produced by running the semantic actions embedded in the parse trees of the translated visibly pushdown grammar. The performance of the parser is compared with popular parsing tools, including ANTLR, GNU Bison, and other popular hand-crafted parsers. The correctness and the time complexity of the core parsing algorithm are formally verified in the proof assistant Coq.
{"title":"A Derivative-based Parser Generator for Visibly Pushdown Grammars","authors":"Xiaodong Jia, Ashish Kumar, Gang Tan","doi":"10.1145/3591472","DOIUrl":"https://doi.org/10.1145/3591472","url":null,"abstract":"In this article, we present a derivative-based, functional recognizer and parser generator for visibly pushdown grammars. The generated parser accepts ambiguous grammars and produces a parse forest containing all valid parse trees for an input string in linear time. Each parse tree in the forest can then be extracted also in linear time. Besides the parser generator, to allow more flexible forms of the visibly pushdown grammars, we also present a translator that converts a tagged CFG to a visibly pushdown grammar in a sound way, and the parse trees of the tagged CFG are further produced by running the semantic actions embedded in the parse trees of the translated visibly pushdown grammar. The performance of the parser is compared with popular parsing tools, including ANTLR, GNU Bison, and other popular hand-crafted parsers. The correctness and the time complexity of the core parsing algorithm are formally verified in the proof assistant Coq.","PeriodicalId":50939,"journal":{"name":"ACM Transactions on Programming Languages and Systems","volume":"45 1","pages":"1 - 68"},"PeriodicalIF":1.3,"publicationDate":"2023-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44937025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}