Aaron BembenekUniversity of Melbourne, Michael GreenbergStevens Institute of Technology, Stephen ChongHarvard University
{"title":"Making Formulog Fast: An Argument for Unconventional Datalog Evaluation (Extended Version)","authors":"Aaron BembenekUniversity of Melbourne, Michael GreenbergStevens Institute of Technology, Stephen ChongHarvard University","doi":"arxiv-2408.14017","DOIUrl":null,"url":null,"abstract":"By combining Datalog, SMT solving, and functional programming, the language\nFormulog provides an appealing mix of features for implementing SMT-based\nstatic analyses (e.g., refinement type checking, symbolic execution) in a\nnatural, declarative way. At the same time, the performance of its custom\nDatalog solver can be an impediment to using Formulog beyond prototyping -- a\ncommon problem for Datalog variants that aspire to solve large problem\ninstances. In this work we speed up Formulog evaluation, with surprising\nresults: while 2.2x speedups are obtained by using the conventional techniques\nfor high-performance Datalog (e.g., compilation, specialized data structures),\nthe big wins come by abandoning the central assumption in modern performant\nDatalog engines, semi-naive Datalog evaluation. In its place, we develop eager\nevaluation, a concurrent Datalog evaluation algorithm that explores the logical\ninference space via a depth-first traversal order. In practice, eager\nevaluation leads to an advantageous distribution of Formulog's SMT workload to\nexternal SMT solvers and improved SMT solving times: our eager evaluation\nextensions to the Formulog interpreter and Souffl\\'e's code generator achieve\nmean 5.2x and 7.6x speedups, respectively, over the optimized code generated by\noff-the-shelf Souffl\\'e on SMT-heavy Formulog benchmarks. Using compilation and eager evaluation, Formulog implementations of\nrefinement type checking, bottom-up pointer analysis, and symbolic execution\nachieve speedups on 20 out of 23 benchmarks over previously published,\nhand-tuned analyses written in F#, Java, and C++, providing strong evidence\nthat Formulog can be the basis of a realistic platform for SMT-based static\nanalysis. Moreover, our experience adds nuance to the conventional wisdom that\nsemi-naive evaluation is the one-size-fits-all best Datalog evaluation\nalgorithm for static analysis workloads.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"45 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.14017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
By combining Datalog, SMT solving, and functional programming, the language
Formulog provides an appealing mix of features for implementing SMT-based
static analyses (e.g., refinement type checking, symbolic execution) in a
natural, declarative way. At the same time, the performance of its custom
Datalog solver can be an impediment to using Formulog beyond prototyping -- a
common problem for Datalog variants that aspire to solve large problem
instances. In this work we speed up Formulog evaluation, with surprising
results: while 2.2x speedups are obtained by using the conventional techniques
for high-performance Datalog (e.g., compilation, specialized data structures),
the big wins come by abandoning the central assumption in modern performant
Datalog engines, semi-naive Datalog evaluation. In its place, we develop eager
evaluation, a concurrent Datalog evaluation algorithm that explores the logical
inference space via a depth-first traversal order. In practice, eager
evaluation leads to an advantageous distribution of Formulog's SMT workload to
external SMT solvers and improved SMT solving times: our eager evaluation
extensions to the Formulog interpreter and Souffl\'e's code generator achieve
mean 5.2x and 7.6x speedups, respectively, over the optimized code generated by
off-the-shelf Souffl\'e on SMT-heavy Formulog benchmarks. Using compilation and eager evaluation, Formulog implementations of
refinement type checking, bottom-up pointer analysis, and symbolic execution
achieve speedups on 20 out of 23 benchmarks over previously published,
hand-tuned analyses written in F#, Java, and C++, providing strong evidence
that Formulog can be the basis of a realistic platform for SMT-based static
analysis. Moreover, our experience adds nuance to the conventional wisdom that
semi-naive evaluation is the one-size-fits-all best Datalog evaluation
algorithm for static analysis workloads.