{"title":"Fast Graph Simplification for Interleaved-Dyck Reachability","authors":"Yuanbo Li, Qirun Zhang, Thomas Reps","doi":"https://dl.acm.org/doi/full/10.1145/3492428","DOIUrl":null,"url":null,"abstract":"<p>Many program-analysis problems can be formulated as graph-reachability problems. Interleaved Dyck language reachability (<span>InterDyck</span>-reachability) is a fundamental framework to express a wide variety of program-analysis problems over edge-labeled graphs. The <span>InterDyck</span> language represents an intersection of multiple matched-parenthesis languages (i.e., Dyck languages). In practice, program analyses typically leverage one Dyck language to achieve context-sensitivity, and other Dyck languages to model data dependencies, such as field-sensitivity and pointer references/dereferences. In the ideal case, an <span>InterDyck</span>-reachability framework should model multiple Dyck languages <i>simultaneously</i>.</p><p>Unfortunately, precise <span>InterDyck</span>-reachability is undecidable. Any practical solution must over-approximate the exact answer. In the literature, a lot of work has been proposed to over-approximate the <span>InterDyck</span>-reachability formulation. This article offers a new perspective on improving both the precision and the scalability of <span>InterDyck</span>-reachability: we aim at simplifying the underlying input graph <i>G</i>. Our key insight is based on the observation that if an edge is not contributing to any <span>InterDyck</span>-paths, we can safely eliminate it from <i>G</i>. Our technique is orthogonal to the <span>InterDyck</span>-reachability formulation and can serve as a pre-processing step with any over-approximating approach for <span>InterDyck</span>-reachability. We have applied our graph simplification algorithm to pre-processing the graphs from a recent <span>InterDyck</span>-reachability-based taint analysis for Android. Our evaluation of three popular <span>InterDyck</span>-reachability algorithms yields promising results. In particular, our graph-simplification method improves both the scalability and precision of all three <span>InterDyck</span>-reachability algorithms, sometimes dramatically.</p>","PeriodicalId":50939,"journal":{"name":"ACM Transactions on Programming Languages and Systems","volume":"1 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2022-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Programming Languages and Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/full/10.1145/3492428","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Many program-analysis problems can be formulated as graph-reachability problems. Interleaved Dyck language reachability (InterDyck-reachability) is a fundamental framework to express a wide variety of program-analysis problems over edge-labeled graphs. The InterDyck language represents an intersection of multiple matched-parenthesis languages (i.e., Dyck languages). In practice, program analyses typically leverage one Dyck language to achieve context-sensitivity, and other Dyck languages to model data dependencies, such as field-sensitivity and pointer references/dereferences. In the ideal case, an InterDyck-reachability framework should model multiple Dyck languages simultaneously.
Unfortunately, precise InterDyck-reachability is undecidable. Any practical solution must over-approximate the exact answer. In the literature, a lot of work has been proposed to over-approximate the InterDyck-reachability formulation. This article offers a new perspective on improving both the precision and the scalability of InterDyck-reachability: we aim at simplifying the underlying input graph G. Our key insight is based on the observation that if an edge is not contributing to any InterDyck-paths, we can safely eliminate it from G. Our technique is orthogonal to the InterDyck-reachability formulation and can serve as a pre-processing step with any over-approximating approach for InterDyck-reachability. We have applied our graph simplification algorithm to pre-processing the graphs from a recent InterDyck-reachability-based taint analysis for Android. Our evaluation of three popular InterDyck-reachability algorithms yields promising results. In particular, our graph-simplification method improves both the scalability and precision of all three InterDyck-reachability algorithms, sometimes dramatically.
期刊介绍:
ACM Transactions on Programming Languages and Systems (TOPLAS) is the premier journal for reporting recent research advances in the areas of programming languages, and systems to assist the task of programming. Papers can be either theoretical or experimental in style, but in either case, they must contain innovative and novel content that advances the state of the art of programming languages and systems. We also invite strictly experimental papers that compare existing approaches, as well as tutorial and survey papers. The scope of TOPLAS includes, but is not limited to, the following subjects:
language design for sequential and parallel programming
programming language implementation
programming language semantics
compilers and interpreters
runtime systems for program execution
storage allocation and garbage collection
languages and methods for writing program specifications
languages and methods for secure and reliable programs
testing and verification of programs