{"title":"分布式静态分析中的全弹性:以书法分析为例","authors":"D. Garbervetsky, Edgardo Zoppi, B. Livshits","doi":"10.1145/3106237.3106261","DOIUrl":null,"url":null,"abstract":"In this paper we present the design and implementation of a distributed, whole-program static analysis framework that is designed to scale with the size of the input. Our approach is based on the actor programming model and is deployed in the cloud. Our reliance on a cloud cluster provides a degree of elasticity for CPU, memory, and storage resources. To demonstrate the potential of our technique, we show how a typical call graph analysis can be implemented in a distributed setting. The vision that motivates this work is that every large-scale software repository such as GitHub, BitBucket, or Visual Studio Online will be able to perform static analysis on a large scale. We experimentally validate our implementation of the distributed call graph analysis using a combination of both synthetic and real benchmarks. To show scalability, we demonstrate how the analysis presented in this paper is able to handle inputs that are almost 10 million lines of code (LOC) in size, without running out of memory. Our results show that the analysis scales well in terms of memory pressure independently of the input size, as we add more virtual machines (VMs). As the number of worker VMs increases, we observe that the analysis time generally improves as well. Lastly, we demonstrate that querying the results can be performed with a median latency of 15 ms.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"133 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"Toward full elasticity in distributed static analysis: the case of callgraph analysis\",\"authors\":\"D. Garbervetsky, Edgardo Zoppi, B. Livshits\",\"doi\":\"10.1145/3106237.3106261\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we present the design and implementation of a distributed, whole-program static analysis framework that is designed to scale with the size of the input. Our approach is based on the actor programming model and is deployed in the cloud. Our reliance on a cloud cluster provides a degree of elasticity for CPU, memory, and storage resources. To demonstrate the potential of our technique, we show how a typical call graph analysis can be implemented in a distributed setting. The vision that motivates this work is that every large-scale software repository such as GitHub, BitBucket, or Visual Studio Online will be able to perform static analysis on a large scale. We experimentally validate our implementation of the distributed call graph analysis using a combination of both synthetic and real benchmarks. To show scalability, we demonstrate how the analysis presented in this paper is able to handle inputs that are almost 10 million lines of code (LOC) in size, without running out of memory. Our results show that the analysis scales well in terms of memory pressure independently of the input size, as we add more virtual machines (VMs). As the number of worker VMs increases, we observe that the analysis time generally improves as well. Lastly, we demonstrate that querying the results can be performed with a median latency of 15 ms.\",\"PeriodicalId\":313494,\"journal\":{\"name\":\"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering\",\"volume\":\"133 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3106237.3106261\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3106237.3106261","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21
摘要
在本文中,我们提出了一个分布式、全程序静态分析框架的设计和实现,该框架被设计为随输入的大小而扩展。我们的方法基于参与者编程模型,并部署在云中。我们对云集群的依赖为CPU、内存和存储资源提供了一定程度的弹性。为了演示我们技术的潜力,我们将展示如何在分布式设置中实现典型的调用图分析。激励这项工作的愿景是,每个大型软件存储库(如GitHub、BitBucket或Visual Studio Online)都能够大规模地执行静态分析。我们通过实验验证了分布式调用图分析的实现,使用了合成基准和真实基准的组合。为了展示可伸缩性,我们演示了本文中提供的分析如何能够处理大小接近1000万行代码(LOC)的输入,而不会耗尽内存。我们的结果表明,当我们添加更多虚拟机(vm)时,就内存压力而言,分析的伸缩性与输入大小无关。随着工作虚拟机数量的增加,我们观察到分析时间通常也会提高。最后,我们证明查询结果可以在中位延迟为15 ms的情况下执行。
Toward full elasticity in distributed static analysis: the case of callgraph analysis
In this paper we present the design and implementation of a distributed, whole-program static analysis framework that is designed to scale with the size of the input. Our approach is based on the actor programming model and is deployed in the cloud. Our reliance on a cloud cluster provides a degree of elasticity for CPU, memory, and storage resources. To demonstrate the potential of our technique, we show how a typical call graph analysis can be implemented in a distributed setting. The vision that motivates this work is that every large-scale software repository such as GitHub, BitBucket, or Visual Studio Online will be able to perform static analysis on a large scale. We experimentally validate our implementation of the distributed call graph analysis using a combination of both synthetic and real benchmarks. To show scalability, we demonstrate how the analysis presented in this paper is able to handle inputs that are almost 10 million lines of code (LOC) in size, without running out of memory. Our results show that the analysis scales well in terms of memory pressure independently of the input size, as we add more virtual machines (VMs). As the number of worker VMs increases, we observe that the analysis time generally improves as well. Lastly, we demonstrate that querying the results can be performed with a median latency of 15 ms.