Bayesian specification learning for finding API usage errors

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering Pub Date : 2017-08-21 DOI:10.1145/3106237.3106284

V. Murali, Swarat Chaudhuri, C. Jermaine

{"title":"Bayesian specification learning for finding API usage errors","authors":"V. Murali, Swarat Chaudhuri, C. Jermaine","doi":"10.1145/3106237.3106284","DOIUrl":null,"url":null,"abstract":"We present a Bayesian framework for learning probabilistic specifications from large, unstructured code corpora, and then using these specifications to statically detect anomalous, hence likely buggy, program behavior. Our key insight is to build a statistical model that correlates all specifications hidden inside a corpus with the syntax and observed behavior of programs that implement these specifications. During the analysis of a particular program, this model is conditioned into a posterior distribution that prioritizes specifications that are relevant to the program. The problem of finding anomalies is now framed quantitatively, as a problem of computing a distance between a \"reference distribution\" over program behaviors that our model expects from the program, and the distribution over behaviors that the program actually produces. We implement our ideas in a system, called Salento, for finding anomalous API usage in Android programs. Salento learns specifications using a combination of a topic model and a neural network model. Our encouraging experimental results show that the system can automatically discover subtle errors in Android applications in the wild, and has high precision and recall compared to competing probabilistic approaches.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"08 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"46","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3106237.3106284","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 46

Abstract

We present a Bayesian framework for learning probabilistic specifications from large, unstructured code corpora, and then using these specifications to statically detect anomalous, hence likely buggy, program behavior. Our key insight is to build a statistical model that correlates all specifications hidden inside a corpus with the syntax and observed behavior of programs that implement these specifications. During the analysis of a particular program, this model is conditioned into a posterior distribution that prioritizes specifications that are relevant to the program. The problem of finding anomalies is now framed quantitatively, as a problem of computing a distance between a "reference distribution" over program behaviors that our model expects from the program, and the distribution over behaviors that the program actually produces. We implement our ideas in a system, called Salento, for finding anomalous API usage in Android programs. Salento learns specifications using a combination of a topic model and a neural network model. Our encouraging experimental results show that the system can automatically discover subtle errors in Android applications in the wild, and has high precision and recall compared to competing probabilistic approaches.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于查找API使用错误的贝叶斯规范学习

我们提出了一个贝叶斯框架，用于从大型非结构化代码语料库中学习概率规范，然后使用这些规范静态地检测异常，因此可能有错误的程序行为。我们的关键见解是建立一个统计模型，将隐藏在语料库中的所有规范与实现这些规范的程序的语法和观察到的行为联系起来。在分析一个特定的程序时，该模型被限制为一个后验分布，该分布优先考虑与程序相关的规范。发现异常的问题现在是定量的，作为计算我们的模型从程序中期望的程序行为的“参考分布”与程序实际产生的行为的分布之间的距离的问题。我们在一个名为Salento的系统中实现了我们的想法，用于发现Android程序中异常的API使用情况。Salento使用主题模型和神经网络模型的组合来学习规范。我们令人鼓舞的实验结果表明，该系统可以自动发现Android应用程序中的细微错误，与竞争的概率方法相比，具有较高的准确率和召回率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

自引率

0.00%

发文量

期刊最新文献

Serverless computing: economic and architectural impact The rising tide lifts all boats: the advancement of science in cyber security (invited talk) User- and analysis-driven context aware software development in mobile computing Continuous variable-specific resolutions of feature interactions Attributed variability models: outside the comfort zone