{"title":"Bayesian specification learning for finding API usage errors","authors":"V. Murali, Swarat Chaudhuri, C. Jermaine","doi":"10.1145/3106237.3106284","DOIUrl":null,"url":null,"abstract":"We present a Bayesian framework for learning probabilistic specifications from large, unstructured code corpora, and then using these specifications to statically detect anomalous, hence likely buggy, program behavior. Our key insight is to build a statistical model that correlates all specifications hidden inside a corpus with the syntax and observed behavior of programs that implement these specifications. During the analysis of a particular program, this model is conditioned into a posterior distribution that prioritizes specifications that are relevant to the program. The problem of finding anomalies is now framed quantitatively, as a problem of computing a distance between a \"reference distribution\" over program behaviors that our model expects from the program, and the distribution over behaviors that the program actually produces. We implement our ideas in a system, called Salento, for finding anomalous API usage in Android programs. Salento learns specifications using a combination of a topic model and a neural network model. Our encouraging experimental results show that the system can automatically discover subtle errors in Android applications in the wild, and has high precision and recall compared to competing probabilistic approaches.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"08 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"46","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3106237.3106284","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 46
Abstract
We present a Bayesian framework for learning probabilistic specifications from large, unstructured code corpora, and then using these specifications to statically detect anomalous, hence likely buggy, program behavior. Our key insight is to build a statistical model that correlates all specifications hidden inside a corpus with the syntax and observed behavior of programs that implement these specifications. During the analysis of a particular program, this model is conditioned into a posterior distribution that prioritizes specifications that are relevant to the program. The problem of finding anomalies is now framed quantitatively, as a problem of computing a distance between a "reference distribution" over program behaviors that our model expects from the program, and the distribution over behaviors that the program actually produces. We implement our ideas in a system, called Salento, for finding anomalous API usage in Android programs. Salento learns specifications using a combination of a topic model and a neural network model. Our encouraging experimental results show that the system can automatically discover subtle errors in Android applications in the wild, and has high precision and recall compared to competing probabilistic approaches.