{"title":"Error mining: Bug detection through comparison with large code databases","authors":"Alexander Breckel","doi":"10.1109/MSR.2012.6224278","DOIUrl":null,"url":null,"abstract":"Bugs are hard to find. Static analysis tools are capable of systematically detecting predefined sets of errors, but extending them to find new error types requires a deep understanding of the underlying programming language. Manual reviews on the other hand, while being able to reveal more individual errors, require much more time. We present a new approach to automatically detect bugs through comparison with a large code database. The source file is analyzed for similar but slightly different code fragments in the database. Frequent occurrences of common differences indicate a potential bug that can be fixed by applying the modification back to the original source file. In this paper, we give an overview of the resulting algorithm and some important implementation details. We further evaluate the circumstances under which good detection rates can be achieved. The results demonstrate that consistently high detection rates of up to 50% are possible for certain error types across different programming languages.","PeriodicalId":383774,"journal":{"name":"2012 9th IEEE Working Conference on Mining Software Repositories (MSR)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 9th IEEE Working Conference on Mining Software Repositories (MSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSR.2012.6224278","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Bugs are hard to find. Static analysis tools are capable of systematically detecting predefined sets of errors, but extending them to find new error types requires a deep understanding of the underlying programming language. Manual reviews on the other hand, while being able to reveal more individual errors, require much more time. We present a new approach to automatically detect bugs through comparison with a large code database. The source file is analyzed for similar but slightly different code fragments in the database. Frequent occurrences of common differences indicate a potential bug that can be fixed by applying the modification back to the original source file. In this paper, we give an overview of the resulting algorithm and some important implementation details. We further evaluate the circumstances under which good detection rates can be achieved. The results demonstrate that consistently high detection rates of up to 50% are possible for certain error types across different programming languages.