CodeHow:基于API理解和扩展布尔模型的有效代码搜索(E)

2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE) Pub Date : 2015-11-09 DOI:10.1109/ASE.2015.42

Fei Lv, Hongyu Zhang, Jian-Guang Lou, Shaowei Wang, D. Zhang, Jianjun Zhao

{"title":"CodeHow:基于API理解和扩展布尔模型的有效代码搜索(E)","authors":"Fei Lv, Hongyu Zhang, Jian-Guang Lou, Shaowei Wang, D. Zhang, Jianjun Zhao","doi":"10.1109/ASE.2015.42","DOIUrl":null,"url":null,"abstract":"Over the years of software development, a vast amount of source code has been accumulated. Many code search tools were proposed to help programmers reuse previously-written code by performing free-text queries over a large-scale codebase. Our experience shows that the accuracy of these code search tools are often unsatisfactory. One major reason is that existing tools lack of query understanding ability. In this paper, we propose CodeHow, a code search technique that can recognize potential APIs a user query refers to. Having understood the potentially relevant APIs, CodeHow expands the query with the APIs and performs code retrieval by applying the Extended Boolean model, which considers the impact of both text similarity and potential APIs on code search. We deploy the backend of CodeHow as a Microsoft Azure service and implement the front-end as a Visual Studio extension. We evaluate CodeHow on a large-scale codebase consisting of 26K C# projects downloaded from GitHub. The experimental results show that when the top 1 results are inspected, CodeHow achieves a precision score of 0.794 (i.e., 79.4% of the first returned results are relevant code snippets). The results also show that CodeHow outperforms conventional code search tools. Furthermore, we perform a controlled experiment and a survey of Microsoft developers. The results confirm the usefulness and effectiveness of CodeHow in programming practices.","PeriodicalId":6586,"journal":{"name":"2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"19 1","pages":"260-270"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"218","resultStr":"{\"title\":\"CodeHow: Effective Code Search Based on API Understanding and Extended Boolean Model (E)\",\"authors\":\"Fei Lv, Hongyu Zhang, Jian-Guang Lou, Shaowei Wang, D. Zhang, Jianjun Zhao\",\"doi\":\"10.1109/ASE.2015.42\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Over the years of software development, a vast amount of source code has been accumulated. Many code search tools were proposed to help programmers reuse previously-written code by performing free-text queries over a large-scale codebase. Our experience shows that the accuracy of these code search tools are often unsatisfactory. One major reason is that existing tools lack of query understanding ability. In this paper, we propose CodeHow, a code search technique that can recognize potential APIs a user query refers to. Having understood the potentially relevant APIs, CodeHow expands the query with the APIs and performs code retrieval by applying the Extended Boolean model, which considers the impact of both text similarity and potential APIs on code search. We deploy the backend of CodeHow as a Microsoft Azure service and implement the front-end as a Visual Studio extension. We evaluate CodeHow on a large-scale codebase consisting of 26K C# projects downloaded from GitHub. The experimental results show that when the top 1 results are inspected, CodeHow achieves a precision score of 0.794 (i.e., 79.4% of the first returned results are relevant code snippets). The results also show that CodeHow outperforms conventional code search tools. Furthermore, we perform a controlled experiment and a survey of Microsoft developers. The results confirm the usefulness and effectiveness of CodeHow in programming practices.\",\"PeriodicalId\":6586,\"journal\":{\"name\":\"2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)\",\"volume\":\"19 1\",\"pages\":\"260-270\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"218\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASE.2015.42\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASE.2015.42","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 218

摘要

经过多年的软件开发，已经积累了大量的源代码。许多代码搜索工具被提出，通过在大规模代码库上执行自由文本查询来帮助程序员重用以前编写的代码。我们的经验表明，这些代码搜索工具的准确性往往不令人满意。一个主要原因是现有工具缺乏查询理解能力。在本文中，我们提出了CodeHow，这是一种代码搜索技术，可以识别用户查询所引用的潜在api。在理解了潜在的相关api之后，CodeHow使用api扩展查询，并通过应用Extended Boolean模型执行代码检索，该模型考虑了文本相似性和潜在api对代码搜索的影响。我们将CodeHow的后端部署为Microsoft Azure服务，并将前端实现为Visual Studio扩展。我们在一个由从GitHub下载的26K c#项目组成的大规模代码库上评估CodeHow。实验结果表明，当检查前1个结果时，CodeHow达到了0.794的精度分数(即79.4%的第一个返回结果是相关的代码片段)。结果还表明，CodeHow优于传统的代码搜索工具。此外，我们进行了一个控制实验和微软开发人员的调查。结果证实了CodeHow在编程实践中的有用性和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CodeHow: Effective Code Search Based on API Understanding and Extended Boolean Model (E)

Over the years of software development, a vast amount of source code has been accumulated. Many code search tools were proposed to help programmers reuse previously-written code by performing free-text queries over a large-scale codebase. Our experience shows that the accuracy of these code search tools are often unsatisfactory. One major reason is that existing tools lack of query understanding ability. In this paper, we propose CodeHow, a code search technique that can recognize potential APIs a user query refers to. Having understood the potentially relevant APIs, CodeHow expands the query with the APIs and performs code retrieval by applying the Extended Boolean model, which considers the impact of both text similarity and potential APIs on code search. We deploy the backend of CodeHow as a Microsoft Azure service and implement the front-end as a Visual Studio extension. We evaluate CodeHow on a large-scale codebase consisting of 26K C# projects downloaded from GitHub. The experimental results show that when the top 1 results are inspected, CodeHow achieves a precision score of 0.794 (i.e., 79.4% of the first returned results are relevant code snippets). The results also show that CodeHow outperforms conventional code search tools. Furthermore, we perform a controlled experiment and a survey of Microsoft developers. The results confirm the usefulness and effectiveness of CodeHow in programming practices.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)

自引率

0.00%

发文量

期刊最新文献

Cost-Efficient Sampling for Performance Prediction of Configurable Systems (T) Refactorings for Android Asynchronous Programming Study and Refactoring of Android Asynchronous Programming (T) The iMPAcT Tool: Testing UI Patterns on Mobile Applications Combining Deep Learning with Information Retrieval to Localize Buggy Files for Bug Reports (N)