类型和查询扩展的使用是否有助于改进大规模代码搜索?

Otávio Augusto Lazzarini Lemos, A. C. D. Paula, Hitesh Sajnani, C. Lopes
{"title":"类型和查询扩展的使用是否有助于改进大规模代码搜索?","authors":"Otávio Augusto Lazzarini Lemos, A. C. D. Paula, Hitesh Sajnani, C. Lopes","doi":"10.1109/SCAM.2015.7335400","DOIUrl":null,"url":null,"abstract":"With the open source code movement, code search with the intent of reuse has become increasingly popular. So much so that researchers have been calling it the new facet of software reuse. Although code search differs from general-purpose document search in essential ways, most tools still rely mainly on keywords matched against source code text. Recently, researchers have proposed more sophisticated ways to perform code search, such as including interface definitions in the queries (e.g., return and parameter types of the desired function, along with keywords; called here Interface-Driven Code Search - IDCS). However, to the best of our knowledge, there are few empirical studies that compare traditional keyword-based code search (KBCS) with more advanced approaches such as IDCS. In this paper we describe an experiment that compares the effectiveness of KBCS with IDCS in the task of large-scale code search of auxiliary functions implemented in Java. We also measure the impact of query expansion based on types and WordNet on both approaches. Our experiment involved 36 subjects that produced real-world queries for 16 different auxiliary functions and a repository with more than 2,000,000 Java methods. Results show that the use of types can improve recall and the number of relevant functions returned (#RFR) when combined with query expansion (~30% improvement in recall, and ~43% improvement in #RFR). However, a more detailed analysis suggests that in some situations it is best to use keywords only, in particular when these are sufficient to semantically define the desired function.","PeriodicalId":192232,"journal":{"name":"2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"Can the use of types and query expansion help improve large-scale code search?\",\"authors\":\"Otávio Augusto Lazzarini Lemos, A. C. D. Paula, Hitesh Sajnani, C. Lopes\",\"doi\":\"10.1109/SCAM.2015.7335400\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the open source code movement, code search with the intent of reuse has become increasingly popular. So much so that researchers have been calling it the new facet of software reuse. Although code search differs from general-purpose document search in essential ways, most tools still rely mainly on keywords matched against source code text. Recently, researchers have proposed more sophisticated ways to perform code search, such as including interface definitions in the queries (e.g., return and parameter types of the desired function, along with keywords; called here Interface-Driven Code Search - IDCS). However, to the best of our knowledge, there are few empirical studies that compare traditional keyword-based code search (KBCS) with more advanced approaches such as IDCS. In this paper we describe an experiment that compares the effectiveness of KBCS with IDCS in the task of large-scale code search of auxiliary functions implemented in Java. We also measure the impact of query expansion based on types and WordNet on both approaches. Our experiment involved 36 subjects that produced real-world queries for 16 different auxiliary functions and a repository with more than 2,000,000 Java methods. Results show that the use of types can improve recall and the number of relevant functions returned (#RFR) when combined with query expansion (~30% improvement in recall, and ~43% improvement in #RFR). However, a more detailed analysis suggests that in some situations it is best to use keywords only, in particular when these are sufficient to semantically define the desired function.\",\"PeriodicalId\":192232,\"journal\":{\"name\":\"2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SCAM.2015.7335400\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCAM.2015.7335400","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21

摘要

随着开源代码运动的兴起,以重用为目的的代码搜索变得越来越流行。以至于研究人员将其称为软件重用的新方面。尽管代码搜索在本质上不同于通用文档搜索,但大多数工具仍然主要依赖与源代码文本匹配的关键字。最近,研究人员提出了更复杂的方法来执行代码搜索,例如在查询中包含接口定义(例如,所需函数的返回和参数类型,以及关键字;这里称为接口驱动代码搜索(IDCS)。然而,据我们所知,很少有实证研究将传统的基于关键字的代码搜索(KBCS)与更先进的方法(如IDCS)进行比较。在本文中,我们描述了一个实验,比较了KBCS和IDCS在Java实现的辅助函数的大规模代码搜索任务中的有效性。我们还测量了基于类型和WordNet的查询扩展对这两种方法的影响。我们的实验涉及36个主题,这些主题为16个不同的辅助函数和一个包含超过2,000,000个Java方法的存储库生成了实际查询。结果表明,当与查询扩展相结合时,类型的使用可以提高召回率和返回的相关函数的数量(#RFR)(召回率提高~30%,#RFR提高~43%)。然而,更详细的分析表明,在某些情况下,最好只使用关键字,特别是当这些关键字足以在语义上定义所需的功能时。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Can the use of types and query expansion help improve large-scale code search?
With the open source code movement, code search with the intent of reuse has become increasingly popular. So much so that researchers have been calling it the new facet of software reuse. Although code search differs from general-purpose document search in essential ways, most tools still rely mainly on keywords matched against source code text. Recently, researchers have proposed more sophisticated ways to perform code search, such as including interface definitions in the queries (e.g., return and parameter types of the desired function, along with keywords; called here Interface-Driven Code Search - IDCS). However, to the best of our knowledge, there are few empirical studies that compare traditional keyword-based code search (KBCS) with more advanced approaches such as IDCS. In this paper we describe an experiment that compares the effectiveness of KBCS with IDCS in the task of large-scale code search of auxiliary functions implemented in Java. We also measure the impact of query expansion based on types and WordNet on both approaches. Our experiment involved 36 subjects that produced real-world queries for 16 different auxiliary functions and a repository with more than 2,000,000 Java methods. Results show that the use of types can improve recall and the number of relevant functions returned (#RFR) when combined with query expansion (~30% improvement in recall, and ~43% improvement in #RFR). However, a more detailed analysis suggests that in some situations it is best to use keywords only, in particular when these are sufficient to semantically define the desired function.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
FaultBuster: An automatic code smell refactoring toolset On the comprehension of code clone visualizations: A controlled study using eye tracking The impact of cross-distribution bug duplicates, empirical study on Debian and Ubuntu The use of C++ exception handling constructs: A comprehensive study CodeMetropolis: Eclipse over the city of source code
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1