Improving software text retrieval using conceptual knowledge in source code

Zeqi Lin, Yanzhen Zou, Junfeng Zhao, Bing Xie
{"title":"Improving software text retrieval using conceptual knowledge in source code","authors":"Zeqi Lin, Yanzhen Zou, Junfeng Zhao, Bing Xie","doi":"10.1109/ASE.2017.8115625","DOIUrl":null,"url":null,"abstract":"A large software project usually has lots of various textual learning resources about its API, such as tutorials, mailing lists, user forums, etc. Text retrieval technology allows developers to search these API learning resources for related documents using free-text queries, but it suffers from the lexical gap between search queries and documents. In this paper, we propose a novel approach for improving the retrieval of API learning resources through leveraging software-specific conceptual knowledge in software source code. The basic idea behind this approach is that the semantic relatedness between queries and documents could be measured according to software-specific concepts involved in them, and software source code contains a large amount of software-specific conceptual knowledge. In detail, firstly we extract an API graph from software source code and use it as software-specific conceptual knowledge. Then we discover API entities involved in queries and documents, and infer semantic document relatedness through analyzing structural relationships between these API entities. We evaluate our approach in three popular open source software projects. Comparing to the state-of-the-art text retrieval approaches, our approach lead to at least 13.77% improvement with respect to mean average precision (MAP).","PeriodicalId":382876,"journal":{"name":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASE.2017.8115625","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21

Abstract

A large software project usually has lots of various textual learning resources about its API, such as tutorials, mailing lists, user forums, etc. Text retrieval technology allows developers to search these API learning resources for related documents using free-text queries, but it suffers from the lexical gap between search queries and documents. In this paper, we propose a novel approach for improving the retrieval of API learning resources through leveraging software-specific conceptual knowledge in software source code. The basic idea behind this approach is that the semantic relatedness between queries and documents could be measured according to software-specific concepts involved in them, and software source code contains a large amount of software-specific conceptual knowledge. In detail, firstly we extract an API graph from software source code and use it as software-specific conceptual knowledge. Then we discover API entities involved in queries and documents, and infer semantic document relatedness through analyzing structural relationships between these API entities. We evaluate our approach in three popular open source software projects. Comparing to the state-of-the-art text retrieval approaches, our approach lead to at least 13.77% improvement with respect to mean average precision (MAP).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用源代码中的概念知识改进软件文本检索
大型软件项目通常有许多关于其API的文本学习资源,如教程、邮件列表、用户论坛等。文本检索技术允许开发人员使用自由文本查询在这些API学习资源中搜索相关文档,但它受到搜索查询和文档之间的词汇差距的困扰。在本文中,我们提出了一种新的方法,通过利用软件源代码中特定于软件的概念知识来改进API学习资源的检索。这种方法背后的基本思想是,查询和文档之间的语义相关性可以根据其中涉及的特定于软件的概念来度量,而软件源代码包含大量特定于软件的概念知识。首先,从软件源代码中提取API图,并将其作为特定于软件的概念知识。然后发现查询和文档中涉及的API实体,并通过分析这些API实体之间的结构关系推断语义文档相关性。我们在三个流行的开源软件项目中评估了我们的方法。与最先进的文本检索方法相比,我们的方法在平均精度(MAP)方面至少提高了13.77%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
TiQi: A natural language interface for querying software project data A comprehensive study on real world concurrency bugs in Node.js Managing software evolution through semantic history slicing Software performance self-adaptation through efficient model predictive control Privacy-aware data-intensive applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1