Code search engines for the next generation

IF 3.7 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Journal of Systems and Software Pub Date : 2024-05-06 DOI:10.1016/j.jss.2024.112065
Marcus Kessel, Colin Atkinson
{"title":"Code search engines for the next generation","authors":"Marcus Kessel,&nbsp;Colin Atkinson","doi":"10.1016/j.jss.2024.112065","DOIUrl":null,"url":null,"abstract":"<div><p>Given the abundance of software in open source repositories, code search engines are increasingly turning to “big data” technologies such as natural language processing and machine learning, to deliver more useful search results. However, like the syntax-based approaches traditionally used to analyze and compare code in the first generation of code search engines, big data technologies are essentially static analysis processes. When dynamic properties of software, such as run-time behavior (i.e., semantics) and performance, are among the search criteria, the exclusive use of static algorithms has a significant negative impact on the precision and recall of the search results as well as other key usability factors such as ranking quality. Therefore, to address these weaknesses and provide a more reliable and usable service, the next generation of code search engines needs to complement static code analysis techniques with equally large-scale, dynamic analysis techniques based on its execution and observation. In this paper we describe a new software platform specifically developed to achieve this by simplifying and largely automating the dynamic analysis (i.e., observation) of code at a large scale. We show how this platform can combine dynamically observed properties of code modules with static properties to improve the quality and usability of code search results.</p></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":null,"pages":null},"PeriodicalIF":3.7000,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0164121224001109/pdfft?md5=129b984a3b00807acd30accacae25c39&pid=1-s2.0-S0164121224001109-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121224001109","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Given the abundance of software in open source repositories, code search engines are increasingly turning to “big data” technologies such as natural language processing and machine learning, to deliver more useful search results. However, like the syntax-based approaches traditionally used to analyze and compare code in the first generation of code search engines, big data technologies are essentially static analysis processes. When dynamic properties of software, such as run-time behavior (i.e., semantics) and performance, are among the search criteria, the exclusive use of static algorithms has a significant negative impact on the precision and recall of the search results as well as other key usability factors such as ranking quality. Therefore, to address these weaknesses and provide a more reliable and usable service, the next generation of code search engines needs to complement static code analysis techniques with equally large-scale, dynamic analysis techniques based on its execution and observation. In this paper we describe a new software platform specifically developed to achieve this by simplifying and largely automating the dynamic analysis (i.e., observation) of code at a large scale. We show how this platform can combine dynamically observed properties of code modules with static properties to improve the quality and usability of code search results.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
下一代代码搜索引擎
鉴于开源软件库中软件的丰富性,代码搜索引擎越来越多地转向 "大数据 "技术,如自然语言处理和机器学习,以提供更有用的搜索结果。然而,与第一代代码搜索引擎传统上用于分析和比较代码的基于语法的方法一样,大数据技术本质上也是静态分析过程。当软件的动态属性(如运行时行为(即语义)和性能)成为搜索标准之一时,只使用静态算法会对搜索结果的精确度和召回率以及其他关键可用性因素(如排名质量)产生显著的负面影响。因此,为了解决这些弱点并提供更可靠、更可用的服务,下一代代码搜索引擎需要以同样大规模的、基于执行和观察的动态分析技术来补充静态代码分析技术。在本文中,我们介绍了一个新的软件平台,该平台是专门为实现这一目标而开发的,它简化了大规模代码动态分析(即观察),并在很大程度上实现了自动化。我们展示了该平台如何将动态观察到的代码模块属性与静态属性相结合,从而提高代码搜索结果的质量和可用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Systems and Software
Journal of Systems and Software 工程技术-计算机:理论方法
CiteScore
8.60
自引率
5.70%
发文量
193
审稿时长
16 weeks
期刊介绍: The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to: • Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution • Agile, model-driven, service-oriented, open source and global software development • Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems • Human factors and management concerns of software development • Data management and big data issues of software systems • Metrics and evaluation, data mining of software development resources • Business and economic aspects of software development processes The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.
期刊最新文献
FSECAM: A contextual thematic approach for linking feature to multi-level software architectural components Exploring emergent microservice evolution in elastic deployment environments An empirical study of AI techniques in mobile applications Information needs in bug reports for web applications Development and benchmarking of multilingual code clone detector
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1