针对少量代码搜索的专用模型初始化和架构优化

IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Information and Software Technology Pub Date : 2024-09-04 DOI:10.1016/j.infsof.2024.107571
Fan Zhang , Qiang Wu , Manman Peng , Yuanyuan Shen
{"title":"针对少量代码搜索的专用模型初始化和架构优化","authors":"Fan Zhang ,&nbsp;Qiang Wu ,&nbsp;Manman Peng ,&nbsp;Yuanyuan Shen","doi":"10.1016/j.infsof.2024.107571","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><p>Code search aims to find relevant code snippets from a codebase given a natural language query. It not only boosts developer efficiency but also improves the performance of tasks such as code generation and program repair, thus becoming one of the crucial tasks in software engineering.</p></div><div><h3>Objective:</h3><p>However, recent works are mainly designed for mainstream programming languages with abundant training data. We aim to address the challenges of code search for domain-specific programming languages with limited training data by proposing a novel two-stage, few-shot code search framework named SMIAO.</p></div><div><h3>Method:</h3><p>SMIAO includes a specialized model initialization and an architecture optimization stage. In the first stage, we first quantitatively identify a mainstream programming language’s dataset that is semantically closest to a target few-shot programming language. Then, we enrich the dataset with hard samples and train an Adapter-GraphCodeBERT model to obtain well-initialized parameters. In the second stage, we first design a search space for the initialized Adapter-GraphCodeBERT model. Then, we employ neural architecture search to optimize the Adapter modules’ positions and quantities in the GraphCodeBERT layers, tailoring for real-world few-shot code search tasks.</p></div><div><h3>Results:</h3><p>We conduct experiments on a publicly available dataset to demonstrate the effectiveness and rationality of SMIAO. The experimental results show that SMIAO outperforms other state-of-the-art baselines.</p></div><div><h3>Conclusion:</h3><p>Using mainstream languages’ datasets to initialize Adapter-GraphCodeBERT models, followed by adjusting the quantities and positions of Adapter modules within the GraphCodeBERT layers by neural architecture search, can effectively improve the performance of few-shot code search tasks.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"177 ","pages":"Article 107571"},"PeriodicalIF":3.8000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0950584924001769/pdfft?md5=42c9abafebc31bfce0fe9d0923669722&pid=1-s2.0-S0950584924001769-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Specialized model initialization and architecture optimization for few-shot code search\",\"authors\":\"Fan Zhang ,&nbsp;Qiang Wu ,&nbsp;Manman Peng ,&nbsp;Yuanyuan Shen\",\"doi\":\"10.1016/j.infsof.2024.107571\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Context:</h3><p>Code search aims to find relevant code snippets from a codebase given a natural language query. It not only boosts developer efficiency but also improves the performance of tasks such as code generation and program repair, thus becoming one of the crucial tasks in software engineering.</p></div><div><h3>Objective:</h3><p>However, recent works are mainly designed for mainstream programming languages with abundant training data. We aim to address the challenges of code search for domain-specific programming languages with limited training data by proposing a novel two-stage, few-shot code search framework named SMIAO.</p></div><div><h3>Method:</h3><p>SMIAO includes a specialized model initialization and an architecture optimization stage. In the first stage, we first quantitatively identify a mainstream programming language’s dataset that is semantically closest to a target few-shot programming language. Then, we enrich the dataset with hard samples and train an Adapter-GraphCodeBERT model to obtain well-initialized parameters. In the second stage, we first design a search space for the initialized Adapter-GraphCodeBERT model. Then, we employ neural architecture search to optimize the Adapter modules’ positions and quantities in the GraphCodeBERT layers, tailoring for real-world few-shot code search tasks.</p></div><div><h3>Results:</h3><p>We conduct experiments on a publicly available dataset to demonstrate the effectiveness and rationality of SMIAO. The experimental results show that SMIAO outperforms other state-of-the-art baselines.</p></div><div><h3>Conclusion:</h3><p>Using mainstream languages’ datasets to initialize Adapter-GraphCodeBERT models, followed by adjusting the quantities and positions of Adapter modules within the GraphCodeBERT layers by neural architecture search, can effectively improve the performance of few-shot code search tasks.</p></div>\",\"PeriodicalId\":54983,\"journal\":{\"name\":\"Information and Software Technology\",\"volume\":\"177 \",\"pages\":\"Article 107571\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2024-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0950584924001769/pdfft?md5=42c9abafebc31bfce0fe9d0923669722&pid=1-s2.0-S0950584924001769-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information and Software Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950584924001769\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584924001769","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

背景:代码搜索旨在根据自然语言查询从代码库中找到相关的代码片段。它不仅能提高开发人员的效率,还能改善代码生成和程序修复等任务的性能,因此成为软件工程中的重要任务之一。目标:然而,最近的研究主要是针对训练数据丰富的主流编程语言而设计的。方法:SMIAO 包括专门的模型初始化和架构优化阶段。在第一阶段,我们首先定量地确定一个主流编程语言的数据集,该数据集在语义上最接近目标 few-shot 编程语言。然后,我们用硬样本丰富数据集,并训练 Adapter-GraphCodeBERT 模型,以获得良好的初始化参数。在第二阶段,我们首先为初始化的 Adapter-GraphCodeBERT 模型设计一个搜索空间。然后,我们采用神经架构搜索来优化适配器模块在GraphCodeBERT层中的位置和数量,为现实世界的少量代码搜索任务量身定制。结果:我们在一个公开可用的数据集上进行了实验,以证明SMIAO的有效性和合理性。实验结果表明,SMIAO优于其他最先进的基线。结论:利用主流语言的数据集初始化Adapter-GraphCodeBERT模型,然后通过神经架构搜索调整Adapter模块在GraphCodeBERT层中的数量和位置,可以有效提高少量代码搜索任务的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Specialized model initialization and architecture optimization for few-shot code search

Context:

Code search aims to find relevant code snippets from a codebase given a natural language query. It not only boosts developer efficiency but also improves the performance of tasks such as code generation and program repair, thus becoming one of the crucial tasks in software engineering.

Objective:

However, recent works are mainly designed for mainstream programming languages with abundant training data. We aim to address the challenges of code search for domain-specific programming languages with limited training data by proposing a novel two-stage, few-shot code search framework named SMIAO.

Method:

SMIAO includes a specialized model initialization and an architecture optimization stage. In the first stage, we first quantitatively identify a mainstream programming language’s dataset that is semantically closest to a target few-shot programming language. Then, we enrich the dataset with hard samples and train an Adapter-GraphCodeBERT model to obtain well-initialized parameters. In the second stage, we first design a search space for the initialized Adapter-GraphCodeBERT model. Then, we employ neural architecture search to optimize the Adapter modules’ positions and quantities in the GraphCodeBERT layers, tailoring for real-world few-shot code search tasks.

Results:

We conduct experiments on a publicly available dataset to demonstrate the effectiveness and rationality of SMIAO. The experimental results show that SMIAO outperforms other state-of-the-art baselines.

Conclusion:

Using mainstream languages’ datasets to initialize Adapter-GraphCodeBERT models, followed by adjusting the quantities and positions of Adapter modules within the GraphCodeBERT layers by neural architecture search, can effectively improve the performance of few-shot code search tasks.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information and Software Technology
Information and Software Technology 工程技术-计算机:软件工程
CiteScore
9.10
自引率
7.70%
发文量
164
审稿时长
9.6 weeks
期刊介绍: Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include: • Software management, quality and metrics, • Software processes, • Software architecture, modelling, specification, design and programming • Functional and non-functional software requirements • Software testing and verification & validation • Empirical studies of all aspects of engineering and managing software development Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information. The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.
期刊最新文献
A software product line approach for developing hybrid software systems Evaluating the understandability and user acceptance of Attack-Defense Trees: Original experiment and replication On the road to interactive LLM-based systematic mapping studies Top-down: A better strategy for incremental covering array generation Editorial Board
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1