{"title":"针对少量代码搜索的专用模型初始化和架构优化","authors":"Fan Zhang , Qiang Wu , Manman Peng , Yuanyuan Shen","doi":"10.1016/j.infsof.2024.107571","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><p>Code search aims to find relevant code snippets from a codebase given a natural language query. It not only boosts developer efficiency but also improves the performance of tasks such as code generation and program repair, thus becoming one of the crucial tasks in software engineering.</p></div><div><h3>Objective:</h3><p>However, recent works are mainly designed for mainstream programming languages with abundant training data. We aim to address the challenges of code search for domain-specific programming languages with limited training data by proposing a novel two-stage, few-shot code search framework named SMIAO.</p></div><div><h3>Method:</h3><p>SMIAO includes a specialized model initialization and an architecture optimization stage. In the first stage, we first quantitatively identify a mainstream programming language’s dataset that is semantically closest to a target few-shot programming language. Then, we enrich the dataset with hard samples and train an Adapter-GraphCodeBERT model to obtain well-initialized parameters. In the second stage, we first design a search space for the initialized Adapter-GraphCodeBERT model. Then, we employ neural architecture search to optimize the Adapter modules’ positions and quantities in the GraphCodeBERT layers, tailoring for real-world few-shot code search tasks.</p></div><div><h3>Results:</h3><p>We conduct experiments on a publicly available dataset to demonstrate the effectiveness and rationality of SMIAO. The experimental results show that SMIAO outperforms other state-of-the-art baselines.</p></div><div><h3>Conclusion:</h3><p>Using mainstream languages’ datasets to initialize Adapter-GraphCodeBERT models, followed by adjusting the quantities and positions of Adapter modules within the GraphCodeBERT layers by neural architecture search, can effectively improve the performance of few-shot code search tasks.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"177 ","pages":"Article 107571"},"PeriodicalIF":3.8000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0950584924001769/pdfft?md5=42c9abafebc31bfce0fe9d0923669722&pid=1-s2.0-S0950584924001769-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Specialized model initialization and architecture optimization for few-shot code search\",\"authors\":\"Fan Zhang , Qiang Wu , Manman Peng , Yuanyuan Shen\",\"doi\":\"10.1016/j.infsof.2024.107571\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Context:</h3><p>Code search aims to find relevant code snippets from a codebase given a natural language query. It not only boosts developer efficiency but also improves the performance of tasks such as code generation and program repair, thus becoming one of the crucial tasks in software engineering.</p></div><div><h3>Objective:</h3><p>However, recent works are mainly designed for mainstream programming languages with abundant training data. We aim to address the challenges of code search for domain-specific programming languages with limited training data by proposing a novel two-stage, few-shot code search framework named SMIAO.</p></div><div><h3>Method:</h3><p>SMIAO includes a specialized model initialization and an architecture optimization stage. In the first stage, we first quantitatively identify a mainstream programming language’s dataset that is semantically closest to a target few-shot programming language. Then, we enrich the dataset with hard samples and train an Adapter-GraphCodeBERT model to obtain well-initialized parameters. In the second stage, we first design a search space for the initialized Adapter-GraphCodeBERT model. Then, we employ neural architecture search to optimize the Adapter modules’ positions and quantities in the GraphCodeBERT layers, tailoring for real-world few-shot code search tasks.</p></div><div><h3>Results:</h3><p>We conduct experiments on a publicly available dataset to demonstrate the effectiveness and rationality of SMIAO. The experimental results show that SMIAO outperforms other state-of-the-art baselines.</p></div><div><h3>Conclusion:</h3><p>Using mainstream languages’ datasets to initialize Adapter-GraphCodeBERT models, followed by adjusting the quantities and positions of Adapter modules within the GraphCodeBERT layers by neural architecture search, can effectively improve the performance of few-shot code search tasks.</p></div>\",\"PeriodicalId\":54983,\"journal\":{\"name\":\"Information and Software Technology\",\"volume\":\"177 \",\"pages\":\"Article 107571\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2024-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0950584924001769/pdfft?md5=42c9abafebc31bfce0fe9d0923669722&pid=1-s2.0-S0950584924001769-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information and Software Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950584924001769\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584924001769","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Specialized model initialization and architecture optimization for few-shot code search
Context:
Code search aims to find relevant code snippets from a codebase given a natural language query. It not only boosts developer efficiency but also improves the performance of tasks such as code generation and program repair, thus becoming one of the crucial tasks in software engineering.
Objective:
However, recent works are mainly designed for mainstream programming languages with abundant training data. We aim to address the challenges of code search for domain-specific programming languages with limited training data by proposing a novel two-stage, few-shot code search framework named SMIAO.
Method:
SMIAO includes a specialized model initialization and an architecture optimization stage. In the first stage, we first quantitatively identify a mainstream programming language’s dataset that is semantically closest to a target few-shot programming language. Then, we enrich the dataset with hard samples and train an Adapter-GraphCodeBERT model to obtain well-initialized parameters. In the second stage, we first design a search space for the initialized Adapter-GraphCodeBERT model. Then, we employ neural architecture search to optimize the Adapter modules’ positions and quantities in the GraphCodeBERT layers, tailoring for real-world few-shot code search tasks.
Results:
We conduct experiments on a publicly available dataset to demonstrate the effectiveness and rationality of SMIAO. The experimental results show that SMIAO outperforms other state-of-the-art baselines.
Conclusion:
Using mainstream languages’ datasets to initialize Adapter-GraphCodeBERT models, followed by adjusting the quantities and positions of Adapter modules within the GraphCodeBERT layers by neural architecture search, can effectively improve the performance of few-shot code search tasks.
期刊介绍:
Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include:
• Software management, quality and metrics,
• Software processes,
• Software architecture, modelling, specification, design and programming
• Functional and non-functional software requirements
• Software testing and verification & validation
• Empirical studies of all aspects of engineering and managing software development
Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information.
The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.