用于代码自动补全的大型语言模型：系统性文献综述

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Computer Standards & Interfaces Pub Date : 2024-08-18 DOI:10.1016/j.csi.2024.103917

Rasha Ahmad Husein , Hala Aburajouh , Cagatay Catal

{"title":"用于代码自动补全的大型语言模型：系统性文献综述","authors":"Rasha Ahmad Husein , Hala Aburajouh , Cagatay Catal","doi":"10.1016/j.csi.2024.103917","DOIUrl":null,"url":null,"abstract":"<div><p>Code completion serves as a fundamental aspect of modern software development, improving developers' coding processes. Integrating code completion tools into an Integrated Development Environment (IDE) or code editor enhances the coding process and boosts productivity by reducing errors and speeding up code writing while reducing cognitive load. This is achieved by predicting subsequent tokens, such as keywords, variable names, types, function names, operators, and more. Different techniques can achieve code completion, and recent research has focused on Deep Learning methods, particularly Large Language Models (LLMs) utilizing Transformer algorithms. While several research papers have focused on the use of LLMs for code completion, these studies are fragmented, and there is no systematic overview of the use of LLMs for code completion. Therefore, we aimed to perform a Systematic Literature Review (SLR) study to investigate how LLMs have been applied for code completion so far. We have formulated several research questions to address how LLMs have been integrated for code completion-related tasks and to assess the efficacy of these LLMs in the context of code completion. To achieve this, we retrieved 244 papers from scientific databases using auto-search and specific keywords, finally selecting 23 primary studies based on an SLR methodology for in-depth analysis. This SLR study categorizes the granularity levels of code completion achieved by utilizing LLMs in IDEs, explores the existing issues in current code completion systems, how LLMs address these challenges, and the pre-training and fine-tuning methods employed. Additionally, this study identifies open research problems and outlines future research directions. Our analysis reveals that LLMs significantly enhance code completion performance across several programming languages and contexts, and their capability to predict relevant code snippets based on context and partial input boosts developer productivity substantially.</p></div>","PeriodicalId":50635,"journal":{"name":"Computer Standards & Interfaces","volume":"92 ","pages":"Article 103917"},"PeriodicalIF":4.1000,"publicationDate":"2024-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0920548924000862/pdfft?md5=8314175bcf57b70427dd5c869ea42978&pid=1-s2.0-S0920548924000862-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Large language models for code completion: A systematic literature review\",\"authors\":\"Rasha Ahmad Husein , Hala Aburajouh , Cagatay Catal\",\"doi\":\"10.1016/j.csi.2024.103917\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Code completion serves as a fundamental aspect of modern software development, improving developers' coding processes. Integrating code completion tools into an Integrated Development Environment (IDE) or code editor enhances the coding process and boosts productivity by reducing errors and speeding up code writing while reducing cognitive load. This is achieved by predicting subsequent tokens, such as keywords, variable names, types, function names, operators, and more. Different techniques can achieve code completion, and recent research has focused on Deep Learning methods, particularly Large Language Models (LLMs) utilizing Transformer algorithms. While several research papers have focused on the use of LLMs for code completion, these studies are fragmented, and there is no systematic overview of the use of LLMs for code completion. Therefore, we aimed to perform a Systematic Literature Review (SLR) study to investigate how LLMs have been applied for code completion so far. We have formulated several research questions to address how LLMs have been integrated for code completion-related tasks and to assess the efficacy of these LLMs in the context of code completion. To achieve this, we retrieved 244 papers from scientific databases using auto-search and specific keywords, finally selecting 23 primary studies based on an SLR methodology for in-depth analysis. This SLR study categorizes the granularity levels of code completion achieved by utilizing LLMs in IDEs, explores the existing issues in current code completion systems, how LLMs address these challenges, and the pre-training and fine-tuning methods employed. Additionally, this study identifies open research problems and outlines future research directions. Our analysis reveals that LLMs significantly enhance code completion performance across several programming languages and contexts, and their capability to predict relevant code snippets based on context and partial input boosts developer productivity substantially.</p></div>\",\"PeriodicalId\":50635,\"journal\":{\"name\":\"Computer Standards & Interfaces\",\"volume\":\"92 \",\"pages\":\"Article 103917\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2024-08-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0920548924000862/pdfft?md5=8314175bcf57b70427dd5c869ea42978&pid=1-s2.0-S0920548924000862-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Standards & Interfaces\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0920548924000862\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Standards & Interfaces","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0920548924000862","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

代码自动补全是现代软件开发的一个基本方面，可改善开发人员的编码过程。将代码自动补全工具集成到集成开发环境（IDE）或代码编辑器中，可以减少错误，加快代码编写速度，同时减轻认知负担，从而增强编码过程，提高生产率。这是通过预测后续标记（如关键字、变量名、类型、函数名、运算符等）来实现的。不同的技术可以实现代码自动补全，最近的研究主要集中在深度学习方法上，特别是利用变换器算法的大型语言模型（LLM）。虽然已有多篇研究论文关注使用 LLMs 进行代码补全，但这些研究并不完整，也没有关于使用 LLMs 进行代码补全的系统概述。因此，我们旨在开展一项系统性文献综述（SLR）研究，调查迄今为止如何将 LLMs 用于代码补全。我们提出了几个研究问题，以探讨如何将 LLMs 集成到代码补全相关任务中，并评估这些 LLMs 在代码补全中的功效。为此，我们使用自动搜索和特定关键词从科学数据库中检索了 244 篇论文，最后根据 SLR 方法选择了 23 项主要研究进行深入分析。本 SLR 研究对集成开发环境中利用 LLM 实现的代码自动完成的粒度进行了分类，探讨了当前代码自动完成系统中存在的问题、LLM 如何应对这些挑战，以及所采用的预培训和微调方法。此外，本研究还确定了有待解决的研究问题，并概述了未来的研究方向。我们的分析表明，在多种编程语言和上下文中，LLM 都能显著提高代码自动补全性能，而且它们能够根据上下文和部分输入预测相关代码片段，从而大幅提高开发人员的工作效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Large language models for code completion: A systematic literature review

Code completion serves as a fundamental aspect of modern software development, improving developers' coding processes. Integrating code completion tools into an Integrated Development Environment (IDE) or code editor enhances the coding process and boosts productivity by reducing errors and speeding up code writing while reducing cognitive load. This is achieved by predicting subsequent tokens, such as keywords, variable names, types, function names, operators, and more. Different techniques can achieve code completion, and recent research has focused on Deep Learning methods, particularly Large Language Models (LLMs) utilizing Transformer algorithms. While several research papers have focused on the use of LLMs for code completion, these studies are fragmented, and there is no systematic overview of the use of LLMs for code completion. Therefore, we aimed to perform a Systematic Literature Review (SLR) study to investigate how LLMs have been applied for code completion so far. We have formulated several research questions to address how LLMs have been integrated for code completion-related tasks and to assess the efficacy of these LLMs in the context of code completion. To achieve this, we retrieved 244 papers from scientific databases using auto-search and specific keywords, finally selecting 23 primary studies based on an SLR methodology for in-depth analysis. This SLR study categorizes the granularity levels of code completion achieved by utilizing LLMs in IDEs, explores the existing issues in current code completion systems, how LLMs address these challenges, and the pre-training and fine-tuning methods employed. Additionally, this study identifies open research problems and outlines future research directions. Our analysis reveals that LLMs significantly enhance code completion performance across several programming languages and contexts, and their capability to predict relevant code snippets based on context and partial input boosts developer productivity substantially.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Standards & Interfaces 工程技术-计算机：软件工程

CiteScore

11.90

自引率

16.00%

发文量

审稿时长

6 months

期刊介绍： The quality of software, well-defined interfaces (hardware and software), the process of digitalisation, and accepted standards in these fields are essential for building and exploiting complex computing, communication, multimedia and measuring systems. Standards can simplify the design and construction of individual hardware and software components and help to ensure satisfactory interworking. Computer Standards & Interfaces is an international journal dealing specifically with these topics. The journal • Provides information about activities and progress on the definition of computer standards, software quality, interfaces and methods, at national, European and international levels • Publishes critical comments on standards and standards activities • Disseminates user''s experiences and case studies in the application and exploitation of established or emerging standards, interfaces and methods • Offers a forum for discussion on actual projects, standards, interfaces and methods by recognised experts • Stimulates relevant research by providing a specialised refereed medium.