Automated Commit Message Generation With Large Language Models: An Empirical Study and Beyond

IF 5.6 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING IEEE Transactions on Software Engineering Pub Date : 2024-10-10 DOI:10.1109/TSE.2024.3478317

Pengyu Xue;Linhao Wu;Zhongxing Yu;Zhi Jin;Zhen Yang;Xinyi Li;Zhenyu Yang;Yue Tan

{"title":"Automated Commit Message Generation With Large Language Models: An Empirical Study and Beyond","authors":"Pengyu Xue;Linhao Wu;Zhongxing Yu;Zhi Jin;Zhen Yang;Xinyi Li;Zhenyu Yang;Yue Tan","doi":"10.1109/TSE.2024.3478317","DOIUrl":null,"url":null,"abstract":"Commit Message Generation (CMG) approaches aim to automatically generate commit messages based on given code \n<italic>diff</i>\ns, which facilitate collaboration among developers and play a critical role in Open-Source Software (OSS). Very recently, Large Language Models (LLMs) have been applied in diverse code-related tasks owing to their powerful generality. Yet, in the CMG field, few studies systematically explored their effectiveness. This paper conducts the first comprehensive experiment to investigate how far we have been in applying LLM to generate high-quality commit messages and how to go further beyond in this field. Motivated by a pilot analysis, we first construct a multi-lingual high-quality CMG test set following practitioners’ criteria. Afterward, we re-evaluate diverse CMG approaches and make comparisons with recent LLMs. To delve deeper into LLMs’ ability, we further propose four manual metrics following the practice of OSS, including Accuracy, Integrity, Readability, and Applicability for assessment. Results reveal that LLMs have outperformed existing CMG approaches overall, and different LLMs carry different advantages, where GPT-3.5 performs best. To further boost LLMs’ performance in the CMG task, we propose an Efficient Retrieval-based In-Context Learning (ICL) framework, namely ERICommiter, which leverages a two-step filtering to accelerate the retrieval efficiency and introduces semantic/lexical-based retrieval algorithm to construct the ICL examples, thereby guiding the generation of high-quality commit messages with LLMs. Extensive experiments demonstrate the substantial performance improvement of ERICommiter on various LLMs across different programming languages. Meanwhile, ERICommiter also significantly reduces the retrieval time while keeping almost the same performance. Our research contributes to the understanding of LLMs’ capabilities in the CMG field and provides valuable insights for practitioners seeking to leverage these tools in their workflows.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 12","pages":"3208-3224"},"PeriodicalIF":5.6000,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10713474/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Commit Message Generation (CMG) approaches aim to automatically generate commit messages based on given code diff s, which facilitate collaboration among developers and play a critical role in Open-Source Software (OSS). Very recently, Large Language Models (LLMs) have been applied in diverse code-related tasks owing to their powerful generality. Yet, in the CMG field, few studies systematically explored their effectiveness. This paper conducts the first comprehensive experiment to investigate how far we have been in applying LLM to generate high-quality commit messages and how to go further beyond in this field. Motivated by a pilot analysis, we first construct a multi-lingual high-quality CMG test set following practitioners’ criteria. Afterward, we re-evaluate diverse CMG approaches and make comparisons with recent LLMs. To delve deeper into LLMs’ ability, we further propose four manual metrics following the practice of OSS, including Accuracy, Integrity, Readability, and Applicability for assessment. Results reveal that LLMs have outperformed existing CMG approaches overall, and different LLMs carry different advantages, where GPT-3.5 performs best. To further boost LLMs’ performance in the CMG task, we propose an Efficient Retrieval-based In-Context Learning (ICL) framework, namely ERICommiter, which leverages a two-step filtering to accelerate the retrieval efficiency and introduces semantic/lexical-based retrieval algorithm to construct the ICL examples, thereby guiding the generation of high-quality commit messages with LLMs. Extensive experiments demonstrate the substantial performance improvement of ERICommiter on various LLMs across different programming languages. Meanwhile, ERICommiter also significantly reduces the retrieval time while keeping almost the same performance. Our research contributes to the understanding of LLMs’ capabilities in the CMG field and provides valuable insights for practitioners seeking to leverage these tools in their workflows.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用大型语言模型自动生成 Commit Message：经验研究及其他

提交消息生成（CMG）方法旨在根据给定的代码差异自动生成提交消息，这促进了开发人员之间的协作，在开源软件（OSS）中起着至关重要的作用。最近，大型语言模型（llm）由于其强大的通用性已经被应用于各种与代码相关的任务中。然而，在CMG领域，很少有研究系统地探讨其有效性。本文进行了第一个全面的实验，以调查我们在应用LLM生成高质量提交消息方面已经走了多远，以及如何在这一领域走得更远。在试点分析的激励下，我们首先根据从业者的标准构建了一个多语言的高质量CMG测试集。之后，我们重新评估了不同的CMG方法，并与最近的法学硕士进行了比较。为了更深入地研究法学硕士的能力，我们进一步提出了遵循OSS实践的四个手动度量标准，包括评估的准确性、完整性、可读性和适用性。结果表明，llm总体上优于现有的CMG方法，不同的llm具有不同的优势，其中GPT-3.5表现最佳。为了进一步提高llm在CMG任务中的性能，我们提出了一个高效的基于检索的上下文学习（ICL）框架，即ERICommiter，该框架利用两步过滤来提高检索效率，并引入基于语义/词汇的检索算法来构建ICL示例，从而指导llm生成高质量的提交消息。大量的实验证明了ERICommiter在不同编程语言的各种llm上的性能有很大的提高。同时，在保持几乎相同的性能的同时，ERICommiter也显著减少了检索时间。我们的研究有助于理解法学硕士在CMG领域的能力，并为寻求在其工作流程中利用这些工具的从业者提供有价值的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.