Constructing meaningful code changes via graph transformer

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING IET Software Pub Date : 2023-01-21 DOI:10.1049/sfw2.12097

Shikai Guo, Mengxuan Li, Xin Ge, Hui Li, Rong Chen, Tingting Li

{"title":"Constructing meaningful code changes via graph transformer","authors":"Shikai Guo, Mengxuan Li, Xin Ge, Hui Li, Rong Chen, Tingting Li","doi":"10.1049/sfw2.12097","DOIUrl":null,"url":null,"abstract":"<p>The rapid development of Open-Source Software (OSS) has resulted in a significant demand for code changes to maintain OSS. Symptoms of poor design and implementation choices in code changes often occur, thus heavily hindering code reviewers to verify correctness and soundness of code changes. Researchers have investigated how to learn meaningful code changes to assist developers in anticipating changes that code reviewers may suggest for the submitted code. However, there are two main limitations to be addressed, including the limitation of long-range dependencies of the source code and the missing syntactic structural information of the source code. To solve these limitations, a novel method is proposed, named Graph Transformer for learning meaningful Code Transformations (GTCT), to provide developers with preliminary and quick feedback when developers submit code changes, which can improve the quality of code changes and improve the efficiency of code review. GTCT comprises two components: code graph embedding and code transformation learning. To address the missing syntactic structural information of the source code limitation, the code graph embedding component captures the types and patterns of code changes by encoding the source code into a code graph structure from the lexical and syntactic representations of the source code. Subsequently, the code transformation learning component uses the multi-head attention mechanism and positional encoding mechanism to address the long-range dependencies limitation. Extensive experiments are conducted to evaluate the performance of GTCT by both quantitative and qualitative analyses. For the quantitative analysis, GTCT relatively outperforms the baseline on six datasets by 210%, 342.86%, 135%, 29.41%, 109.09%, and 91.67% in terms of perfect prediction. Meanwhile, the qualitative analysis shows that each type of code change by GTCT outperforms that of the baseline method in terms of bug fixed, refactoring code and others' taxonomy of code changes.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"17 2","pages":"154-167"},"PeriodicalIF":1.5000,"publicationDate":"2023-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2.12097","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Software","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/sfw2.12097","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

The rapid development of Open-Source Software (OSS) has resulted in a significant demand for code changes to maintain OSS. Symptoms of poor design and implementation choices in code changes often occur, thus heavily hindering code reviewers to verify correctness and soundness of code changes. Researchers have investigated how to learn meaningful code changes to assist developers in anticipating changes that code reviewers may suggest for the submitted code. However, there are two main limitations to be addressed, including the limitation of long-range dependencies of the source code and the missing syntactic structural information of the source code. To solve these limitations, a novel method is proposed, named Graph Transformer for learning meaningful Code Transformations (GTCT), to provide developers with preliminary and quick feedback when developers submit code changes, which can improve the quality of code changes and improve the efficiency of code review. GTCT comprises two components: code graph embedding and code transformation learning. To address the missing syntactic structural information of the source code limitation, the code graph embedding component captures the types and patterns of code changes by encoding the source code into a code graph structure from the lexical and syntactic representations of the source code. Subsequently, the code transformation learning component uses the multi-head attention mechanism and positional encoding mechanism to address the long-range dependencies limitation. Extensive experiments are conducted to evaluate the performance of GTCT by both quantitative and qualitative analyses. For the quantitative analysis, GTCT relatively outperforms the baseline on six datasets by 210%, 342.86%, 135%, 29.41%, 109.09%, and 91.67% in terms of perfect prediction. Meanwhile, the qualitative analysis shows that each type of code change by GTCT outperforms that of the baseline method in terms of bug fixed, refactoring code and others' taxonomy of code changes.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过图转换器构造有意义的代码更改

开放源码软件（OSS）的快速发展导致了对代码更改的巨大需求，以维护OSS。在代码更改中，经常会出现设计和实现选择不当的症状，从而严重阻碍代码审查人员验证代码更改的正确性和可靠性。研究人员调查了如何学习有意义的代码更改，以帮助开发人员预测代码审查人员可能对提交的代码建议的更改。然而，有两个主要的限制需要解决，包括源代码的长程依赖性的限制和源代码缺少语法结构信息。为了解决这些局限性，提出了一种新的方法，称为图变换器，用于学习有意义的代码变换（GTCT），在开发人员提交代码更改时为开发人员提供初步快速的反馈，从而提高代码更改的质量和代码审查的效率。GTCT包括两个部分：代码图嵌入和代码转换学习。为了解决源代码限制的语法结构信息缺失问题，代码图嵌入组件通过将源代码从源代码的词汇和语法表示编码为代码图结构来捕获代码变化的类型和模式。随后，代码转换学习组件使用多头注意力机制和位置编码机制来解决长程依赖性限制。进行了大量的实验，通过定量和定性分析来评估GTCT的性能。对于定量分析，GTCT在六个数据集上的完美预测相对优于基线210%、342.86%、135%、29.41%、109.09%和91.67%。同时，定性分析表明，GTCT的每种类型的代码更改在bug修复、重构代码和其他代码更改分类方面都优于基线方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IET Software 工程技术-计算机：软件工程

CiteScore

4.20

自引率

0.00%

发文量

审稿时长

9 months

期刊介绍： IET Software publishes papers on all aspects of the software lifecycle, including design, development, implementation and maintenance. The focus of the journal is on the methods used to develop and maintain software, and their practical application. Authors are especially encouraged to submit papers on the following topics, although papers on all aspects of software engineering are welcome: Software and systems requirements engineering Formal methods, design methods, practice and experience Software architecture, aspect and object orientation, reuse and re-engineering Testing, verification and validation techniques Software dependability and measurement Human systems engineering and human-computer interaction Knowledge engineering; expert and knowledge-based systems, intelligent agents Information systems engineering Application of software engineering in industry and commerce Software engineering technology transfer Management of software development Theoretical aspects of software development Machine learning Big data and big code Cloud computing Current Special Issue. Call for papers: Knowledge Discovery for Software Development - https://digital-library.theiet.org/files/IET_SEN_CFP_KDSD.pdf Big Data Analytics for Sustainable Software Development - https://digital-library.theiet.org/files/IET_SEN_CFP_BDASSD.pdf