用图嵌入和预训练模型增强代码摘要

IF 0.6 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal of Software Engineering and Knowledge Engineering Pub Date : 2023-10-12 DOI:10.1142/s0218194023410024

Lixuan Li, Jie Li, Yihui Xu, Hao Zhu, Xiaofang Zhang

{"title":"用图嵌入和预训练模型增强代码摘要","authors":"Lixuan Li, Jie Li, Yihui Xu, Hao Zhu, Xiaofang Zhang","doi":"10.1142/s0218194023410024","DOIUrl":null,"url":null,"abstract":"Code summarization is a task that aims at automatically producing descriptions of source code. Recently many deep-learning-based approaches have been proposed to generate accurate code summaries, among which pre-trained models (PTMs) for programming languages have achieved promising results. It is well known that source code written in programming languages is highly structured and unambiguous. Though previous work pre-trained the model with well-design tasks to learn universal representation from a large scale of data, they have not considered structure information during the fine-tuning stage. To make full use of both the pre-trained programming language model and the structure information of source code, we utilize Flow-Augmented Abstract Syntax Tree (FA-AST) of source code for structure information and propose GraphPLBART — Graph-augmented Programming Language and Bi-directional Auto-Regressive Transformer, which can effectively introduce structure information to a well PTM through a cross attention layer. Compared with the best-performing baselines, GraphPLBART still improves by 3.2%, 7.1%, and 1.2% in terms of BLEU, METEOR, and ROUGE-L, respectively, on Java dataset, and also improves by 4.0%, 6.3%, and 2.1% on Python dataset. Further experiment shows that the structure information from FA-AST has significant benefits for the performance of GraphPLBART. In addition, our meticulous manual evaluation experiment further reinforces the superiority of our proposed approach. This demonstrates its remarkable abstract quality and solidifies its position as a promising solution in the field of code summarization.","PeriodicalId":50288,"journal":{"name":"International Journal of Software Engineering and Knowledge Engineering","volume":"77 1","pages":"0"},"PeriodicalIF":0.6000,"publicationDate":"2023-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing Code Summarization with Graph Embedding and Pre-trained Model\",\"authors\":\"Lixuan Li, Jie Li, Yihui Xu, Hao Zhu, Xiaofang Zhang\",\"doi\":\"10.1142/s0218194023410024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Code summarization is a task that aims at automatically producing descriptions of source code. Recently many deep-learning-based approaches have been proposed to generate accurate code summaries, among which pre-trained models (PTMs) for programming languages have achieved promising results. It is well known that source code written in programming languages is highly structured and unambiguous. Though previous work pre-trained the model with well-design tasks to learn universal representation from a large scale of data, they have not considered structure information during the fine-tuning stage. To make full use of both the pre-trained programming language model and the structure information of source code, we utilize Flow-Augmented Abstract Syntax Tree (FA-AST) of source code for structure information and propose GraphPLBART — Graph-augmented Programming Language and Bi-directional Auto-Regressive Transformer, which can effectively introduce structure information to a well PTM through a cross attention layer. Compared with the best-performing baselines, GraphPLBART still improves by 3.2%, 7.1%, and 1.2% in terms of BLEU, METEOR, and ROUGE-L, respectively, on Java dataset, and also improves by 4.0%, 6.3%, and 2.1% on Python dataset. Further experiment shows that the structure information from FA-AST has significant benefits for the performance of GraphPLBART. In addition, our meticulous manual evaluation experiment further reinforces the superiority of our proposed approach. This demonstrates its remarkable abstract quality and solidifies its position as a promising solution in the field of code summarization.\",\"PeriodicalId\":50288,\"journal\":{\"name\":\"International Journal of Software Engineering and Knowledge Engineering\",\"volume\":\"77 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2023-10-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Software Engineering and Knowledge Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/s0218194023410024\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Software Engineering and Knowledge Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s0218194023410024","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

代码摘要是一项旨在自动生成源代码描述的任务。近年来，人们提出了许多基于深度学习的方法来生成准确的代码摘要，其中针对编程语言的预训练模型(ptm)取得了可喜的成果。众所周知，用编程语言编写的源代码是高度结构化和明确的。虽然以前的工作用精心设计的任务对模型进行预训练，以从大规模数据中学习普遍表示，但他们在微调阶段没有考虑结构信息。为了充分利用预训练好的编程语言模型和源代码的结构信息，利用源代码的流增强抽象语法树(FA-AST)获取结构信息，提出GraphPLBART -图增强编程语言和双向自回归转换器，通过交叉注意层将结构信息有效地引入油井PTM。与性能最好的基线相比，GraphPLBART在Java数据集上对BLEU、METEOR和ROUGE-L分别提高了3.2%、7.1%和1.2%，在Python数据集上也分别提高了4.0%、6.3%和2.1%。进一步的实验表明，来自FA-AST的结构信息对GraphPLBART的性能有显著的好处。此外，我们细致的人工评估实验进一步强化了我们提出的方法的优越性。这证明了它卓越的抽象品质，并巩固了它作为代码摘要领域中有前途的解决方案的地位。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Enhancing Code Summarization with Graph Embedding and Pre-trained Model

Code summarization is a task that aims at automatically producing descriptions of source code. Recently many deep-learning-based approaches have been proposed to generate accurate code summaries, among which pre-trained models (PTMs) for programming languages have achieved promising results. It is well known that source code written in programming languages is highly structured and unambiguous. Though previous work pre-trained the model with well-design tasks to learn universal representation from a large scale of data, they have not considered structure information during the fine-tuning stage. To make full use of both the pre-trained programming language model and the structure information of source code, we utilize Flow-Augmented Abstract Syntax Tree (FA-AST) of source code for structure information and propose GraphPLBART — Graph-augmented Programming Language and Bi-directional Auto-Regressive Transformer, which can effectively introduce structure information to a well PTM through a cross attention layer. Compared with the best-performing baselines, GraphPLBART still improves by 3.2%, 7.1%, and 1.2% in terms of BLEU, METEOR, and ROUGE-L, respectively, on Java dataset, and also improves by 4.0%, 6.3%, and 2.1% on Python dataset. Further experiment shows that the structure information from FA-AST has significant benefits for the performance of GraphPLBART. In addition, our meticulous manual evaluation experiment further reinforces the superiority of our proposed approach. This demonstrates its remarkable abstract quality and solidifies its position as a promising solution in the field of code summarization.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Software Engineering and Knowledge Engineering 工程技术-工程：电子与电气

CiteScore

1.90

自引率

11.10%

发文量

审稿时长

16 months

期刊介绍： The International Journal of Software Engineering and Knowledge Engineering is intended to serve as a forum for researchers, practitioners, and developers to exchange ideas and results for the advancement of software engineering and knowledge engineering. Three types of papers will be published: Research papers reporting original research results Technology trend surveys reviewing an area of research in software engineering and knowledge engineering Survey articles surveying a broad area in software engineering and knowledge engineering In addition, tool reviews (no more than three manuscript pages) and book reviews (no more than two manuscript pages) are also welcome. A central theme of this journal is the interplay between software engineering and knowledge engineering: how knowledge engineering methods can be applied to software engineering, and vice versa. The journal publishes papers in the areas of software engineering methods and practices, object-oriented systems, rapid prototyping, software reuse, cleanroom software engineering, stepwise refinement/enhancement, formal methods of specification, ambiguity in software development, impact of CASE on software development life cycle, knowledge engineering methods and practices, logic programming, expert systems, knowledge-based systems, distributed knowledge-based systems, deductive database systems, knowledge representations, knowledge-based systems in language translation & processing, software and knowledge-ware maintenance, reverse engineering in software design, and applications in various domains of interest.