DLBT: Deep Learning-Based Transformer to Generate Pseudo-Code from Source Code

IF 1.7 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Cmc-computers Materials & Continua Pub Date : 2022-01-01 DOI:10.32604/cmc.2022.019884

Walaa K. Gad, Anas Alokla, Waleed Nazih, M. Aref, A. M. Salem

{"title":"DLBT: Deep Learning-Based Transformer to Generate Pseudo-Code from Source Code","authors":"Walaa K. Gad, Anas Alokla, Waleed Nazih, M. Aref, A. M. Salem","doi":"10.32604/cmc.2022.019884","DOIUrl":null,"url":null,"abstract":": Understanding the content of the source code and its regular expression is very difficult when they are written in an unfamiliar language. Pseudo-code explains and describes the content of the code without using syntax or programming language technologies. However, writing Pseudo-code to each code instruction is laborious. Recently, neural machine translation is used to generate textual descriptions for the source code. In this paper, a novel deep learning-based transformer (DLBT) model is proposed for automatic Pseudo-code generation from the source code. The proposed model uses deep learning which is based on Neural Machine Translation (NMT) to work as a language translator. The DLBT is based on the transformer which is an encoder-decoder structure. There are three major components: tokenizer and embeddings, transformer, and post-processing. Each code line is tokenized to dense vector. Then transformer captures the relatedness between the source code and the matching Pseudo-code without the need of Recurrent Neural Network (RNN). At the post-processing step, the generated Pseudo-code is optimized. The proposed model is assessed using a real Python dataset, which contains more than 18,800 lines of a source code written in Python. The experiments show promising performance results compared with other machine translation methods such as Recurrent Neural Network (RNN). The proposed DLBT records 47.32, 68. 49 accuracy and BLEU performance measures, respectively.","PeriodicalId":10440,"journal":{"name":"Cmc-computers Materials & Continua","volume":"2016 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cmc-computers Materials & Continua","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.32604/cmc.2022.019884","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 5

Abstract

: Understanding the content of the source code and its regular expression is very difficult when they are written in an unfamiliar language. Pseudo-code explains and describes the content of the code without using syntax or programming language technologies. However, writing Pseudo-code to each code instruction is laborious. Recently, neural machine translation is used to generate textual descriptions for the source code. In this paper, a novel deep learning-based transformer (DLBT) model is proposed for automatic Pseudo-code generation from the source code. The proposed model uses deep learning which is based on Neural Machine Translation (NMT) to work as a language translator. The DLBT is based on the transformer which is an encoder-decoder structure. There are three major components: tokenizer and embeddings, transformer, and post-processing. Each code line is tokenized to dense vector. Then transformer captures the relatedness between the source code and the matching Pseudo-code without the need of Recurrent Neural Network (RNN). At the post-processing step, the generated Pseudo-code is optimized. The proposed model is assessed using a real Python dataset, which contains more than 18,800 lines of a source code written in Python. The experiments show promising performance results compared with other machine translation methods such as Recurrent Neural Network (RNN). The proposed DLBT records 47.32, 68. 49 accuracy and BLEU performance measures, respectively.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DLBT:基于深度学习的从源代码生成伪代码的转换器

:当源代码是用不熟悉的语言编写时，理解源代码的内容及其正则表达式是非常困难的。伪代码在不使用语法或编程语言技术的情况下解释和描述代码的内容。然而，为每个代码指令编写伪代码是很费力的。最近，神经机器翻译被用于生成源代码的文本描述。本文提出了一种基于深度学习的变压器(DLBT)模型，用于从源代码自动生成伪代码。该模型使用基于神经机器翻译(NMT)的深度学习作为语言翻译。DLBT是基于变压器的，它是一个编码器-解码器结构。有三个主要组件:标记器和嵌入、转换器和后处理。每个代码行被标记为密集向量。然后，transformer在不需要递归神经网络(RNN)的情况下捕获源代码与匹配伪代码之间的相关性。在后处理步骤中，对生成的伪代码进行优化。所提出的模型使用真实的Python数据集进行评估，该数据集包含超过18,800行用Python编写的源代码。与其他机器翻译方法(如递归神经网络(RNN))相比，实验显示了良好的性能。拟议的DLBT记录为47.32,68。49精度和BLEU性能测量分别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Cmc-computers Materials & Continua 工程技术-材料科学：综合

CiteScore

5.30

自引率

19.40%

发文量

345

审稿时长

1 months

期刊介绍： This journal publishes original research papers in the areas of computer networks, artificial intelligence, big data management, software engineering, multimedia, cyber security, internet of things, materials genome, integrated materials science, data analysis, modeling, and engineering of designing and manufacturing of modern functional and multifunctional materials. Novel high performance computing methods, big data analysis, and artificial intelligence that advance material technologies are especially welcome.