{"title":"基于结构化树输入和AST解码器注意力增强的代码生成方法","authors":"Wenjun Wei, Junhua Wu","doi":"10.1109/QRS-C57518.2022.00077","DOIUrl":null,"url":null,"abstract":"Automatic code generation based on natural language input is important to research in the field of software engineering. In the past, it was mostly a seq2seq structure and used the RNN model. Input and output are regarded as simple sequences, and syntactic structure information in source information is often ignored. This paper proposes a code generation method Tx(Tree-Tree). It uses structured trees to replace simple word sequences so that the model can better learn the syntactic and semantic information in the source information. Therefore, it can alleviate the long dependency problem caused by too long source information. At the same time, the enhanced attention mechanism is adopted in the decoder to distinguish the influence of different historical actions on the current predicted action. The model is validated on three datasets: DJANGO, CONALA, and ATIS. Compared with some typical models, Tx(Tree-Tree) improves both accuracy and BLEU.","PeriodicalId":183728,"journal":{"name":"2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Code Generation Method based on Structured Tree Input and AST Decoder Attention Augmentation\",\"authors\":\"Wenjun Wei, Junhua Wu\",\"doi\":\"10.1109/QRS-C57518.2022.00077\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic code generation based on natural language input is important to research in the field of software engineering. In the past, it was mostly a seq2seq structure and used the RNN model. Input and output are regarded as simple sequences, and syntactic structure information in source information is often ignored. This paper proposes a code generation method Tx(Tree-Tree). It uses structured trees to replace simple word sequences so that the model can better learn the syntactic and semantic information in the source information. Therefore, it can alleviate the long dependency problem caused by too long source information. At the same time, the enhanced attention mechanism is adopted in the decoder to distinguish the influence of different historical actions on the current predicted action. The model is validated on three datasets: DJANGO, CONALA, and ATIS. Compared with some typical models, Tx(Tree-Tree) improves both accuracy and BLEU.\",\"PeriodicalId\":183728,\"journal\":{\"name\":\"2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C)\",\"volume\":\"113 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/QRS-C57518.2022.00077\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QRS-C57518.2022.00077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Code Generation Method based on Structured Tree Input and AST Decoder Attention Augmentation
Automatic code generation based on natural language input is important to research in the field of software engineering. In the past, it was mostly a seq2seq structure and used the RNN model. Input and output are regarded as simple sequences, and syntactic structure information in source information is often ignored. This paper proposes a code generation method Tx(Tree-Tree). It uses structured trees to replace simple word sequences so that the model can better learn the syntactic and semantic information in the source information. Therefore, it can alleviate the long dependency problem caused by too long source information. At the same time, the enhanced attention mechanism is adopted in the decoder to distinguish the influence of different historical actions on the current predicted action. The model is validated on three datasets: DJANGO, CONALA, and ATIS. Compared with some typical models, Tx(Tree-Tree) improves both accuracy and BLEU.