学习用统计机器翻译从源代码生成伪代码(T)

2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE) Pub Date : 2015-11-09 DOI:10.1109/ASE.2015.36

Yusuke Oda, Hiroyuki Fudaba, Graham Neubig, Hideaki Hata, S. Sakti, T. Toda, Satoshi Nakamura

{"title":"学习用统计机器翻译从源代码生成伪代码(T)","authors":"Yusuke Oda, Hiroyuki Fudaba, Graham Neubig, Hideaki Hata, S. Sakti, T. Toda, Satoshi Nakamura","doi":"10.1109/ASE.2015.36","DOIUrl":null,"url":null,"abstract":"Pseudo-code written in natural language can aid the comprehension of source code in unfamiliar programming languages. However, the great majority of source code has no corresponding pseudo-code, because pseudo-code is redundant and laborious to create. If pseudo-code could be generated automatically and instantly from given source code, we could allow for on-demand production of pseudo-code without human effort. In this paper, we propose a method to automatically generate pseudo-code from source code, specifically adopting the statistical machine translation (SMT) framework. SMT, which was originally designed to translate between two natural languages, allows us to automatically learn the relationship between source code/pseudo-code pairs, making it possible to create a pseudo-code generator with less human effort. In experiments, we generated English or Japanese pseudo-code from Python statements using SMT, and find that the generated pseudo-code is largely accurate, and aids code understanding.","PeriodicalId":6586,"journal":{"name":"2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"15 1","pages":"574-584"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"245","resultStr":"{\"title\":\"Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation (T)\",\"authors\":\"Yusuke Oda, Hiroyuki Fudaba, Graham Neubig, Hideaki Hata, S. Sakti, T. Toda, Satoshi Nakamura\",\"doi\":\"10.1109/ASE.2015.36\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pseudo-code written in natural language can aid the comprehension of source code in unfamiliar programming languages. However, the great majority of source code has no corresponding pseudo-code, because pseudo-code is redundant and laborious to create. If pseudo-code could be generated automatically and instantly from given source code, we could allow for on-demand production of pseudo-code without human effort. In this paper, we propose a method to automatically generate pseudo-code from source code, specifically adopting the statistical machine translation (SMT) framework. SMT, which was originally designed to translate between two natural languages, allows us to automatically learn the relationship between source code/pseudo-code pairs, making it possible to create a pseudo-code generator with less human effort. In experiments, we generated English or Japanese pseudo-code from Python statements using SMT, and find that the generated pseudo-code is largely accurate, and aids code understanding.\",\"PeriodicalId\":6586,\"journal\":{\"name\":\"2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)\",\"volume\":\"15 1\",\"pages\":\"574-584\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"245\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASE.2015.36\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASE.2015.36","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 245

摘要

用自然语言编写的伪代码可以帮助理解用不熟悉的编程语言编写的源代码。然而，绝大多数源代码都没有相应的伪代码，因为伪代码是冗余的，而且创建起来很费力。如果伪代码可以从给定的源代码自动地、即时地生成，我们就可以允许按需生产伪代码，而不需要人工的努力。本文提出了一种从源代码自动生成伪代码的方法，具体采用统计机器翻译(SMT)框架。SMT最初设计用于在两种自然语言之间进行翻译，它允许我们自动学习源代码/伪代码对之间的关系，从而可以用更少的人力创建伪代码生成器。在实验中，我们使用SMT从Python语句生成英语或日语伪代码，并发现生成的伪代码在很大程度上是准确的，并且有助于代码理解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation (T)

Pseudo-code written in natural language can aid the comprehension of source code in unfamiliar programming languages. However, the great majority of source code has no corresponding pseudo-code, because pseudo-code is redundant and laborious to create. If pseudo-code could be generated automatically and instantly from given source code, we could allow for on-demand production of pseudo-code without human effort. In this paper, we propose a method to automatically generate pseudo-code from source code, specifically adopting the statistical machine translation (SMT) framework. SMT, which was originally designed to translate between two natural languages, allows us to automatically learn the relationship between source code/pseudo-code pairs, making it possible to create a pseudo-code generator with less human effort. In experiments, we generated English or Japanese pseudo-code from Python statements using SMT, and find that the generated pseudo-code is largely accurate, and aids code understanding.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)

自引率

0.00%

发文量

期刊最新文献

Cost-Efficient Sampling for Performance Prediction of Configurable Systems (T) Refactorings for Android Asynchronous Programming Study and Refactoring of Android Asynchronous Programming (T) The iMPAcT Tool: Testing UI Patterns on Mobile Applications Combining Deep Learning with Information Retrieval to Localize Buggy Files for Bug Reports (N)