In-IDE Code Generation from Natural Language: Promise and Challenges

ACM Transactions on Software Engineering and Methodology (TOSEM) Pub Date : 2021-01-27 DOI:10.1145/3487569

Frank F. Xu, Bogdan Vasilescu, Graham Neubig

{"title":"In-IDE Code Generation from Natural Language: Promise and Challenges","authors":"Frank F. Xu, Bogdan Vasilescu, Graham Neubig","doi":"10.1145/3487569","DOIUrl":null,"url":null,"abstract":"A great part of software development involves conceptualizing or communicating the underlying procedures and logic that needs to be expressed in programs. One major difficulty of programming is turning concept into code, especially when dealing with the APIs of unfamiliar libraries. Recently, there has been a proliferation of machine learning methods for code generation and retrieval from natural language queries, but these have primarily been evaluated purely based on retrieval accuracy or overlap of generated code with developer-written code, and the actual effect of these methods on the developer workflow is surprisingly unattested. In this article, we perform the first comprehensive investigation of the promise and challenges of using such technology inside the PyCharm IDE, asking, “At the current state of technology does it improve developer productivity or accuracy, how does it affect the developer experience, and what are the remaining gaps and challenges?” To facilitate the study, we first develop a plugin for the PyCharm IDE that implements a hybrid of code generation and code retrieval functionality, and we orchestrate virtual environments to enable collection of many user events (e.g., web browsing, keystrokes, fine-grained code edits). We ask developers with various backgrounds to complete 7 varieties of 14 Python programming tasks ranging from basic file manipulation to machine learning or data visualization, with or without the help of the plugin. While qualitative surveys of developer experience are largely positive, quantitative results with regards to increased productivity, code quality, or program correctness are inconclusive. Further analysis identifies several pain points that could improve the effectiveness of future machine learning-based code generation/retrieval developer assistants and demonstrates when developers prefer code generation over code retrieval and vice versa. We release all data and software to pave the road for future empirical studies on this topic, as well as development of better code generation models.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"1 1","pages":"1 - 47"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"68","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology (TOSEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3487569","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 68

Abstract

A great part of software development involves conceptualizing or communicating the underlying procedures and logic that needs to be expressed in programs. One major difficulty of programming is turning concept into code, especially when dealing with the APIs of unfamiliar libraries. Recently, there has been a proliferation of machine learning methods for code generation and retrieval from natural language queries, but these have primarily been evaluated purely based on retrieval accuracy or overlap of generated code with developer-written code, and the actual effect of these methods on the developer workflow is surprisingly unattested. In this article, we perform the first comprehensive investigation of the promise and challenges of using such technology inside the PyCharm IDE, asking, “At the current state of technology does it improve developer productivity or accuracy, how does it affect the developer experience, and what are the remaining gaps and challenges?” To facilitate the study, we first develop a plugin for the PyCharm IDE that implements a hybrid of code generation and code retrieval functionality, and we orchestrate virtual environments to enable collection of many user events (e.g., web browsing, keystrokes, fine-grained code edits). We ask developers with various backgrounds to complete 7 varieties of 14 Python programming tasks ranging from basic file manipulation to machine learning or data visualization, with or without the help of the plugin. While qualitative surveys of developer experience are largely positive, quantitative results with regards to increased productivity, code quality, or program correctness are inconclusive. Further analysis identifies several pain points that could improve the effectiveness of future machine learning-based code generation/retrieval developer assistants and demonstrates when developers prefer code generation over code retrieval and vice versa. We release all data and software to pave the road for future empirical studies on this topic, as well as development of better code generation models.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

从自然语言生成ide内代码:希望与挑战

软件开发的很大一部分涉及概念化或沟通需要在程序中表达的底层过程和逻辑。编程的一个主要困难是将概念转化为代码，特别是在处理不熟悉的库的api时。最近，用于从自然语言查询生成和检索代码的机器学习方法激增，但这些方法主要是纯粹基于检索准确性或生成的代码与开发人员编写的代码的重叠来评估的，并且这些方法对开发人员工作流程的实际影响令人惊讶地未经证实。在本文中，我们对在PyCharm IDE中使用此类技术的前景和挑战进行了首次全面调查，并提出了以下问题:“在目前的技术状态下，它是否提高了开发人员的生产力或准确性?它如何影响开发人员的体验?还有哪些差距和挑战?”为了便于研究，我们首先为PyCharm IDE开发了一个插件，该插件实现了代码生成和代码检索功能的混合，并且我们编排了虚拟环境，以支持收集许多用户事件(例如，网页浏览，击键，细粒度代码编辑)。我们要求具有不同背景的开发人员完成14种Python编程任务中的7种，从基本的文件操作到机器学习或数据可视化，有或没有插件的帮助。虽然对开发人员经验的定性调查在很大程度上是积极的，但是关于提高生产力、代码质量或程序正确性的定量结果是不确定的。进一步的分析确定了几个痛点，这些痛点可以提高未来基于机器学习的代码生成/检索开发人员助手的有效性，并演示了开发人员何时更喜欢代码生成而不是代码检索，反之亦然。我们发布了所有的数据和软件，为未来对这个主题的实证研究铺平道路，以及开发更好的代码生成模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Software Engineering and Methodology (TOSEM)

自引率

0.00%

发文量

期刊最新文献

Turnover of Companies in OpenStack: Prevalence and Rationale Super-optimization of Smart Contracts Verification of Programs Sensitive to Heap Layout Assessing and Improving an Evaluation Dataset for Detecting Semantic Code Clones via Deep Learning Guaranteeing Timed Opacity using Parametric Timed Model Checking