CodeExp: Explanatory Code Document Generation

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing Pub Date : 2022-11-25 DOI:10.48550/arXiv.2211.15395

Haotian Cui, Chenglong Wang, Junjie Huang, J. Inala, Todd Mytkowicz, Bolong Wang, Jian Gao, Nan Duan

{"title":"CodeExp: Explanatory Code Document Generation","authors":"Haotian Cui, Chenglong Wang, Junjie Huang, J. Inala, Todd Mytkowicz, Bolong Wang, Jian Gao, Nan Duan","doi":"10.48550/arXiv.2211.15395","DOIUrl":null,"url":null,"abstract":"Developing models that can automatically generate detailed code explanation can greatly benefit software maintenance and programming education. However, existing code-to-text generation models often produce only high-level summaries of code that do not capture implementation-level choices essential for these scenarios. To fill in this gap, we propose the code explanation generation task. We first conducted a human study to identify the criteria for high-quality explanatory docstring for code. Based on that, we collected and refined a large-scale code docstring corpus and formulated automatic evaluation metrics that best match human assessments. Finally, we present a multi-stage fine-tuning strategy and baseline models for the task. Our experiments show that (1) our refined training dataset lets models achieve better performance in the explanation generation tasks compared to larger unrefined data (15x larger), and (2) fine-tuned models can generate well-structured long docstrings comparable to human-written ones. We envision our training dataset, human-evaluation protocol, recommended metrics, and fine-tuning strategy can boost future code explanation research. The code and annotated data are available at https://github.com/subercui/CodeExp.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"62 1","pages":"2342-2354"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2211.15395","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Developing models that can automatically generate detailed code explanation can greatly benefit software maintenance and programming education. However, existing code-to-text generation models often produce only high-level summaries of code that do not capture implementation-level choices essential for these scenarios. To fill in this gap, we propose the code explanation generation task. We first conducted a human study to identify the criteria for high-quality explanatory docstring for code. Based on that, we collected and refined a large-scale code docstring corpus and formulated automatic evaluation metrics that best match human assessments. Finally, we present a multi-stage fine-tuning strategy and baseline models for the task. Our experiments show that (1) our refined training dataset lets models achieve better performance in the explanation generation tasks compared to larger unrefined data (15x larger), and (2) fine-tuned models can generate well-structured long docstrings comparable to human-written ones. We envision our training dataset, human-evaluation protocol, recommended metrics, and fine-tuning strategy can boost future code explanation research. The code and annotated data are available at https://github.com/subercui/CodeExp.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

解释性代码文档生成

开发能够自动生成详细代码解释的模型可以极大地有利于软件维护和编程教育。然而，现有的代码到文本生成模型通常只生成代码的高级摘要，而不能捕获这些场景所必需的实现级选择。为了填补这一空白，我们提出了代码解释生成任务。我们首先进行了一项人类研究，以确定用于代码的高质量解释性文档字符串的标准。在此基础上，我们收集并细化了一个大规模的代码文档字符串语料库，并制定了最符合人类评估的自动评估指标。最后，我们提出了一个多阶段的微调策略和基线模型。我们的实验表明:(1)与较大的未精炼数据(大15倍)相比，我们的精炼训练数据集让模型在解释生成任务中获得更好的性能，(2)微调模型可以生成结构良好的长文档字符串，与人类编写的文档字符串相当。我们设想我们的训练数据集、人类评估协议、推荐指标和微调策略可以促进未来的代码解释研究。代码和带注释的数据可在https://github.com/subercui/CodeExp上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

自引率

0.00%

发文量