Deep API Learning Revisited

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC) Pub Date : 2022-05-01 DOI:10.1145/3524610.3527872

James Martin, Jinrong Guo

{"title":"Deep API Learning Revisited","authors":"James Martin, Jinrong Guo","doi":"10.1145/3524610.3527872","DOIUrl":null,"url":null,"abstract":"Understanding the correct API usage sequences is one of the most important tasks for programmers when they work with unfamiliar libraries. However, programmers often encounter obstacles to finding the appropriate information due to either poor quality of API documentation or ineffective query-based searching strategy. To help solve this issue, researchers have proposed various methods to suggest the sequence of APIs given natural language queries representing the information needs from programmers. Among such efforts, Gu et al. adopted a deep learning method, in particular an RNN Encoder-Decoder architecture, to perform this task and obtained promising results on common APIs in Java. In this work, we aim to reproduce their results and apply the same methods for APIs in Python. Additionally, we compare the performance with a more recent Transformer-based method, i.e., CodeBERT, for the same task. Our experiment reveals a clear drop in performance measures when careful data cleaning is performed. Owing to the pretraining from a large number of source code files and effective encoding technique, CodeBERT outperforms the method by Gu et al., to a large extent.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"112 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3524610.3527872","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Understanding the correct API usage sequences is one of the most important tasks for programmers when they work with unfamiliar libraries. However, programmers often encounter obstacles to finding the appropriate information due to either poor quality of API documentation or ineffective query-based searching strategy. To help solve this issue, researchers have proposed various methods to suggest the sequence of APIs given natural language queries representing the information needs from programmers. Among such efforts, Gu et al. adopted a deep learning method, in particular an RNN Encoder-Decoder architecture, to perform this task and obtained promising results on common APIs in Java. In this work, we aim to reproduce their results and apply the same methods for APIs in Python. Additionally, we compare the performance with a more recent Transformer-based method, i.e., CodeBERT, for the same task. Our experiment reveals a clear drop in performance measures when careful data cleaning is performed. Owing to the pretraining from a large number of source code files and effective encoding technique, CodeBERT outperforms the method by Gu et al., to a large extent.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

深度API学习重访

当程序员使用不熟悉的库时，理解正确的API使用顺序是最重要的任务之一。但是，由于API文档质量差或基于查询的搜索策略无效，程序员在查找适当信息时经常遇到障碍。为了帮助解决这个问题，研究人员提出了各种方法来建议api的顺序，给出了代表程序员信息需求的自然语言查询。在这些努力中，Gu等人采用了深度学习方法，特别是RNN编码器-解码器架构来执行该任务，并在Java中的常见api上获得了令人满意的结果。在这项工作中，我们的目标是重现他们的结果，并将相同的方法应用于Python中的api。此外，对于相同的任务，我们将性能与最近的基于transformer的方法(即CodeBERT)进行比较。我们的实验显示，在执行仔细的数据清理时，性能指标明显下降。CodeBERT基于大量源代码文件的预训练和有效的编码技术，在很大程度上优于Gu等人的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

自引率

0.00%

发文量

期刊最新文献

Context-based Cluster Fault Localization Fine-Grained Code-Comment Semantic Interaction Analysis Find Bugs in Static Bug Finders Self-Supervised Learning of Smart Contract Representations An Exploratory Study of Analyzing JavaScript Online Code Clones