When Code Completion Fails: A Case Study on Real-World Completions

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) Pub Date : 2019-05-01 DOI:10.1109/ICSE.2019.00101

V. Hellendoorn, Sebastian Proksch, H. Gall, Alberto Bacchelli

{"title":"When Code Completion Fails: A Case Study on Real-World Completions","authors":"V. Hellendoorn, Sebastian Proksch, H. Gall, Alberto Bacchelli","doi":"10.1109/ICSE.2019.00101","DOIUrl":null,"url":null,"abstract":"Code completion is commonly used by software developers and is integrated into all major IDE's. Good completion tools can not only save time and effort but may also help avoid incorrect API usage. Many proposed completion tools have shown promising results on synthetic benchmarks, but these benchmarks make no claims about the realism of the completions they test. This lack of grounding in real-world data could hinder our scientific understanding of developer needs and of the efficacy of completion models. This paper presents a case study on 15,000 code completions that were applied by 66 real developers, which we study and contrast with artificial completions to inform future research and tools in this area. We find that synthetic benchmarks misrepresent many aspects of real-world completions; tested completion tools were far less accurate on real-world data. Worse, on the few completions that consumed most of the developers' time, prediction accuracy was less than 20% -- an effect that is invisible in synthetic benchmarks. Our findings have ramifications for future benchmarks, tool design and real-world efficacy: Benchmarks must account for completions that developers use most, such as intra-project APIs; models should be designed to be amenable to intra-project data; and real-world developer trials are essential to quantifying performance on the least predictable completions, which are both most time-consuming and far more typical than artificial data suggests. We publicly release our preprint [https://doi.org/10.5281/zenodo.2565673] and replication data and materials [https://doi.org/10.5281/zenodo.2562249].","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"8 1","pages":"960-970"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"55","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE.2019.00101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 55

Abstract

Code completion is commonly used by software developers and is integrated into all major IDE's. Good completion tools can not only save time and effort but may also help avoid incorrect API usage. Many proposed completion tools have shown promising results on synthetic benchmarks, but these benchmarks make no claims about the realism of the completions they test. This lack of grounding in real-world data could hinder our scientific understanding of developer needs and of the efficacy of completion models. This paper presents a case study on 15,000 code completions that were applied by 66 real developers, which we study and contrast with artificial completions to inform future research and tools in this area. We find that synthetic benchmarks misrepresent many aspects of real-world completions; tested completion tools were far less accurate on real-world data. Worse, on the few completions that consumed most of the developers' time, prediction accuracy was less than 20% -- an effect that is invisible in synthetic benchmarks. Our findings have ramifications for future benchmarks, tool design and real-world efficacy: Benchmarks must account for completions that developers use most, such as intra-project APIs; models should be designed to be amenable to intra-project data; and real-world developer trials are essential to quantifying performance on the least predictable completions, which are both most time-consuming and far more typical than artificial data suggests. We publicly release our preprint [https://doi.org/10.5281/zenodo.2565673] and replication data and materials [https://doi.org/10.5281/zenodo.2562249].

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

当代码完成失败:一个关于真实世界完成的案例研究

代码补全通常被软件开发人员使用，并且集成到所有主要的IDE中。好的完井工具不仅可以节省时间和精力，还可以帮助避免错误的API使用。许多建议的完井工具在综合基准测试中显示出有希望的结果，但是这些基准测试并没有说明它们所测试完井的真实性。缺乏真实数据的基础可能会阻碍我们对开发人员需求和完井模型有效性的科学理解。本文提出了一个由66名真正的开发人员应用的15000个代码完成的案例研究，我们将其与人工完成进行研究和对比，以告知该领域未来的研究和工具。我们发现，合成基准错误地反映了现实完井的许多方面;经过测试的完井工具在实际数据上的准确性要低得多。更糟糕的是，在少数几个消耗了开发人员大部分时间的完井中，预测准确率低于20%——这在合成基准测试中是看不见的。我们的研究结果对未来的基准测试、工具设计和现实世界的效率有影响:基准测试必须考虑开发人员使用最多的完成，比如项目内部api;模型的设计应符合项目内部数据;现实世界的开发人员试验对于量化最不可预测完井的性能至关重要，这些完井既耗时又比人工数据所显示的更为典型。我们公开发布我们的预印本[https://doi.org/10.5281/zenodo.2565673]和复制数据和材料[https://doi.org/10.5281/zenodo.2562249]。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量

期刊最新文献

VFix: Value-Flow-Guided Precise Program Repair for Null Pointer Dereferences Search-Based Energy Testing of Android Scalable Approaches for Test Suite Reduction A System Identification Based Oracle for Control-CPS Software Fault Localization Training Binary Classifiers as Data Structure Invariants