NL2Type:从自然语言信息推断JavaScript函数类型

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) Pub Date : 2019-05-01 DOI:10.1109/ICSE.2019.00045

Rabee Sohail Malik, Jibesh Patra, Michael Pradel

{"title":"NL2Type:从自然语言信息推断JavaScript函数类型","authors":"Rabee Sohail Malik, Jibesh Patra, Michael Pradel","doi":"10.1109/ICSE.2019.00045","DOIUrl":null,"url":null,"abstract":"JavaScript is dynamically typed and hence lacks the type safety of statically typed languages, leading to suboptimal IDE support, difficult to understand APIs, and unexpected runtime behavior. Several gradual type systems have been proposed, e.g., Flow and TypeScript, but they rely on developers to annotate code with types. This paper presents NL2Type, a learning-based approach for predicting likely type signatures of JavaScript functions. The key idea is to exploit natural language information in source code, such as comments, function names, and parameter names, a rich source of knowledge that is typically ignored by type inference algorithms. We formulate the problem of predicting types as a classification problem and train a recurrent, LSTM-based neural model that, after learning from an annotated code base, predicts function types for unannotated code. We evaluate the approach with a corpus of 162,673 JavaScript files from real-world projects. NL2Type predicts types with a precision of 84.1% and a recall of 78.9% when considering only the top-most suggestion, and with a precision of 95.5% and a recall of 89.6% when considering the top-5 suggestions. The approach outperforms both JSNice, a state-of-the-art approach that analyzes implementations of functions instead of natural language information, and DeepTyper, a recent type prediction approach that is also based on deep learning. Beyond predicting types, NL2Type serves as a consistency checker for existing type annotations. We show that it discovers 39 inconsistencies that deserve developer attention (from a manual analysis of 50 warnings), most of which are due to incorrect type annotations.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"21 1","pages":"304-315"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"105","resultStr":"{\"title\":\"NL2Type: Inferring JavaScript Function Types from Natural Language Information\",\"authors\":\"Rabee Sohail Malik, Jibesh Patra, Michael Pradel\",\"doi\":\"10.1109/ICSE.2019.00045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"JavaScript is dynamically typed and hence lacks the type safety of statically typed languages, leading to suboptimal IDE support, difficult to understand APIs, and unexpected runtime behavior. Several gradual type systems have been proposed, e.g., Flow and TypeScript, but they rely on developers to annotate code with types. This paper presents NL2Type, a learning-based approach for predicting likely type signatures of JavaScript functions. The key idea is to exploit natural language information in source code, such as comments, function names, and parameter names, a rich source of knowledge that is typically ignored by type inference algorithms. We formulate the problem of predicting types as a classification problem and train a recurrent, LSTM-based neural model that, after learning from an annotated code base, predicts function types for unannotated code. We evaluate the approach with a corpus of 162,673 JavaScript files from real-world projects. NL2Type predicts types with a precision of 84.1% and a recall of 78.9% when considering only the top-most suggestion, and with a precision of 95.5% and a recall of 89.6% when considering the top-5 suggestions. The approach outperforms both JSNice, a state-of-the-art approach that analyzes implementations of functions instead of natural language information, and DeepTyper, a recent type prediction approach that is also based on deep learning. Beyond predicting types, NL2Type serves as a consistency checker for existing type annotations. We show that it discovers 39 inconsistencies that deserve developer attention (from a manual analysis of 50 warnings), most of which are due to incorrect type annotations.\",\"PeriodicalId\":6736,\"journal\":{\"name\":\"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)\",\"volume\":\"21 1\",\"pages\":\"304-315\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"105\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSE.2019.00045\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE.2019.00045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 105

摘要

JavaScript是动态类型的，因此缺乏静态类型语言的类型安全性，导致IDE支持不够理想，难以理解api，以及意外的运行时行为。已经提出了几种渐进式类型系统，例如Flow和TypeScript，但它们依赖于开发人员用类型注释代码。本文介绍了NL2Type，一种基于学习的方法，用于预测JavaScript函数可能的类型签名。关键思想是利用源代码中的自然语言信息，例如注释、函数名和参数名，这是通常被类型推断算法忽略的丰富的知识来源。我们将预测类型的问题表述为一个分类问题，并训练一个循环的、基于lstm的神经模型，该模型在从带注释的代码库学习后，预测未带注释的代码的函数类型。我们使用来自实际项目的162,673个JavaScript文件的语料库来评估这种方法。NL2Type仅考虑最前面的建议时，预测类型的准确率为84.1%，召回率为78.9%;考虑前5条建议时，预测类型的准确率为95.5%，召回率为89.6%。该方法优于JSNice(一种最先进的方法，分析功能实现而不是自然语言信息)和DeepTyper(一种最新的基于深度学习的类型预测方法)。除了预测类型之外，NL2Type还可以作为现有类型注释的一致性检查器。我们表明，它发现了39个值得开发人员注意的不一致(来自对50个警告的手动分析)，其中大多数是由于不正确的类型注释。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

NL2Type: Inferring JavaScript Function Types from Natural Language Information

JavaScript is dynamically typed and hence lacks the type safety of statically typed languages, leading to suboptimal IDE support, difficult to understand APIs, and unexpected runtime behavior. Several gradual type systems have been proposed, e.g., Flow and TypeScript, but they rely on developers to annotate code with types. This paper presents NL2Type, a learning-based approach for predicting likely type signatures of JavaScript functions. The key idea is to exploit natural language information in source code, such as comments, function names, and parameter names, a rich source of knowledge that is typically ignored by type inference algorithms. We formulate the problem of predicting types as a classification problem and train a recurrent, LSTM-based neural model that, after learning from an annotated code base, predicts function types for unannotated code. We evaluate the approach with a corpus of 162,673 JavaScript files from real-world projects. NL2Type predicts types with a precision of 84.1% and a recall of 78.9% when considering only the top-most suggestion, and with a precision of 95.5% and a recall of 89.6% when considering the top-5 suggestions. The approach outperforms both JSNice, a state-of-the-art approach that analyzes implementations of functions instead of natural language information, and DeepTyper, a recent type prediction approach that is also based on deep learning. Beyond predicting types, NL2Type serves as a consistency checker for existing type annotations. We show that it discovers 39 inconsistencies that deserve developer attention (from a manual analysis of 50 warnings), most of which are due to incorrect type annotations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助