THANH VAN NGUYEN, Peter C. Rigby, A. Nguyen, Mark Karanfil, T. Nguyen
{"title":"T2API: synthesizing API code usage templates from English texts with statistical translation","authors":"THANH VAN NGUYEN, Peter C. Rigby, A. Nguyen, Mark Karanfil, T. Nguyen","doi":"10.1145/2950290.2983931","DOIUrl":null,"url":null,"abstract":"In this work, we develop T2API, a statistical machine translation-based tool that takes a given English description of a programming task as a query, and synthesizes the API usage template for the task by learning from training data. T2API works in two steps. First, it derives the API elements relevant to the task described in the input by statistically learning from a StackOverflow corpus of text descriptions and corresponding code. To infer those API elements, it also considers the context of the words in the textual input and the context of API elements that often go together in the corpus. The inferred API elements with their relevance scores are ensembled into an API usage by our novel API usage synthesis algorithm that learns the API usages from a large code corpus via a graph-based language model. Importantly, T2API is capable of generating new API usages from smaller, previously-seen usages.","PeriodicalId":20532,"journal":{"name":"Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering","volume":"50 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"51","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2950290.2983931","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 51
Abstract
In this work, we develop T2API, a statistical machine translation-based tool that takes a given English description of a programming task as a query, and synthesizes the API usage template for the task by learning from training data. T2API works in two steps. First, it derives the API elements relevant to the task described in the input by statistically learning from a StackOverflow corpus of text descriptions and corresponding code. To infer those API elements, it also considers the context of the words in the textual input and the context of API elements that often go together in the corpus. The inferred API elements with their relevance scores are ensembled into an API usage by our novel API usage synthesis algorithm that learns the API usages from a large code corpus via a graph-based language model. Importantly, T2API is capable of generating new API usages from smaller, previously-seen usages.