{"title":"Exploring API Embedding for API Usages and Applications","authors":"Trong Duc Nguyen, A. Nguyen, H. Phan, T. Nguyen","doi":"10.1109/ICSE.2017.47","DOIUrl":null,"url":null,"abstract":"Word2Vec is a class of neural network models that as being trainedfrom a large corpus of texts, they can produce for each unique word acorresponding vector in a continuous space in which linguisticcontexts of words can be observed. In this work, we study thecharacteristics of Word2Vec vectors, called API2VEC or API embeddings, for the API elements within the API sequences in source code. Ourempirical study shows that the close proximity of the API2VEC vectorsfor API elements reflects the similar usage contexts containing thesurrounding APIs of those API elements. Moreover, API2VEC can captureseveral similar semantic relations between API elements in API usagesvia vector offsets. We demonstrate the usefulness of API2VEC vectorsfor API elements in three applications. First, we build a tool thatmines the pairs of API elements that share the same usage relationsamong them. The other applications are in the code migrationdomain. We develop API2API, a tool to automatically learn the APImappings between Java and C# using a characteristic of the API2VECvectors for API elements in the two languages: semantic relationsamong API elements in their usages are observed in the two vectorspaces for the two languages as similar geometric arrangements amongtheir API2VEC vectors. Our empirical evaluation shows that API2APIrelatively improves 22.6% and 40.1% top-1 and top-5 accuracy over astate-of-the-art mining approach for API mappings. Finally, as anotherapplication in code migration, we are able to migrate equivalent APIusages from Java to C# with up to 90.6% recall and 87.2% precision.","PeriodicalId":6505,"journal":{"name":"2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE)","volume":"36 1","pages":"438-449"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"145","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE.2017.47","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 145
Abstract
Word2Vec is a class of neural network models that as being trainedfrom a large corpus of texts, they can produce for each unique word acorresponding vector in a continuous space in which linguisticcontexts of words can be observed. In this work, we study thecharacteristics of Word2Vec vectors, called API2VEC or API embeddings, for the API elements within the API sequences in source code. Ourempirical study shows that the close proximity of the API2VEC vectorsfor API elements reflects the similar usage contexts containing thesurrounding APIs of those API elements. Moreover, API2VEC can captureseveral similar semantic relations between API elements in API usagesvia vector offsets. We demonstrate the usefulness of API2VEC vectorsfor API elements in three applications. First, we build a tool thatmines the pairs of API elements that share the same usage relationsamong them. The other applications are in the code migrationdomain. We develop API2API, a tool to automatically learn the APImappings between Java and C# using a characteristic of the API2VECvectors for API elements in the two languages: semantic relationsamong API elements in their usages are observed in the two vectorspaces for the two languages as similar geometric arrangements amongtheir API2VEC vectors. Our empirical evaluation shows that API2APIrelatively improves 22.6% and 40.1% top-1 and top-5 accuracy over astate-of-the-art mining approach for API mappings. Finally, as anotherapplication in code migration, we are able to migrate equivalent APIusages from Java to C# with up to 90.6% recall and 87.2% precision.