Exploring API Embedding for API Usages and Applications

Trong Duc Nguyen, A. Nguyen, H. Phan, T. Nguyen
{"title":"Exploring API Embedding for API Usages and Applications","authors":"Trong Duc Nguyen, A. Nguyen, H. Phan, T. Nguyen","doi":"10.1109/ICSE.2017.47","DOIUrl":null,"url":null,"abstract":"Word2Vec is a class of neural network models that as being trainedfrom a large corpus of texts, they can produce for each unique word acorresponding vector in a continuous space in which linguisticcontexts of words can be observed. In this work, we study thecharacteristics of Word2Vec vectors, called API2VEC or API embeddings, for the API elements within the API sequences in source code. Ourempirical study shows that the close proximity of the API2VEC vectorsfor API elements reflects the similar usage contexts containing thesurrounding APIs of those API elements. Moreover, API2VEC can captureseveral similar semantic relations between API elements in API usagesvia vector offsets. We demonstrate the usefulness of API2VEC vectorsfor API elements in three applications. First, we build a tool thatmines the pairs of API elements that share the same usage relationsamong them. The other applications are in the code migrationdomain. We develop API2API, a tool to automatically learn the APImappings between Java and C# using a characteristic of the API2VECvectors for API elements in the two languages: semantic relationsamong API elements in their usages are observed in the two vectorspaces for the two languages as similar geometric arrangements amongtheir API2VEC vectors. Our empirical evaluation shows that API2APIrelatively improves 22.6% and 40.1% top-1 and top-5 accuracy over astate-of-the-art mining approach for API mappings. Finally, as anotherapplication in code migration, we are able to migrate equivalent APIusages from Java to C# with up to 90.6% recall and 87.2% precision.","PeriodicalId":6505,"journal":{"name":"2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE)","volume":"36 1","pages":"438-449"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"145","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE.2017.47","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 145

Abstract

Word2Vec is a class of neural network models that as being trainedfrom a large corpus of texts, they can produce for each unique word acorresponding vector in a continuous space in which linguisticcontexts of words can be observed. In this work, we study thecharacteristics of Word2Vec vectors, called API2VEC or API embeddings, for the API elements within the API sequences in source code. Ourempirical study shows that the close proximity of the API2VEC vectorsfor API elements reflects the similar usage contexts containing thesurrounding APIs of those API elements. Moreover, API2VEC can captureseveral similar semantic relations between API elements in API usagesvia vector offsets. We demonstrate the usefulness of API2VEC vectorsfor API elements in three applications. First, we build a tool thatmines the pairs of API elements that share the same usage relationsamong them. The other applications are in the code migrationdomain. We develop API2API, a tool to automatically learn the APImappings between Java and C# using a characteristic of the API2VECvectors for API elements in the two languages: semantic relationsamong API elements in their usages are observed in the two vectorspaces for the two languages as similar geometric arrangements amongtheir API2VEC vectors. Our empirical evaluation shows that API2APIrelatively improves 22.6% and 40.1% top-1 and top-5 accuracy over astate-of-the-art mining approach for API mappings. Finally, as anotherapplication in code migration, we are able to migrate equivalent APIusages from Java to C# with up to 90.6% recall and 87.2% precision.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
探索API用法和应用的API嵌入
Word2Vec是一类从大量文本语料库中训练出来的神经网络模型,它们可以在一个连续的空间中为每个独特的单词产生相应的向量,在这个空间中可以观察到单词的语言上下文。在这项工作中,我们研究了Word2Vec向量的特征,称为API2VEC或API嵌入,用于源代码中API序列中的API元素。我们的实证研究表明,API元素的API2VEC向量的接近性反映了包含这些API元素周围API的相似使用上下文。此外,API2VEC可以通过向量偏移捕获API使用中API元素之间的几个类似的语义关系。我们在三个应用程序中演示了API2VEC向量对API元素的有用性。首先,我们构建一个工具来挖掘API元素对,它们之间共享相同的使用关系。其他应用程序位于代码迁移域中。我们开发了API2API,这是一个使用两种语言中API元素的api2vecvector特性自动学习Java和c#之间的API映射的工具:在两种语言的两个向量空间中观察到API元素在其使用中的语义关系,就像它们的API2VEC向量之间的相似几何排列一样。我们的经验评估表明,api2api相对于最先进的API映射挖掘方法提高了22.6%和40.1%的top-1和top-5准确率。最后,作为代码迁移中的另一个应用程序,我们能够以高达90.6%的召回率和87.2%的精度将等效的api用法从Java迁移到c#。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Adaptive Unpacking of Android Apps Symbolic Model Extraction for Web Application Verification On Cross-Stack Configuration Errors Syntactic and Semantic Differencing for Combinatorial Models of Test Designs Fuzzy Fine-Grained Code-History Analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1