On the use of statistical machine translation for suggesting variable names for decompiled code: The Pharo case

IF 1.7 3区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Journal of Computer Languages Pub Date : 2024-04-19 DOI:10.1016/j.cola.2024.101271
Juan Pablo Sandoval Alcocer , Harold Camacho-Jaimes , Geraldine Galindo-Gutierrez , Andrés Neyem , Alexandre Bergel , Stéphane Ducasse
{"title":"On the use of statistical machine translation for suggesting variable names for decompiled code: The Pharo case","authors":"Juan Pablo Sandoval Alcocer ,&nbsp;Harold Camacho-Jaimes ,&nbsp;Geraldine Galindo-Gutierrez ,&nbsp;Andrés Neyem ,&nbsp;Alexandre Bergel ,&nbsp;Stéphane Ducasse","doi":"10.1016/j.cola.2024.101271","DOIUrl":null,"url":null,"abstract":"<div><p>Adequately selecting variable names is a difficult activity for practitioners. In 2018, Jaffe et al. proposed the use of statistical machine translation (SMT) to suggest descriptive variable names for decompiled code. A large corpus of decompiled C code was used to train the SMT model. Our paper presents the results of a partial replication of Jaffe’s experiment. We apply the same technique and methodology to a dataset made of code written in the Pharo programming language. We selected Pharo since its syntax is simple – it fits on half of a postcard – and because the optimizations performed by the compiler are limited to method scope. Our results indicate that SMT may recover between 8.9% and 69.88% of the variable names depending on the training set. Our replication concludes that: (i) the accuracy depends on the code similarity between the training and testing sets; (ii) the simplicity of the Pharo syntax and the satisfactory decompiled code alignment have a positive impact on predicting variable names; and (iii) a relatively small code corpus is sufficient to train the SMT model, which shows the applicability of the approach to less popular programming languages. Additionally, to assess SMT’s potential in improving original variable names, ten Pharo developers reviewed 400 SMT name suggestions, with four reviews per variable. Only 15 suggestions (3.75%) were unanimously viewed as improvements, while 45 (11.25%) were perceived as improvements by at least two reviewers, highlighting SMT’s limitations in providing suitable alternatives.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"79 ","pages":"Article 101271"},"PeriodicalIF":1.7000,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Languages","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590118424000145","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Adequately selecting variable names is a difficult activity for practitioners. In 2018, Jaffe et al. proposed the use of statistical machine translation (SMT) to suggest descriptive variable names for decompiled code. A large corpus of decompiled C code was used to train the SMT model. Our paper presents the results of a partial replication of Jaffe’s experiment. We apply the same technique and methodology to a dataset made of code written in the Pharo programming language. We selected Pharo since its syntax is simple – it fits on half of a postcard – and because the optimizations performed by the compiler are limited to method scope. Our results indicate that SMT may recover between 8.9% and 69.88% of the variable names depending on the training set. Our replication concludes that: (i) the accuracy depends on the code similarity between the training and testing sets; (ii) the simplicity of the Pharo syntax and the satisfactory decompiled code alignment have a positive impact on predicting variable names; and (iii) a relatively small code corpus is sufficient to train the SMT model, which shows the applicability of the approach to less popular programming languages. Additionally, to assess SMT’s potential in improving original variable names, ten Pharo developers reviewed 400 SMT name suggestions, with four reviews per variable. Only 15 suggestions (3.75%) were unanimously viewed as improvements, while 45 (11.25%) were perceived as improvements by at least two reviewers, highlighting SMT’s limitations in providing suitable alternatives.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
关于使用统计机器翻译为反编译代码建议变量名:Pharo 案例
对于从业者来说,适当选择变量名是一项困难的活动。2018 年,Jaffe 等人提出使用统计机器翻译(SMT)为反编译代码建议描述性变量名。他们使用了大量反编译 C 代码语料库来训练 SMT 模型。我们的论文介绍了部分复制 Jaffe 实验的结果。我们将相同的技术和方法应用于一个由 Pharo 编程语言编写的代码组成的数据集。我们选择 Pharo 是因为它的语法很简单--只需半张明信片就能写完,而且编译器的优化仅限于方法范围。我们的结果表明,根据训练集的不同,SMT 可以恢复 8.9% 到 69.88% 的变量名。我们的复制结论是(i) 准确性取决于训练集和测试集之间的代码相似性;(ii) Pharo 语法的简洁性和令人满意的反编译代码对齐方式对预测变量名有积极影响;(iii) 相对较小的代码语料足以训练 SMT 模型,这表明该方法适用于不太流行的编程语言。此外,为了评估 SMT 在改进原始变量名方面的潜力,十名 Pharo 开发人员审查了 400 个 SMT 名称建议,每个变量审查四次。只有 15 项建议(3.75%)被一致认为是改进,而 45 项建议(11.25%)被至少两名评审者认为是改进,这突出表明了 SMT 在提供合适替代方案方面的局限性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Computer Languages
Journal of Computer Languages Computer Science-Computer Networks and Communications
CiteScore
5.00
自引率
13.60%
发文量
36
期刊最新文献
Debugging in the Domain-Specific Modeling Languages for multi-agent systems GPotion: Embedding GPU programming in Elixir Near-Pruned single assignment transformation of programs MLAPW: A framework to assess the impact of feature selection and sampling techniques on anti-pattern prediction using WSDL metrics Editorial Board
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1