Probing Numeracy and Logic of Language Models of Code

Razan Baltaji, Parth Thakkar
{"title":"Probing Numeracy and Logic of Language Models of Code","authors":"Razan Baltaji, Parth Thakkar","doi":"10.1109/InteNSE59150.2023.00006","DOIUrl":null,"url":null,"abstract":"Machine learning techniques have found a widespread use in the software engineering community. In particular, language models (LMs) trained on code form the backbone of a majority of these applications, spanning tasks such as code completion, summarization, refactoring, execution prediction, and test generation. These tasks require reasoning about both the syntax and semantics of code. Recent work has shown that language models learn to capture the syntactic properties of code, but it is unclear to what extent they can reason about the semantics of code. In this work, we explore the ability of 3 language models of code to reason about a specific kind of semantics: numerical and logical properties of code. We propose several probing tasks to test the numerical and logical reasoning abilities of these models. We find that the models we explore - CodeBERT, GraphCodeBERT and CodeGen do indeed learn many numerical and logical properties of code, such as finding maximum in a list of numbers, comparing numbers, evaluating boolean expressions and representing numbers. They do not perform as well on complex tasks such as evaluating arithmetic expressions and substituting variables in such expressions. Our results indicate that while these models hold promise, there is a lot of room for improvement of their numeric and logical reasoning abilities.","PeriodicalId":166762,"journal":{"name":"2023 IEEE/ACM International Workshop on Interpretability and Robustness in Neural Software Engineering (InteNSE)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM International Workshop on Interpretability and Robustness in Neural Software Engineering (InteNSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/InteNSE59150.2023.00006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Machine learning techniques have found a widespread use in the software engineering community. In particular, language models (LMs) trained on code form the backbone of a majority of these applications, spanning tasks such as code completion, summarization, refactoring, execution prediction, and test generation. These tasks require reasoning about both the syntax and semantics of code. Recent work has shown that language models learn to capture the syntactic properties of code, but it is unclear to what extent they can reason about the semantics of code. In this work, we explore the ability of 3 language models of code to reason about a specific kind of semantics: numerical and logical properties of code. We propose several probing tasks to test the numerical and logical reasoning abilities of these models. We find that the models we explore - CodeBERT, GraphCodeBERT and CodeGen do indeed learn many numerical and logical properties of code, such as finding maximum in a list of numbers, comparing numbers, evaluating boolean expressions and representing numbers. They do not perform as well on complex tasks such as evaluating arithmetic expressions and substituting variables in such expressions. Our results indicate that while these models hold promise, there is a lot of room for improvement of their numeric and logical reasoning abilities.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
探讨代码语言模型的算术性和逻辑性
机器学习技术在软件工程界得到了广泛的应用。特别是,在代码上训练的语言模型(LMs)构成了大多数这些应用程序的主干,涵盖了代码完成、总结、重构、执行预测和测试生成等任务。这些任务需要对代码的语法和语义进行推理。最近的研究表明,语言模型可以学习捕捉代码的语法属性,但还不清楚它们能在多大程度上推断代码的语义。在这项工作中,我们探索了代码的三种语言模型对特定语义的推理能力:代码的数值和逻辑属性。我们提出了几个探索性任务来测试这些模型的数值和逻辑推理能力。我们发现我们探索的模型——CodeBERT、GraphCodeBERT和CodeGen确实学习了代码的许多数值和逻辑属性,例如在数字列表中查找最大值、比较数字、计算布尔表达式和表示数字。它们在计算算术表达式和替换表达式中的变量等复杂任务上表现不佳。我们的结果表明,虽然这些模型有希望,但它们的数字和逻辑推理能力还有很大的改进空间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Probing Numeracy and Logic of Language Models of Code A Study of Variable-Role-based Feature Enrichment in Neural Models of Code Study of Distractors in Neural Models of Code
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1