B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible Tests

Mouxiang Chen, Zhongxin Liu, He Tao, Yusu Hong, David Lo, Xin Xia, Jianling Sun
{"title":"B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible Tests","authors":"Mouxiang Chen, Zhongxin Liu, He Tao, Yusu Hong, David Lo, Xin Xia, Jianling Sun","doi":"arxiv-2409.08692","DOIUrl":null,"url":null,"abstract":"Selecting the best code solution from multiple generated ones is an essential\ntask in code generation, which can be achieved by using some reliable\nvalidators (e.g., developer-written test cases) for assistance. Since reliable\ntest cases are not always available and can be expensive to build in practice,\nresearchers propose to automatically generate test cases to assess code\nsolutions. However, when both code solutions and test cases are plausible and\nnot reliable, selecting the best solution becomes challenging. Although some\nheuristic strategies have been proposed to tackle this problem, they lack a\nstrong theoretical guarantee and it is still an open question whether an\noptimal selection strategy exists. Our work contributes in two ways. First, we\nshow that within a Bayesian framework, the optimal selection strategy can be\ndefined based on the posterior probability of the observed passing states\nbetween solutions and tests. The problem of identifying the best solution is\nthen framed as an integer programming problem. Second, we propose an efficient\napproach for approximating this optimal (yet uncomputable) strategy, where the\napproximation error is bounded by the correctness of prior knowledge. We then\nincorporate effective prior knowledge to tailor code generation tasks. Both\ntheoretical and empirical studies confirm that existing heuristics are limited\nin selecting the best solutions with plausible test cases. Our proposed\napproximated optimal strategy B4 significantly surpasses existing heuristics in\nselecting code solutions generated by large language models (LLMs) with\nLLM-generated tests, achieving a relative performance improvement by up to 50%\nover the strongest heuristic and 246% over the random selection in the most\nchallenging scenarios. Our code is publicly available at\nhttps://github.com/ZJU-CTAG/B4.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08692","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Selecting the best code solution from multiple generated ones is an essential task in code generation, which can be achieved by using some reliable validators (e.g., developer-written test cases) for assistance. Since reliable test cases are not always available and can be expensive to build in practice, researchers propose to automatically generate test cases to assess code solutions. However, when both code solutions and test cases are plausible and not reliable, selecting the best solution becomes challenging. Although some heuristic strategies have been proposed to tackle this problem, they lack a strong theoretical guarantee and it is still an open question whether an optimal selection strategy exists. Our work contributes in two ways. First, we show that within a Bayesian framework, the optimal selection strategy can be defined based on the posterior probability of the observed passing states between solutions and tests. The problem of identifying the best solution is then framed as an integer programming problem. Second, we propose an efficient approach for approximating this optimal (yet uncomputable) strategy, where the approximation error is bounded by the correctness of prior knowledge. We then incorporate effective prior knowledge to tailor code generation tasks. Both theoretical and empirical studies confirm that existing heuristics are limited in selecting the best solutions with plausible test cases. Our proposed approximated optimal strategy B4 significantly surpasses existing heuristics in selecting code solutions generated by large language models (LLMs) with LLM-generated tests, achieving a relative performance improvement by up to 50% over the strongest heuristic and 246% over the random selection in the most challenging scenarios. Our code is publicly available at https://github.com/ZJU-CTAG/B4.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
B4:通过可信测试实现对可信代码解决方案的最佳评估
在代码生成过程中,从多个生成的代码方案中选择最佳代码方案是一项基本任务,这可以通过使用一些可靠的验证器(如开发人员编写的测试用例)来实现。由于可靠的测试用例并不总是可用,而且在实践中构建成本可能很高,因此研究人员建议自动生成测试用例来评估代码解决方案。然而,当代码解决方案和测试用例都似是而非且不可靠时,选择最佳解决方案就变得非常具有挑战性。虽然已经提出了一些启发式策略来解决这个问题,但它们缺乏有力的理论保证,而且是否存在最佳选择策略仍是一个未决问题。我们的工作在两个方面做出了贡献。首先,我们展示了在贝叶斯框架内,可以根据观察到的解决方案和测试之间的传递状态的后验概率来定义最佳选择策略。这样,确定最佳解决方案的问题就被归结为一个整数编程问题。其次,我们提出了近似这一最优(但不可计算)策略的有效方法,近似误差受先验知识正确性的限制。然后,我们结合有效的先验知识来定制代码生成任务。理论和实证研究都证实,现有的启发式方法在选择具有可信测试用例的最佳解决方案方面存在局限性。我们提出的近似最优策略 B4 在选择由大语言模型(LLM)生成的代码解决方案时大大超越了现有的启发式方法,在最具挑战性的场景中,比最强启发式方法的相对性能提高了 50%,比随机选择方法的性能提高了 246%。我们的代码可在https://github.com/ZJU-CTAG/B4。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Promise and Peril of Collaborative Code Generation Models: Balancing Effectiveness and Memorization Shannon Entropy is better Feature than Category and Sentiment in User Feedback Processing Motivations, Challenges, Best Practices, and Benefits for Bots and Conversational Agents in Software Engineering: A Multivocal Literature Review A Taxonomy of Self-Admitted Technical Debt in Deep Learning Systems Investigating team maturity in an agile automotive reorganization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1