LLM4VV:探索用于验证和核查测试套件的 LLM 即法官

Zachariah Sollenberger, Jay Patel, Christian Munley, Aaron Jarmusch, Sunita Chandrasekaran
{"title":"LLM4VV:探索用于验证和核查测试套件的 LLM 即法官","authors":"Zachariah Sollenberger, Jay Patel, Christian Munley, Aaron Jarmusch, Sunita Chandrasekaran","doi":"arxiv-2408.11729","DOIUrl":null,"url":null,"abstract":"Large Language Models (LLM) are evolving and have significantly\nrevolutionized the landscape of software development. If used well, they can\nsignificantly accelerate the software development cycle. At the same time, the\ncommunity is very cautious of the models being trained on biased or sensitive\ndata, which can lead to biased outputs along with the inadvertent release of\nconfidential information. Additionally, the carbon footprints and the\nun-explainability of these black box models continue to raise questions about\nthe usability of LLMs. With the abundance of opportunities LLMs have to offer, this paper explores\nthe idea of judging tests used to evaluate compiler implementations of\ndirective-based programming models as well as probe into the black box of LLMs.\nBased on our results, utilizing an agent-based prompting approach and setting\nup a validation pipeline structure drastically increased the quality of\nDeepSeek Coder, the LLM chosen for the evaluation purposes.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LLM4VV: Exploring LLM-as-a-Judge for Validation and Verification Testsuites\",\"authors\":\"Zachariah Sollenberger, Jay Patel, Christian Munley, Aaron Jarmusch, Sunita Chandrasekaran\",\"doi\":\"arxiv-2408.11729\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large Language Models (LLM) are evolving and have significantly\\nrevolutionized the landscape of software development. If used well, they can\\nsignificantly accelerate the software development cycle. At the same time, the\\ncommunity is very cautious of the models being trained on biased or sensitive\\ndata, which can lead to biased outputs along with the inadvertent release of\\nconfidential information. Additionally, the carbon footprints and the\\nun-explainability of these black box models continue to raise questions about\\nthe usability of LLMs. With the abundance of opportunities LLMs have to offer, this paper explores\\nthe idea of judging tests used to evaluate compiler implementations of\\ndirective-based programming models as well as probe into the black box of LLMs.\\nBased on our results, utilizing an agent-based prompting approach and setting\\nup a validation pipeline structure drastically increased the quality of\\nDeepSeek Coder, the LLM chosen for the evaluation purposes.\",\"PeriodicalId\":501197,\"journal\":{\"name\":\"arXiv - CS - Programming Languages\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Programming Languages\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.11729\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.11729","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

大型语言模型(LLM)不断发展,极大地改变了软件开发的格局。如果使用得当,它们可以显著加快软件开发周期。与此同时,业界也非常担心模型在有偏见或敏感数据的基础上进行训练,这可能会导致有偏见的输出以及无意中泄露机密信息。此外,这些黑盒子模型的碳足迹和无法解释性继续引发人们对 LLM 可用性的质疑。基于我们的研究结果,利用基于代理的提示方法和建立验证流水线结构大大提高了用于评估的 LLM--DeepSeek Coder 的质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
LLM4VV: Exploring LLM-as-a-Judge for Validation and Verification Testsuites
Large Language Models (LLM) are evolving and have significantly revolutionized the landscape of software development. If used well, they can significantly accelerate the software development cycle. At the same time, the community is very cautious of the models being trained on biased or sensitive data, which can lead to biased outputs along with the inadvertent release of confidential information. Additionally, the carbon footprints and the un-explainability of these black box models continue to raise questions about the usability of LLMs. With the abundance of opportunities LLMs have to offer, this paper explores the idea of judging tests used to evaluate compiler implementations of directive-based programming models as well as probe into the black box of LLMs. Based on our results, utilizing an agent-based prompting approach and setting up a validation pipeline structure drastically increased the quality of DeepSeek Coder, the LLM chosen for the evaluation purposes.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Repr Types: One Abstraction to Rule Them All $μλεδ$-Calculus: A Self Optimizing Language that Seems to Exhibit Paradoxical Transfinite Cognitive Capabilities Expressing and Analyzing Quantum Algorithms with Qualtran Conversational Concurrency The MLIR Transform Dialect. Your compiler is more powerful than you think
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1