RETAIN: Interactive Tool for Regression Testing Guided LLM Migration

Tanay Dixit, Daniel Lee, Sally Fang, Sai Sree Harsha, Anirudh Sureshan, Akash Maharaj, Yunyao Li
{"title":"RETAIN: Interactive Tool for Regression Testing Guided LLM Migration","authors":"Tanay Dixit, Daniel Lee, Sally Fang, Sai Sree Harsha, Anirudh Sureshan, Akash Maharaj, Yunyao Li","doi":"arxiv-2409.03928","DOIUrl":null,"url":null,"abstract":"Large Language Models (LLMs) are increasingly integrated into diverse\napplications. The rapid evolution of LLMs presents opportunities for developers\nto enhance applications continuously. However, this constant adaptation can\nalso lead to performance regressions during model migrations. While several\ninteractive tools have been proposed to streamline the complexity of prompt\nengineering, few address the specific requirements of regression testing for\nLLM Migrations. To bridge this gap, we introduce RETAIN (REgression Testing\nguided LLM migrAtIoN), a tool designed explicitly for regression testing in LLM\nMigrations. RETAIN comprises two key components: an interactive interface\ntailored to regression testing needs during LLM migrations, and an error\ndiscovery module that facilitates understanding of differences in model\nbehaviors. The error discovery module generates textual descriptions of various\nerrors or differences between model outputs, providing actionable insights for\nprompt refinement. Our automatic evaluation and empirical user studies\ndemonstrate that RETAIN, when compared to manual evaluation, enabled\nparticipants to identify twice as many errors, facilitated experimentation with\n75% more prompts, and achieves 12% higher metric scores in a given time frame.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"55 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.03928","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Large Language Models (LLMs) are increasingly integrated into diverse applications. The rapid evolution of LLMs presents opportunities for developers to enhance applications continuously. However, this constant adaptation can also lead to performance regressions during model migrations. While several interactive tools have been proposed to streamline the complexity of prompt engineering, few address the specific requirements of regression testing for LLM Migrations. To bridge this gap, we introduce RETAIN (REgression Testing guided LLM migrAtIoN), a tool designed explicitly for regression testing in LLM Migrations. RETAIN comprises two key components: an interactive interface tailored to regression testing needs during LLM migrations, and an error discovery module that facilitates understanding of differences in model behaviors. The error discovery module generates textual descriptions of various errors or differences between model outputs, providing actionable insights for prompt refinement. Our automatic evaluation and empirical user studies demonstrate that RETAIN, when compared to manual evaluation, enabled participants to identify twice as many errors, facilitated experimentation with 75% more prompts, and achieves 12% higher metric scores in a given time frame.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
RETAIN:引导 LLM 迁移的回归测试互动工具
大型语言模型(LLM)越来越多地被集成到各种应用中。LLM 的快速发展为开发人员提供了不断增强应用的机会。然而,这种不断的适应性也会导致模型迁移过程中的性能下降。虽然已经提出了一些交互式工具来简化复杂的提示工程,但很少有工具能满足 LLM 迁移回归测试的特定要求。为了弥合这一差距,我们引入了 RETAIN(REgression Testingguided LLM migrAtIoN),这是一款专门为 LLMM 迁移中的回归测试而设计的工具。RETAIN 由两个关键部分组成:在 LLM 迁移过程中满足回归测试需求的交互式界面,以及有助于理解模型行为差异的错误发现模块。错误发现模块生成各种错误或模型输出之间差异的文本描述,为及时改进提供可操作的见解。我们的自动评估和实证用户研究表明,与人工评估相比,RETAIN 使参与者识别的错误数量增加了一倍,促进实验的提示增加了 75%,在给定时间内获得的指标分数提高了 12%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference Retrieve, Annotate, Evaluate, Repeat: Leveraging Multimodal LLMs for Large-Scale Product Retrieval Evaluation Active Reconfigurable Intelligent Surface Empowered Synthetic Aperture Radar Imaging FLARE: Fusing Language Models and Collaborative Architectures for Recommender Enhancement Basket-Enhanced Heterogenous Hypergraph for Price-Sensitive Next Basket Recommendation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1