Automatic Generation of Behavioral Test Cases For Natural Language Processing Using Clustering and Prompting

Ying Li, Rahul Singh, Tarun Joshi, Agus Sudjianto
{"title":"Automatic Generation of Behavioral Test Cases For Natural Language Processing Using Clustering and Prompting","authors":"Ying Li, Rahul Singh, Tarun Joshi, Agus Sudjianto","doi":"arxiv-2408.00161","DOIUrl":null,"url":null,"abstract":"Recent work in behavioral testing for natural language processing (NLP)\nmodels, such as Checklist, is inspired by related paradigms in software\nengineering testing. They allow evaluation of general linguistic capabilities\nand domain understanding, hence can help evaluate conceptual soundness and\nidentify model weaknesses. However, a major challenge is the creation of test\ncases. The current packages rely on semi-automated approach using manual\ndevelopment which requires domain expertise and can be time consuming. This\npaper introduces an automated approach to develop test cases by exploiting the\npower of large language models and statistical techniques. It clusters the text\nrepresentations to carefully construct meaningful groups and then apply\nprompting techniques to automatically generate Minimal Functionality Tests\n(MFT). The well-known Amazon Reviews corpus is used to demonstrate our\napproach. We analyze the behavioral test profiles across four different\nclassification algorithms and discuss the limitations and strengths of those\nmodels.","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"35 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Emerging Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.00161","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Recent work in behavioral testing for natural language processing (NLP) models, such as Checklist, is inspired by related paradigms in software engineering testing. They allow evaluation of general linguistic capabilities and domain understanding, hence can help evaluate conceptual soundness and identify model weaknesses. However, a major challenge is the creation of test cases. The current packages rely on semi-automated approach using manual development which requires domain expertise and can be time consuming. This paper introduces an automated approach to develop test cases by exploiting the power of large language models and statistical techniques. It clusters the text representations to carefully construct meaningful groups and then apply prompting techniques to automatically generate Minimal Functionality Tests (MFT). The well-known Amazon Reviews corpus is used to demonstrate our approach. We analyze the behavioral test profiles across four different classification algorithms and discuss the limitations and strengths of those models.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用聚类和提示自动生成自然语言处理行为测试用例
自然语言处理(NLP)模型的行为测试(如核对表)方面的最新研究受到了软件工程测试相关范例的启发。它们允许对一般语言能力和领域理解进行评估,因此有助于评估概念的合理性和识别模型的弱点。然而,创建测试用例是一项重大挑战。当前的软件包依赖于使用人工开发的半自动化方法,这需要领域专业知识,而且可能很耗时。本文介绍了一种利用大型语言模型和统计技术开发测试用例的自动化方法。它对文本表述进行聚类,仔细构建有意义的组,然后应用提示技术自动生成最小功能测试(MFT)。著名的亚马逊评论语料库被用来演示我们的方法。我们分析了四种不同分类算法的行为测试概况,并讨论了这些模型的局限性和优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Pennsieve - A Collaborative Platform for Translational Neuroscience and Beyond Analysing Attacks on Blockchain Systems in a Layer-based Approach Exploring Utility in a Real-World Warehouse Optimization Problem: Formulation Based on Quantun Annealers and Preliminary Results High Definition Map Mapping and Update: A General Overview and Future Directions Detection Made Easy: Potentials of Large Language Models for Solidity Vulnerabilities
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1