Pytester: Deep reinforcement learning for text-to-testcase generation

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Journal of Systems and Software Pub Date : 2025-02-17 DOI:10.1016/j.jss.2025.112381

Wannita Takerngsaksiri , Rujikorn Charakorn , Chakkrit Tantithamthavorn , Yuan-Fang Li

{"title":"Pytester: Deep reinforcement learning for text-to-testcase generation","authors":"Wannita Takerngsaksiri , Rujikorn Charakorn , Chakkrit Tantithamthavorn , Yuan-Fang Li","doi":"10.1016/j.jss.2025.112381","DOIUrl":null,"url":null,"abstract":"<div><div>Test-driven development (TDD) is a widely-employed software development practice that mandates writing test cases based on a textual description <em>before</em> writing the actual code. While writing test cases is the centerpiece of TDD, it is time-consuming, expensive, and often shunned by developers. To address these issues associated with TDD, automated test case generation approaches have recently been investigated. Such approaches take source code as input, but not the textual description. Therefore, existing work does not fully support true TDD, as actual code is required to generate test cases. In addition, current deep learning-based test case generation approaches are trained with one learning objective, i.e., to generate test cases that are exactly matched with the ground-truth test cases. However, such approaches may limit the model’s ability to generate different yet correct test cases. In this paper, we introduce <span>PyTester</span>, a Text-to-Testcase generation approach that can automatically generate syntactically correct, executable, complete, and effective test cases while being aligned with a given textual description. We evaluate <span>PyTester</span> on the public APPS benchmark dataset, and the results show that our Deep RL approach enables <span>PyTester</span>, a small language model, to outperform much larger language models like GPT3.5, StarCoder, and InCoder. Our findings suggest that future research could consider improving small over large LMs for better resource efficiency by integrating the SE domain knowledge into the design of reinforcement learning architecture.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"224 ","pages":"Article 112381"},"PeriodicalIF":4.1000,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121225000494","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Test-driven development (TDD) is a widely-employed software development practice that mandates writing test cases based on a textual description before writing the actual code. While writing test cases is the centerpiece of TDD, it is time-consuming, expensive, and often shunned by developers. To address these issues associated with TDD, automated test case generation approaches have recently been investigated. Such approaches take source code as input, but not the textual description. Therefore, existing work does not fully support true TDD, as actual code is required to generate test cases. In addition, current deep learning-based test case generation approaches are trained with one learning objective, i.e., to generate test cases that are exactly matched with the ground-truth test cases. However, such approaches may limit the model’s ability to generate different yet correct test cases. In this paper, we introduce PyTester, a Text-to-Testcase generation approach that can automatically generate syntactically correct, executable, complete, and effective test cases while being aligned with a given textual description. We evaluate PyTester on the public APPS benchmark dataset, and the results show that our Deep RL approach enables PyTester, a small language model, to outperform much larger language models like GPT3.5, StarCoder, and InCoder. Our findings suggest that future research could consider improving small over large LMs for better resource efficiency by integrating the SE domain knowledge into the design of reinforcement learning architecture.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Pytester：用于文本到测试用例生成的深度强化学习

测试驱动开发（TDD）是一种广泛使用的软件开发实践，它要求在编写实际代码之前根据文本描述编写测试用例。虽然编写测试用例是TDD的核心，但它既耗时又昂贵，而且经常被开发人员所回避。为了解决这些与TDD相关的问题，最近研究了自动化测试用例生成方法。这些方法采用源代码作为输入，而不是文本描述。因此，现有的工作并不完全支持真正的TDD，因为需要实际的代码来生成测试用例。此外，当前基于深度学习的测试用例生成方法是用一个学习目标进行训练的，即生成与真实测试用例完全匹配的测试用例。然而，这样的方法可能会限制模型生成不同但正确的测试用例的能力。在本文中，我们将介绍PyTester，这是一种文本到测试用例的生成方法，它可以自动生成语法正确、可执行、完整和有效的测试用例，同时与给定的文本描述保持一致。我们在公共APPS基准数据集上对PyTester进行了评估，结果表明，我们的深度强化学习方法使PyTester（一个小型语言模型）的性能优于GPT3.5、StarCoder和InCoder等大型语言模型。我们的研究结果表明，未来的研究可以考虑通过将SE领域的知识集成到强化学习架构的设计中来改进小型而非大型LMs，以获得更好的资源效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Systems and Software 工程技术-计算机：理论方法

CiteScore

8.60

自引率

5.70%

发文量

193

审稿时长

16 weeks

期刊介绍： The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to: •Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution •Agile, model-driven, service-oriented, open source and global software development •Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems •Human factors and management concerns of software development •Data management and big data issues of software systems •Metrics and evaluation, data mining of software development resources •Business and economic aspects of software development processes The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.

期刊最新文献

LogGen: Integrating traditional model and LLM with code analysis for precise log generation Editorial Board Smart contract vulnerabilities, tools, and benchmarks: An updated systematic literature review Investigating the potential of using worked examples to help resolve issues in a GitHub project Reference architecture for autonomy and adaptivity in satellites