Predictive Mutation Analysis via the Natural Language Channel in Source Code

ACM Transactions on Software Engineering and Methodology (TOSEM) Pub Date : 2021-04-22 DOI:10.1145/3510417

Jinhan Kim, Juyoung Jeon, Shin Hong, S. Yoo

{"title":"Predictive Mutation Analysis via the Natural Language Channel in Source Code","authors":"Jinhan Kim, Juyoung Jeon, Shin Hong, S. Yoo","doi":"10.1145/3510417","DOIUrl":null,"url":null,"abstract":"Mutation analysis can provide valuable insights into both the system under test and its test suite. However, it is not scalable due to the cost of building and testing a large number of mutants. Predictive Mutation Testing (PMT) has been proposed to reduce the cost of mutation testing, but it can only provide statistical inference about whether a mutant will be killed or not by the entire test suite. We propose Seshat, a Predictive Mutation Analysis (PMA) technique that can accurately predict the entire kill matrix, not just the Mutation Score (MS) of the given test suite. Seshat exploits the natural language channel in code, and learns the relationship between the syntactic and semantic concepts of each test case and the mutants it can kill, from a given kill matrix. The learnt model can later be used to predict the kill matrices for subsequent versions of the program, even after both the source and test code have changed significantly. Empirical evaluation using the programs in Defects4J shows that Seshat can predict kill matrices with an average F-score of 0.83 for versions that are up to years apart. This is an improvement in F-score by 0.14 and 0.45 points over the state-of-the-art PMT technique and a simple coverage-based heuristic, respectively. Seshat also performs as well as PMT for the prediction of the MS only. When applied to a mutant-based fault localisation technique, the predicted kill matrix by Seshat is successfully used to locate faults within the top 10 position, showing its usefulness beyond prediction of MS. Once Seshat trains its model using a concrete mutation analysis, the subsequent predictions made by Seshat are on average 39 times faster than actual test-based analysis. We also show that Seshat can be successfully applied to automatically generated test cases with an experiment using EvoSuite.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"100 1","pages":"1 - 27"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology (TOSEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3510417","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Mutation analysis can provide valuable insights into both the system under test and its test suite. However, it is not scalable due to the cost of building and testing a large number of mutants. Predictive Mutation Testing (PMT) has been proposed to reduce the cost of mutation testing, but it can only provide statistical inference about whether a mutant will be killed or not by the entire test suite. We propose Seshat, a Predictive Mutation Analysis (PMA) technique that can accurately predict the entire kill matrix, not just the Mutation Score (MS) of the given test suite. Seshat exploits the natural language channel in code, and learns the relationship between the syntactic and semantic concepts of each test case and the mutants it can kill, from a given kill matrix. The learnt model can later be used to predict the kill matrices for subsequent versions of the program, even after both the source and test code have changed significantly. Empirical evaluation using the programs in Defects4J shows that Seshat can predict kill matrices with an average F-score of 0.83 for versions that are up to years apart. This is an improvement in F-score by 0.14 and 0.45 points over the state-of-the-art PMT technique and a simple coverage-based heuristic, respectively. Seshat also performs as well as PMT for the prediction of the MS only. When applied to a mutant-based fault localisation technique, the predicted kill matrix by Seshat is successfully used to locate faults within the top 10 position, showing its usefulness beyond prediction of MS. Once Seshat trains its model using a concrete mutation analysis, the subsequent predictions made by Seshat are on average 39 times faster than actual test-based analysis. We also show that Seshat can be successfully applied to automatically generated test cases with an experiment using EvoSuite.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过自然语言通道预测突变分析的源代码

突变分析可以为被测系统及其测试套件提供有价值的见解。然而，由于构建和测试大量突变体的成本，它无法扩展。预测突变检测(Predictive Mutation Testing, PMT)是为了降低突变检测的成本而提出的，但它只能提供关于突变体是否会被整个测试套件杀死的统计推断。我们提出了一种预测突变分析(PMA)技术Seshat，它可以准确地预测整个杀伤矩阵，而不仅仅是给定测试套件的突变分数(MS)。Seshat利用代码中的自然语言通道，并从给定的kill矩阵中学习每个测试用例的语法和语义概念与它可以杀死的突变体之间的关系。学习到的模型可以用来预测程序后续版本的终止矩阵，甚至在源代码和测试代码都发生了重大变化之后。使用缺陷4j中的程序进行的经验评估表明，Seshat可以预测间隔长达数年的版本的kill矩阵，平均f值为0.83。这比最先进的PMT技术和简单的基于覆盖率的启发式分别提高了0.14和0.45分。Seshat在预测多发性硬化症方面也表现得和PMT一样好。当应用于基于突变的故障定位技术时，Seshat预测的死亡矩阵成功地用于定位前10位的故障，这表明它比ms预测更有用。一旦Seshat使用具体的突变分析训练其模型，Seshat随后做出的预测平均比实际基于测试的分析快39倍。我们还展示了Seshat可以通过使用EvoSuite的实验成功地应用于自动生成的测试用例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Software Engineering and Methodology (TOSEM)

自引率

0.00%

发文量

期刊最新文献

Turnover of Companies in OpenStack: Prevalence and Rationale Super-optimization of Smart Contracts Verification of Programs Sensitive to Heap Layout Assessing and Improving an Evaluation Dataset for Detecting Semantic Code Clones via Deep Learning Guaranteeing Timed Opacity using Parametric Timed Model Checking