Leveraging mutants for automatic prediction of metamorphic relations using machine learning

Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation Pub Date : 2019-08-27 DOI:10.1145/3340482.3342741

Aravind Nair, K. Meinke, Sigrid Eldh

{"title":"Leveraging mutants for automatic prediction of metamorphic relations using machine learning","authors":"Aravind Nair, K. Meinke, Sigrid Eldh","doi":"10.1145/3340482.3342741","DOIUrl":null,"url":null,"abstract":"An oracle is used in software testing to derive the verdict (pass/fail) for a test case. Lack of precise test oracles is one of the major problems in software testing which can hinder judgements about quality. Metamorphic testing is an emerging technique which solves both the oracle problem and the test case generation problem by testing special forms of software requirements known as metamorphic requirements. However, manually deriving the metamorphic requirements for a given program requires a high level of domain expertise, is labor intensive and error prone. As an alternative, we consider the problem of automatic detection of metamorphic requirements using machine learning (ML). For this problem we can apply graph kernels and support vector machines (SVM). A significant problem for any ML approach is to obtain a large labeled training set of data (in this case programs) that generalises well. The main contribution of this paper is a general method to generate large volumes of synthetic training data which can improve ML assisted detection of metamorphic requirements. For training data synthesis we adopt mutation testing techniques. This research is the first to explore the area of data augmentation techniques for ML-based analysis of software code. We also have the goal to enhance black-box testing using white-box methodologies. Our results show that the mutants incorporated into the source code corpus not only efficiently scale the dataset size, but they can also improve the accuracy of classification models.","PeriodicalId":254040,"journal":{"name":"Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation","volume":"272 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3340482.3342741","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

An oracle is used in software testing to derive the verdict (pass/fail) for a test case. Lack of precise test oracles is one of the major problems in software testing which can hinder judgements about quality. Metamorphic testing is an emerging technique which solves both the oracle problem and the test case generation problem by testing special forms of software requirements known as metamorphic requirements. However, manually deriving the metamorphic requirements for a given program requires a high level of domain expertise, is labor intensive and error prone. As an alternative, we consider the problem of automatic detection of metamorphic requirements using machine learning (ML). For this problem we can apply graph kernels and support vector machines (SVM). A significant problem for any ML approach is to obtain a large labeled training set of data (in this case programs) that generalises well. The main contribution of this paper is a general method to generate large volumes of synthetic training data which can improve ML assisted detection of metamorphic requirements. For training data synthesis we adopt mutation testing techniques. This research is the first to explore the area of data augmentation techniques for ML-based analysis of software code. We also have the goal to enhance black-box testing using white-box methodologies. Our results show that the mutants incorporated into the source code corpus not only efficiently scale the dataset size, but they can also improve the accuracy of classification models.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用机器学习利用突变体自动预测变形关系

oracle在软件测试中用于得出测试用例的结论(通过/失败)。缺乏精确的测试指示是软件测试中的主要问题之一，它会阻碍对质量的判断。变形测试是一种新兴的技术，它通过测试被称为变形需求的特殊形式的软件需求来解决oracle问题和测试用例生成问题。然而，手动地为给定的程序派生变形需求需要高水平的领域专业知识，是劳动密集型的并且容易出错。作为替代方案，我们考虑使用机器学习(ML)自动检测变形需求的问题。对于这个问题，我们可以应用图核和支持向量机(SVM)。对于任何机器学习方法来说，一个重要的问题是获得一个泛化良好的大型标记训练数据集(在本例中是程序)。本文的主要贡献是提供了一种生成大量综合训练数据的通用方法，可以提高机器学习辅助检测变质需求的能力。对于训练数据的合成，我们采用突变测试技术。这项研究首次探索了基于机器学习的软件代码分析的数据增强技术领域。我们也有使用白盒方法来增强黑盒测试的目标。我们的研究结果表明，加入到源代码语料库中的突变体不仅可以有效地扩展数据集的大小，而且还可以提高分类模型的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation

自引率

0.00%

发文量