机器学习的组合测试指标

2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW) Pub Date : 2021-04-01 DOI:10.1109/ICSTW52544.2021.00025

Erin Lanus, Laura J. Freeman, D. R. Kuhn, R. Kacker

{"title":"机器学习的组合测试指标","authors":"Erin Lanus, Laura J. Freeman, D. R. Kuhn, R. Kacker","doi":"10.1109/ICSTW52544.2021.00025","DOIUrl":null,"url":null,"abstract":"This paper defines a set difference metric for comparing machine learning (ML) datasets and proposes the difference between datasets be a function of combinatorial coverage. We illustrate its utility for evaluating and predicting performance of ML models. Identifying and measuring differences between datasets is of significant value for ML problems, where the accuracy of the model is heavily dependent on the degree to which training data are sufficiently representative of data encountered in application. The method is illustrated for transfer learning without retraining, the problem of predicting performance of a model trained on one dataset and applied to another.","PeriodicalId":371680,"journal":{"name":"2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Combinatorial Testing Metrics for Machine Learning\",\"authors\":\"Erin Lanus, Laura J. Freeman, D. R. Kuhn, R. Kacker\",\"doi\":\"10.1109/ICSTW52544.2021.00025\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper defines a set difference metric for comparing machine learning (ML) datasets and proposes the difference between datasets be a function of combinatorial coverage. We illustrate its utility for evaluating and predicting performance of ML models. Identifying and measuring differences between datasets is of significant value for ML problems, where the accuracy of the model is heavily dependent on the degree to which training data are sufficiently representative of data encountered in application. The method is illustrated for transfer learning without retraining, the problem of predicting performance of a model trained on one dataset and applied to another.\",\"PeriodicalId\":371680,\"journal\":{\"name\":\"2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSTW52544.2021.00025\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSTW52544.2021.00025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

本文定义了一个用于比较机器学习(ML)数据集的集差度量，并提出数据集之间的差异是组合覆盖的函数。我们说明了它在评估和预测机器学习模型性能方面的效用。识别和测量数据集之间的差异对于ML问题具有重要价值，其中模型的准确性严重依赖于训练数据在多大程度上充分代表应用中遇到的数据。该方法用于无需再训练的迁移学习，即在一个数据集上训练并应用于另一个数据集的模型的预测性能问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Combinatorial Testing Metrics for Machine Learning

This paper defines a set difference metric for comparing machine learning (ML) datasets and proposes the difference between datasets be a function of combinatorial coverage. We illustrate its utility for evaluating and predicting performance of ML models. Identifying and measuring differences between datasets is of significant value for ML problems, where the accuracy of the model is heavily dependent on the degree to which training data are sufficiently representative of data encountered in application. The method is illustrated for transfer learning without retraining, the problem of predicting performance of a model trained on one dataset and applied to another.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)

自引率

0.00%

发文量

期刊最新文献

Effectively Sampling Higher Order Mutants Using Causal Effect Syntax-Tree Similarity for Test-Case Derivability in Software Requirements Automatic Equivalent Mutants Classification Using Abstract Syntax Tree Neural Networks Online GANs for Automatic Performance Testing A Combinatorial Approach to Explaining Image Classifiers