利用统计测试比较机器学习算法

Kurdistan Journal of Applied Research Pub Date : 2021-07-07 DOI:10.24017/SCIENCE.2021.1.8

H. Hamarashid

{"title":"利用统计测试比较机器学习算法","authors":"H. Hamarashid","doi":"10.24017/SCIENCE.2021.1.8","DOIUrl":null,"url":null,"abstract":"The mean result of machine learning models is determined by utilizing k-fold cross-validation. The algorithm with the best average performance should surpass those with the poorest. But what if the difference in average outcomes is the consequence of a statistical anomaly? To conduct whether or not the mean result differences between two algorithms is genuine then statistical hypothesis test is utilized. Using statistical hypothesis testing, this study will demonstrate how to compare machine learning algorithms. The output of several machine learning algorithms or simulation pipelines is compared during model selection. The model that performs the best based on your performance measure becomes the last model, which can be utilized to make predictions on new data. With classification and regression prediction models it can be conducted by utilizing traditional machine learning and deep learning methods. The difficulty is to identify whether or not the difference between two models is accurate.","PeriodicalId":17866,"journal":{"name":"Kurdistan Journal of Applied Research","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Utilizing Statistical Tests for Comparing Machine Learning Algorithms\",\"authors\":\"H. Hamarashid\",\"doi\":\"10.24017/SCIENCE.2021.1.8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The mean result of machine learning models is determined by utilizing k-fold cross-validation. The algorithm with the best average performance should surpass those with the poorest. But what if the difference in average outcomes is the consequence of a statistical anomaly? To conduct whether or not the mean result differences between two algorithms is genuine then statistical hypothesis test is utilized. Using statistical hypothesis testing, this study will demonstrate how to compare machine learning algorithms. The output of several machine learning algorithms or simulation pipelines is compared during model selection. The model that performs the best based on your performance measure becomes the last model, which can be utilized to make predictions on new data. With classification and regression prediction models it can be conducted by utilizing traditional machine learning and deep learning methods. The difficulty is to identify whether or not the difference between two models is accurate.\",\"PeriodicalId\":17866,\"journal\":{\"name\":\"Kurdistan Journal of Applied Research\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Kurdistan Journal of Applied Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.24017/SCIENCE.2021.1.8\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Kurdistan Journal of Applied Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24017/SCIENCE.2021.1.8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

机器学习模型的平均结果是利用k-fold交叉验证确定的。平均性能最好的算法应该超过平均性能最差的算法。但是，如果平均结果的差异是统计异常的结果呢?为了判断两种算法之间的平均结果差异是否真实，使用统计假设检验。使用统计假设检验，本研究将演示如何比较机器学习算法。在模型选择过程中比较几种机器学习算法或仿真管道的输出。根据您的性能度量，表现最好的模型将成为最后一个模型，该模型可用于对新数据进行预测。有了分类和回归预测模型，可以利用传统的机器学习和深度学习方法进行预测。困难在于确定两个模型之间的差异是否准确。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Utilizing Statistical Tests for Comparing Machine Learning Algorithms

The mean result of machine learning models is determined by utilizing k-fold cross-validation. The algorithm with the best average performance should surpass those with the poorest. But what if the difference in average outcomes is the consequence of a statistical anomaly? To conduct whether or not the mean result differences between two algorithms is genuine then statistical hypothesis test is utilized. Using statistical hypothesis testing, this study will demonstrate how to compare machine learning algorithms. The output of several machine learning algorithms or simulation pipelines is compared during model selection. The model that performs the best based on your performance measure becomes the last model, which can be utilized to make predictions on new data. With classification and regression prediction models it can be conducted by utilizing traditional machine learning and deep learning methods. The difficulty is to identify whether or not the difference between two models is accurate.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Kurdistan Journal of Applied Research

自引率

0.00%

发文量

审稿时长

12 weeks