{"title":"增强机器学习图像分类的再现性","authors":"G. Shao, H. Zhang, J. Shao, K. Woeste, Lina Tang","doi":"10.54364/aaiml.2022.1132","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) reproducibility needs to be informed with reliable evaluation measures. However, routine image classification is evaluated using metrics that are highly sensitive to class prevalence. Consequently, the reproducibility of ML models remains unclear due to class imbalance-induced noise. We suggest regularly using class imbalance-resistant evaluation metrics, including balanced accuracy, area under precision-recall curve, and image classification efficacy, for the evaluation of the reproducibility of ML models. Each of these evaluation metrics is conceptually consistent with and logically complements the others, and their joint use can help explain different aspects of classification performance at the whole-class level and individual class level. These metrics can be used for the validation, testing, and/or transfer of ML classifiers. Comprehensive analysis using these metrics as a routine approach strengthens the reproducibility of ML models.","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Strengthening Machine Learning Reproducibility for Image Classification\",\"authors\":\"G. Shao, H. Zhang, J. Shao, K. Woeste, Lina Tang\",\"doi\":\"10.54364/aaiml.2022.1132\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning (ML) reproducibility needs to be informed with reliable evaluation measures. However, routine image classification is evaluated using metrics that are highly sensitive to class prevalence. Consequently, the reproducibility of ML models remains unclear due to class imbalance-induced noise. We suggest regularly using class imbalance-resistant evaluation metrics, including balanced accuracy, area under precision-recall curve, and image classification efficacy, for the evaluation of the reproducibility of ML models. Each of these evaluation metrics is conceptually consistent with and logically complements the others, and their joint use can help explain different aspects of classification performance at the whole-class level and individual class level. These metrics can be used for the validation, testing, and/or transfer of ML classifiers. Comprehensive analysis using these metrics as a routine approach strengthens the reproducibility of ML models.\",\"PeriodicalId\":373878,\"journal\":{\"name\":\"Adv. Artif. Intell. Mach. Learn.\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Adv. Artif. Intell. Mach. Learn.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.54364/aaiml.2022.1132\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Adv. Artif. Intell. Mach. Learn.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54364/aaiml.2022.1132","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Strengthening Machine Learning Reproducibility for Image Classification
Machine learning (ML) reproducibility needs to be informed with reliable evaluation measures. However, routine image classification is evaluated using metrics that are highly sensitive to class prevalence. Consequently, the reproducibility of ML models remains unclear due to class imbalance-induced noise. We suggest regularly using class imbalance-resistant evaluation metrics, including balanced accuracy, area under precision-recall curve, and image classification efficacy, for the evaluation of the reproducibility of ML models. Each of these evaluation metrics is conceptually consistent with and logically complements the others, and their joint use can help explain different aspects of classification performance at the whole-class level and individual class level. These metrics can be used for the validation, testing, and/or transfer of ML classifiers. Comprehensive analysis using these metrics as a routine approach strengthens the reproducibility of ML models.