Eric D Pelletier, Sean D Jeffries, Kevin Song, Thomas M Hemmerling
{"title":"图像分析中机器学习模型性能的比较分析:数据集多样性和规模的影响。","authors":"Eric D Pelletier, Sean D Jeffries, Kevin Song, Thomas M Hemmerling","doi":"10.1213/ANE.0000000000007088","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>This study presents an analysis of machine-learning model performance in image analysis, with a specific focus on videolaryngoscopy procedures. The research aimed to explore how dataset diversity and size affect the performance of machine-learning models, an issue vital to the advancement of clinical artificial intelligence tools.</p><p><strong>Methods: </strong>A total of 377 videolaryngoscopy videos from YouTube were used to create 6 varied datasets, each differing in patient diversity and image count. The study also incorporates data augmentation techniques to enhance these datasets further. Two machine-learning models, YOLOv5-Small and YOLOv8-Small, were trained and evaluated on metrics such as F1 score (a statistical measure that combines the precision and recall of the model into a single metric, reflecting its overall accuracy), precision, recall, mAP@50, and mAP@50-95.</p><p><strong>Results: </strong>The findings indicate a significant impact of dataset configuration on model performance, especially the balance between diversity and quantity. The Multi-25 × 10 dataset, featuring 25 images from 10 different patients, demonstrates superior performance, highlighting the value of a well-balanced dataset. The study also finds that the effects of data augmentation vary across different types of datasets.</p><p><strong>Conclusions: </strong>Overall, this study emphasizes the critical role of dataset structure in the performance of machine-learning models in medical image analysis. It underscores the necessity of striking an optimal balance between dataset size and diversity, thereby illuminating the complexities inherent in data-driven machine-learning development.</p>","PeriodicalId":7784,"journal":{"name":"Anesthesia and analgesia","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative Analysis of Machine-Learning Model Performance in Image Analysis: The Impact of Dataset Diversity and Size.\",\"authors\":\"Eric D Pelletier, Sean D Jeffries, Kevin Song, Thomas M Hemmerling\",\"doi\":\"10.1213/ANE.0000000000007088\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>This study presents an analysis of machine-learning model performance in image analysis, with a specific focus on videolaryngoscopy procedures. The research aimed to explore how dataset diversity and size affect the performance of machine-learning models, an issue vital to the advancement of clinical artificial intelligence tools.</p><p><strong>Methods: </strong>A total of 377 videolaryngoscopy videos from YouTube were used to create 6 varied datasets, each differing in patient diversity and image count. The study also incorporates data augmentation techniques to enhance these datasets further. Two machine-learning models, YOLOv5-Small and YOLOv8-Small, were trained and evaluated on metrics such as F1 score (a statistical measure that combines the precision and recall of the model into a single metric, reflecting its overall accuracy), precision, recall, mAP@50, and mAP@50-95.</p><p><strong>Results: </strong>The findings indicate a significant impact of dataset configuration on model performance, especially the balance between diversity and quantity. The Multi-25 × 10 dataset, featuring 25 images from 10 different patients, demonstrates superior performance, highlighting the value of a well-balanced dataset. The study also finds that the effects of data augmentation vary across different types of datasets.</p><p><strong>Conclusions: </strong>Overall, this study emphasizes the critical role of dataset structure in the performance of machine-learning models in medical image analysis. It underscores the necessity of striking an optimal balance between dataset size and diversity, thereby illuminating the complexities inherent in data-driven machine-learning development.</p>\",\"PeriodicalId\":7784,\"journal\":{\"name\":\"Anesthesia and analgesia\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2024-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Anesthesia and analgesia\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1213/ANE.0000000000007088\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ANESTHESIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anesthesia and analgesia","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1213/ANE.0000000000007088","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ANESTHESIOLOGY","Score":null,"Total":0}
Comparative Analysis of Machine-Learning Model Performance in Image Analysis: The Impact of Dataset Diversity and Size.
Background: This study presents an analysis of machine-learning model performance in image analysis, with a specific focus on videolaryngoscopy procedures. The research aimed to explore how dataset diversity and size affect the performance of machine-learning models, an issue vital to the advancement of clinical artificial intelligence tools.
Methods: A total of 377 videolaryngoscopy videos from YouTube were used to create 6 varied datasets, each differing in patient diversity and image count. The study also incorporates data augmentation techniques to enhance these datasets further. Two machine-learning models, YOLOv5-Small and YOLOv8-Small, were trained and evaluated on metrics such as F1 score (a statistical measure that combines the precision and recall of the model into a single metric, reflecting its overall accuracy), precision, recall, mAP@50, and mAP@50-95.
Results: The findings indicate a significant impact of dataset configuration on model performance, especially the balance between diversity and quantity. The Multi-25 × 10 dataset, featuring 25 images from 10 different patients, demonstrates superior performance, highlighting the value of a well-balanced dataset. The study also finds that the effects of data augmentation vary across different types of datasets.
Conclusions: Overall, this study emphasizes the critical role of dataset structure in the performance of machine-learning models in medical image analysis. It underscores the necessity of striking an optimal balance between dataset size and diversity, thereby illuminating the complexities inherent in data-driven machine-learning development.
期刊介绍:
Anesthesia & Analgesia exists for the benefit of patients under the care of health care professionals engaged in the disciplines broadly related to anesthesiology, perioperative medicine, critical care medicine, and pain medicine. The Journal furthers the care of these patients by reporting the fundamental advances in the science of these clinical disciplines and by documenting the clinical, laboratory, and administrative advances that guide therapy. Anesthesia & Analgesia seeks a balance between definitive clinical and management investigations and outstanding basic scientific reports. The Journal welcomes original manuscripts containing rigorous design and analysis, even if unusual in their approach.