Boosting wisdom of the crowd for medical image annotation using training performance and task features.

IF 3.1 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL Cognitive Research-Principles and Implications Pub Date : 2024-05-20 DOI:10.1186/s41235-024-00558-6

Eeshan Hasan, Erik Duhaime, Jennifer S Trueblood

{"title":"Boosting wisdom of the crowd for medical image annotation using training performance and task features.","authors":"Eeshan Hasan, Erik Duhaime, Jennifer S Trueblood","doi":"10.1186/s41235-024-00558-6","DOIUrl":null,"url":null,"abstract":"<p><p>A crucial bottleneck in medical artificial intelligence (AI) is high-quality labeled medical datasets. In this paper, we test a large variety of wisdom of the crowd algorithms to label medical images that were initially classified by individuals recruited through an app-based platform. Individuals classified skin lesions from the International Skin Lesion Challenge 2018 into 7 different categories. There was a large dispersion in the geographical location, experience, training, and performance of the recruited individuals. We tested several wisdom of the crowd algorithms of varying complexity from a simple unweighted average to more complex Bayesian models that account for individual patterns of errors. Using a switchboard analysis, we observe that the best-performing algorithms rely on selecting top performers, weighting decisions by training accuracy, and take into account the task environment. These algorithms far exceed expert performance. We conclude by discussing the implications of these approaches for the development of medical AI.</p>","PeriodicalId":46827,"journal":{"name":"Cognitive Research-Principles and Implications","volume":"9 1","pages":"31"},"PeriodicalIF":3.1000,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11102897/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Research-Principles and Implications","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1186/s41235-024-00558-6","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

A crucial bottleneck in medical artificial intelligence (AI) is high-quality labeled medical datasets. In this paper, we test a large variety of wisdom of the crowd algorithms to label medical images that were initially classified by individuals recruited through an app-based platform. Individuals classified skin lesions from the International Skin Lesion Challenge 2018 into 7 different categories. There was a large dispersion in the geographical location, experience, training, and performance of the recruited individuals. We tested several wisdom of the crowd algorithms of varying complexity from a simple unweighted average to more complex Bayesian models that account for individual patterns of errors. Using a switchboard analysis, we observe that the best-performing algorithms rely on selecting top performers, weighting decisions by training accuracy, and take into account the task environment. These algorithms far exceed expert performance. We conclude by discussing the implications of these approaches for the development of medical AI.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用训练成绩和任务特征提升医学图像注释的群体智慧。

医疗人工智能（AI）的一个关键瓶颈是高质量的标记医疗数据集。在本文中，我们测试了大量众智算法，以标注通过基于应用的平台招募的个人最初分类的医学图像。个人将 2018 年国际皮肤病变挑战赛中的皮肤病变分为 7 个不同的类别。被招募者的地理位置、经验、培训和表现存在很大差异。我们测试了几种复杂程度不同的群众智慧算法，从简单的非加权平均到考虑到个人错误模式的更复杂的贝叶斯模型。通过配电盘分析，我们发现表现最好的算法依赖于选择表现最出色的人，根据训练准确性对决策进行加权，并将任务环境考虑在内。这些算法远远超过了专家的表现。最后，我们将讨论这些方法对医学人工智能发展的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊