利用训练成绩和任务特征提升医学图像注释的群体智慧。

IF 3.4 2区 心理学 Q1 PSYCHOLOGY, EXPERIMENTAL Cognitive Research-Principles and Implications Pub Date : 2024-05-20 DOI:10.1186/s41235-024-00558-6
Eeshan Hasan, Erik Duhaime, Jennifer S Trueblood
{"title":"利用训练成绩和任务特征提升医学图像注释的群体智慧。","authors":"Eeshan Hasan, Erik Duhaime, Jennifer S Trueblood","doi":"10.1186/s41235-024-00558-6","DOIUrl":null,"url":null,"abstract":"<p><p>A crucial bottleneck in medical artificial intelligence (AI) is high-quality labeled medical datasets. In this paper, we test a large variety of wisdom of the crowd algorithms to label medical images that were initially classified by individuals recruited through an app-based platform. Individuals classified skin lesions from the International Skin Lesion Challenge 2018 into 7 different categories. There was a large dispersion in the geographical location, experience, training, and performance of the recruited individuals. We tested several wisdom of the crowd algorithms of varying complexity from a simple unweighted average to more complex Bayesian models that account for individual patterns of errors. Using a switchboard analysis, we observe that the best-performing algorithms rely on selecting top performers, weighting decisions by training accuracy, and take into account the task environment. These algorithms far exceed expert performance. We conclude by discussing the implications of these approaches for the development of medical AI.</p>","PeriodicalId":46827,"journal":{"name":"Cognitive Research-Principles and Implications","volume":"9 1","pages":"31"},"PeriodicalIF":3.4000,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11102897/pdf/","citationCount":"0","resultStr":"{\"title\":\"Boosting wisdom of the crowd for medical image annotation using training performance and task features.\",\"authors\":\"Eeshan Hasan, Erik Duhaime, Jennifer S Trueblood\",\"doi\":\"10.1186/s41235-024-00558-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>A crucial bottleneck in medical artificial intelligence (AI) is high-quality labeled medical datasets. In this paper, we test a large variety of wisdom of the crowd algorithms to label medical images that were initially classified by individuals recruited through an app-based platform. Individuals classified skin lesions from the International Skin Lesion Challenge 2018 into 7 different categories. There was a large dispersion in the geographical location, experience, training, and performance of the recruited individuals. We tested several wisdom of the crowd algorithms of varying complexity from a simple unweighted average to more complex Bayesian models that account for individual patterns of errors. Using a switchboard analysis, we observe that the best-performing algorithms rely on selecting top performers, weighting decisions by training accuracy, and take into account the task environment. These algorithms far exceed expert performance. We conclude by discussing the implications of these approaches for the development of medical AI.</p>\",\"PeriodicalId\":46827,\"journal\":{\"name\":\"Cognitive Research-Principles and Implications\",\"volume\":\"9 1\",\"pages\":\"31\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11102897/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cognitive Research-Principles and Implications\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.1186/s41235-024-00558-6\",\"RegionNum\":2,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHOLOGY, EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Research-Principles and Implications","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1186/s41235-024-00558-6","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

摘要

医疗人工智能(AI)的一个关键瓶颈是高质量的标记医疗数据集。在本文中,我们测试了大量众智算法,以标注通过基于应用的平台招募的个人最初分类的医学图像。个人将 2018 年国际皮肤病变挑战赛中的皮肤病变分为 7 个不同的类别。被招募者的地理位置、经验、培训和表现存在很大差异。我们测试了几种复杂程度不同的群众智慧算法,从简单的非加权平均到考虑到个人错误模式的更复杂的贝叶斯模型。通过配电盘分析,我们发现表现最好的算法依赖于选择表现最出色的人,根据训练准确性对决策进行加权,并将任务环境考虑在内。这些算法远远超过了专家的表现。最后,我们将讨论这些方法对医学人工智能发展的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Boosting wisdom of the crowd for medical image annotation using training performance and task features.

A crucial bottleneck in medical artificial intelligence (AI) is high-quality labeled medical datasets. In this paper, we test a large variety of wisdom of the crowd algorithms to label medical images that were initially classified by individuals recruited through an app-based platform. Individuals classified skin lesions from the International Skin Lesion Challenge 2018 into 7 different categories. There was a large dispersion in the geographical location, experience, training, and performance of the recruited individuals. We tested several wisdom of the crowd algorithms of varying complexity from a simple unweighted average to more complex Bayesian models that account for individual patterns of errors. Using a switchboard analysis, we observe that the best-performing algorithms rely on selecting top performers, weighting decisions by training accuracy, and take into account the task environment. These algorithms far exceed expert performance. We conclude by discussing the implications of these approaches for the development of medical AI.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.80
自引率
7.30%
发文量
96
审稿时长
25 weeks
期刊最新文献
Fixation durations on familiar items are longer due to attenuation of exploration. Different facets of age perception in people with developmental prosopagnosia and "super-recognisers". Self-evaluations and the language of the beholder: objective performance and language solidarity predict L2 and L1 self-evaluations in bilingual adults. Correction: Distress reactions and susceptibility to misinformation for an analogue trauma event. Jack of all trades, master of one: domain-specific and domain-general contributions to perceptual expertise in visual comparison.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1