Crowdsourcing Skin Demarcations of Chronic Graft-Versus-Host Disease in Patient Photographs: Training Versus Performance Study.

Q3 Medicine JMIR dermatology Pub Date : 2023-12-26 DOI:10.2196/48589
Andrew J McNeil, Kelsey Parks, Xiaoqi Liu, Bohan Jiang, Joseph Coco, Kira McCool, Daniel Fabbri, Erik P Duhaime, Benoit M Dawant, Eric R Tkaczyk
{"title":"Crowdsourcing Skin Demarcations of Chronic Graft-Versus-Host Disease in Patient Photographs: Training Versus Performance Study.","authors":"Andrew J McNeil, Kelsey Parks, Xiaoqi Liu, Bohan Jiang, Joseph Coco, Kira McCool, Daniel Fabbri, Erik P Duhaime, Benoit M Dawant, Eric R Tkaczyk","doi":"10.2196/48589","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Chronic graft-versus-host disease (cGVHD) is a significant cause of long-term morbidity and mortality in patients after allogeneic hematopoietic cell transplantation. Skin is the most commonly affected organ, and visual assessment of cGVHD can have low reliability. Crowdsourcing data from nonexpert participants has been used for numerous medical applications, including image labeling and segmentation tasks.</p><p><strong>Objective: </strong>This study aimed to assess the ability of crowds of nonexpert raters-individuals without any prior training for identifying or marking cGHVD-to demarcate photos of cGVHD-affected skin. We also studied the effect of training and feedback on crowd performance.</p><p><strong>Methods: </strong>Using a Canfield Vectra H1 3D camera, 360 photographs of the skin of 36 patients with cGVHD were taken. Ground truth demarcations were provided in 3D by a trained expert and reviewed by a board-certified dermatologist. In total, 3000 2D images (projections from various angles) were created for crowd demarcation through the DiagnosUs mobile app. Raters were split into high and low feedback groups. The performances of 4 different crowds of nonexperts were analyzed, including 17 raters per image for the low and high feedback groups, 32-35 raters per image for the low feedback group, and the top 5 performers for each image from the low feedback group.</p><p><strong>Results: </strong>Across 8 demarcation competitions, 130 raters were recruited to the high feedback group and 161 to the low feedback group. This resulted in a total of 54,887 individual demarcations from the high feedback group and 78,967 from the low feedback group. The nonexpert crowds achieved good overall performance for segmenting cGVHD-affected skin with minimal training, achieving a median surface area error of less than 12% of skin pixels for all crowds in both the high and low feedback groups. The low feedback crowds performed slightly poorer than the high feedback crowd, even when a larger crowd was used. Tracking the 5 most reliable raters from the low feedback group for each image recovered a performance similar to that of the high feedback crowd. Higher variability between raters for a given image was not found to correlate with lower performance of the crowd consensus demarcation and cannot therefore be used as a measure of reliability. No significant learning was observed during the task as more photos and feedback were seen.</p><p><strong>Conclusions: </strong>Crowds of nonexpert raters can demarcate cGVHD images with good overall performance. Tracking the top 5 most reliable raters provided optimal results, obtaining the best performance with the lowest number of expert demarcations required for adequate training. However, the agreement amongst individual nonexperts does not help predict whether the crowd has provided an accurate result. Future work should explore the performance of crowdsourcing in standard clinical photos and further methods to estimate the reliability of consensus demarcations.</p>","PeriodicalId":73553,"journal":{"name":"JMIR dermatology","volume":"6 ","pages":"e48589"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10777279/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR dermatology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/48589","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Chronic graft-versus-host disease (cGVHD) is a significant cause of long-term morbidity and mortality in patients after allogeneic hematopoietic cell transplantation. Skin is the most commonly affected organ, and visual assessment of cGVHD can have low reliability. Crowdsourcing data from nonexpert participants has been used for numerous medical applications, including image labeling and segmentation tasks.

Objective: This study aimed to assess the ability of crowds of nonexpert raters-individuals without any prior training for identifying or marking cGHVD-to demarcate photos of cGVHD-affected skin. We also studied the effect of training and feedback on crowd performance.

Methods: Using a Canfield Vectra H1 3D camera, 360 photographs of the skin of 36 patients with cGVHD were taken. Ground truth demarcations were provided in 3D by a trained expert and reviewed by a board-certified dermatologist. In total, 3000 2D images (projections from various angles) were created for crowd demarcation through the DiagnosUs mobile app. Raters were split into high and low feedback groups. The performances of 4 different crowds of nonexperts were analyzed, including 17 raters per image for the low and high feedback groups, 32-35 raters per image for the low feedback group, and the top 5 performers for each image from the low feedback group.

Results: Across 8 demarcation competitions, 130 raters were recruited to the high feedback group and 161 to the low feedback group. This resulted in a total of 54,887 individual demarcations from the high feedback group and 78,967 from the low feedback group. The nonexpert crowds achieved good overall performance for segmenting cGVHD-affected skin with minimal training, achieving a median surface area error of less than 12% of skin pixels for all crowds in both the high and low feedback groups. The low feedback crowds performed slightly poorer than the high feedback crowd, even when a larger crowd was used. Tracking the 5 most reliable raters from the low feedback group for each image recovered a performance similar to that of the high feedback crowd. Higher variability between raters for a given image was not found to correlate with lower performance of the crowd consensus demarcation and cannot therefore be used as a measure of reliability. No significant learning was observed during the task as more photos and feedback were seen.

Conclusions: Crowds of nonexpert raters can demarcate cGVHD images with good overall performance. Tracking the top 5 most reliable raters provided optimal results, obtaining the best performance with the lowest number of expert demarcations required for adequate training. However, the agreement amongst individual nonexperts does not help predict whether the crowd has provided an accurate result. Future work should explore the performance of crowdsourcing in standard clinical photos and further methods to estimate the reliability of consensus demarcations.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
众包患者照片中慢性移植物抗宿主病的皮肤分界:培训与性能研究
背景:慢性移植物抗宿主疾病(cGVHD)是异基因造血细胞移植患者长期发病和死亡的重要原因。皮肤是最常受影响的器官,而对 cGVHD 的视觉评估可靠性较低。来自非专业参与者的众包数据已被用于许多医疗应用,包括图像标记和分割任务:本研究旨在评估非专业评定者人群(事先未接受过任何识别或标记 cGHVD 培训的人)对受 cGVHD 影响的皮肤照片进行分界的能力。我们还研究了培训和反馈对人群表现的影响:使用 Canfield Vectra H1 3D 相机拍摄了 36 名 cGVHD 患者的 360 张皮肤照片。由一名经过培训的专家提供三维真实分界,并由一名经过认证的皮肤科医生进行审核。通过 DiagnosUs 移动应用程序共创建了 3000 张 2D 图像(不同角度的投影),用于人群分界。评分者被分为高反馈组和低反馈组。对 4 个不同的非专业人群的表现进行了分析,包括低反馈组和高反馈组每张图像 17 名评分者,低反馈组每张图像 32-35 名评分者,以及低反馈组每张图像前 5 名评分者:在 8 次分界比赛中,高反馈组招募了 130 名评分员,低反馈组招募了 161 名评分员。结果,高反馈组共进行了 54,887 次单独分界,低反馈组共进行了 78,967 次单独分界。非专家人群在分割受 cGVHD 影响的皮肤方面取得了良好的整体性能,只需少量训练,高反馈组和低反馈组所有人群的皮肤像素表面积误差中位数均小于 12%。低反馈人群的表现略逊于高反馈人群,即使使用了更大的人群也是如此。对每张图像跟踪低反馈组中最可靠的 5 个评分者,其结果与高反馈组的结果相似。对于特定图像,评分者之间较高的变异性与较低的人群共识分界性能之间没有关联,因此不能用作可靠性的衡量标准。随着照片和反馈的增多,在任务过程中没有观察到明显的学习现象:结论:由非专业人员组成的群众评定员可以对 cGVHD 图像进行分界,且整体表现良好。跟踪前 5 位最可靠的评定者可获得最佳结果,在充分训练所需的最低专家分界数量下获得最佳性能。不过,非专家个人之间的一致意见无助于预测人群是否提供了准确的结果。未来的工作应探索众包在标准临床照片中的表现,并进一步探索估算共识分界可靠性的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
1.20
自引率
0.00%
发文量
0
审稿时长
18 weeks
期刊最新文献
The Prevalence of Dermoscopy Use Among Dermatology Residents in Riyadh, Saudi Arabia: Cross-Sectional Study. Patterns of Public Interest in Lipomas and Lipoma-Removal Procedures: Google Trends Analysis. The Comparative Sufficiency of ChatGPT, Google Bard, and Bing AI in Answering Diagnosis, Treatment, and Prognosis Questions About Common Dermatological Diagnoses. Modern Digital Query Analytics of Patient Education Materials on Acanthosis Nigricans: Systematic Search and Content Analysis. The Depth Estimation and Visualization of Dermatological Lesions: Development and Usability Study.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1