A Comparative Analysis of Large Language Model Accuracy for Image-Based Hair Disease Identification in Diverse Skin Tones

IF 2.5 4区医学 Q1 MEDICINE, GENERAL & INTERNAL Journal of the National Medical Association Pub Date : 2024-08-01 DOI:10.1016/j.jnma.2024.07.035

Willow D. Pastard MS , Willow Pastard MS , Zane Sejdiu BS , Alexis Arza BS , James Cross MBA , Razmig Garabet BS , Anna Chacon MD , Ellen N. Pritchett MD

{"title":"A Comparative Analysis of Large Language Model Accuracy for Image-Based Hair Disease Identification in Diverse Skin Tones","authors":"Willow D. Pastard MS , Willow Pastard MS , Zane Sejdiu BS , Alexis Arza BS , James Cross MBA , Razmig Garabet BS , Anna Chacon MD , Ellen N. Pritchett MD","doi":"10.1016/j.jnma.2024.07.035","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><p>The rapid integration of artificial intelligence (AI) in dermatology shows promise for support of clinical practice and democratization of diagnosis access. Significant limitations and ethical concerns persist, however. Despite growing research into AI's effectiveness in identifying skin conditions, fewer studies have explored its ability to accurately diagnose hair disorders.</p></div><div><h3>Methods</h3><p>This study explores the capacity of the large language model (LLM) ChatGPT to correctly identify alopecia areata, androgenetic alopecia, traction alopecia, and central centrifugal cicatricial alopecia across a range of skin tones. Utilizing the Monk Skin Tone Scale, images of hair disorders were sorted into lighter (Monk Scale 1-5) and darker (Monk Scale 6-10) categories. Images were sourced from publicly accessible databases.</p></div><div><h3>Results</h3><p>Our analysis revealed significant differences in diagnosis rates. ChatGPT was more likely to correctly identify disease in lighter skin, notably for alopecia areata (p<.001) and androgenetic alopecia (p=.003). This trend was also seen in overall diagnosis rates (p<.001). Interestingly, the program repeatedly incorrectly identified 24.48% of all hair conditions in dark skin as traction alopecia. Additionally, while initially this study sought to explore ChatGPT's ability to diagnose common nail disorders across skin tones this could not be completed due to the insufficient availability of images depicting nail disorders in darker skin.</p></div><div><h3>Conclusion</h3><p>These findings highlight some of the limitations of LLMs in accurate diagnosis of diseases of the hair and nails. It emphasizes potential implications for the performance of artificial intelligence trained on dermatologic databases with limited representation.</p></div>","PeriodicalId":17369,"journal":{"name":"Journal of the National Medical Association","volume":"116 4","pages":"Page 426"},"PeriodicalIF":2.5000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the National Medical Association","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0027968424001160","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

The rapid integration of artificial intelligence (AI) in dermatology shows promise for support of clinical practice and democratization of diagnosis access. Significant limitations and ethical concerns persist, however. Despite growing research into AI's effectiveness in identifying skin conditions, fewer studies have explored its ability to accurately diagnose hair disorders.

Methods

This study explores the capacity of the large language model (LLM) ChatGPT to correctly identify alopecia areata, androgenetic alopecia, traction alopecia, and central centrifugal cicatricial alopecia across a range of skin tones. Utilizing the Monk Skin Tone Scale, images of hair disorders were sorted into lighter (Monk Scale 1-5) and darker (Monk Scale 6-10) categories. Images were sourced from publicly accessible databases.

Results

Our analysis revealed significant differences in diagnosis rates. ChatGPT was more likely to correctly identify disease in lighter skin, notably for alopecia areata (p<.001) and androgenetic alopecia (p=.003). This trend was also seen in overall diagnosis rates (p<.001). Interestingly, the program repeatedly incorrectly identified 24.48% of all hair conditions in dark skin as traction alopecia. Additionally, while initially this study sought to explore ChatGPT's ability to diagnose common nail disorders across skin tones this could not be completed due to the insufficient availability of images depicting nail disorders in darker skin.

Conclusion

These findings highlight some of the limitations of LLMs in accurate diagnosis of diseases of the hair and nails. It emphasizes potential implications for the performance of artificial intelligence trained on dermatologic databases with limited representation.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于图像的不同肤色毛发疾病识别大语言模型准确性对比分析

目的人工智能（AI）在皮肤病学领域的快速应用为临床实践提供了支持，也为诊断的民主化带来了希望。然而，人工智能仍然存在很大的局限性和伦理问题。本研究探讨了大型语言模型（LLM）ChatGPT 在各种肤色中正确识别斑秃、雄激素性脱发、牵引性脱发和中枢性离心环状脱发的能力。利用蒙克肤色量表，将毛发疾病的图片分为浅色（蒙克量表 1-5）和深色（蒙克量表 6-10）两类。结果我们的分析表明诊断率存在显著差异。ChatGPT 更有可能正确识别浅色皮肤的疾病，尤其是斑秃（p<.001）和雄激素性脱发（p=.003）。这一趋势也体现在总体诊断率上（p< .001）。有趣的是，该程序多次将 24.48% 的深色皮肤毛发病症错误地识别为牵引性脱发。此外，虽然本研究最初试图探索 ChatGPT 诊断不同肤色常见指甲疾病的能力，但由于描述深色皮肤指甲疾病的图像不足而未能完成。它强调了在代表性有限的皮肤病数据库中训练人工智能的潜在意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of the National Medical Association 医学-医学：内科

CiteScore

4.80

自引率

3.00%

发文量

139

审稿时长

98 days

期刊介绍： Journal of the National Medical Association, the official journal of the National Medical Association, is a peer-reviewed publication whose purpose is to address medical care disparities of persons of African descent. The Journal of the National Medical Association is focused on specialized clinical research activities related to the health problems of African Americans and other minority groups. Special emphasis is placed on the application of medical science to improve the healthcare of underserved populations both in the United States and abroad. The Journal has the following objectives: (1) to expand the base of original peer-reviewed literature and the quality of that research on the topic of minority health; (2) to provide greater dissemination of this research; (3) to offer appropriate and timely recognition of the significant contributions of physicians who serve these populations; and (4) to promote engagement by member and non-member physicians in the overall goals and objectives of the National Medical Association.