Willow D. Pastard MS , Willow Pastard MS , Zane Sejdiu BS , Alexis Arza BS , James Cross MBA , Razmig Garabet BS , Anna Chacon MD , Ellen N. Pritchett MD
{"title":"A Comparative Analysis of Large Language Model Accuracy for Image-Based Hair Disease Identification in Diverse Skin Tones","authors":"Willow D. Pastard MS , Willow Pastard MS , Zane Sejdiu BS , Alexis Arza BS , James Cross MBA , Razmig Garabet BS , Anna Chacon MD , Ellen N. Pritchett MD","doi":"10.1016/j.jnma.2024.07.035","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><p>The rapid integration of artificial intelligence (AI) in dermatology shows promise for support of clinical practice and democratization of diagnosis access. Significant limitations and ethical concerns persist, however. Despite growing research into AI's effectiveness in identifying skin conditions, fewer studies have explored its ability to accurately diagnose hair disorders.</p></div><div><h3>Methods</h3><p>This study explores the capacity of the large language model (LLM) ChatGPT to correctly identify alopecia areata, androgenetic alopecia, traction alopecia, and central centrifugal cicatricial alopecia across a range of skin tones. Utilizing the Monk Skin Tone Scale, images of hair disorders were sorted into lighter (Monk Scale 1-5) and darker (Monk Scale 6-10) categories. Images were sourced from publicly accessible databases.</p></div><div><h3>Results</h3><p>Our analysis revealed significant differences in diagnosis rates. ChatGPT was more likely to correctly identify disease in lighter skin, notably for alopecia areata (p<.001) and androgenetic alopecia (p=.003). This trend was also seen in overall diagnosis rates (p<.001). Interestingly, the program repeatedly incorrectly identified 24.48% of all hair conditions in dark skin as traction alopecia. Additionally, while initially this study sought to explore ChatGPT's ability to diagnose common nail disorders across skin tones this could not be completed due to the insufficient availability of images depicting nail disorders in darker skin.</p></div><div><h3>Conclusion</h3><p>These findings highlight some of the limitations of LLMs in accurate diagnosis of diseases of the hair and nails. It emphasizes potential implications for the performance of artificial intelligence trained on dermatologic databases with limited representation.</p></div>","PeriodicalId":17369,"journal":{"name":"Journal of the National Medical Association","volume":"116 4","pages":"Page 426"},"PeriodicalIF":2.5000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the National Medical Association","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0027968424001160","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
The rapid integration of artificial intelligence (AI) in dermatology shows promise for support of clinical practice and democratization of diagnosis access. Significant limitations and ethical concerns persist, however. Despite growing research into AI's effectiveness in identifying skin conditions, fewer studies have explored its ability to accurately diagnose hair disorders.
Methods
This study explores the capacity of the large language model (LLM) ChatGPT to correctly identify alopecia areata, androgenetic alopecia, traction alopecia, and central centrifugal cicatricial alopecia across a range of skin tones. Utilizing the Monk Skin Tone Scale, images of hair disorders were sorted into lighter (Monk Scale 1-5) and darker (Monk Scale 6-10) categories. Images were sourced from publicly accessible databases.
Results
Our analysis revealed significant differences in diagnosis rates. ChatGPT was more likely to correctly identify disease in lighter skin, notably for alopecia areata (p<.001) and androgenetic alopecia (p=.003). This trend was also seen in overall diagnosis rates (p<.001). Interestingly, the program repeatedly incorrectly identified 24.48% of all hair conditions in dark skin as traction alopecia. Additionally, while initially this study sought to explore ChatGPT's ability to diagnose common nail disorders across skin tones this could not be completed due to the insufficient availability of images depicting nail disorders in darker skin.
Conclusion
These findings highlight some of the limitations of LLMs in accurate diagnosis of diseases of the hair and nails. It emphasizes potential implications for the performance of artificial intelligence trained on dermatologic databases with limited representation.
期刊介绍:
Journal of the National Medical Association, the official journal of the National Medical Association, is a peer-reviewed publication whose purpose is to address medical care disparities of persons of African descent.
The Journal of the National Medical Association is focused on specialized clinical research activities related to the health problems of African Americans and other minority groups. Special emphasis is placed on the application of medical science to improve the healthcare of underserved populations both in the United States and abroad. The Journal has the following objectives: (1) to expand the base of original peer-reviewed literature and the quality of that research on the topic of minority health; (2) to provide greater dissemination of this research; (3) to offer appropriate and timely recognition of the significant contributions of physicians who serve these populations; and (4) to promote engagement by member and non-member physicians in the overall goals and objectives of the National Medical Association.