{"title":"Evaluating the diagnostic performance of a large language model-powered chatbot for providing immunohistochemistry recommendations in dermatopathology","authors":"Myles R. McCrary MD, PhD, Justine Galambus MD, Wei-Shen Chen MD, PhD","doi":"10.1111/cup.14631","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Large language model (LLM)-powered chatbots such as ChatGPT have numerous applications. However, their effectiveness in dermatopathology has not been formally evaluated. Dermatopathological cases often require immunohistochemical workup. Here, we evaluate the performance of a chatbot in providing diagnostically useful information on immunohistochemistry relating to dermatological diseases.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>We queried a commonly used chatbot for the immunophenotypes of 51 cutaneous diseases, including a diverse variety of epidermal, adnexal, hematolymphoid, and soft tissue entities. We requested it to provide references for each diagnosis. All tests were repeated, compiled, quantified, and then compared with established literature standards.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Clustering analysis demonstrated that recommendations correlated with tumor type, suggesting chatbots can supply appropriate panels. However, a significant portion of recommendations were factually incorrect (13.9%). Citations were rarely clinically useful (24.5%). Many were confabulated (27.2%). Prompt responses for cutaneous adnexal lesions tended to be less accurate while literature references were less useful. Reference retrieval performance was associated with the number of PubMed entries per entity.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>This foundational study suggests that LLM-powered chatbots may be useful for generating immunohistochemical panels for dermatologic diagnoses. However, specific performance capabilities and biases must be considered. In addition, extreme caution is advised regarding the tendencies to fabricate material. Future models intentionally fine-tuned to augment diagnostic medicine may prove to be valuable.</p>\n </section>\n </div>","PeriodicalId":15407,"journal":{"name":"Journal of Cutaneous Pathology","volume":null,"pages":null},"PeriodicalIF":1.6000,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cutaneous Pathology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/cup.14631","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"DERMATOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Large language model (LLM)-powered chatbots such as ChatGPT have numerous applications. However, their effectiveness in dermatopathology has not been formally evaluated. Dermatopathological cases often require immunohistochemical workup. Here, we evaluate the performance of a chatbot in providing diagnostically useful information on immunohistochemistry relating to dermatological diseases.
Methods
We queried a commonly used chatbot for the immunophenotypes of 51 cutaneous diseases, including a diverse variety of epidermal, adnexal, hematolymphoid, and soft tissue entities. We requested it to provide references for each diagnosis. All tests were repeated, compiled, quantified, and then compared with established literature standards.
Results
Clustering analysis demonstrated that recommendations correlated with tumor type, suggesting chatbots can supply appropriate panels. However, a significant portion of recommendations were factually incorrect (13.9%). Citations were rarely clinically useful (24.5%). Many were confabulated (27.2%). Prompt responses for cutaneous adnexal lesions tended to be less accurate while literature references were less useful. Reference retrieval performance was associated with the number of PubMed entries per entity.
Conclusions
This foundational study suggests that LLM-powered chatbots may be useful for generating immunohistochemical panels for dermatologic diagnoses. However, specific performance capabilities and biases must be considered. In addition, extreme caution is advised regarding the tendencies to fabricate material. Future models intentionally fine-tuned to augment diagnostic medicine may prove to be valuable.
期刊介绍:
Journal of Cutaneous Pathology publishes manuscripts broadly relevant to diseases of the skin and mucosae, with the aims of advancing scientific knowledge regarding dermatopathology and enhancing the communication between clinical practitioners and research scientists. Original scientific manuscripts on diagnostic and experimental cutaneous pathology are especially desirable. Timely, pertinent review articles also will be given high priority. Manuscripts based on light, fluorescence, and electron microscopy, histochemistry, immunology, molecular biology, and genetics, as well as allied sciences, are all welcome, provided their principal focus is on cutaneous pathology. Publication time will be kept as short as possible, ensuring that articles will be quickly available to all interested in this speciality.