评估由大型语言模型驱动的聊天机器人在皮肤病理学中提供免疫组化建议的诊断性能。

IF 1.1 4区医学 Q3 DERMATOLOGY Journal of Cutaneous Pathology Pub Date : 2024-05-14 DOI:10.1111/cup.14631

Myles R. McCrary MD, PhD, Justine Galambus MD, Wei-Shen Chen MD, PhD

{"title":"评估由大型语言模型驱动的聊天机器人在皮肤病理学中提供免疫组化建议的诊断性能。","authors":"Myles R. McCrary MD, PhD, Justine Galambus MD, Wei-Shen Chen MD, PhD","doi":"10.1111/cup.14631","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Large language model (LLM)-powered chatbots such as ChatGPT have numerous applications. However, their effectiveness in dermatopathology has not been formally evaluated. Dermatopathological cases often require immunohistochemical workup. Here, we evaluate the performance of a chatbot in providing diagnostically useful information on immunohistochemistry relating to dermatological diseases.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>We queried a commonly used chatbot for the immunophenotypes of 51 cutaneous diseases, including a diverse variety of epidermal, adnexal, hematolymphoid, and soft tissue entities. We requested it to provide references for each diagnosis. All tests were repeated, compiled, quantified, and then compared with established literature standards.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Clustering analysis demonstrated that recommendations correlated with tumor type, suggesting chatbots can supply appropriate panels. However, a significant portion of recommendations were factually incorrect (13.9%). Citations were rarely clinically useful (24.5%). Many were confabulated (27.2%). Prompt responses for cutaneous adnexal lesions tended to be less accurate while literature references were less useful. Reference retrieval performance was associated with the number of PubMed entries per entity.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>This foundational study suggests that LLM-powered chatbots may be useful for generating immunohistochemical panels for dermatologic diagnoses. However, specific performance capabilities and biases must be considered. In addition, extreme caution is advised regarding the tendencies to fabricate material. Future models intentionally fine-tuned to augment diagnostic medicine may prove to be valuable.</p>\n </section>\n </div>","PeriodicalId":15407,"journal":{"name":"Journal of Cutaneous Pathology","volume":"51 9","pages":"689-695"},"PeriodicalIF":1.1000,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating the diagnostic performance of a large language model-powered chatbot for providing immunohistochemistry recommendations in dermatopathology\",\"authors\":\"Myles R. McCrary MD, PhD, Justine Galambus MD, Wei-Shen Chen MD, PhD\",\"doi\":\"10.1111/cup.14631\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>Large language model (LLM)-powered chatbots such as ChatGPT have numerous applications. However, their effectiveness in dermatopathology has not been formally evaluated. Dermatopathological cases often require immunohistochemical workup. Here, we evaluate the performance of a chatbot in providing diagnostically useful information on immunohistochemistry relating to dermatological diseases.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>We queried a commonly used chatbot for the immunophenotypes of 51 cutaneous diseases, including a diverse variety of epidermal, adnexal, hematolymphoid, and soft tissue entities. We requested it to provide references for each diagnosis. All tests were repeated, compiled, quantified, and then compared with established literature standards.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>Clustering analysis demonstrated that recommendations correlated with tumor type, suggesting chatbots can supply appropriate panels. However, a significant portion of recommendations were factually incorrect (13.9%). Citations were rarely clinically useful (24.5%). Many were confabulated (27.2%). Prompt responses for cutaneous adnexal lesions tended to be less accurate while literature references were less useful. Reference retrieval performance was associated with the number of PubMed entries per entity.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions</h3>\\n \\n <p>This foundational study suggests that LLM-powered chatbots may be useful for generating immunohistochemical panels for dermatologic diagnoses. However, specific performance capabilities and biases must be considered. In addition, extreme caution is advised regarding the tendencies to fabricate material. Future models intentionally fine-tuned to augment diagnostic medicine may prove to be valuable.</p>\\n </section>\\n </div>\",\"PeriodicalId\":15407,\"journal\":{\"name\":\"Journal of Cutaneous Pathology\",\"volume\":\"51 9\",\"pages\":\"689-695\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2024-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cutaneous Pathology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/cup.14631\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"DERMATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cutaneous Pathology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/cup.14631","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"DERMATOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

背景由大语言模型（LLM）驱动的聊天机器人（如 ChatGPT）应用广泛。然而，它们在皮肤病理学方面的有效性尚未得到正式评估。皮肤病理学病例通常需要免疫组化检查。在此，我们评估了聊天机器人在提供与皮肤病相关的免疫组化诊断有用信息方面的性能：方法：我们向一个常用聊天机器人询问了 51 种皮肤疾病的免疫表型，包括各种表皮、附件、血淋巴和软组织实体。我们要求它为每项诊断提供参考文献。我们对所有测试进行了重复、汇编、量化，然后与既定的文献标准进行比较：聚类分析表明，建议与肿瘤类型相关，这表明聊天机器人可以提供适当的面板。然而，相当一部分建议与事实不符（13.9%）。引用很少对临床有用（24.5%）。很多都是混淆的（27.2%）。对皮肤附件病变的提示回答往往不太准确，而文献参考则不太有用。参考文献的检索性能与每个实体在PubMed上的条目数量有关：这项基础性研究表明，LLM 驱动的聊天机器人可能有助于为皮肤病诊断生成免疫组化面板。但是，必须考虑具体的性能和偏差。此外，对于编造材料的倾向也要格外谨慎。未来有意进行微调以增强诊断医学的模型可能会被证明是有价值的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Evaluating the diagnostic performance of a large language model-powered chatbot for providing immunohistochemistry recommendations in dermatopathology

Background

Large language model (LLM)-powered chatbots such as ChatGPT have numerous applications. However, their effectiveness in dermatopathology has not been formally evaluated. Dermatopathological cases often require immunohistochemical workup. Here, we evaluate the performance of a chatbot in providing diagnostically useful information on immunohistochemistry relating to dermatological diseases.

Methods

We queried a commonly used chatbot for the immunophenotypes of 51 cutaneous diseases, including a diverse variety of epidermal, adnexal, hematolymphoid, and soft tissue entities. We requested it to provide references for each diagnosis. All tests were repeated, compiled, quantified, and then compared with established literature standards.

Results

Clustering analysis demonstrated that recommendations correlated with tumor type, suggesting chatbots can supply appropriate panels. However, a significant portion of recommendations were factually incorrect (13.9%). Citations were rarely clinically useful (24.5%). Many were confabulated (27.2%). Prompt responses for cutaneous adnexal lesions tended to be less accurate while literature references were less useful. Reference retrieval performance was associated with the number of PubMed entries per entity.

Conclusions

This foundational study suggests that LLM-powered chatbots may be useful for generating immunohistochemical panels for dermatologic diagnoses. However, specific performance capabilities and biases must be considered. In addition, extreme caution is advised regarding the tendencies to fabricate material. Future models intentionally fine-tuned to augment diagnostic medicine may prove to be valuable.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Cutaneous Pathology 医学-病理学

CiteScore

3.20

自引率

5.90%

发文量

174

审稿时长

3-8 weeks

期刊介绍： Journal of Cutaneous Pathology publishes manuscripts broadly relevant to diseases of the skin and mucosae, with the aims of advancing scientific knowledge regarding dermatopathology and enhancing the communication between clinical practitioners and research scientists. Original scientific manuscripts on diagnostic and experimental cutaneous pathology are especially desirable. Timely, pertinent review articles also will be given high priority. Manuscripts based on light, fluorescence, and electron microscopy, histochemistry, immunology, molecular biology, and genetics, as well as allied sciences, are all welcome, provided their principal focus is on cutaneous pathology. Publication time will be kept as short as possible, ensuring that articles will be quickly available to all interested in this speciality.