Accuracy of generative artificial intelligence models in differential diagnoses of familial Mediterranean fever and deficiency of Interleukin-1 receptor antagonist

IF 3.6 Q2 IMMUNOLOGY Journal of Translational Autoimmunity Pub Date : 2023-10-14 DOI:10.1016/j.jtauto.2023.100213

Joshua Pillai , Kathryn Pillai

{"title":"Accuracy of generative artificial intelligence models in differential diagnoses of familial Mediterranean fever and deficiency of Interleukin-1 receptor antagonist","authors":"Joshua Pillai , Kathryn Pillai","doi":"10.1016/j.jtauto.2023.100213","DOIUrl":null,"url":null,"abstract":"<div><p>With the increasing development of artificial intelligence, large language models (LLMs) have been utilized to solve problems in natural language processing tasks. More recently, LLMs have shown unique potential in numerous applications within medicine but have been particularly investigated for their ability in clinical reasoning. Although the diagnostic accuracy of LLMs in forming differential diagnoses has been reviewed in general internal medicine applications, much is unknown in autoinflammatory disorders. From the nature of autoinflammatory diseases, forming a differential diagnosis is challenging due to the overlapping symptoms between disorders and even more difficult without genetic screening. In this work, the diagnostic accuracy of the Generative Pre-Trained Transformer Model-4 (GPT-4), GPT-3.5, and Large Language Model Meta AI (LLaMa) were evaluated in clinical vignettes of Deficiency of Interleukin-1 Receptor Antagonist (DIRA) and Familial Mediterranean Fever (FMF). We then compared these models to a control group including one internal medicine physician. It was found that GPT-4 did not significantly differ in correctly identifying DIRA and FMF patients compared to the internist. However, the physician maintained a significantly higher accuracy than GPT-3.5 and LLaMa 2 for either disease. Overall, we explore and discuss the unique potential of LLMs in diagnostics for autoimmune diseases.</p></div>","PeriodicalId":36425,"journal":{"name":"Journal of Translational Autoimmunity","volume":"7 ","pages":"Article 100213"},"PeriodicalIF":3.6000,"publicationDate":"2023-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Translational Autoimmunity","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589909023000266","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"IMMUNOLOGY","Score":null,"Total":0}

引用次数: 1

Abstract

With the increasing development of artificial intelligence, large language models (LLMs) have been utilized to solve problems in natural language processing tasks. More recently, LLMs have shown unique potential in numerous applications within medicine but have been particularly investigated for their ability in clinical reasoning. Although the diagnostic accuracy of LLMs in forming differential diagnoses has been reviewed in general internal medicine applications, much is unknown in autoinflammatory disorders. From the nature of autoinflammatory diseases, forming a differential diagnosis is challenging due to the overlapping symptoms between disorders and even more difficult without genetic screening. In this work, the diagnostic accuracy of the Generative Pre-Trained Transformer Model-4 (GPT-4), GPT-3.5, and Large Language Model Meta AI (LLaMa) were evaluated in clinical vignettes of Deficiency of Interleukin-1 Receptor Antagonist (DIRA) and Familial Mediterranean Fever (FMF). We then compared these models to a control group including one internal medicine physician. It was found that GPT-4 did not significantly differ in correctly identifying DIRA and FMF patients compared to the internist. However, the physician maintained a significantly higher accuracy than GPT-3.5 and LLaMa 2 for either disease. Overall, we explore and discuss the unique potential of LLMs in diagnostics for autoimmune diseases.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

生殖人工智能模型在家族性地中海热和白细胞介素-1受体拮抗剂缺乏鉴别诊断中的准确性

随着人工智能的不断发展，大型语言模型(large language models, llm)已被用于解决自然语言处理任务中的问题。最近，法学硕士在医学领域的众多应用中显示出独特的潜力，但他们在临床推理方面的能力也受到了特别的研究。虽然LLMs在形成鉴别诊断中的诊断准确性已经在一般内科应用中得到了回顾，但在自身炎症性疾病中仍有很多未知。从自身炎症性疾病的本质来看，由于疾病之间的症状重叠，形成鉴别诊断是具有挑战性的，如果没有遗传筛查就更加困难。在这项工作中，我们评估了生成预训练变压器模型4 (GPT-4)、GPT-3.5和大型语言模型Meta AI (LLaMa)在白细胞介素-1受体拮抗剂(DIRA)缺乏症和家族性地中海热(FMF)的诊断准确性。然后，我们将这些模型与包括一名内科医生在内的对照组进行比较。与内科医生相比，GPT-4在正确识别DIRA和FMF患者方面没有显著差异。然而，对于任何一种疾病，医生都保持了比GPT-3.5和LLaMa 2更高的准确性。总之，我们探索和讨论llm在自身免疫性疾病诊断中的独特潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊