Diagnostic accuracy of large language models in psychiatry

IF 3.8 4区 医学 Q1 PSYCHIATRY Asian journal of psychiatry Pub Date : 2024-07-25 DOI:10.1016/j.ajp.2024.104168
{"title":"Diagnostic accuracy of large language models in psychiatry","authors":"","doi":"10.1016/j.ajp.2024.104168","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><p>Medical decision-making is crucial for effective treatment, especially in psychiatry where diagnosis often relies on subjective patient reports and a lack of high-specificity symptoms. Artificial intelligence (AI), particularly Large Language Models (LLMs) like GPT, has emerged as a promising tool to enhance diagnostic accuracy in psychiatry. This comparative study explores the diagnostic capabilities of several AI models, including Aya, GPT-3.5, GPT-4, GPT-3.5 clinical assistant (CA), Nemotron, and Nemotron CA, using clinical cases from the DSM-5.</p></div><div><h3>Methods</h3><p>We curated 20 clinical cases from the DSM-5 Clinical Cases book, covering a wide range of psychiatric diagnoses. Four advanced AI models (GPT-3.5 Turbo, GPT-4, Aya, Nemotron) were tested using prompts to elicit detailed diagnoses and reasoning. The models' performances were evaluated based on accuracy and quality of reasoning, with additional analysis using the Retrieval Augmented Generation (RAG) methodology for models accessing the DSM-5 text.</p></div><div><h3>Results</h3><p>The AI models showed varied diagnostic accuracy, with GPT-3.5 and GPT-4 performing notably better than Aya and Nemotron in terms of both accuracy and reasoning quality. While models struggled with specific disorders such as cyclothymic and disruptive mood dysregulation disorders, others excelled, particularly in diagnosing psychotic and bipolar disorders. Statistical analysis highlighted significant differences in accuracy and reasoning, emphasizing the superiority of the GPT models.</p></div><div><h3>Discussion</h3><p>The application of AI in psychiatry offers potential improvements in diagnostic accuracy. The superior performance of the GPT models can be attributed to their advanced natural language processing capabilities and extensive training on diverse text data, enabling more effective interpretation of psychiatric language. However, models like Aya and Nemotron showed limitations in reasoning, indicating a need for further refinement in their training and application.</p></div><div><h3>Conclusion</h3><p>AI holds significant promise for enhancing psychiatric diagnostics, with certain models demonstrating high potential in interpreting complex clinical descriptions accurately. Future research should focus on expanding the dataset and integrating multimodal data to further enhance the diagnostic capabilities of AI in psychiatry.</p></div>","PeriodicalId":8543,"journal":{"name":"Asian journal of psychiatry","volume":null,"pages":null},"PeriodicalIF":3.8000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Asian journal of psychiatry","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1876201824002612","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction

Medical decision-making is crucial for effective treatment, especially in psychiatry where diagnosis often relies on subjective patient reports and a lack of high-specificity symptoms. Artificial intelligence (AI), particularly Large Language Models (LLMs) like GPT, has emerged as a promising tool to enhance diagnostic accuracy in psychiatry. This comparative study explores the diagnostic capabilities of several AI models, including Aya, GPT-3.5, GPT-4, GPT-3.5 clinical assistant (CA), Nemotron, and Nemotron CA, using clinical cases from the DSM-5.

Methods

We curated 20 clinical cases from the DSM-5 Clinical Cases book, covering a wide range of psychiatric diagnoses. Four advanced AI models (GPT-3.5 Turbo, GPT-4, Aya, Nemotron) were tested using prompts to elicit detailed diagnoses and reasoning. The models' performances were evaluated based on accuracy and quality of reasoning, with additional analysis using the Retrieval Augmented Generation (RAG) methodology for models accessing the DSM-5 text.

Results

The AI models showed varied diagnostic accuracy, with GPT-3.5 and GPT-4 performing notably better than Aya and Nemotron in terms of both accuracy and reasoning quality. While models struggled with specific disorders such as cyclothymic and disruptive mood dysregulation disorders, others excelled, particularly in diagnosing psychotic and bipolar disorders. Statistical analysis highlighted significant differences in accuracy and reasoning, emphasizing the superiority of the GPT models.

Discussion

The application of AI in psychiatry offers potential improvements in diagnostic accuracy. The superior performance of the GPT models can be attributed to their advanced natural language processing capabilities and extensive training on diverse text data, enabling more effective interpretation of psychiatric language. However, models like Aya and Nemotron showed limitations in reasoning, indicating a need for further refinement in their training and application.

Conclusion

AI holds significant promise for enhancing psychiatric diagnostics, with certain models demonstrating high potential in interpreting complex clinical descriptions accurately. Future research should focus on expanding the dataset and integrating multimodal data to further enhance the diagnostic capabilities of AI in psychiatry.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
精神病学大型语言模型的诊断准确性
导言:医疗决策对于有效治疗至关重要,尤其是在精神病学领域,诊断往往依赖于患者的主观报告,缺乏高特异性症状。人工智能(AI),尤其是像 GPT 这样的大型语言模型(LLMs),已成为提高精神病学诊断准确性的一种有前途的工具。本比较研究利用 DSM-5 中的临床病例,探讨了 Aya、GPT-3.5、GPT-4、GPT-3.5 临床助手(CA)、Nemotron 和 Nemotron CA 等几种人工智能模型的诊断能力。我们使用提示对四种高级人工智能模型(GPT-3.5 Turbo、GPT-4、Aya、Nemotron)进行了测试,以引出详细的诊断和推理。根据推理的准确性和质量对模型的表现进行了评估,并使用检索增强生成(RAG)方法对访问 DSM-5 文本的模型进行了额外分析。结果人工智能模型显示出不同的诊断准确性,GPT-3.5 和 GPT-4 在准确性和推理质量方面的表现明显优于 Aya 和 Nemotron。虽然模型在特定疾病(如周期性和破坏性情绪失调症)的诊断上有困难,但其他模型却表现出色,尤其是在诊断精神病和双相情感障碍方面。统计分析凸显了在准确性和推理方面的显著差异,强调了 GPT 模型的优越性。GPT 模型的优越性能可归因于其先进的自然语言处理能力和对各种文本数据的广泛训练,从而能更有效地解释精神病学语言。然而,Aya 和 Nemotron 等模型在推理方面表现出了局限性,这表明需要进一步完善它们的训练和应用。未来的研究应侧重于扩大数据集和整合多模态数据,以进一步提高人工智能在精神病学中的诊断能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Asian journal of psychiatry
Asian journal of psychiatry Medicine-Psychiatry and Mental Health
CiteScore
12.70
自引率
5.30%
发文量
297
审稿时长
35 days
期刊介绍: The Asian Journal of Psychiatry serves as a comprehensive resource for psychiatrists, mental health clinicians, neurologists, physicians, mental health students, and policymakers. Its goal is to facilitate the exchange of research findings and clinical practices between Asia and the global community. The journal focuses on psychiatric research relevant to Asia, covering preclinical, clinical, service system, and policy development topics. It also highlights the socio-cultural diversity of the region in relation to mental health.
期刊最新文献
Examinations of the VR techniques for craving and the effectiveness of mindfulness-based practice on the changes in the HRV index 2024 FDA-approved psychotropic medications: A conspectus Perceived helplessness is the central symptom of mental distress in Chinese physicians: Network analysis and replication of stress, burnout, depression, and anxiety Exploring the regulatory framework of psychedelics in the US & Europe Abnormal resting-state functional connectivity of the right anterior cingulate cortex in chronic ketamine users and its correlation with cognitive impairments
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1