{"title":"精神病学大型语言模型的诊断准确性","authors":"","doi":"10.1016/j.ajp.2024.104168","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><p>Medical decision-making is crucial for effective treatment, especially in psychiatry where diagnosis often relies on subjective patient reports and a lack of high-specificity symptoms. Artificial intelligence (AI), particularly Large Language Models (LLMs) like GPT, has emerged as a promising tool to enhance diagnostic accuracy in psychiatry. This comparative study explores the diagnostic capabilities of several AI models, including Aya, GPT-3.5, GPT-4, GPT-3.5 clinical assistant (CA), Nemotron, and Nemotron CA, using clinical cases from the DSM-5.</p></div><div><h3>Methods</h3><p>We curated 20 clinical cases from the DSM-5 Clinical Cases book, covering a wide range of psychiatric diagnoses. Four advanced AI models (GPT-3.5 Turbo, GPT-4, Aya, Nemotron) were tested using prompts to elicit detailed diagnoses and reasoning. The models' performances were evaluated based on accuracy and quality of reasoning, with additional analysis using the Retrieval Augmented Generation (RAG) methodology for models accessing the DSM-5 text.</p></div><div><h3>Results</h3><p>The AI models showed varied diagnostic accuracy, with GPT-3.5 and GPT-4 performing notably better than Aya and Nemotron in terms of both accuracy and reasoning quality. While models struggled with specific disorders such as cyclothymic and disruptive mood dysregulation disorders, others excelled, particularly in diagnosing psychotic and bipolar disorders. Statistical analysis highlighted significant differences in accuracy and reasoning, emphasizing the superiority of the GPT models.</p></div><div><h3>Discussion</h3><p>The application of AI in psychiatry offers potential improvements in diagnostic accuracy. The superior performance of the GPT models can be attributed to their advanced natural language processing capabilities and extensive training on diverse text data, enabling more effective interpretation of psychiatric language. However, models like Aya and Nemotron showed limitations in reasoning, indicating a need for further refinement in their training and application.</p></div><div><h3>Conclusion</h3><p>AI holds significant promise for enhancing psychiatric diagnostics, with certain models demonstrating high potential in interpreting complex clinical descriptions accurately. Future research should focus on expanding the dataset and integrating multimodal data to further enhance the diagnostic capabilities of AI in psychiatry.</p></div>","PeriodicalId":8543,"journal":{"name":"Asian journal of psychiatry","volume":null,"pages":null},"PeriodicalIF":3.8000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Diagnostic accuracy of large language models in psychiatry\",\"authors\":\"\",\"doi\":\"10.1016/j.ajp.2024.104168\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Introduction</h3><p>Medical decision-making is crucial for effective treatment, especially in psychiatry where diagnosis often relies on subjective patient reports and a lack of high-specificity symptoms. Artificial intelligence (AI), particularly Large Language Models (LLMs) like GPT, has emerged as a promising tool to enhance diagnostic accuracy in psychiatry. This comparative study explores the diagnostic capabilities of several AI models, including Aya, GPT-3.5, GPT-4, GPT-3.5 clinical assistant (CA), Nemotron, and Nemotron CA, using clinical cases from the DSM-5.</p></div><div><h3>Methods</h3><p>We curated 20 clinical cases from the DSM-5 Clinical Cases book, covering a wide range of psychiatric diagnoses. Four advanced AI models (GPT-3.5 Turbo, GPT-4, Aya, Nemotron) were tested using prompts to elicit detailed diagnoses and reasoning. The models' performances were evaluated based on accuracy and quality of reasoning, with additional analysis using the Retrieval Augmented Generation (RAG) methodology for models accessing the DSM-5 text.</p></div><div><h3>Results</h3><p>The AI models showed varied diagnostic accuracy, with GPT-3.5 and GPT-4 performing notably better than Aya and Nemotron in terms of both accuracy and reasoning quality. While models struggled with specific disorders such as cyclothymic and disruptive mood dysregulation disorders, others excelled, particularly in diagnosing psychotic and bipolar disorders. Statistical analysis highlighted significant differences in accuracy and reasoning, emphasizing the superiority of the GPT models.</p></div><div><h3>Discussion</h3><p>The application of AI in psychiatry offers potential improvements in diagnostic accuracy. The superior performance of the GPT models can be attributed to their advanced natural language processing capabilities and extensive training on diverse text data, enabling more effective interpretation of psychiatric language. However, models like Aya and Nemotron showed limitations in reasoning, indicating a need for further refinement in their training and application.</p></div><div><h3>Conclusion</h3><p>AI holds significant promise for enhancing psychiatric diagnostics, with certain models demonstrating high potential in interpreting complex clinical descriptions accurately. Future research should focus on expanding the dataset and integrating multimodal data to further enhance the diagnostic capabilities of AI in psychiatry.</p></div>\",\"PeriodicalId\":8543,\"journal\":{\"name\":\"Asian journal of psychiatry\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2024-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Asian journal of psychiatry\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1876201824002612\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHIATRY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Asian journal of psychiatry","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1876201824002612","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}
Diagnostic accuracy of large language models in psychiatry
Introduction
Medical decision-making is crucial for effective treatment, especially in psychiatry where diagnosis often relies on subjective patient reports and a lack of high-specificity symptoms. Artificial intelligence (AI), particularly Large Language Models (LLMs) like GPT, has emerged as a promising tool to enhance diagnostic accuracy in psychiatry. This comparative study explores the diagnostic capabilities of several AI models, including Aya, GPT-3.5, GPT-4, GPT-3.5 clinical assistant (CA), Nemotron, and Nemotron CA, using clinical cases from the DSM-5.
Methods
We curated 20 clinical cases from the DSM-5 Clinical Cases book, covering a wide range of psychiatric diagnoses. Four advanced AI models (GPT-3.5 Turbo, GPT-4, Aya, Nemotron) were tested using prompts to elicit detailed diagnoses and reasoning. The models' performances were evaluated based on accuracy and quality of reasoning, with additional analysis using the Retrieval Augmented Generation (RAG) methodology for models accessing the DSM-5 text.
Results
The AI models showed varied diagnostic accuracy, with GPT-3.5 and GPT-4 performing notably better than Aya and Nemotron in terms of both accuracy and reasoning quality. While models struggled with specific disorders such as cyclothymic and disruptive mood dysregulation disorders, others excelled, particularly in diagnosing psychotic and bipolar disorders. Statistical analysis highlighted significant differences in accuracy and reasoning, emphasizing the superiority of the GPT models.
Discussion
The application of AI in psychiatry offers potential improvements in diagnostic accuracy. The superior performance of the GPT models can be attributed to their advanced natural language processing capabilities and extensive training on diverse text data, enabling more effective interpretation of psychiatric language. However, models like Aya and Nemotron showed limitations in reasoning, indicating a need for further refinement in their training and application.
Conclusion
AI holds significant promise for enhancing psychiatric diagnostics, with certain models demonstrating high potential in interpreting complex clinical descriptions accurately. Future research should focus on expanding the dataset and integrating multimodal data to further enhance the diagnostic capabilities of AI in psychiatry.
期刊介绍:
The Asian Journal of Psychiatry serves as a comprehensive resource for psychiatrists, mental health clinicians, neurologists, physicians, mental health students, and policymakers. Its goal is to facilitate the exchange of research findings and clinical practices between Asia and the global community. The journal focuses on psychiatric research relevant to Asia, covering preclinical, clinical, service system, and policy development topics. It also highlights the socio-cultural diversity of the region in relation to mental health.