评估心理健康的诊断准确性和治疗效果：大型语言模型工具和心理健康专业人员的比较分析。

IF 2.6 Q1 PSYCHOLOGY, CLINICAL European Journal of Investigation in Health Psychology and Education Pub Date : 2025-01-18 DOI:10.3390/ejihpe15010009

Inbar Levkovich

{"title":"评估心理健康的诊断准确性和治疗效果：大型语言模型工具和心理健康专业人员的比较分析。","authors":"Inbar Levkovich","doi":"10.3390/ejihpe15010009","DOIUrl":null,"url":null,"abstract":"Large language models (LLMs) offer promising possibilities in mental health, yet their ability to assess disorders and recommend treatments remains underexplored. This quantitative cross-sectional study evaluated four LLMs (Gemini (Gemini 2.0 Flash Experimental), Claude (Claude 3.5 Sonnet), ChatGPT-3.5, and ChatGPT-4) using text vignettes representing conditions such as depression, suicidal ideation, early and chronic schizophrenia, social phobia, and PTSD. Each model's diagnostic accuracy, treatment recommendations, and predicted outcomes were compared with norms established by mental health professionals. Findings indicated that for certain conditions, including depression and PTSD, models like ChatGPT-4 achieved higher diagnostic accuracy compared to human professionals. However, in more complex cases, such as early schizophrenia, LLM performance varied, with ChatGPT-4 achieving only 55% accuracy, while other LLMs and professionals performed better. LLMs tended to suggest a broader range of proactive treatments, whereas professionals recommended more targeted psychiatric consultations and specific medications. In terms of outcome predictions, professionals were generally more optimistic regarding full recovery, especially with treatment, while LLMs predicted lower full recovery rates and higher partial recovery rates, particularly in untreated cases. While LLMs recommend a broader treatment range, their conservative recovery predictions, particularly for complex conditions, highlight the need for professional oversight. LLMs provide valuable support in diagnostics and treatment planning but cannot replace professional discretion.","PeriodicalId":30631,"journal":{"name":"European Journal of Investigation in Health Psychology and Education","volume":"15 1","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11765082/pdf/","citationCount":"0","resultStr":"{\"title\":\"Evaluating Diagnostic Accuracy and Treatment Efficacy in Mental Health: A Comparative Analysis of Large Language Model Tools and Mental Health Professionals.\",\"authors\":\"Inbar Levkovich\",\"doi\":\"10.3390/ejihpe15010009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large language models (LLMs) offer promising possibilities in mental health, yet their ability to assess disorders and recommend treatments remains underexplored. This quantitative cross-sectional study evaluated four LLMs (Gemini (Gemini 2.0 Flash Experimental), Claude (Claude 3.5 Sonnet), ChatGPT-3.5, and ChatGPT-4) using text vignettes representing conditions such as depression, suicidal ideation, early and chronic schizophrenia, social phobia, and PTSD. Each model's diagnostic accuracy, treatment recommendations, and predicted outcomes were compared with norms established by mental health professionals. Findings indicated that for certain conditions, including depression and PTSD, models like ChatGPT-4 achieved higher diagnostic accuracy compared to human professionals. However, in more complex cases, such as early schizophrenia, LLM performance varied, with ChatGPT-4 achieving only 55% accuracy, while other LLMs and professionals performed better. LLMs tended to suggest a broader range of proactive treatments, whereas professionals recommended more targeted psychiatric consultations and specific medications. In terms of outcome predictions, professionals were generally more optimistic regarding full recovery, especially with treatment, while LLMs predicted lower full recovery rates and higher partial recovery rates, particularly in untreated cases. While LLMs recommend a broader treatment range, their conservative recovery predictions, particularly for complex conditions, highlight the need for professional oversight. LLMs provide valuable support in diagnostics and treatment planning but cannot replace professional discretion.\",\"PeriodicalId\":30631,\"journal\":{\"name\":\"European Journal of Investigation in Health Psychology and Education\",\"volume\":\"15 1\",\"pages\":\"\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-01-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11765082/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Journal of Investigation in Health Psychology and Education\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/ejihpe15010009\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHOLOGY, CLINICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Investigation in Health Psychology and Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/ejihpe15010009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, CLINICAL","Score":null,"Total":0}

引用次数: 0

摘要

大型语言模型（LLMs）在心理健康方面提供了很有希望的可能性，但它们评估疾病和推荐治疗的能力仍未得到充分探索。本定量横断面研究评估了四位法学硕士（Gemini （Gemini 2.0 Flash Experimental）、Claude （Claude 3.5 Sonnet）、ChatGPT-3.5和ChatGPT-4），使用文字小插图代表抑郁、自杀意念、早期和慢性精神分裂症、社交恐惧症和创伤后应激障碍等疾病。每个模型的诊断准确性、治疗建议和预测结果都与心理健康专业人员建立的规范进行了比较。研究结果表明，对于某些疾病，包括抑郁症和创伤后应激障碍，与人类专业人员相比，ChatGPT-4等模型的诊断准确性更高。然而，在更复杂的情况下，如早期精神分裂症，法学硕士的表现各不相同，ChatGPT-4的准确率只有55%，而其他法学硕士和专业人士的表现更好。法学硕士倾向于建议更广泛的积极治疗，而专业人士则建议更有针对性的精神病学咨询和特定药物。就结果预测而言，专业人士普遍对完全康复更为乐观，特别是在治疗方面，而法学硕士预测完全康复率较低，部分康复率较高，特别是在未经治疗的病例中。虽然法学硕士建议更广泛的治疗范围，但他们保守的恢复预测，特别是对于复杂的情况，强调了专业监督的必要性。法学硕士在诊断和治疗计划方面提供了宝贵的支持，但不能取代专业的判断力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Evaluating Diagnostic Accuracy and Treatment Efficacy in Mental Health: A Comparative Analysis of Large Language Model Tools and Mental Health Professionals.

Large language models (LLMs) offer promising possibilities in mental health, yet their ability to assess disorders and recommend treatments remains underexplored. This quantitative cross-sectional study evaluated four LLMs (Gemini (Gemini 2.0 Flash Experimental), Claude (Claude 3.5 Sonnet), ChatGPT-3.5, and ChatGPT-4) using text vignettes representing conditions such as depression, suicidal ideation, early and chronic schizophrenia, social phobia, and PTSD. Each model's diagnostic accuracy, treatment recommendations, and predicted outcomes were compared with norms established by mental health professionals. Findings indicated that for certain conditions, including depression and PTSD, models like ChatGPT-4 achieved higher diagnostic accuracy compared to human professionals. However, in more complex cases, such as early schizophrenia, LLM performance varied, with ChatGPT-4 achieving only 55% accuracy, while other LLMs and professionals performed better. LLMs tended to suggest a broader range of proactive treatments, whereas professionals recommended more targeted psychiatric consultations and specific medications. In terms of outcome predictions, professionals were generally more optimistic regarding full recovery, especially with treatment, while LLMs predicted lower full recovery rates and higher partial recovery rates, particularly in untreated cases. While LLMs recommend a broader treatment range, their conservative recovery predictions, particularly for complex conditions, highlight the need for professional oversight. LLMs provide valuable support in diagnostics and treatment planning but cannot replace professional discretion.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊