评估心理健康的诊断准确性和治疗效果:大型语言模型工具和心理健康专业人员的比较分析。

Inbar Levkovich
{"title":"评估心理健康的诊断准确性和治疗效果:大型语言模型工具和心理健康专业人员的比较分析。","authors":"Inbar Levkovich","doi":"10.3390/ejihpe15010009","DOIUrl":null,"url":null,"abstract":"<p><p>Large language models (LLMs) offer promising possibilities in mental health, yet their ability to assess disorders and recommend treatments remains underexplored. This quantitative cross-sectional study evaluated four LLMs (Gemini (Gemini 2.0 Flash Experimental), Claude (Claude 3.5 Sonnet), ChatGPT-3.5, and ChatGPT-4) using text vignettes representing conditions such as depression, suicidal ideation, early and chronic schizophrenia, social phobia, and PTSD. Each model's diagnostic accuracy, treatment recommendations, and predicted outcomes were compared with norms established by mental health professionals. Findings indicated that for certain conditions, including depression and PTSD, models like ChatGPT-4 achieved higher diagnostic accuracy compared to human professionals. However, in more complex cases, such as early schizophrenia, LLM performance varied, with ChatGPT-4 achieving only 55% accuracy, while other LLMs and professionals performed better. LLMs tended to suggest a broader range of proactive treatments, whereas professionals recommended more targeted psychiatric consultations and specific medications. In terms of outcome predictions, professionals were generally more optimistic regarding full recovery, especially with treatment, while LLMs predicted lower full recovery rates and higher partial recovery rates, particularly in untreated cases. While LLMs recommend a broader treatment range, their conservative recovery predictions, particularly for complex conditions, highlight the need for professional oversight. LLMs provide valuable support in diagnostics and treatment planning but cannot replace professional discretion.</p>","PeriodicalId":30631,"journal":{"name":"European Journal of Investigation in Health Psychology and Education","volume":"15 1","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11765082/pdf/","citationCount":"0","resultStr":"{\"title\":\"Evaluating Diagnostic Accuracy and Treatment Efficacy in Mental Health: A Comparative Analysis of Large Language Model Tools and Mental Health Professionals.\",\"authors\":\"Inbar Levkovich\",\"doi\":\"10.3390/ejihpe15010009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Large language models (LLMs) offer promising possibilities in mental health, yet their ability to assess disorders and recommend treatments remains underexplored. This quantitative cross-sectional study evaluated four LLMs (Gemini (Gemini 2.0 Flash Experimental), Claude (Claude 3.5 Sonnet), ChatGPT-3.5, and ChatGPT-4) using text vignettes representing conditions such as depression, suicidal ideation, early and chronic schizophrenia, social phobia, and PTSD. Each model's diagnostic accuracy, treatment recommendations, and predicted outcomes were compared with norms established by mental health professionals. Findings indicated that for certain conditions, including depression and PTSD, models like ChatGPT-4 achieved higher diagnostic accuracy compared to human professionals. However, in more complex cases, such as early schizophrenia, LLM performance varied, with ChatGPT-4 achieving only 55% accuracy, while other LLMs and professionals performed better. LLMs tended to suggest a broader range of proactive treatments, whereas professionals recommended more targeted psychiatric consultations and specific medications. In terms of outcome predictions, professionals were generally more optimistic regarding full recovery, especially with treatment, while LLMs predicted lower full recovery rates and higher partial recovery rates, particularly in untreated cases. While LLMs recommend a broader treatment range, their conservative recovery predictions, particularly for complex conditions, highlight the need for professional oversight. LLMs provide valuable support in diagnostics and treatment planning but cannot replace professional discretion.</p>\",\"PeriodicalId\":30631,\"journal\":{\"name\":\"European Journal of Investigation in Health Psychology and Education\",\"volume\":\"15 1\",\"pages\":\"\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-01-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11765082/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Journal of Investigation in Health Psychology and Education\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/ejihpe15010009\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHOLOGY, CLINICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Investigation in Health Psychology and Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/ejihpe15010009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, CLINICAL","Score":null,"Total":0}
引用次数: 0

摘要

大型语言模型(LLMs)在心理健康方面提供了很有希望的可能性,但它们评估疾病和推荐治疗的能力仍未得到充分探索。本定量横断面研究评估了四位法学硕士(Gemini (Gemini 2.0 Flash Experimental)、Claude (Claude 3.5 Sonnet)、ChatGPT-3.5和ChatGPT-4),使用文字小插图代表抑郁、自杀意念、早期和慢性精神分裂症、社交恐惧症和创伤后应激障碍等疾病。每个模型的诊断准确性、治疗建议和预测结果都与心理健康专业人员建立的规范进行了比较。研究结果表明,对于某些疾病,包括抑郁症和创伤后应激障碍,与人类专业人员相比,ChatGPT-4等模型的诊断准确性更高。然而,在更复杂的情况下,如早期精神分裂症,法学硕士的表现各不相同,ChatGPT-4的准确率只有55%,而其他法学硕士和专业人士的表现更好。法学硕士倾向于建议更广泛的积极治疗,而专业人士则建议更有针对性的精神病学咨询和特定药物。就结果预测而言,专业人士普遍对完全康复更为乐观,特别是在治疗方面,而法学硕士预测完全康复率较低,部分康复率较高,特别是在未经治疗的病例中。虽然法学硕士建议更广泛的治疗范围,但他们保守的恢复预测,特别是对于复杂的情况,强调了专业监督的必要性。法学硕士在诊断和治疗计划方面提供了宝贵的支持,但不能取代专业的判断力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Evaluating Diagnostic Accuracy and Treatment Efficacy in Mental Health: A Comparative Analysis of Large Language Model Tools and Mental Health Professionals.

Large language models (LLMs) offer promising possibilities in mental health, yet their ability to assess disorders and recommend treatments remains underexplored. This quantitative cross-sectional study evaluated four LLMs (Gemini (Gemini 2.0 Flash Experimental), Claude (Claude 3.5 Sonnet), ChatGPT-3.5, and ChatGPT-4) using text vignettes representing conditions such as depression, suicidal ideation, early and chronic schizophrenia, social phobia, and PTSD. Each model's diagnostic accuracy, treatment recommendations, and predicted outcomes were compared with norms established by mental health professionals. Findings indicated that for certain conditions, including depression and PTSD, models like ChatGPT-4 achieved higher diagnostic accuracy compared to human professionals. However, in more complex cases, such as early schizophrenia, LLM performance varied, with ChatGPT-4 achieving only 55% accuracy, while other LLMs and professionals performed better. LLMs tended to suggest a broader range of proactive treatments, whereas professionals recommended more targeted psychiatric consultations and specific medications. In terms of outcome predictions, professionals were generally more optimistic regarding full recovery, especially with treatment, while LLMs predicted lower full recovery rates and higher partial recovery rates, particularly in untreated cases. While LLMs recommend a broader treatment range, their conservative recovery predictions, particularly for complex conditions, highlight the need for professional oversight. LLMs provide valuable support in diagnostics and treatment planning but cannot replace professional discretion.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
4.40
自引率
12.50%
发文量
111
审稿时长
8 weeks
期刊最新文献
Co-Designing a Digital Coach-Supported Parenting Program for Internalising Problems in Autistic Children. Dysfunctional Cognition and Work-Related Outcomes: A Systematic Literature Review. Age-Related Patterns in Child-to-Parent Violence Across Adolescence and Emerging Adulthood. Identity Reconstruction as a Coping Mechanism in Addiction Recovery: A Pilot Stratified Randomized Controlled Trial of Narrative Therapy Group Intervention. Gen Z Youth in the Battleground: Can AI Interventions Mitigate Risky Gaming Behaviours and Mental Health Harm?
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1