{"title":"Comparing the Perspectives of Generative AI, Mental Health Experts, and the General Public on Schizophrenia Recovery: Case Vignette Study","authors":"Zohar Elyoseph, Inbar Levkovich","doi":"10.2196/53043","DOIUrl":null,"url":null,"abstract":"Background: Background: The current paradigm in mental healthcare focuses on clinical recovery and symptom remission. This model’s efficacy is influenced by therapist trust in patient recovery potential and therapeutic relationship depth. Schizophrenia is a chronic illness with severe symptoms where the possibility of recovery is a matter of debate. As artificial intelligence (AI) becomes integrated into the healthcare field, it is important to examine its ability to assess recovery potential in major psychiatric disorders such as schizophrenia. Objective: Objectives: To evaluate the ability of Large Languets Models (LLMs) in comparison to mental health professionals to assess the prognosis of schizophrenia with and without treatments and the long term positive and negative outcomes. Methods: Methods: Vignettes were input to LLMs interfaces and assessed ten times by four AI platforms: ChatGPT-3.5, ChatGPT-4, Google Bard, and Claude. A total of 80 evaluations were collected and benchmarked against existing norms to analyze what mental health professionals (general practitioners, psychiatrists, clinical psychologists and mental health nurses) and the general public think about schizophrenia prognosis with and without treatment and the positive and negative long-term outcomes of schizophrenia interventions. Results: Results: Prognosis with professional help: ChatGPT-3.5 was notably pessimistic, whereas ChatGPT-4, Claude and BARD aligned with professional views but differed from the general public. All LLMs believed untreated schizophrenia would remain static or worsen without professional help. Long-term outcomes: ChatGPT-4 and Claude predicted more negative outcomes than BARD and ChatGPT-3.5. For positive outcomes, ChatGPT-3.5 and Claude were more negative than BARD and ChatGPT-4. Conclusions: Conclusions: The findings that three out of the four LLMs aligned closely with the predictions of mental health professionals when considering the 'with treatment' condition is a demonstration of the potential of this technology in providing professional clinical prognosis. The pessimistic assessment of ChatGPT 3.5 is a disturbing finding since it may reduce the motivation of patients to start or persist with treatment for schizophrenia. Overall, while LLMs hold promise in augmenting healthcare, their application necessitates rigorous validation and a harmonious blend with human expertise.","PeriodicalId":48616,"journal":{"name":"Jmir Mental Health","volume":"101 1","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jmir Mental Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/53043","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Background: The current paradigm in mental healthcare focuses on clinical recovery and symptom remission. This model’s efficacy is influenced by therapist trust in patient recovery potential and therapeutic relationship depth. Schizophrenia is a chronic illness with severe symptoms where the possibility of recovery is a matter of debate. As artificial intelligence (AI) becomes integrated into the healthcare field, it is important to examine its ability to assess recovery potential in major psychiatric disorders such as schizophrenia. Objective: Objectives: To evaluate the ability of Large Languets Models (LLMs) in comparison to mental health professionals to assess the prognosis of schizophrenia with and without treatments and the long term positive and negative outcomes. Methods: Methods: Vignettes were input to LLMs interfaces and assessed ten times by four AI platforms: ChatGPT-3.5, ChatGPT-4, Google Bard, and Claude. A total of 80 evaluations were collected and benchmarked against existing norms to analyze what mental health professionals (general practitioners, psychiatrists, clinical psychologists and mental health nurses) and the general public think about schizophrenia prognosis with and without treatment and the positive and negative long-term outcomes of schizophrenia interventions. Results: Results: Prognosis with professional help: ChatGPT-3.5 was notably pessimistic, whereas ChatGPT-4, Claude and BARD aligned with professional views but differed from the general public. All LLMs believed untreated schizophrenia would remain static or worsen without professional help. Long-term outcomes: ChatGPT-4 and Claude predicted more negative outcomes than BARD and ChatGPT-3.5. For positive outcomes, ChatGPT-3.5 and Claude were more negative than BARD and ChatGPT-4. Conclusions: Conclusions: The findings that three out of the four LLMs aligned closely with the predictions of mental health professionals when considering the 'with treatment' condition is a demonstration of the potential of this technology in providing professional clinical prognosis. The pessimistic assessment of ChatGPT 3.5 is a disturbing finding since it may reduce the motivation of patients to start or persist with treatment for schizophrenia. Overall, while LLMs hold promise in augmenting healthcare, their application necessitates rigorous validation and a harmonious blend with human expertise.
期刊介绍:
JMIR Mental Health (JMH, ISSN 2368-7959) is a PubMed-indexed, peer-reviewed sister journal of JMIR, the leading eHealth journal (Impact Factor 2016: 5.175).
JMIR Mental Health focusses on digital health and Internet interventions, technologies and electronic innovations (software and hardware) for mental health, addictions, online counselling and behaviour change. This includes formative evaluation and system descriptions, theoretical papers, review papers, viewpoint/vision papers, and rigorous evaluations.