Assessing bias in AI-driven psychiatric recommendations: A comparative cross-sectional study of chatbot-classified and CANMAT 2023 guideline for adjunctive therapy in difficult-to-treat depression
{"title":"Assessing bias in AI-driven psychiatric recommendations: A comparative cross-sectional study of chatbot-classified and CANMAT 2023 guideline for adjunctive therapy in difficult-to-treat depression","authors":"Yu Chang , Yi-Chun Liu , Si-Sheng Huang , Wen-Yu Hsu","doi":"10.1016/j.psychres.2025.116501","DOIUrl":null,"url":null,"abstract":"<div><div>The integration of chatbots into psychiatry introduces a novel approach to support clinical decision-making, but biases in their recommendations pose significant concerns. This study investigates potential biases in chatbot-generated recommendations for adjunctive therapy in difficult-to-treat depression, comparing these outputs with the Canadian Network for Mood and Anxiety Treatments (CANMAT) 2023 guidelines. The analysis involved calculating Cohen’s kappa coefficients to measure the overall level of agreement between chatbot-generated classifications and CANMAT guidelines. Differences between chatbot-generated and CANMAT classifications for each medication were assessed using the Wilcoxon signed-rank test. Results reveal substantial agreement for high-performing models, such as Google AI's Gemini 2.0 Flash, which achieved the highest Cohen’s kappa value of 0.82 (SE = 0.052). In contrast, OpenAI’s o1 model showed a lower agreement of 0.746 (SE = 0.057). Notable discrepancies were observed in the overestimation of medications such as quetiapine and lithium and the underestimation of modafinil and ketamine. Additionally, a distinct bias pattern was observed in OpenAI’s chatbots, which demonstrated a tendency to over-recommend lithium and bupropion. Our study highlights both the promise and the challenges of employing AI tools in psychiatric practice, and advocates for multi-model approaches to mitigate bias and improve clinical reliability.</div></div>","PeriodicalId":20819,"journal":{"name":"Psychiatry Research","volume":"348 ","pages":"Article 116501"},"PeriodicalIF":3.9000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychiatry Research","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0165178125001490","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/15 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}
引用次数: 0
Abstract
The integration of chatbots into psychiatry introduces a novel approach to support clinical decision-making, but biases in their recommendations pose significant concerns. This study investigates potential biases in chatbot-generated recommendations for adjunctive therapy in difficult-to-treat depression, comparing these outputs with the Canadian Network for Mood and Anxiety Treatments (CANMAT) 2023 guidelines. The analysis involved calculating Cohen’s kappa coefficients to measure the overall level of agreement between chatbot-generated classifications and CANMAT guidelines. Differences between chatbot-generated and CANMAT classifications for each medication were assessed using the Wilcoxon signed-rank test. Results reveal substantial agreement for high-performing models, such as Google AI's Gemini 2.0 Flash, which achieved the highest Cohen’s kappa value of 0.82 (SE = 0.052). In contrast, OpenAI’s o1 model showed a lower agreement of 0.746 (SE = 0.057). Notable discrepancies were observed in the overestimation of medications such as quetiapine and lithium and the underestimation of modafinil and ketamine. Additionally, a distinct bias pattern was observed in OpenAI’s chatbots, which demonstrated a tendency to over-recommend lithium and bupropion. Our study highlights both the promise and the challenges of employing AI tools in psychiatric practice, and advocates for multi-model approaches to mitigate bias and improve clinical reliability.
期刊介绍:
Psychiatry Research offers swift publication of comprehensive research reports and reviews within the field of psychiatry.
The scope of the journal encompasses:
Biochemical, physiological, neuroanatomic, genetic, neurocognitive, and psychosocial determinants of psychiatric disorders.
Diagnostic assessments of psychiatric disorders.
Evaluations that pursue hypotheses about the cause or causes of psychiatric diseases.
Evaluations of pharmacologic and non-pharmacologic psychiatric treatments.
Basic neuroscience studies related to animal or neurochemical models for psychiatric disorders.
Methodological advances, such as instrumentation, clinical scales, and assays directly applicable to psychiatric research.