Assessing bias in AI-driven psychiatric recommendations: A comparative cross-sectional study of chatbot-classified and CANMAT 2023 guideline for adjunctive therapy in difficult-to-treat depression

IF 3.9 2区 医学 Q1 PSYCHIATRY Psychiatry Research Pub Date : 2025-06-01 Epub Date: 2025-04-15 DOI:10.1016/j.psychres.2025.116501
Yu Chang , Yi-Chun Liu , Si-Sheng Huang , Wen-Yu Hsu
{"title":"Assessing bias in AI-driven psychiatric recommendations: A comparative cross-sectional study of chatbot-classified and CANMAT 2023 guideline for adjunctive therapy in difficult-to-treat depression","authors":"Yu Chang ,&nbsp;Yi-Chun Liu ,&nbsp;Si-Sheng Huang ,&nbsp;Wen-Yu Hsu","doi":"10.1016/j.psychres.2025.116501","DOIUrl":null,"url":null,"abstract":"<div><div>The integration of chatbots into psychiatry introduces a novel approach to support clinical decision-making, but biases in their recommendations pose significant concerns. This study investigates potential biases in chatbot-generated recommendations for adjunctive therapy in difficult-to-treat depression, comparing these outputs with the Canadian Network for Mood and Anxiety Treatments (CANMAT) 2023 guidelines. The analysis involved calculating Cohen’s kappa coefficients to measure the overall level of agreement between chatbot-generated classifications and CANMAT guidelines. Differences between chatbot-generated and CANMAT classifications for each medication were assessed using the Wilcoxon signed-rank test. Results reveal substantial agreement for high-performing models, such as Google AI's Gemini 2.0 Flash, which achieved the highest Cohen’s kappa value of 0.82 (SE = 0.052). In contrast, OpenAI’s o1 model showed a lower agreement of 0.746 (SE = 0.057). Notable discrepancies were observed in the overestimation of medications such as quetiapine and lithium and the underestimation of modafinil and ketamine. Additionally, a distinct bias pattern was observed in OpenAI’s chatbots, which demonstrated a tendency to over-recommend lithium and bupropion. Our study highlights both the promise and the challenges of employing AI tools in psychiatric practice, and advocates for multi-model approaches to mitigate bias and improve clinical reliability.</div></div>","PeriodicalId":20819,"journal":{"name":"Psychiatry Research","volume":"348 ","pages":"Article 116501"},"PeriodicalIF":3.9000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychiatry Research","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0165178125001490","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/15 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}
引用次数: 0

Abstract

The integration of chatbots into psychiatry introduces a novel approach to support clinical decision-making, but biases in their recommendations pose significant concerns. This study investigates potential biases in chatbot-generated recommendations for adjunctive therapy in difficult-to-treat depression, comparing these outputs with the Canadian Network for Mood and Anxiety Treatments (CANMAT) 2023 guidelines. The analysis involved calculating Cohen’s kappa coefficients to measure the overall level of agreement between chatbot-generated classifications and CANMAT guidelines. Differences between chatbot-generated and CANMAT classifications for each medication were assessed using the Wilcoxon signed-rank test. Results reveal substantial agreement for high-performing models, such as Google AI's Gemini 2.0 Flash, which achieved the highest Cohen’s kappa value of 0.82 (SE = 0.052). In contrast, OpenAI’s o1 model showed a lower agreement of 0.746 (SE = 0.057). Notable discrepancies were observed in the overestimation of medications such as quetiapine and lithium and the underestimation of modafinil and ketamine. Additionally, a distinct bias pattern was observed in OpenAI’s chatbots, which demonstrated a tendency to over-recommend lithium and bupropion. Our study highlights both the promise and the challenges of employing AI tools in psychiatric practice, and advocates for multi-model approaches to mitigate bias and improve clinical reliability.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评估人工智能驱动的精神病学建议的偏倚:聊天机器人分类和CANMAT 2023指南在难治性抑郁症辅助治疗中的比较横断面研究
将聊天机器人整合到精神病学中引入了一种支持临床决策的新方法,但其建议中的偏见引起了重大关注。本研究调查了聊天机器人对难治性抑郁症辅助治疗建议的潜在偏差,并将这些结果与加拿大情绪和焦虑治疗网络(CANMAT) 2023指南进行了比较。分析包括计算科恩的卡帕系数,以衡量聊天机器人生成的分类与CANMAT指南之间的总体一致程度。使用Wilcoxon符号秩检验评估每种药物的聊天机器人生成和CANMAT分类之间的差异。结果显示,高性能模型的一致性很高,例如谷歌AI的Gemini 2.0 Flash,其科恩kappa值最高,为0.82 (SE = 0.052)。相比之下,OpenAI的01模型的一致性较低,为0.746 (SE = 0.057)。在喹硫平和锂等药物的高估和莫达非尼和氯胺酮的低估方面观察到显著差异。此外,在OpenAI的聊天机器人中观察到一个明显的偏见模式,它显示出过度推荐锂和安非他酮的倾向。我们的研究强调了在精神病学实践中使用人工智能工具的前景和挑战,并倡导采用多模型方法来减轻偏见和提高临床可靠性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Psychiatry Research
Psychiatry Research 医学-精神病学
CiteScore
17.40
自引率
1.80%
发文量
527
审稿时长
57 days
期刊介绍: Psychiatry Research offers swift publication of comprehensive research reports and reviews within the field of psychiatry. The scope of the journal encompasses: Biochemical, physiological, neuroanatomic, genetic, neurocognitive, and psychosocial determinants of psychiatric disorders. Diagnostic assessments of psychiatric disorders. Evaluations that pursue hypotheses about the cause or causes of psychiatric diseases. Evaluations of pharmacologic and non-pharmacologic psychiatric treatments. Basic neuroscience studies related to animal or neurochemical models for psychiatric disorders. Methodological advances, such as instrumentation, clinical scales, and assays directly applicable to psychiatric research.
期刊最新文献
Coping efficacy as a mediator between combat exposure events and probable PTSD The subjective trauma outlook as a screening tool for PTSD during wartime: extension to the PCL-5 short form Acute and long-term psychiatric consequences of synthetic cannabinoids and related novel psychoactive substances: A systematic review Comment on: “Levels of and changes in psychosis symptoms and clinical insight: Exploring the impact of differential antipsychotic mechanisms”—From acknowledging attrition to quantifying its potential bias Differentiating unipolar and bipolar depression using multi-task eye-movement features
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1