利用大型语言模型评估基于人工智能的互动评估在抑郁症筛查中的功效

Zheng Jin, Dandan Bi, Jiaxing Hu, Kaibin Zhao
{"title":"利用大型语言模型评估基于人工智能的互动评估在抑郁症筛查中的功效","authors":"Zheng Jin, Dandan Bi, Jiaxing Hu, Kaibin Zhao","doi":"10.1101/2024.07.19.24310543","DOIUrl":null,"url":null,"abstract":"The evolution of language models, particularly the development of Large Language Models like ChatGPT, has opened new avenues for psychological assessment, potentially revolutionizing the rating scale methods that have been used for over a century. This study introduces a new Automated Assessment Paradigm (AAP), which aims to integrate natural language processing (NLP) techniques with traditional measurement methods. This integration enhances the accuracy and depth of mental health evaluations, while also addressing the acceptance and subjective experience of participants - areas that have not been extensively measured before. A pilot study was conducted with 32 participants, seven of whom were diagnosed with depression by licensed psychiatrists using the Clinical Interview Schedule-Revised (CIS-R). The participants completed the BDI-Fast Screen (BDI-FS) using a custom ChatGPT(GPTs) interface and the Chinese version of the PHQ-9 in a private setting. Following these assessments, participants also completed the Subjective Evaluation Scale. Spearman's correlation analysis showed a high correlation between the total scores of the PHQ-9 and the BSI-FS-GPTs. The agreement of diagnoses between the two measures, as measured by Cohen's kappa, was also significant. BSI-FS-GPTs diagnosis showed significantly higher agreement with the current diagnosis of depression. However, given the limited sample size of the pilot study, the AUC value of 1.00 and a sensitivity of 0.80 at a cutoff of 0.5, with zero false positive rate, likely overstate the classifier's performance. Bayesian factors suggest that participants may feel more comfortable expressing their true feelings and opinions through this method. For ongoing follow-up research, a total sample size of approximately 104 participants, including about 26 diagnosed individuals, may be required to ensure the analysis maintains a necessary power of 0.80 and an alpha level of 0.05. Nonetheless, these findings provide a promising foundation for the ongoing validation of the new AAP in larger-scale studies, aiming to confirm its validity and reliability.","PeriodicalId":501388,"journal":{"name":"medRxiv - Psychiatry and Clinical Psychology","volume":"53 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating the Efficacy of AI-Based Interactive Assessments Using Large Language Models for Depression Screening\",\"authors\":\"Zheng Jin, Dandan Bi, Jiaxing Hu, Kaibin Zhao\",\"doi\":\"10.1101/2024.07.19.24310543\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The evolution of language models, particularly the development of Large Language Models like ChatGPT, has opened new avenues for psychological assessment, potentially revolutionizing the rating scale methods that have been used for over a century. This study introduces a new Automated Assessment Paradigm (AAP), which aims to integrate natural language processing (NLP) techniques with traditional measurement methods. This integration enhances the accuracy and depth of mental health evaluations, while also addressing the acceptance and subjective experience of participants - areas that have not been extensively measured before. A pilot study was conducted with 32 participants, seven of whom were diagnosed with depression by licensed psychiatrists using the Clinical Interview Schedule-Revised (CIS-R). The participants completed the BDI-Fast Screen (BDI-FS) using a custom ChatGPT(GPTs) interface and the Chinese version of the PHQ-9 in a private setting. Following these assessments, participants also completed the Subjective Evaluation Scale. Spearman's correlation analysis showed a high correlation between the total scores of the PHQ-9 and the BSI-FS-GPTs. The agreement of diagnoses between the two measures, as measured by Cohen's kappa, was also significant. BSI-FS-GPTs diagnosis showed significantly higher agreement with the current diagnosis of depression. However, given the limited sample size of the pilot study, the AUC value of 1.00 and a sensitivity of 0.80 at a cutoff of 0.5, with zero false positive rate, likely overstate the classifier's performance. Bayesian factors suggest that participants may feel more comfortable expressing their true feelings and opinions through this method. For ongoing follow-up research, a total sample size of approximately 104 participants, including about 26 diagnosed individuals, may be required to ensure the analysis maintains a necessary power of 0.80 and an alpha level of 0.05. Nonetheless, these findings provide a promising foundation for the ongoing validation of the new AAP in larger-scale studies, aiming to confirm its validity and reliability.\",\"PeriodicalId\":501388,\"journal\":{\"name\":\"medRxiv - Psychiatry and Clinical Psychology\",\"volume\":\"53 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"medRxiv - Psychiatry and Clinical Psychology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2024.07.19.24310543\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Psychiatry and Clinical Psychology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.19.24310543","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

语言模型的发展,特别是像 ChatGPT 这样的大型语言模型的开发,为心理评估开辟了新的途径,有可能彻底改变一个多世纪以来一直使用的评分量表方法。本研究引入了一种新的自动评估范式(AAP),旨在将自然语言处理(NLP)技术与传统测量方法相结合。这种整合提高了心理健康评估的准确性和深度,同时也解决了参与者的接受程度和主观体验问题--这些方面以前都没有被广泛测量过。我们对 32 名参与者进行了试点研究,其中 7 人由执业精神科医生使用《临床访谈表-修订版》(CIS-R)诊断为抑郁症。参与者使用定制的 ChatGPT(GPTs)界面完成了 BDI-快速筛查(BDI-FS),并在私人环境中完成了 PHQ-9 的中文版。在这些评估之后,参与者还完成了主观评价量表。斯皮尔曼相关分析表明,PHQ-9 和 BSI-FS-GPTs 的总分之间存在高度相关性。根据科恩卡帕(Cohen's kappa)的测量,这两项测量的诊断结果之间的一致性也非常显著。BSI-FS-GPTs 诊断与当前抑郁症诊断的一致性明显更高。然而,由于试点研究的样本量有限,在 0.5 临界值和零假阳性率的情况下,AUC 值为 1.00,灵敏度为 0.80,很可能夸大了分类器的性能。贝叶斯因素表明,通过这种方法,参与者可能会更愿意表达自己的真实感受和意见。对于正在进行的后续研究,可能需要约 104 名参与者(包括约 26 名确诊者)的样本量,以确保分析保持必要的 0.80 功率和 0.05 的阿尔法水平。尽管如此,这些研究结果为在更大规模的研究中不断验证新的 AAP 提供了一个良好的基础,旨在确认其有效性和可靠性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Evaluating the Efficacy of AI-Based Interactive Assessments Using Large Language Models for Depression Screening
The evolution of language models, particularly the development of Large Language Models like ChatGPT, has opened new avenues for psychological assessment, potentially revolutionizing the rating scale methods that have been used for over a century. This study introduces a new Automated Assessment Paradigm (AAP), which aims to integrate natural language processing (NLP) techniques with traditional measurement methods. This integration enhances the accuracy and depth of mental health evaluations, while also addressing the acceptance and subjective experience of participants - areas that have not been extensively measured before. A pilot study was conducted with 32 participants, seven of whom were diagnosed with depression by licensed psychiatrists using the Clinical Interview Schedule-Revised (CIS-R). The participants completed the BDI-Fast Screen (BDI-FS) using a custom ChatGPT(GPTs) interface and the Chinese version of the PHQ-9 in a private setting. Following these assessments, participants also completed the Subjective Evaluation Scale. Spearman's correlation analysis showed a high correlation between the total scores of the PHQ-9 and the BSI-FS-GPTs. The agreement of diagnoses between the two measures, as measured by Cohen's kappa, was also significant. BSI-FS-GPTs diagnosis showed significantly higher agreement with the current diagnosis of depression. However, given the limited sample size of the pilot study, the AUC value of 1.00 and a sensitivity of 0.80 at a cutoff of 0.5, with zero false positive rate, likely overstate the classifier's performance. Bayesian factors suggest that participants may feel more comfortable expressing their true feelings and opinions through this method. For ongoing follow-up research, a total sample size of approximately 104 participants, including about 26 diagnosed individuals, may be required to ensure the analysis maintains a necessary power of 0.80 and an alpha level of 0.05. Nonetheless, these findings provide a promising foundation for the ongoing validation of the new AAP in larger-scale studies, aiming to confirm its validity and reliability.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Socio-medical Factors Associated with Neurodevelopmental Disorders on the Kenyan Coast Relationship between blood-cerebrospinal fluid barrier integrity, cardiometabolic and inflammatory factors in schizophrenia-spectrum disorders Whole-exome sequencing study of opioid dependence offers novel insights into the contributions of exome variants Mayo Normative Studies: regression-based normative data for remote self-administration of the Stricker Learning Span, Symbols Test and Mayo Test Drive Screening Battery Composite and validation in individuals with Mild Cognitive Impairment and dementia EEG frontal alpha asymmetry mediates the association between maternal and child internalizing symptoms in childhood
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1