Moving Toward Meaningful Evaluations of Monitoring in e-Mental Health Based on the Case of a Web-Based Grief Service for Older Mourners: Mixed Methods Study.
Lena Brandl, Stephanie Jansen-Kosterink, Jeannette Brodbeck, Sofia Jacinto, Bettina Mooser, Dirk Heylen
{"title":"Moving Toward Meaningful Evaluations of Monitoring in e-Mental Health Based on the Case of a Web-Based Grief Service for Older Mourners: Mixed Methods Study.","authors":"Lena Brandl, Stephanie Jansen-Kosterink, Jeannette Brodbeck, Sofia Jacinto, Bettina Mooser, Dirk Heylen","doi":"10.2196/63262","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI) tools hold much promise for mental health care by increasing the scalability and accessibility of care. However, current development and evaluation practices of AI tools limit their meaningfulness for health care contexts and therefore also the practical usefulness of such tools for professionals and clients alike.</p><p><strong>Objective: </strong>The aim of this study is to demonstrate the evaluation of an AI monitoring tool that detects the need for more intensive care in a web-based grief intervention for older mourners who have lost their spouse, with the goal of moving toward meaningful evaluation of AI tools in e-mental health.</p><p><strong>Methods: </strong>We leveraged the insights from three evaluation approaches: (1) the F1-score evaluated the tool's capacity to classify user monitoring parameters as either in need of more intensive support or recommendable to continue using the web-based grief intervention as is; (2) we used linear regression to assess the predictive value of users' monitoring parameters for clinical changes in grief, depression, and loneliness over the course of a 10-week intervention; and (3) we collected qualitative experience data from e-coaches (N=4) who incorporated the monitoring in their weekly email guidance during the 10-week intervention.</p><p><strong>Results: </strong>Based on n=174 binary recommendation decisions, the F1-score of the monitoring tool was 0.91. Due to minimal change in depression and loneliness scores after the 10-week intervention, only 1 linear regression was conducted. The difference score in grief before and after the intervention was included as a dependent variable. Participants' (N=21) mean score on the self-report monitoring and the estimated slope of individually fitted growth curves and its standard error (ie, participants' response pattern to the monitoring questions) were used as predictors. Only the mean monitoring score exhibited predictive value for the observed change in grief (R2=1.19, SE 0.33; t16=3.58, P=.002). The e-coaches appreciated the monitoring tool as an opportunity to confirm their initial impression about intervention participants, personalize their email guidance, and detect when participants' mental health deteriorated during the intervention.</p><p><strong>Conclusions: </strong>The monitoring tool evaluated in this paper identified a need for more intensive support reasonably well in a nonclinical sample of older mourners, had some predictive value for the change in grief symptoms during a 10-week intervention, and was appreciated as an additional source of mental health information by e-coaches who supported mourners during the intervention. Each evaluation approach in this paper came with its own set of limitations, including (1) skewed class distributions in prediction tasks based on real-life health data and (2) choosing meaningful statistical analyses based on clinical trial designs that are not targeted at evaluating AI tools. However, combining multiple evaluation methods facilitates drawing meaningful conclusions about the clinical value of AI monitoring tools for their intended mental health context.</p>","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":"8 ","pages":"e63262"},"PeriodicalIF":2.0000,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11620699/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Formative Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/63262","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Artificial intelligence (AI) tools hold much promise for mental health care by increasing the scalability and accessibility of care. However, current development and evaluation practices of AI tools limit their meaningfulness for health care contexts and therefore also the practical usefulness of such tools for professionals and clients alike.
Objective: The aim of this study is to demonstrate the evaluation of an AI monitoring tool that detects the need for more intensive care in a web-based grief intervention for older mourners who have lost their spouse, with the goal of moving toward meaningful evaluation of AI tools in e-mental health.
Methods: We leveraged the insights from three evaluation approaches: (1) the F1-score evaluated the tool's capacity to classify user monitoring parameters as either in need of more intensive support or recommendable to continue using the web-based grief intervention as is; (2) we used linear regression to assess the predictive value of users' monitoring parameters for clinical changes in grief, depression, and loneliness over the course of a 10-week intervention; and (3) we collected qualitative experience data from e-coaches (N=4) who incorporated the monitoring in their weekly email guidance during the 10-week intervention.
Results: Based on n=174 binary recommendation decisions, the F1-score of the monitoring tool was 0.91. Due to minimal change in depression and loneliness scores after the 10-week intervention, only 1 linear regression was conducted. The difference score in grief before and after the intervention was included as a dependent variable. Participants' (N=21) mean score on the self-report monitoring and the estimated slope of individually fitted growth curves and its standard error (ie, participants' response pattern to the monitoring questions) were used as predictors. Only the mean monitoring score exhibited predictive value for the observed change in grief (R2=1.19, SE 0.33; t16=3.58, P=.002). The e-coaches appreciated the monitoring tool as an opportunity to confirm their initial impression about intervention participants, personalize their email guidance, and detect when participants' mental health deteriorated during the intervention.
Conclusions: The monitoring tool evaluated in this paper identified a need for more intensive support reasonably well in a nonclinical sample of older mourners, had some predictive value for the change in grief symptoms during a 10-week intervention, and was appreciated as an additional source of mental health information by e-coaches who supported mourners during the intervention. Each evaluation approach in this paper came with its own set of limitations, including (1) skewed class distributions in prediction tasks based on real-life health data and (2) choosing meaningful statistical analyses based on clinical trial designs that are not targeted at evaluating AI tools. However, combining multiple evaluation methods facilitates drawing meaningful conclusions about the clinical value of AI monitoring tools for their intended mental health context.
背景:人工智能(AI)工具通过提高护理的可扩展性和可及性,为精神卫生保健带来了很大的希望。然而,目前人工智能工具的开发和评估实践限制了它们对医疗保健环境的意义,因此也限制了这些工具对专业人员和客户的实际有用性。目的:本研究的目的是展示对人工智能监测工具的评估,该工具可以在基于网络的丧偶老年哀悼者悲伤干预中检测到需要更多的重症监护,目的是对人工智能工具在电子心理健康方面进行有意义的评估。方法:我们利用了三种评估方法的见解:(1)f1得分评估了工具将用户监控参数分类为需要更深入的支持或建议继续使用基于网络的悲伤干预的能力;(2)在为期10周的干预过程中,我们使用线性回归评估用户监测参数对悲伤、抑郁和孤独临床变化的预测价值;(3)我们收集了电子教练(N=4)的定性经验数据,这些教练在为期10周的干预期间将监测纳入他们的每周电子邮件指导。结果:基于n=174个二元推荐决策,监测工具的f1评分为0.91。由于干预10周后抑郁和孤独得分变化很小,因此只进行了1次线性回归。干预前后的悲伤差异得分作为因变量。参与者(N=21)在自我报告监测上的平均得分和单独拟合的生长曲线的估计斜率及其标准误差(即参与者对监测问题的反应模式)被用作预测因子。只有平均监测评分对观察到的悲伤变化具有预测价值(R2=1.19, SE 0.33;t16.1 = 3.58, P = .002)。电子教练对监测工具表示赞赏,认为这是一个机会,可以确认他们对干预参与者的初步印象,个性化他们的电子邮件指导,并发现干预期间参与者的心理健康何时恶化。结论:本文评估的监测工具在一个非临床的老年哀悼者样本中很好地确定了需要更密集的支持,对10周干预期间悲伤症状的变化有一定的预测价值,并且在干预期间支持哀悼者的电子教练认为这是一个额外的心理健康信息来源。本文中的每种评估方法都有自己的局限性,包括:(1)基于现实生活健康数据的预测任务中的类别分布偏倚;(2)选择基于临床试验设计的有意义的统计分析,而不是针对评估人工智能工具。然而,结合多种评估方法有助于得出有关人工智能监测工具在其预期心理健康背景下的临床价值的有意义的结论。