Quality evaluation of digital voice assistants for the management of mental health conditions

IF 0.4 Q4 MEDICINE, RESEARCH & EXPERIMENTAL AIMS Medical Science Pub Date : 2022-01-01 DOI:10.3934/medsci.2022028

Vanessa Kai Lin Chua, L. Wong, K. Yap

{"title":"Quality evaluation of digital voice assistants for the management of mental health conditions","authors":"Vanessa Kai Lin Chua, L. Wong, K. Yap","doi":"10.3934/medsci.2022028","DOIUrl":null,"url":null,"abstract":"Background Digital voice assistants (DVAs) are gaining increasing popularity as a tool for accessing online mental health information. However, the quality of information provided by DVAs is not known. This study seeks to evaluate the quality of DVA responses to mental health-related queries in relation to six quality domains: comprehension ability, relevance, comprehensiveness, accuracy, understandability and reliability. Materials and methods Four smartphone DVAs were evaluated: Apple Siri, Samsung Bixby, Google Assistant and Amazon Alexa. Sixty-six questions and answers on mental health conditions (depression, anxiety, obsessive-compulsive disorder (OCD) and bipolar disorder) were compiled from authoritative sources, clinical guidelines and public search trends. Three evaluators scored the DVAs from an in-house-developed evaluation rubric. Data were analyzed by using the Kruskal-Wallis and Wilcoxon rank sum tests. Results Across all questions, Google Assistant scored the highest (78.9%), while Alexa scored the lowest (64.5%). Siri (83.9%), Bixby (87.7%) and Google Assistant (87.4%) scored the best for questions on depression, while Alexa (72.3%) scored the best for OCD questions. Bixby scored the lowest for questions on general mental health (0%) and OCD (0%) compared to all other DVAs. In terms of the quality domains, Google Assistant scored significantly higher for comprehension ability compared to Siri (100% versus 88.9%, p < 0.001) and Bixby (100% versus 94.5%, p < 0.001). Moreover, Google Assistant also scored significantly higher than Siri (100% versus 66.7%, p < 0.001) and Alexa (100% versus 75.0%, p < 0.001) in terms of relevance. In contrast, Alexa scored the worst in terms of accuracy (75.0%), reliability (58.3%) and comprehensiveness (22.2%) compared to all other DVAs. Conclusion Overall, Google Assistant performed the best in terms of responding to the mental health-related queries, while Alexa performed the worst. While the comprehension abilities of the DVAs were good, the DVAs had differing performances in the other quality domains. The responses by DVAs should be supplemented with other information from authoritative sources, and users should seek the help and advice of a healthcare professional when managing their mental health conditions.","PeriodicalId":43011,"journal":{"name":"AIMS Medical Science","volume":"63 1","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AIMS Medical Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3934/medsci.2022028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

Background Digital voice assistants (DVAs) are gaining increasing popularity as a tool for accessing online mental health information. However, the quality of information provided by DVAs is not known. This study seeks to evaluate the quality of DVA responses to mental health-related queries in relation to six quality domains: comprehension ability, relevance, comprehensiveness, accuracy, understandability and reliability. Materials and methods Four smartphone DVAs were evaluated: Apple Siri, Samsung Bixby, Google Assistant and Amazon Alexa. Sixty-six questions and answers on mental health conditions (depression, anxiety, obsessive-compulsive disorder (OCD) and bipolar disorder) were compiled from authoritative sources, clinical guidelines and public search trends. Three evaluators scored the DVAs from an in-house-developed evaluation rubric. Data were analyzed by using the Kruskal-Wallis and Wilcoxon rank sum tests. Results Across all questions, Google Assistant scored the highest (78.9%), while Alexa scored the lowest (64.5%). Siri (83.9%), Bixby (87.7%) and Google Assistant (87.4%) scored the best for questions on depression, while Alexa (72.3%) scored the best for OCD questions. Bixby scored the lowest for questions on general mental health (0%) and OCD (0%) compared to all other DVAs. In terms of the quality domains, Google Assistant scored significantly higher for comprehension ability compared to Siri (100% versus 88.9%, p < 0.001) and Bixby (100% versus 94.5%, p < 0.001). Moreover, Google Assistant also scored significantly higher than Siri (100% versus 66.7%, p < 0.001) and Alexa (100% versus 75.0%, p < 0.001) in terms of relevance. In contrast, Alexa scored the worst in terms of accuracy (75.0%), reliability (58.3%) and comprehensiveness (22.2%) compared to all other DVAs. Conclusion Overall, Google Assistant performed the best in terms of responding to the mental health-related queries, while Alexa performed the worst. While the comprehension abilities of the DVAs were good, the DVAs had differing performances in the other quality domains. The responses by DVAs should be supplemented with other information from authoritative sources, and users should seek the help and advice of a healthcare professional when managing their mental health conditions.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于心理健康状况管理的数字语音助手的质量评估

数字语音助手(DVAs)作为一种获取在线心理健康信息的工具越来越受欢迎。然而，dva提供的信息的质量尚不清楚。本研究旨在评估DVA对心理健康相关问题回应的质量，涉及六个质量领域:理解能力、相关性、全面性、准确性、可理解性和可靠性。材料和方法对四款智能手机dva进行了评估:苹果Siri、三星Bixby、谷歌助手和亚马逊Alexa。从权威来源、临床指南和公众搜索趋势中汇编了66个关于精神健康状况(抑郁、焦虑、强迫症和双相情感障碍)的问题和答案。三名评估人员根据内部开发的评估标准对dva进行评分。数据分析采用Kruskal-Wallis和Wilcoxon秩和检验。在所有问题中，谷歌助手得分最高(78.9%)，而Alexa得分最低(64.5%)。Siri(83.9%)、Bixby(87.7%)和Google Assistant(87.4%)在抑郁症问题上得分最高，而Alexa(72.3%)在强迫症问题上得分最高。与所有其他dva相比，Bixby在一般心理健康(0%)和强迫症(0%)问题上得分最低。在质量领域方面，Google Assistant的理解能力得分明显高于Siri(100%对88.9%，p < 0.001)和Bixby(100%对94.5%，p < 0.001)。此外，在相关性方面，谷歌助手的得分也明显高于Siri(100%对66.7%，p < 0.001)和Alexa(100%对75.0%，p < 0.001)。相比之下，与所有其他dva相比，Alexa在准确性(75.0%)、可靠性(58.3%)和综合性(22.2%)方面得分最低。总体而言，谷歌助手在回答与心理健康相关的问题方面表现最好，而Alexa表现最差。学生的理解能力较好，但在其他素质领域表现各异。dva的答复应补充来自权威来源的其他信息，用户在管理其精神健康状况时应寻求医疗保健专业人员的帮助和建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

AIMS Medical Science MEDICINE, RESEARCH & EXPERIMENTAL-

自引率

14.30%

发文量

审稿时长

12 weeks