Quality evaluation of digital voice assistants for the management of mental health conditions

IF 0.4 Q4 MEDICINE, RESEARCH & EXPERIMENTAL AIMS Medical Science Pub Date : 2022-01-01 DOI:10.3934/medsci.2022028
Vanessa Kai Lin Chua, L. Wong, K. Yap
{"title":"Quality evaluation of digital voice assistants for the management of mental health conditions","authors":"Vanessa Kai Lin Chua, L. Wong, K. Yap","doi":"10.3934/medsci.2022028","DOIUrl":null,"url":null,"abstract":"Background Digital voice assistants (DVAs) are gaining increasing popularity as a tool for accessing online mental health information. However, the quality of information provided by DVAs is not known. This study seeks to evaluate the quality of DVA responses to mental health-related queries in relation to six quality domains: comprehension ability, relevance, comprehensiveness, accuracy, understandability and reliability. Materials and methods Four smartphone DVAs were evaluated: Apple Siri, Samsung Bixby, Google Assistant and Amazon Alexa. Sixty-six questions and answers on mental health conditions (depression, anxiety, obsessive-compulsive disorder (OCD) and bipolar disorder) were compiled from authoritative sources, clinical guidelines and public search trends. Three evaluators scored the DVAs from an in-house-developed evaluation rubric. Data were analyzed by using the Kruskal-Wallis and Wilcoxon rank sum tests. Results Across all questions, Google Assistant scored the highest (78.9%), while Alexa scored the lowest (64.5%). Siri (83.9%), Bixby (87.7%) and Google Assistant (87.4%) scored the best for questions on depression, while Alexa (72.3%) scored the best for OCD questions. Bixby scored the lowest for questions on general mental health (0%) and OCD (0%) compared to all other DVAs. In terms of the quality domains, Google Assistant scored significantly higher for comprehension ability compared to Siri (100% versus 88.9%, p < 0.001) and Bixby (100% versus 94.5%, p < 0.001). Moreover, Google Assistant also scored significantly higher than Siri (100% versus 66.7%, p < 0.001) and Alexa (100% versus 75.0%, p < 0.001) in terms of relevance. In contrast, Alexa scored the worst in terms of accuracy (75.0%), reliability (58.3%) and comprehensiveness (22.2%) compared to all other DVAs. Conclusion Overall, Google Assistant performed the best in terms of responding to the mental health-related queries, while Alexa performed the worst. While the comprehension abilities of the DVAs were good, the DVAs had differing performances in the other quality domains. The responses by DVAs should be supplemented with other information from authoritative sources, and users should seek the help and advice of a healthcare professional when managing their mental health conditions.","PeriodicalId":43011,"journal":{"name":"AIMS Medical Science","volume":null,"pages":null},"PeriodicalIF":0.4000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AIMS Medical Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3934/medsci.2022028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

Background Digital voice assistants (DVAs) are gaining increasing popularity as a tool for accessing online mental health information. However, the quality of information provided by DVAs is not known. This study seeks to evaluate the quality of DVA responses to mental health-related queries in relation to six quality domains: comprehension ability, relevance, comprehensiveness, accuracy, understandability and reliability. Materials and methods Four smartphone DVAs were evaluated: Apple Siri, Samsung Bixby, Google Assistant and Amazon Alexa. Sixty-six questions and answers on mental health conditions (depression, anxiety, obsessive-compulsive disorder (OCD) and bipolar disorder) were compiled from authoritative sources, clinical guidelines and public search trends. Three evaluators scored the DVAs from an in-house-developed evaluation rubric. Data were analyzed by using the Kruskal-Wallis and Wilcoxon rank sum tests. Results Across all questions, Google Assistant scored the highest (78.9%), while Alexa scored the lowest (64.5%). Siri (83.9%), Bixby (87.7%) and Google Assistant (87.4%) scored the best for questions on depression, while Alexa (72.3%) scored the best for OCD questions. Bixby scored the lowest for questions on general mental health (0%) and OCD (0%) compared to all other DVAs. In terms of the quality domains, Google Assistant scored significantly higher for comprehension ability compared to Siri (100% versus 88.9%, p < 0.001) and Bixby (100% versus 94.5%, p < 0.001). Moreover, Google Assistant also scored significantly higher than Siri (100% versus 66.7%, p < 0.001) and Alexa (100% versus 75.0%, p < 0.001) in terms of relevance. In contrast, Alexa scored the worst in terms of accuracy (75.0%), reliability (58.3%) and comprehensiveness (22.2%) compared to all other DVAs. Conclusion Overall, Google Assistant performed the best in terms of responding to the mental health-related queries, while Alexa performed the worst. While the comprehension abilities of the DVAs were good, the DVAs had differing performances in the other quality domains. The responses by DVAs should be supplemented with other information from authoritative sources, and users should seek the help and advice of a healthcare professional when managing their mental health conditions.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于心理健康状况管理的数字语音助手的质量评估
数字语音助手(DVAs)作为一种获取在线心理健康信息的工具越来越受欢迎。然而,dva提供的信息的质量尚不清楚。本研究旨在评估DVA对心理健康相关问题回应的质量,涉及六个质量领域:理解能力、相关性、全面性、准确性、可理解性和可靠性。材料和方法对四款智能手机dva进行了评估:苹果Siri、三星Bixby、谷歌助手和亚马逊Alexa。从权威来源、临床指南和公众搜索趋势中汇编了66个关于精神健康状况(抑郁、焦虑、强迫症和双相情感障碍)的问题和答案。三名评估人员根据内部开发的评估标准对dva进行评分。数据分析采用Kruskal-Wallis和Wilcoxon秩和检验。在所有问题中,谷歌助手得分最高(78.9%),而Alexa得分最低(64.5%)。Siri(83.9%)、Bixby(87.7%)和Google Assistant(87.4%)在抑郁症问题上得分最高,而Alexa(72.3%)在强迫症问题上得分最高。与所有其他dva相比,Bixby在一般心理健康(0%)和强迫症(0%)问题上得分最低。在质量领域方面,Google Assistant的理解能力得分明显高于Siri(100%对88.9%,p < 0.001)和Bixby(100%对94.5%,p < 0.001)。此外,在相关性方面,谷歌助手的得分也明显高于Siri(100%对66.7%,p < 0.001)和Alexa(100%对75.0%,p < 0.001)。相比之下,与所有其他dva相比,Alexa在准确性(75.0%)、可靠性(58.3%)和综合性(22.2%)方面得分最低。总体而言,谷歌助手在回答与心理健康相关的问题方面表现最好,而Alexa表现最差。学生的理解能力较好,但在其他素质领域表现各异。dva的答复应补充来自权威来源的其他信息,用户在管理其精神健康状况时应寻求医疗保健专业人员的帮助和建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
AIMS Medical Science
AIMS Medical Science MEDICINE, RESEARCH & EXPERIMENTAL-
自引率
14.30%
发文量
20
审稿时长
12 weeks
期刊最新文献
Alcohol consumption and HIV disease prognosis among virally unsuppressed in Rural KwaZulu Natal, South Africa The correlation between obesity and other cardiovascular disease risk factors among adult patients attending a specialist clinic in Kumasi. Ghana Analysis of Caputo fractional-order model for COVID-19 with non-pharmaceuticals interventions and vaccine hesitancy Increased risk of diabetic ketoacidosis in an Urban, United States, safety-net emergency department in the COVID-19 era Hydroxyurea and pyridostigmine repurposed for treating Covid-19 multi-systems dysfunctions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1