Are Virtual Assistants Trustworthy for Medicare Information: An Examination of Accuracy and Reliability.

IF 4.6 2区 医学 Q1 GERONTOLOGY Gerontologist Pub Date : 2024-08-01 DOI:10.1093/geront/gnae062
Emily Langston, Neil Charness, Walter Boot
{"title":"Are Virtual Assistants Trustworthy for Medicare Information: An Examination of Accuracy and Reliability.","authors":"Emily Langston, Neil Charness, Walter Boot","doi":"10.1093/geront/gnae062","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and objectives: </strong>Advances in artificial intelligence (AI)-based virtual assistants provide a potential opportunity for older adults to use this technology in the context of health information-seeking. Meta-analysis on trust in AI shows that users are influenced by the accuracy and reliability of the AI trustee. We evaluated these dimensions for responses to Medicare queries.</p><p><strong>Research design and methods: </strong>During the summer of 2023, we assessed the accuracy and reliability of Alexa, Google Assistant, Bard, and ChatGPT-4 on Medicare terminology and general content from a large, standardized question set. We compared the accuracy of these AI systems to that of a large representative sample of Medicare beneficiaries who were queried twenty years prior.</p><p><strong>Results: </strong>Alexa and Google Assistant were found to be highly inaccurate when compared to beneficiaries' mean accuracy of 68.4% on terminology queries and 53.0% on general Medicare content. Bard and ChatGPT-4 answered Medicare terminology queries perfectly and performed much better on general Medicare content queries (Bard = 96.3%, ChatGPT-4 = 92.6%) than the average Medicare beneficiary. About one month to a month-and-a-half later, we found that Bard and Alexa's accuracy stayed the same, whereas ChatGPT-4's performance nominally decreased, and Google Assistant's performance nominally increased.</p><p><strong>Discussion and implications: </strong>LLM-based assistants generate trustworthy information in response to carefully phrased queries about Medicare, in contrast to Alexa and Google Assistant. Further studies will be needed to determine what factors beyond accuracy and reliability influence the adoption and use of such technology for Medicare decision-making.</p>","PeriodicalId":51347,"journal":{"name":"Gerontologist","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11258897/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Gerontologist","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/geront/gnae062","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GERONTOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background and objectives: Advances in artificial intelligence (AI)-based virtual assistants provide a potential opportunity for older adults to use this technology in the context of health information-seeking. Meta-analysis on trust in AI shows that users are influenced by the accuracy and reliability of the AI trustee. We evaluated these dimensions for responses to Medicare queries.

Research design and methods: During the summer of 2023, we assessed the accuracy and reliability of Alexa, Google Assistant, Bard, and ChatGPT-4 on Medicare terminology and general content from a large, standardized question set. We compared the accuracy of these AI systems to that of a large representative sample of Medicare beneficiaries who were queried twenty years prior.

Results: Alexa and Google Assistant were found to be highly inaccurate when compared to beneficiaries' mean accuracy of 68.4% on terminology queries and 53.0% on general Medicare content. Bard and ChatGPT-4 answered Medicare terminology queries perfectly and performed much better on general Medicare content queries (Bard = 96.3%, ChatGPT-4 = 92.6%) than the average Medicare beneficiary. About one month to a month-and-a-half later, we found that Bard and Alexa's accuracy stayed the same, whereas ChatGPT-4's performance nominally decreased, and Google Assistant's performance nominally increased.

Discussion and implications: LLM-based assistants generate trustworthy information in response to carefully phrased queries about Medicare, in contrast to Alexa and Google Assistant. Further studies will be needed to determine what factors beyond accuracy and reliability influence the adoption and use of such technology for Medicare decision-making.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
虚拟助理提供的医疗保险信息是否可信?对准确性和可靠性的研究。
背景和目的:基于人工智能的虚拟助手的进步为老年人在寻求健康信息时使用这种技术提供了潜在的机会。关于人工智能信任度的元分析表明,用户会受到人工智能受托人的准确性和可靠性的影响。我们对医疗保险查询回复的这些方面进行了评估:2023 年夏天,我们评估了 Alexa、Google Assistant、Bard 和 ChatGPT-4 在大型标准化问题集的医疗保险术语和一般内容方面的准确性和可靠性。我们将这些人工智能系统的准确性与二十年前查询的具有代表性的医疗保险受益人大样本的准确性进行了比较:结果发现,Alexa 和谷歌助手在术语查询方面的平均准确率为 68.4%,在一般医疗保险内容方面的平均准确率为 53.0%,而受益人的平均准确率为 68.4%。与普通医疗保险受益人相比,Bard 和 ChatGPT-4 能完美回答医疗保险术语查询,在一般医疗保险内容查询方面表现更好(Bard = 96.3%,ChatGPT-4 = 92.6%)。大约一个月到一个半月后,我们发现 Bard 和 Alexa 的准确率保持不变,而 ChatGPT-4 的表现略有下降,谷歌助手的表现略有上升:与 Alexa 和谷歌助手相比,基于 LLM 的助手在回答有关医疗保险的仔细措辞的询问时会生成值得信赖的信息。进一步的研究将需要确定除了准确性和可靠性之外,还有哪些因素会影响在医疗保险决策中采用和使用此类技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Gerontologist
Gerontologist GERONTOLOGY-
CiteScore
11.00
自引率
8.80%
发文量
171
期刊介绍: The Gerontologist, published since 1961, is a bimonthly journal of The Gerontological Society of America that provides a multidisciplinary perspective on human aging by publishing research and analysis on applied social issues. It informs the broad community of disciplines and professions involved in understanding the aging process and providing care to older people. Articles should include a conceptual framework and testable hypotheses. Implications for policy or practice should be highlighted. The Gerontologist publishes quantitative and qualitative research and encourages manuscript submissions of various types including: research articles, intervention research, review articles, measurement articles, forums, and brief reports. Book and media reviews, International Spotlights, and award-winning lectures are commissioned by the editors.
期刊最新文献
Caregiving Challenges from Persistent Pain Among Family Caregivers to People with Dementia. Usability Testing of the PACE-App to Support Family Caregivers in Managing Pain for People with Dementia. The Evolution in Dementia Caregiving Research: NIA's Catalyst Role. Celebrating the National Institute on Aging's 50th Anniversary: The Gerontologist Special Collection. Digital Contact as Strain or Support: How Does Type of Contact Shape the Association between Mother-child Interactions and Adult Children's Depressive Symptoms in Later-life Families?
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1