Are Virtual Assistants Trustworthy for Medicare Information: An Examination of Accuracy and Reliability.

IF 4.6 2区医学 Q1 GERONTOLOGY Gerontologist Pub Date : 2024-08-01 DOI:10.1093/geront/gnae062

Emily Langston, Neil Charness, Walter Boot

{"title":"Are Virtual Assistants Trustworthy for Medicare Information: An Examination of Accuracy and Reliability.","authors":"Emily Langston, Neil Charness, Walter Boot","doi":"10.1093/geront/gnae062","DOIUrl":null,"url":null,"abstract":"Background and objectives: Advances in artificial intelligence (AI)-based virtual assistants provide a potential opportunity for older adults to use this technology in the context of health information-seeking. Meta-analysis on trust in AI shows that users are influenced by the accuracy and reliability of the AI trustee. We evaluated these dimensions for responses to Medicare queries.Research design and methods: During the summer of 2023, we assessed the accuracy and reliability of Alexa, Google Assistant, Bard, and ChatGPT-4 on Medicare terminology and general content from a large, standardized question set. We compared the accuracy of these AI systems to that of a large representative sample of Medicare beneficiaries who were queried twenty years prior.Results: Alexa and Google Assistant were found to be highly inaccurate when compared to beneficiaries' mean accuracy of 68.4% on terminology queries and 53.0% on general Medicare content. Bard and ChatGPT-4 answered Medicare terminology queries perfectly and performed much better on general Medicare content queries (Bard = 96.3%, ChatGPT-4 = 92.6%) than the average Medicare beneficiary. About one month to a month-and-a-half later, we found that Bard and Alexa's accuracy stayed the same, whereas ChatGPT-4's performance nominally decreased, and Google Assistant's performance nominally increased.Discussion and implications: LLM-based assistants generate trustworthy information in response to carefully phrased queries about Medicare, in contrast to Alexa and Google Assistant. Further studies will be needed to determine what factors beyond accuracy and reliability influence the adoption and use of such technology for Medicare decision-making.","PeriodicalId":51347,"journal":{"name":"Gerontologist","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11258897/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Gerontologist","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/geront/gnae062","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GERONTOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background and objectives: Advances in artificial intelligence (AI)-based virtual assistants provide a potential opportunity for older adults to use this technology in the context of health information-seeking. Meta-analysis on trust in AI shows that users are influenced by the accuracy and reliability of the AI trustee. We evaluated these dimensions for responses to Medicare queries.

Research design and methods: During the summer of 2023, we assessed the accuracy and reliability of Alexa, Google Assistant, Bard, and ChatGPT-4 on Medicare terminology and general content from a large, standardized question set. We compared the accuracy of these AI systems to that of a large representative sample of Medicare beneficiaries who were queried twenty years prior.

Results: Alexa and Google Assistant were found to be highly inaccurate when compared to beneficiaries' mean accuracy of 68.4% on terminology queries and 53.0% on general Medicare content. Bard and ChatGPT-4 answered Medicare terminology queries perfectly and performed much better on general Medicare content queries (Bard = 96.3%, ChatGPT-4 = 92.6%) than the average Medicare beneficiary. About one month to a month-and-a-half later, we found that Bard and Alexa's accuracy stayed the same, whereas ChatGPT-4's performance nominally decreased, and Google Assistant's performance nominally increased.

Discussion and implications: LLM-based assistants generate trustworthy information in response to carefully phrased queries about Medicare, in contrast to Alexa and Google Assistant. Further studies will be needed to determine what factors beyond accuracy and reliability influence the adoption and use of such technology for Medicare decision-making.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

虚拟助理提供的医疗保险信息是否可信？对准确性和可靠性的研究。

背景和目的：基于人工智能的虚拟助手的进步为老年人在寻求健康信息时使用这种技术提供了潜在的机会。关于人工智能信任度的元分析表明，用户会受到人工智能受托人的准确性和可靠性的影响。我们对医疗保险查询回复的这些方面进行了评估：2023 年夏天，我们评估了 Alexa、Google Assistant、Bard 和 ChatGPT-4 在大型标准化问题集的医疗保险术语和一般内容方面的准确性和可靠性。我们将这些人工智能系统的准确性与二十年前查询的具有代表性的医疗保险受益人大样本的准确性进行了比较：结果发现，Alexa 和谷歌助手在术语查询方面的平均准确率为 68.4%，在一般医疗保险内容方面的平均准确率为 53.0%，而受益人的平均准确率为 68.4%。与普通医疗保险受益人相比，Bard 和 ChatGPT-4 能完美回答医疗保险术语查询，在一般医疗保险内容查询方面表现更好（Bard = 96.3%，ChatGPT-4 = 92.6%）。大约一个月到一个半月后，我们发现 Bard 和 Alexa 的准确率保持不变，而 ChatGPT-4 的表现略有下降，谷歌助手的表现略有上升：与 Alexa 和谷歌助手相比，基于 LLM 的助手在回答有关医疗保险的仔细措辞的询问时会生成值得信赖的信息。进一步的研究将需要确定除了准确性和可靠性之外，还有哪些因素会影响在医疗保险决策中采用和使用此类技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Gerontologist GERONTOLOGY-

CiteScore

11.00

自引率

8.80%

发文量

171

期刊介绍： The Gerontologist, published since 1961, is a bimonthly journal of The Gerontological Society of America that provides a multidisciplinary perspective on human aging by publishing research and analysis on applied social issues. It informs the broad community of disciplines and professions involved in understanding the aging process and providing care to older people. Articles should include a conceptual framework and testable hypotheses. Implications for policy or practice should be highlighted. The Gerontologist publishes quantitative and qualitative research and encourages manuscript submissions of various types including: research articles, intervention research, review articles, measurement articles, forums, and brief reports. Book and media reviews, International Spotlights, and award-winning lectures are commissioned by the editors.