{"title":"使用基于大语言模型的聊天机器人管理门诊脑卒中幸存者和护理人员的康复问题和教育需求","authors":"Jin Rui Edmund Neo, Joon Sin Ser, San San Tay","doi":"10.3389/fdgth.2024.1395501","DOIUrl":null,"url":null,"abstract":"The utility of large language model-based (LLM) artificial intelligence (AI) chatbots in many aspects of healthcare is becoming apparent though their ability to address patient concerns remains unknown. We sought to evaluate the performance of two well-known, freely-accessible chatbots, ChatGPT and Google Bard, in responding to common questions about stroke rehabilitation posed by patients and their caregivers.We collected questions from outpatients and their caregivers through a survey, categorised them by theme, and created representative questions to be posed to both chatbots. We then evaluated the chatbots' responses based on accuracy, safety, relevance, and readability. Interrater agreement was also tracked.Although both chatbots achieved similar overall scores, Google Bard performed slightly better in relevance and safety. Both provided readable responses with some general accuracy, but struggled with hallucinated responses, were often not specific, and lacked awareness of the possibility for emotional situations with the potential to turn dangerous. Additionally, interrater agreement was low, highlighting the variability in physician acceptance of their responses.AI chatbots show potential in patient-facing support roles, but issues remain regarding safety, accuracy, and relevance. Future chatbots should address these problems to ensure that they can reliably and independently manage the concerns and questions of stroke patients and their caregivers.","PeriodicalId":504480,"journal":{"name":"Frontiers in Digital Health","volume":" 5","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Use of large language model-based chatbots in managing the rehabilitation concerns and education needs of outpatient stroke survivors and caregivers\",\"authors\":\"Jin Rui Edmund Neo, Joon Sin Ser, San San Tay\",\"doi\":\"10.3389/fdgth.2024.1395501\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The utility of large language model-based (LLM) artificial intelligence (AI) chatbots in many aspects of healthcare is becoming apparent though their ability to address patient concerns remains unknown. We sought to evaluate the performance of two well-known, freely-accessible chatbots, ChatGPT and Google Bard, in responding to common questions about stroke rehabilitation posed by patients and their caregivers.We collected questions from outpatients and their caregivers through a survey, categorised them by theme, and created representative questions to be posed to both chatbots. We then evaluated the chatbots' responses based on accuracy, safety, relevance, and readability. Interrater agreement was also tracked.Although both chatbots achieved similar overall scores, Google Bard performed slightly better in relevance and safety. Both provided readable responses with some general accuracy, but struggled with hallucinated responses, were often not specific, and lacked awareness of the possibility for emotional situations with the potential to turn dangerous. Additionally, interrater agreement was low, highlighting the variability in physician acceptance of their responses.AI chatbots show potential in patient-facing support roles, but issues remain regarding safety, accuracy, and relevance. Future chatbots should address these problems to ensure that they can reliably and independently manage the concerns and questions of stroke patients and their caregivers.\",\"PeriodicalId\":504480,\"journal\":{\"name\":\"Frontiers in Digital Health\",\"volume\":\" 5\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Digital Health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fdgth.2024.1395501\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Digital Health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fdgth.2024.1395501","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
基于大型语言模型(LLM)的人工智能(AI)聊天机器人在医疗保健的许多方面的作用正变得越来越明显,但它们解决患者问题的能力仍是未知数。我们试图评估 ChatGPT 和 Google Bard 这两个知名、可自由访问的聊天机器人在回答患者及其护理人员提出的有关中风康复的常见问题时的性能。我们通过调查收集了门诊患者及其护理人员提出的问题,并按主题进行了分类,然后创建了具有代表性的问题,将其提交给这两个聊天机器人。然后,我们根据准确性、安全性、相关性和可读性对聊天机器人的回答进行了评估。虽然两个聊天机器人的总分相近,但 Google Bard 在相关性和安全性方面的表现略胜一筹。这两个聊天机器人都提供了具有一定准确性的可读回复,但在幻觉回复方面有困难,回复往往不具体,而且缺乏对情绪化情况可能导致危险的认识。人工智能聊天机器人在面向患者的支持角色方面显示出潜力,但在安全性、准确性和相关性方面仍存在问题。未来的聊天机器人应解决这些问题,以确保它们能可靠、独立地处理中风患者及其护理人员的担忧和问题。
Use of large language model-based chatbots in managing the rehabilitation concerns and education needs of outpatient stroke survivors and caregivers
The utility of large language model-based (LLM) artificial intelligence (AI) chatbots in many aspects of healthcare is becoming apparent though their ability to address patient concerns remains unknown. We sought to evaluate the performance of two well-known, freely-accessible chatbots, ChatGPT and Google Bard, in responding to common questions about stroke rehabilitation posed by patients and their caregivers.We collected questions from outpatients and their caregivers through a survey, categorised them by theme, and created representative questions to be posed to both chatbots. We then evaluated the chatbots' responses based on accuracy, safety, relevance, and readability. Interrater agreement was also tracked.Although both chatbots achieved similar overall scores, Google Bard performed slightly better in relevance and safety. Both provided readable responses with some general accuracy, but struggled with hallucinated responses, were often not specific, and lacked awareness of the possibility for emotional situations with the potential to turn dangerous. Additionally, interrater agreement was low, highlighting the variability in physician acceptance of their responses.AI chatbots show potential in patient-facing support roles, but issues remain regarding safety, accuracy, and relevance. Future chatbots should address these problems to ensure that they can reliably and independently manage the concerns and questions of stroke patients and their caregivers.