从大规模部署 LLM 驱动的专家在线医疗聊天机器人中汲取经验

arXiv - CS - Human-Computer Interaction Pub Date : 2024-09-16 DOI:arxiv-2409.10354

Bhuvan Sachdeva, Pragnya Ramjee, Geeta Fulari, Kaushik Murali, Mohit Jain

{"title":"从大规模部署 LLM 驱动的专家在线医疗聊天机器人中汲取经验","authors":"Bhuvan Sachdeva, Pragnya Ramjee, Geeta Fulari, Kaushik Murali, Mohit Jain","doi":"arxiv-2409.10354","DOIUrl":null,"url":null,"abstract":"Large Language Models (LLMs) are widely used in healthcare, but limitations\nlike hallucinations, incomplete information, and bias hinder their reliability.\nTo address these, researchers released the Build Your Own expert Bot (BYOeB)\nplatform, enabling developers to create LLM-powered chatbots with integrated\nexpert verification. CataractBot, its first implementation, provides\nexpert-verified responses to cataract surgery questions. A pilot evaluation\nshowed its potential; however the study had a small sample size and was\nprimarily qualitative. In this work, we conducted a large-scale 24-week\ndeployment of CataractBot involving 318 patients and attendants who sent 1,992\nmessages, with 91.71\\% of responses verified by seven experts. Analysis of\ninteraction logs revealed that medical questions significantly outnumbered\nlogistical ones, hallucinations were negligible, and experts rated 84.52\\% of\nmedical answers as accurate. As the knowledge base expanded with expert\ncorrections, system performance improved by 19.02\\%, reducing expert workload.\nThese insights guide the design of future LLM-powered chatbots.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot\",\"authors\":\"Bhuvan Sachdeva, Pragnya Ramjee, Geeta Fulari, Kaushik Murali, Mohit Jain\",\"doi\":\"arxiv-2409.10354\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large Language Models (LLMs) are widely used in healthcare, but limitations\\nlike hallucinations, incomplete information, and bias hinder their reliability.\\nTo address these, researchers released the Build Your Own expert Bot (BYOeB)\\nplatform, enabling developers to create LLM-powered chatbots with integrated\\nexpert verification. CataractBot, its first implementation, provides\\nexpert-verified responses to cataract surgery questions. A pilot evaluation\\nshowed its potential; however the study had a small sample size and was\\nprimarily qualitative. In this work, we conducted a large-scale 24-week\\ndeployment of CataractBot involving 318 patients and attendants who sent 1,992\\nmessages, with 91.71\\\\% of responses verified by seven experts. Analysis of\\ninteraction logs revealed that medical questions significantly outnumbered\\nlogistical ones, hallucinations were negligible, and experts rated 84.52\\\\% of\\nmedical answers as accurate. As the knowledge base expanded with expert\\ncorrections, system performance improved by 19.02\\\\%, reducing expert workload.\\nThese insights guide the design of future LLM-powered chatbots.\",\"PeriodicalId\":501541,\"journal\":{\"name\":\"arXiv - CS - Human-Computer Interaction\",\"volume\":\"6 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Human-Computer Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.10354\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Human-Computer Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10354","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

大语言模型（LLM）被广泛应用于医疗保健领域，但幻觉、信息不完整和偏见等局限性阻碍了其可靠性。为了解决这些问题，研究人员发布了 "打造你自己的专家机器人"（BYOeB）平台，使开发人员能够创建由 LLM 驱动的聊天机器人，并集成专家验证功能。白内障机器人（CataractBot）是该平台的首款应用，它为白内障手术问题提供了经过专家验证的回复。一项试点评估显示了它的潜力，但这项研究的样本量很小，而且主要是定性研究。在这项工作中，我们对 CataractBot 进行了为期 24 周的大规模部署，共有 318 名患者和护理人员参与，他们发送了 1,992 条信息，其中 91.71% 的回复经过了七位专家的验证。对交互日志的分析表明，医疗问题明显多于逻辑问题，幻觉几乎可以忽略不计，专家们认为 84.52% 的医疗回答是准确的。随着知识库在专家纠正下不断扩大，系统性能提高了 19.02%，减少了专家的工作量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot

Large Language Models (LLMs) are widely used in healthcare, but limitations like hallucinations, incomplete information, and bias hinder their reliability. To address these, researchers released the Build Your Own expert Bot (BYOeB) platform, enabling developers to create LLM-powered chatbots with integrated expert verification. CataractBot, its first implementation, provides expert-verified responses to cataract surgery questions. A pilot evaluation showed its potential; however the study had a small sample size and was primarily qualitative. In this work, we conducted a large-scale 24-week deployment of CataractBot involving 318 patients and attendants who sent 1,992 messages, with 91.71\% of responses verified by seven experts. Analysis of interaction logs revealed that medical questions significantly outnumbered logistical ones, hallucinations were negligible, and experts rated 84.52\% of medical answers as accurate. As the knowledge base expanded with expert corrections, system performance improved by 19.02\%, reducing expert workload. These insights guide the design of future LLM-powered chatbots.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Human-Computer Interaction

自引率

0.00%

发文量