从大规模部署 LLM 驱动的专家在线医疗聊天机器人中汲取经验

Bhuvan Sachdeva, Pragnya Ramjee, Geeta Fulari, Kaushik Murali, Mohit Jain
{"title":"从大规模部署 LLM 驱动的专家在线医疗聊天机器人中汲取经验","authors":"Bhuvan Sachdeva, Pragnya Ramjee, Geeta Fulari, Kaushik Murali, Mohit Jain","doi":"arxiv-2409.10354","DOIUrl":null,"url":null,"abstract":"Large Language Models (LLMs) are widely used in healthcare, but limitations\nlike hallucinations, incomplete information, and bias hinder their reliability.\nTo address these, researchers released the Build Your Own expert Bot (BYOeB)\nplatform, enabling developers to create LLM-powered chatbots with integrated\nexpert verification. CataractBot, its first implementation, provides\nexpert-verified responses to cataract surgery questions. A pilot evaluation\nshowed its potential; however the study had a small sample size and was\nprimarily qualitative. In this work, we conducted a large-scale 24-week\ndeployment of CataractBot involving 318 patients and attendants who sent 1,992\nmessages, with 91.71\\% of responses verified by seven experts. Analysis of\ninteraction logs revealed that medical questions significantly outnumbered\nlogistical ones, hallucinations were negligible, and experts rated 84.52\\% of\nmedical answers as accurate. As the knowledge base expanded with expert\ncorrections, system performance improved by 19.02\\%, reducing expert workload.\nThese insights guide the design of future LLM-powered chatbots.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot\",\"authors\":\"Bhuvan Sachdeva, Pragnya Ramjee, Geeta Fulari, Kaushik Murali, Mohit Jain\",\"doi\":\"arxiv-2409.10354\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large Language Models (LLMs) are widely used in healthcare, but limitations\\nlike hallucinations, incomplete information, and bias hinder their reliability.\\nTo address these, researchers released the Build Your Own expert Bot (BYOeB)\\nplatform, enabling developers to create LLM-powered chatbots with integrated\\nexpert verification. CataractBot, its first implementation, provides\\nexpert-verified responses to cataract surgery questions. A pilot evaluation\\nshowed its potential; however the study had a small sample size and was\\nprimarily qualitative. In this work, we conducted a large-scale 24-week\\ndeployment of CataractBot involving 318 patients and attendants who sent 1,992\\nmessages, with 91.71\\\\% of responses verified by seven experts. Analysis of\\ninteraction logs revealed that medical questions significantly outnumbered\\nlogistical ones, hallucinations were negligible, and experts rated 84.52\\\\% of\\nmedical answers as accurate. As the knowledge base expanded with expert\\ncorrections, system performance improved by 19.02\\\\%, reducing expert workload.\\nThese insights guide the design of future LLM-powered chatbots.\",\"PeriodicalId\":501541,\"journal\":{\"name\":\"arXiv - CS - Human-Computer Interaction\",\"volume\":\"6 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Human-Computer Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.10354\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Human-Computer Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10354","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

大语言模型(LLM)被广泛应用于医疗保健领域,但幻觉、信息不完整和偏见等局限性阻碍了其可靠性。为了解决这些问题,研究人员发布了 "打造你自己的专家机器人"(BYOeB)平台,使开发人员能够创建由 LLM 驱动的聊天机器人,并集成专家验证功能。白内障机器人(CataractBot)是该平台的首款应用,它为白内障手术问题提供了经过专家验证的回复。一项试点评估显示了它的潜力,但这项研究的样本量很小,而且主要是定性研究。在这项工作中,我们对 CataractBot 进行了为期 24 周的大规模部署,共有 318 名患者和护理人员参与,他们发送了 1,992 条信息,其中 91.71% 的回复经过了七位专家的验证。对交互日志的分析表明,医疗问题明显多于逻辑问题,幻觉几乎可以忽略不计,专家们认为 84.52% 的医疗回答是准确的。随着知识库在专家纠正下不断扩大,系统性能提高了 19.02%,减少了专家的工作量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot
Large Language Models (LLMs) are widely used in healthcare, but limitations like hallucinations, incomplete information, and bias hinder their reliability. To address these, researchers released the Build Your Own expert Bot (BYOeB) platform, enabling developers to create LLM-powered chatbots with integrated expert verification. CataractBot, its first implementation, provides expert-verified responses to cataract surgery questions. A pilot evaluation showed its potential; however the study had a small sample size and was primarily qualitative. In this work, we conducted a large-scale 24-week deployment of CataractBot involving 318 patients and attendants who sent 1,992 messages, with 91.71\% of responses verified by seven experts. Analysis of interaction logs revealed that medical questions significantly outnumbered logistical ones, hallucinations were negligible, and experts rated 84.52\% of medical answers as accurate. As the knowledge base expanded with expert corrections, system performance improved by 19.02\%, reducing expert workload. These insights guide the design of future LLM-powered chatbots.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Equimetrics -- Applying HAR principles to equestrian activities AI paintings vs. Human Paintings? Deciphering Public Interactions and Perceptions towards AI-Generated Paintings on TikTok From Data Stories to Dialogues: A Randomised Controlled Trial of Generative AI Agents and Data Storytelling in Enhancing Data Visualisation Comprehension Exploring Gaze Pattern in Autistic Children: Clustering, Visualization, and Prediction Revealing the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1