{"title":"Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot","authors":"Bhuvan Sachdeva, Pragnya Ramjee, Geeta Fulari, Kaushik Murali, Mohit Jain","doi":"arxiv-2409.10354","DOIUrl":null,"url":null,"abstract":"Large Language Models (LLMs) are widely used in healthcare, but limitations\nlike hallucinations, incomplete information, and bias hinder their reliability.\nTo address these, researchers released the Build Your Own expert Bot (BYOeB)\nplatform, enabling developers to create LLM-powered chatbots with integrated\nexpert verification. CataractBot, its first implementation, provides\nexpert-verified responses to cataract surgery questions. A pilot evaluation\nshowed its potential; however the study had a small sample size and was\nprimarily qualitative. In this work, we conducted a large-scale 24-week\ndeployment of CataractBot involving 318 patients and attendants who sent 1,992\nmessages, with 91.71\\% of responses verified by seven experts. Analysis of\ninteraction logs revealed that medical questions significantly outnumbered\nlogistical ones, hallucinations were negligible, and experts rated 84.52\\% of\nmedical answers as accurate. As the knowledge base expanded with expert\ncorrections, system performance improved by 19.02\\%, reducing expert workload.\nThese insights guide the design of future LLM-powered chatbots.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Human-Computer Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10354","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Large Language Models (LLMs) are widely used in healthcare, but limitations
like hallucinations, incomplete information, and bias hinder their reliability.
To address these, researchers released the Build Your Own expert Bot (BYOeB)
platform, enabling developers to create LLM-powered chatbots with integrated
expert verification. CataractBot, its first implementation, provides
expert-verified responses to cataract surgery questions. A pilot evaluation
showed its potential; however the study had a small sample size and was
primarily qualitative. In this work, we conducted a large-scale 24-week
deployment of CataractBot involving 318 patients and attendants who sent 1,992
messages, with 91.71\% of responses verified by seven experts. Analysis of
interaction logs revealed that medical questions significantly outnumbered
logistical ones, hallucinations were negligible, and experts rated 84.52\% of
medical answers as accurate. As the knowledge base expanded with expert
corrections, system performance improved by 19.02\%, reducing expert workload.
These insights guide the design of future LLM-powered chatbots.