Background: Artificial intelligence-driven large language models demonstrate immense potential in the medical field. It remains unclear whether ChatGPT has the ability to provide appropriate recommendations for patients with inflammatory bowel disease (IBD) that are comparable to those of gastroenterologists. This study quantitatively assessed the performance of ChatGPT's generated IBD-related recommendations from the distinct perspectives of gastroenterologists and patients.
Methods: Healthcare questions regarding IBD were solicited from IBD patients and specialized physicians. Those questions were then presented to GPT-4 Omni and three independent senior gastroenterologists for responses. These responses were subsequently evaluated by a blinded panel of five board-certified gastroenterologists using a five-point Likert scale, assessing accuracy, completeness, and readability. Furthermore, 10 IBD patients as blinded assessors performed assessments of both ChatGPT's and gastroenterologists' responses.
Results: Thirty high-frequency questions were selected, encompassing basic knowledge, treatment, and management domains. ChatGPT demonstrated high reproducibility in responding to these questions. Regarding accuracy and readability, ChatGPT's performance was comparable to that of gastroenterologists. For completeness of responses, ChatGPT outperformed gastroenterologists (4.42 ± 0.67 vs 4.19 ± 0.65; P = 0.012). Overall, IBD patients were satisfied with both ChatGPT's and gastroenterologists' responses but, for treatment-related questions, patients rated gastroenterologists higher than ChatGPT (4.54 ± 0.32 vs 4.21 ± 0.38; P = 0.040).
Conclusions: ChatGPT has the potential to provide stable, accurate, comprehensive, and comprehensible healthcare-related information for IBD patients. Further validation of the reliability and practicality of large language models in real-world clinical settings is crucial.
扫码关注我们
求助内容:
应助结果提醒方式:
