Jimmy Daza , Lucas Soares Bezerra , Laura Santamaría , Roberto Rueda-Esteban , Heike Bantel , Marcos Girala , Matthias Ebert , Florian Van Bömmel , Andreas Geier , Andres Gomez Aldana , Kevin Yau , Mario Alvares-da-Silva , Markus Peck-Radosavljevic , Ezequiel Ridruejo , Arndt Weinmann , Andreas Teufel
{"title":"评估自身免疫性肝病中的四种聊天机器人:对比分析","authors":"Jimmy Daza , Lucas Soares Bezerra , Laura Santamaría , Roberto Rueda-Esteban , Heike Bantel , Marcos Girala , Matthias Ebert , Florian Van Bömmel , Andreas Geier , Andres Gomez Aldana , Kevin Yau , Mario Alvares-da-Silva , Markus Peck-Radosavljevic , Ezequiel Ridruejo , Arndt Weinmann , Andreas Teufel","doi":"10.1016/j.aohep.2024.101537","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction and Objectives</h3><p>Autoimmune liver diseases (AILDs) are rare and require precise evaluation, which is often challenging for medical providers. Chatbots are innovative solutions to assist healthcare professionals in clinical management. In our study, ten liver specialists systematically evaluated four chatbots to determine their utility as clinical decision support tools in the field of AILDs.</p></div><div><h3>Materials and Methods</h3><p>We constructed a 56-question questionnaire focusing on AILD evaluation, diagnosis, and management of Autoimmune Hepatitis (AIH), Primary Biliary Cholangitis (PBC), and Primary Sclerosing Cholangitis (PSC). Four chatbots -ChatGPT 3.5, Claude, Microsoft Copilot, and Google Bard- were presented with the questions in their free tiers in December 2023. Responses underwent critical evaluation by ten liver specialists using a standardized 1 to 10 Likert scale. The analysis included mean scores, the number of highest-rated replies, and the identification of common shortcomings in chatbots performance.</p></div><div><h3>Results</h3><p>Among the assessed chatbots, specialists rated Claude highest with a mean score of 7.37 (<em>SD</em> = 1.91), followed by ChatGPT (7.17, <em>SD</em> = 1.89), Microsoft Copilot (6.63, <em>SD</em> = 2.10), and Google Bard (6.52, <em>SD</em> = 2.27). Claude also excelled with 27 best-rated replies, outperforming ChatGPT (20), while Microsoft Copilot and Google Bard lagged with only 6 and 9, respectively. Common deficiencies included listing details over specific advice, limited dosing options, inaccuracies for pregnant patients, insufficient recent data, over-reliance on CT and MRI imaging, and inadequate discussion regarding off-label use and fibrates in PBC treatment. Notably, internet access for Microsoft Copilot and Google Bard did not enhance precision compared to pre-trained models.</p></div><div><h3>Conclusions</h3><p>Chatbots hold promise in AILD support, but our study underscores key areas for improvement. Refinement is needed in providing specific advice, accuracy, and focused up-to-date information. Addressing these shortcomings is essential for enhancing the utility of chatbots in AILD management, guiding future development, and ensuring their effectiveness as clinical decision-support tools.</p></div>","PeriodicalId":7979,"journal":{"name":"Annals of hepatology","volume":"30 1","pages":"Article 101537"},"PeriodicalIF":3.7000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1665268124003314/pdfft?md5=af7833dfb14ff08e21bb53cacf4381eb&pid=1-s2.0-S1665268124003314-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Evaluation of four chatbots in autoimmune liver disease: A comparative analysis\",\"authors\":\"Jimmy Daza , Lucas Soares Bezerra , Laura Santamaría , Roberto Rueda-Esteban , Heike Bantel , Marcos Girala , Matthias Ebert , Florian Van Bömmel , Andreas Geier , Andres Gomez Aldana , Kevin Yau , Mario Alvares-da-Silva , Markus Peck-Radosavljevic , Ezequiel Ridruejo , Arndt Weinmann , Andreas Teufel\",\"doi\":\"10.1016/j.aohep.2024.101537\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Introduction and Objectives</h3><p>Autoimmune liver diseases (AILDs) are rare and require precise evaluation, which is often challenging for medical providers. Chatbots are innovative solutions to assist healthcare professionals in clinical management. In our study, ten liver specialists systematically evaluated four chatbots to determine their utility as clinical decision support tools in the field of AILDs.</p></div><div><h3>Materials and Methods</h3><p>We constructed a 56-question questionnaire focusing on AILD evaluation, diagnosis, and management of Autoimmune Hepatitis (AIH), Primary Biliary Cholangitis (PBC), and Primary Sclerosing Cholangitis (PSC). Four chatbots -ChatGPT 3.5, Claude, Microsoft Copilot, and Google Bard- were presented with the questions in their free tiers in December 2023. Responses underwent critical evaluation by ten liver specialists using a standardized 1 to 10 Likert scale. The analysis included mean scores, the number of highest-rated replies, and the identification of common shortcomings in chatbots performance.</p></div><div><h3>Results</h3><p>Among the assessed chatbots, specialists rated Claude highest with a mean score of 7.37 (<em>SD</em> = 1.91), followed by ChatGPT (7.17, <em>SD</em> = 1.89), Microsoft Copilot (6.63, <em>SD</em> = 2.10), and Google Bard (6.52, <em>SD</em> = 2.27). Claude also excelled with 27 best-rated replies, outperforming ChatGPT (20), while Microsoft Copilot and Google Bard lagged with only 6 and 9, respectively. Common deficiencies included listing details over specific advice, limited dosing options, inaccuracies for pregnant patients, insufficient recent data, over-reliance on CT and MRI imaging, and inadequate discussion regarding off-label use and fibrates in PBC treatment. Notably, internet access for Microsoft Copilot and Google Bard did not enhance precision compared to pre-trained models.</p></div><div><h3>Conclusions</h3><p>Chatbots hold promise in AILD support, but our study underscores key areas for improvement. Refinement is needed in providing specific advice, accuracy, and focused up-to-date information. Addressing these shortcomings is essential for enhancing the utility of chatbots in AILD management, guiding future development, and ensuring their effectiveness as clinical decision-support tools.</p></div>\",\"PeriodicalId\":7979,\"journal\":{\"name\":\"Annals of hepatology\",\"volume\":\"30 1\",\"pages\":\"Article 101537\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1665268124003314/pdfft?md5=af7833dfb14ff08e21bb53cacf4381eb&pid=1-s2.0-S1665268124003314-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of hepatology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1665268124003314\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"GASTROENTEROLOGY & HEPATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of hepatology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1665268124003314","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}
Evaluation of four chatbots in autoimmune liver disease: A comparative analysis
Introduction and Objectives
Autoimmune liver diseases (AILDs) are rare and require precise evaluation, which is often challenging for medical providers. Chatbots are innovative solutions to assist healthcare professionals in clinical management. In our study, ten liver specialists systematically evaluated four chatbots to determine their utility as clinical decision support tools in the field of AILDs.
Materials and Methods
We constructed a 56-question questionnaire focusing on AILD evaluation, diagnosis, and management of Autoimmune Hepatitis (AIH), Primary Biliary Cholangitis (PBC), and Primary Sclerosing Cholangitis (PSC). Four chatbots -ChatGPT 3.5, Claude, Microsoft Copilot, and Google Bard- were presented with the questions in their free tiers in December 2023. Responses underwent critical evaluation by ten liver specialists using a standardized 1 to 10 Likert scale. The analysis included mean scores, the number of highest-rated replies, and the identification of common shortcomings in chatbots performance.
Results
Among the assessed chatbots, specialists rated Claude highest with a mean score of 7.37 (SD = 1.91), followed by ChatGPT (7.17, SD = 1.89), Microsoft Copilot (6.63, SD = 2.10), and Google Bard (6.52, SD = 2.27). Claude also excelled with 27 best-rated replies, outperforming ChatGPT (20), while Microsoft Copilot and Google Bard lagged with only 6 and 9, respectively. Common deficiencies included listing details over specific advice, limited dosing options, inaccuracies for pregnant patients, insufficient recent data, over-reliance on CT and MRI imaging, and inadequate discussion regarding off-label use and fibrates in PBC treatment. Notably, internet access for Microsoft Copilot and Google Bard did not enhance precision compared to pre-trained models.
Conclusions
Chatbots hold promise in AILD support, but our study underscores key areas for improvement. Refinement is needed in providing specific advice, accuracy, and focused up-to-date information. Addressing these shortcomings is essential for enhancing the utility of chatbots in AILD management, guiding future development, and ensuring their effectiveness as clinical decision-support tools.
期刊介绍:
Annals of Hepatology publishes original research on the biology and diseases of the liver in both humans and experimental models. Contributions may be submitted as regular articles. The journal also publishes concise reviews of both basic and clinical topics.