Comparative evaluation of artificial intelligence systems' accuracy in providing medical drug dosages: A methodological study.

Swaminathan Ramasubramanian, Sangeetha Balaji, Tejashri Kannan, Naveen Jeyaraman, Shilpa Sharma, Filippo Migliorini, Suhasini Balasubramaniam, Madhan Jeyaraman
{"title":"Comparative evaluation of artificial intelligence systems' accuracy in providing medical drug dosages: A methodological study.","authors":"Swaminathan Ramasubramanian, Sangeetha Balaji, Tejashri Kannan, Naveen Jeyaraman, Shilpa Sharma, Filippo Migliorini, Suhasini Balasubramaniam, Madhan Jeyaraman","doi":"10.5662/wjm.v14.i4.92802","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Medication errors, especially in dosage calculation, pose risks in healthcare. Artificial intelligence (AI) systems like ChatGPT and Google Bard may help reduce errors, but their accuracy in providing medication information remains to be evaluated.</p><p><strong>Aim: </strong>To evaluate the accuracy of AI systems (ChatGPT 3.5, ChatGPT 4, Google Bard) in providing drug dosage information per Harrison's Principles of Internal Medicine.</p><p><strong>Methods: </strong>A set of natural language queries mimicking real-world medical dosage inquiries was presented to the AI systems. Responses were analyzed using a 3-point Likert scale. The analysis, conducted with Python and its libraries, focused on basic statistics, overall system accuracy, and disease-specific and organ system accuracies.</p><p><strong>Results: </strong>ChatGPT 4 outperformed the other systems, showing the highest rate of correct responses (83.77%) and the best overall weighted accuracy (0.6775). Disease-specific accuracy varied notably across systems, with some diseases being accurately recognized, while others demonstrated significant discrepancies. Organ system accuracy also showed variable results, underscoring system-specific strengths and weaknesses.</p><p><strong>Conclusion: </strong>ChatGPT 4 demonstrates superior reliability in medical dosage information, yet variations across diseases emphasize the need for ongoing improvements. These results highlight AI's potential in aiding healthcare professionals, urging continuous development for dependable accuracy in critical medical situations.</p>","PeriodicalId":94271,"journal":{"name":"World journal of methodology","volume":"14 4","pages":"92802"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11287534/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World journal of methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5662/wjm.v14.i4.92802","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Medication errors, especially in dosage calculation, pose risks in healthcare. Artificial intelligence (AI) systems like ChatGPT and Google Bard may help reduce errors, but their accuracy in providing medication information remains to be evaluated.

Aim: To evaluate the accuracy of AI systems (ChatGPT 3.5, ChatGPT 4, Google Bard) in providing drug dosage information per Harrison's Principles of Internal Medicine.

Methods: A set of natural language queries mimicking real-world medical dosage inquiries was presented to the AI systems. Responses were analyzed using a 3-point Likert scale. The analysis, conducted with Python and its libraries, focused on basic statistics, overall system accuracy, and disease-specific and organ system accuracies.

Results: ChatGPT 4 outperformed the other systems, showing the highest rate of correct responses (83.77%) and the best overall weighted accuracy (0.6775). Disease-specific accuracy varied notably across systems, with some diseases being accurately recognized, while others demonstrated significant discrepancies. Organ system accuracy also showed variable results, underscoring system-specific strengths and weaknesses.

Conclusion: ChatGPT 4 demonstrates superior reliability in medical dosage information, yet variations across diseases emphasize the need for ongoing improvements. These results highlight AI's potential in aiding healthcare professionals, urging continuous development for dependable accuracy in critical medical situations.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
人工智能系统提供医疗药物剂量准确性的比较评价:一项方法学研究。
背景:用药错误,尤其是剂量计算错误,给医疗保健带来风险。ChatGPT和b谷歌Bard等人工智能(AI)系统可能有助于减少错误,但它们在提供药物信息方面的准确性仍有待评估。目的:评估人工智能系统(ChatGPT 3.5, ChatGPT 4, b谷歌Bard)根据《哈里森内科原理》提供药物剂量信息的准确性。方法:向人工智能系统提供一组模拟真实用药剂量查询的自然语言查询。使用3点李克特量表分析回答。使用Python及其库进行的分析侧重于基本统计、整体系统准确性以及疾病特异性和器官系统准确性。结果:ChatGPT 4表现出最高的正确率(83.77%)和最佳的综合加权准确率(0.6775)。不同系统的疾病特异性准确性差异显著,一些疾病被准确识别,而另一些疾病则表现出显著差异。器官系统的准确性也显示出不同的结果,强调系统特定的优势和劣势。结论:ChatGPT 4在医疗剂量信息方面表现出卓越的可靠性,但不同疾病的差异强调需要不断改进。这些结果突出了人工智能在帮助医疗专业人员方面的潜力,敦促在危急医疗情况下不断发展可靠的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Point-of-care ultrasound in nephrology: A private practice viewpoint. Relation between dysbiosis and inborn errors of immunity. Remission of type 2 diabetes mellitus: Emerging concepts and proposed diagnostic criteria. Artificial intelligence and robotics in regional anesthesia. Biobanks and biomarkers: Their current and future role in biomedical research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1