Benedict Morath, Ute Chiriac, Elena Jaszkowski, Carolin Deiß, Hannah Nürnberg, Katrin Hörth, Torsten Hoppe-Tichy, Kim Green
{"title":"Performance and risks of ChatGPT used in drug information: an exploratory real-world analysis.","authors":"Benedict Morath, Ute Chiriac, Elena Jaszkowski, Carolin Deiß, Hannah Nürnberg, Katrin Hörth, Torsten Hoppe-Tichy, Kim Green","doi":"10.1136/ejhpharm-2023-003750","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>To investigate the performance and risk associated with the usage of Chat Generative Pre-trained Transformer (ChatGPT) to answer drug-related questions.</p><p><strong>Methods: </strong>A sample of 50 drug-related questions were consecutively collected and entered in the artificial intelligence software application ChatGPT. Answers were documented and rated in a standardised consensus process by six senior hospital pharmacists in the domains content (correct, incomplete, false), patient management (possible, insufficient, not possible) and risk (no risk, low risk, high risk). As reference, answers were researched in adherence to the German guideline of drug information and stratified in four categories according to the sources used. In addition, the reproducibility of ChatGPT's answers was analysed by entering three questions at different timepoints repeatedly (day 1, day 2, week 2, week 3).</p><p><strong>Results: </strong>Overall, only 13 of 50 answers provided correct content and had enough information to initiate management with no risk of patient harm. The majority of answers were either false (38%, n=19) or had partly correct content (36%, n=18) and no references were provided. A high risk of patient harm was likely in 26% (n=13) of the cases and risk was judged low for 28% (n=14) of the cases. In all high-risk cases, actions could have been initiated based on the provided information. The answers of ChatGPT varied over time when entered repeatedly and only three out of 12 answers were identical, showing no reproducibility to low reproducibility.</p><p><strong>Conclusion: </strong>In a real-world sample of 50 drug-related questions, ChatGPT answered the majority of questions wrong or partly wrong. The use of artificial intelligence applications in drug information is not possible as long as barriers like wrong content, missing references and reproducibility remain.</p>","PeriodicalId":12050,"journal":{"name":"European journal of hospital pharmacy : science and practice","volume":" ","pages":"491-497"},"PeriodicalIF":1.6000,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European journal of hospital pharmacy : science and practice","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/ejhpharm-2023-003750","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: To investigate the performance and risk associated with the usage of Chat Generative Pre-trained Transformer (ChatGPT) to answer drug-related questions.
Methods: A sample of 50 drug-related questions were consecutively collected and entered in the artificial intelligence software application ChatGPT. Answers were documented and rated in a standardised consensus process by six senior hospital pharmacists in the domains content (correct, incomplete, false), patient management (possible, insufficient, not possible) and risk (no risk, low risk, high risk). As reference, answers were researched in adherence to the German guideline of drug information and stratified in four categories according to the sources used. In addition, the reproducibility of ChatGPT's answers was analysed by entering three questions at different timepoints repeatedly (day 1, day 2, week 2, week 3).
Results: Overall, only 13 of 50 answers provided correct content and had enough information to initiate management with no risk of patient harm. The majority of answers were either false (38%, n=19) or had partly correct content (36%, n=18) and no references were provided. A high risk of patient harm was likely in 26% (n=13) of the cases and risk was judged low for 28% (n=14) of the cases. In all high-risk cases, actions could have been initiated based on the provided information. The answers of ChatGPT varied over time when entered repeatedly and only three out of 12 answers were identical, showing no reproducibility to low reproducibility.
Conclusion: In a real-world sample of 50 drug-related questions, ChatGPT answered the majority of questions wrong or partly wrong. The use of artificial intelligence applications in drug information is not possible as long as barriers like wrong content, missing references and reproducibility remain.
期刊介绍:
European Journal of Hospital Pharmacy (EJHP) offers a high quality, peer-reviewed platform for the publication of practical and innovative research which aims to strengthen the profile and professional status of hospital pharmacists. EJHP is committed to being the leading journal on all aspects of hospital pharmacy, thereby advancing the science, practice and profession of hospital pharmacy. The journal aims to become a major source for education and inspiration to improve practice and the standard of patient care in hospitals and related institutions worldwide.
EJHP is the only official journal of the European Association of Hospital Pharmacists.