Nur Ozturk , Irem Yakak , Melih Buğra Ağ , Nilay Aksoy
{"title":"Is ChatGPT reliable and accurate in answering pharmacotherapy-related inquiries in both Turkish and English?","authors":"Nur Ozturk , Irem Yakak , Melih Buğra Ağ , Nilay Aksoy","doi":"10.1016/j.cptl.2024.04.017","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><p>Artificial intelligence (AI), particularly ChatGPT, is becoming more and more prevalent in the healthcare field for tasks such as disease diagnosis and medical record analysis. The objective of this study is to evaluate the proficiency and accuracy of ChatGPT in different domains of clinical pharmacy cases and queries.</p></div><div><h3>Methods</h3><p>The study NAPLEX® Review Questions, 4th edition, pertaining to 10 different chronic conditions compared ChatGPT's responses to pharmacotherapy cases and questions obtained from McGraw Hill's, alongside the answers provided by the book's authors. The proportion of correct responses was collected and analyzed using the Statistical Package for the Social Sciences (SPSS) version 29.</p></div><div><h3>Results</h3><p>When tested in English, ChatGPT had substantially higher mean scores than when tested in Turkish. The average accurate score for English and Turkish was 0.41 ± 0.49 and 0.32 ± 0.46, respectively, <em>p</em> = 0.18. Responses to queries beginning with “Which of the following is correct?” are considerably more precise than those beginning with “Mark all the incorrect answers?” 0.66 ± 0.47 as opposed to 0.16 ± 0.36; <em>p</em> = 0.01 in English language and 0.50 ± 0.50 as opposed to 0.14 ± 0.34; <em>p</em> < 0.05in Turkish language.</p></div><div><h3>Conclusion</h3><p>ChatGPT displayed a moderate level of accuracy while responding to English inquiries, but it displayed a slight level of accuracy when responding to Turkish inquiries, contingent upon the question format. Improving the accuracy of ChatGPT in languages other than English requires the incorporation of several components. The integration of the English version of ChatGPT into clinical practice has the potential to improve the effectiveness, precision, and standard of patient care provision by supplementing personal expertise and professional judgment. However, it is crucial to utilize technology as an adjunct and not a replacement for human decision-making and critical thinking.</p></div>","PeriodicalId":47501,"journal":{"name":"Currents in Pharmacy Teaching and Learning","volume":"16 7","pages":"Article 102101"},"PeriodicalIF":1.3000,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Currents in Pharmacy Teaching and Learning","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1877129724001205","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction
Artificial intelligence (AI), particularly ChatGPT, is becoming more and more prevalent in the healthcare field for tasks such as disease diagnosis and medical record analysis. The objective of this study is to evaluate the proficiency and accuracy of ChatGPT in different domains of clinical pharmacy cases and queries.
Methods
The study NAPLEX® Review Questions, 4th edition, pertaining to 10 different chronic conditions compared ChatGPT's responses to pharmacotherapy cases and questions obtained from McGraw Hill's, alongside the answers provided by the book's authors. The proportion of correct responses was collected and analyzed using the Statistical Package for the Social Sciences (SPSS) version 29.
Results
When tested in English, ChatGPT had substantially higher mean scores than when tested in Turkish. The average accurate score for English and Turkish was 0.41 ± 0.49 and 0.32 ± 0.46, respectively, p = 0.18. Responses to queries beginning with “Which of the following is correct?” are considerably more precise than those beginning with “Mark all the incorrect answers?” 0.66 ± 0.47 as opposed to 0.16 ± 0.36; p = 0.01 in English language and 0.50 ± 0.50 as opposed to 0.14 ± 0.34; p < 0.05in Turkish language.
Conclusion
ChatGPT displayed a moderate level of accuracy while responding to English inquiries, but it displayed a slight level of accuracy when responding to Turkish inquiries, contingent upon the question format. Improving the accuracy of ChatGPT in languages other than English requires the incorporation of several components. The integration of the English version of ChatGPT into clinical practice has the potential to improve the effectiveness, precision, and standard of patient care provision by supplementing personal expertise and professional judgment. However, it is crucial to utilize technology as an adjunct and not a replacement for human decision-making and critical thinking.