Accuracy of Spanish and English-generated ChatGPT responses to commonly asked patient questions about labor epidurals: a survey-based study among bilingual obstetric anesthesia experts
Antonio Gonzalez Fiol , Allison A. Mootz , Zili He , Carlos Delgado , Vilma Ortiz , Sharon C. Reale
{"title":"Accuracy of Spanish and English-generated ChatGPT responses to commonly asked patient questions about labor epidurals: a survey-based study among bilingual obstetric anesthesia experts","authors":"Antonio Gonzalez Fiol , Allison A. Mootz , Zili He , Carlos Delgado , Vilma Ortiz , Sharon C. Reale","doi":"10.1016/j.ijoa.2024.104290","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Large language models (LLMs), of which ChatGPT is the most well known, are now available to patients to seek medical advice in various languages. However, the accuracy of the information utilized to train these models remains unknown.</div></div><div><h3>Methods</h3><div>Ten commonly asked questions regarding labor epidurals were translated from English to Spanish, and all 20 questions were entered into ChatGPT version 3.5. The answers were transcribed. A survey was then sent to 10 bilingual fellowship-trained obstetric anesthesiologists to assess the accuracy of these answers utilizing a 5-point Likert scale.</div></div><div><h3>Results</h3><div>Overall, the accuracy scores for the ChatGPT-generated answers in Spanish were lower than for the English answers with a median score of 34 (IQR 33–36.5) versus 40.5 (IQR 39–44.3), respectively (<em>P</em> value 0.02). Answers to two questions were scored significantly lower: “Do epidurals prolong labor?” (2 (IQR 2–2.5) versus 4 (IQR 4–4.5), <em>P</em> value 0.03) and “Do epidurals increase the risk of needing cesarean delivery?” (3(IQR 2–4) versus 4 (IQR 4–5); P value 0.03). There was a strong agreement that answers to the question “Do epidurals cause autism” were accurate in both Spanish and English.</div></div><div><h3>Conclusion</h3><div>ChatGPT-generated answers in Spanish to ten questions about labor epidurals scored lower for accuracy<!--> <!-->than<!--> <!-->answers generated in English, particularly regarding the effect of labor epidurals on labor course and mode of delivery. This disparity in ChatGPT-generated information may extend already-known health inequities among non-English-speaking patients and perpetuate misinformation.</div></div>","PeriodicalId":14250,"journal":{"name":"International journal of obstetric anesthesia","volume":"61 ","pages":"Article 104290"},"PeriodicalIF":2.6000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of obstetric anesthesia","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0959289X24003029","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ANESTHESIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Large language models (LLMs), of which ChatGPT is the most well known, are now available to patients to seek medical advice in various languages. However, the accuracy of the information utilized to train these models remains unknown.
Methods
Ten commonly asked questions regarding labor epidurals were translated from English to Spanish, and all 20 questions were entered into ChatGPT version 3.5. The answers were transcribed. A survey was then sent to 10 bilingual fellowship-trained obstetric anesthesiologists to assess the accuracy of these answers utilizing a 5-point Likert scale.
Results
Overall, the accuracy scores for the ChatGPT-generated answers in Spanish were lower than for the English answers with a median score of 34 (IQR 33–36.5) versus 40.5 (IQR 39–44.3), respectively (P value 0.02). Answers to two questions were scored significantly lower: “Do epidurals prolong labor?” (2 (IQR 2–2.5) versus 4 (IQR 4–4.5), P value 0.03) and “Do epidurals increase the risk of needing cesarean delivery?” (3(IQR 2–4) versus 4 (IQR 4–5); P value 0.03). There was a strong agreement that answers to the question “Do epidurals cause autism” were accurate in both Spanish and English.
Conclusion
ChatGPT-generated answers in Spanish to ten questions about labor epidurals scored lower for accuracy than answers generated in English, particularly regarding the effect of labor epidurals on labor course and mode of delivery. This disparity in ChatGPT-generated information may extend already-known health inequities among non-English-speaking patients and perpetuate misinformation.
期刊介绍:
The International Journal of Obstetric Anesthesia is the only journal publishing original articles devoted exclusively to obstetric anesthesia and bringing together all three of its principal components; anesthesia care for operative delivery and the perioperative period, pain relief in labour and care of the critically ill obstetric patient.
• Original research (both clinical and laboratory), short reports and case reports will be considered.
• The journal also publishes invited review articles and debates on topical and controversial subjects in the area of obstetric anesthesia.
• Articles on related topics such as perinatal physiology and pharmacology and all subjects of importance to obstetric anaesthetists/anesthesiologists are also welcome.
The journal is peer-reviewed by international experts. Scholarship is stressed to include the focus on discovery, application of knowledge across fields, and informing the medical community. Through the peer-review process, we hope to attest to the quality of scholarships and guide the Journal to extend and transform knowledge in this important and expanding area.