Readability, quality and accuracy of generative artificial intelligence chatbots for commonly asked questions about labor epidurals: a comparison of ChatGPT and Bard

IF 2.6 3区 医学 Q2 ANESTHESIOLOGY International journal of obstetric anesthesia Pub Date : 2025-02-01 DOI:10.1016/j.ijoa.2024.104317
D. Lee, M. Brown, J. Hammond, M. Zakowski
{"title":"Readability, quality and accuracy of generative artificial intelligence chatbots for commonly asked questions about labor epidurals: a comparison of ChatGPT and Bard","authors":"D. Lee,&nbsp;M. Brown,&nbsp;J. Hammond,&nbsp;M. Zakowski","doi":"10.1016/j.ijoa.2024.104317","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Over 90% of pregnant women and 76% expectant fathers search for pregnancy health information. We examined readability, accuracy and quality of answers to common obstetric anesthesia questions from the popular generative artificial intelligence (AI) chatbots ChatGPT and Bard.</div></div><div><h3>Methods</h3><div>Twenty questions for generative AI chatbots were derived from frequently asked questions based on professional society, hospital and consumer websites. ChatGPT and Bard were queried in November 2023. Answers were graded for accuracy by four obstetric anesthesiologists. Quality was measured using Patient Education Materials Assessment Tool for Print (PEMAT). Readability was measured using six readability indices. Accuracy, quality and readability were compared using independent <em>t</em>-test.</div></div><div><h3>Results</h3><div>Bard readability scores were high school level, significantly easier than ChatGPT’s college level by all scoring metrics (<em>P</em> &lt;0.001). Bard had significantly longer answers (<em>P</em> &lt;0.001), yet with similar accuracy of Bard (85 % ± 10) and ChatGPT (87 % ± 14) (<em>P</em> = 0.5). PEMAT understandability scores were no statistically significantly different (<em>P</em> = 0.06). Actionability by PEMAT scores for Bard was significantly higher (22% vs. 9%) than ChatGPT (<em>P</em> = 0.007)</div></div><div><h3>Conclusion</h3><div>Answers to questions about “labor epidurals” should be accurate, high quality, and easy to read. Bard at high school reading level, was well above the goal 4<sup>th</sup> to 6<sup>th</sup> grade level suggested for patient materials. Consumers, health care providers, hospitals and governmental agencies should be aware of the quality of information generated by chatbots. Chatbots should meet the standards for readability and understandability of health-related questions, to aid public understanding and enhance shared decision-making.</div></div>","PeriodicalId":14250,"journal":{"name":"International journal of obstetric anesthesia","volume":"61 ","pages":"Article 104317"},"PeriodicalIF":2.6000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of obstetric anesthesia","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0959289X24003297","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ANESTHESIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction

Over 90% of pregnant women and 76% expectant fathers search for pregnancy health information. We examined readability, accuracy and quality of answers to common obstetric anesthesia questions from the popular generative artificial intelligence (AI) chatbots ChatGPT and Bard.

Methods

Twenty questions for generative AI chatbots were derived from frequently asked questions based on professional society, hospital and consumer websites. ChatGPT and Bard were queried in November 2023. Answers were graded for accuracy by four obstetric anesthesiologists. Quality was measured using Patient Education Materials Assessment Tool for Print (PEMAT). Readability was measured using six readability indices. Accuracy, quality and readability were compared using independent t-test.

Results

Bard readability scores were high school level, significantly easier than ChatGPT’s college level by all scoring metrics (P <0.001). Bard had significantly longer answers (P <0.001), yet with similar accuracy of Bard (85 % ± 10) and ChatGPT (87 % ± 14) (P = 0.5). PEMAT understandability scores were no statistically significantly different (P = 0.06). Actionability by PEMAT scores for Bard was significantly higher (22% vs. 9%) than ChatGPT (P = 0.007)

Conclusion

Answers to questions about “labor epidurals” should be accurate, high quality, and easy to read. Bard at high school reading level, was well above the goal 4th to 6th grade level suggested for patient materials. Consumers, health care providers, hospitals and governmental agencies should be aware of the quality of information generated by chatbots. Chatbots should meet the standards for readability and understandability of health-related questions, to aid public understanding and enhance shared decision-making.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
生成式人工智能聊天机器人对硬膜外分娩常见问题的可读性、质量和准确性:ChatGPT和Bard的比较
导读:超过90%的孕妇和76%的准爸爸会搜索孕期健康信息。我们检查了流行的生成式人工智能(AI)聊天机器人ChatGPT和Bard对常见产科麻醉问题的回答的可读性、准确性和质量。方法:从专业学会、医院和消费者网站的常见问题中抽取20个生成式AI聊天机器人的问题。ChatGPT和Bard于2023年11月进行了查询。答案的准确性由四位产科麻醉师评分。使用患者教育材料评估工具(PEMAT)测量质量。使用6个可读性指标来测量可读性。准确性、质量和可读性采用独立t检验比较。结果:巴德的可读性分数是高中水平,明显比ChatGPT的大学水平更容易,所有评分指标(P至6年级建议的患者材料水平)。消费者、医疗保健提供者、医院和政府机构应该意识到聊天机器人产生的信息的质量。聊天机器人应符合健康相关问题的可读性和可理解性标准,以帮助公众理解和加强共同决策。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
相关文献
Polymorphism Detection of GDF9 Gene and Its Association with Litter Size in Luzhong Mutton Sheep (Ovis aries).
IF 3 ACS Applied Bio MaterialsPub Date : 2021-02-22 DOI: 10.3390/ani11020571
Fengyan Wang, Mingxing Chu, Linxiang Pan, Xiangyu Wang, Xiaoyun He, Rensen Zhang, Lin Tao, Yongfu La, Lin Ma, Ran Di
Association of Polymorphisms in Fecundity Genes of GDF9, BMP15 and BMP15-1B with Litter Size in Iranian Baluchi Sheep
IF 2.2 2区 农林科学Asian-Australasian Journal of Animal SciencesPub Date : 2011-08-24 DOI: 10.5713/AJAS.2011.10453
F. Moradband, G. Rahimi, M. Gholizadeh
Molecular Variants of FecB and BMP15 Fecundity Genes in Sheep (Ovis aries)
IF 0 International Journal of Livestock ResearchPub Date : 2018-01-01 DOI: 10.5455/IJLR.20180116071418
Asharani Ad, Appannavar Mm, Yathish Hm, M. Hussain, V. Kasaralikar, M. Suranagi
来源期刊
CiteScore
4.70
自引率
7.10%
发文量
285
审稿时长
58 days
期刊介绍: The International Journal of Obstetric Anesthesia is the only journal publishing original articles devoted exclusively to obstetric anesthesia and bringing together all three of its principal components; anesthesia care for operative delivery and the perioperative period, pain relief in labour and care of the critically ill obstetric patient. • Original research (both clinical and laboratory), short reports and case reports will be considered. • The journal also publishes invited review articles and debates on topical and controversial subjects in the area of obstetric anesthesia. • Articles on related topics such as perinatal physiology and pharmacology and all subjects of importance to obstetric anaesthetists/anesthesiologists are also welcome. The journal is peer-reviewed by international experts. Scholarship is stressed to include the focus on discovery, application of knowledge across fields, and informing the medical community. Through the peer-review process, we hope to attest to the quality of scholarships and guide the Journal to extend and transform knowledge in this important and expanding area.
期刊最新文献
Environmental and occupational risks with use of nitrous oxide (Entonox®) for labour analgesia: a qualitative analysis of midwives’ attitudes in the United Kingdom Patient perspectives in obstetric anaesthesia reports and invaluable learning for clinicians – In response to “A decade of obstetric anaesthetic case reports publications: a focused review” Erratum to "Pre-oxygenation using high flow nasal oxygen or face mask oxygen in pregnant people - A prospective randomised controlled crossover non-inferiority study (The HINOP2 study)" [Int J Obstet Anesth 60 (2024) 104236]. Neonatal acid-base status before and after discontinuing routine left uterine displacement for elective cesarean delivery: a retrospective cohort study (2014–2017) Intrathecal morphine 100 µg versus 150 µg for post-cesarean delivery analgesia: a retrospective cohort study (2020–2022)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1