Efficacy of large language models and their potential in Obstetrics and Gynecology education.

IF 2 Q2 OBSTETRICS & GYNECOLOGY Obstetrics and Gynecology Science Pub Date : 2024-11-01 Epub Date: 2024-10-02 DOI:10.5468/ogs.24211
Kyung Jin Eoh, Gu Yeun Kwon, Eun Jin Lee, JoonHo Lee, Inha Lee, Young Tae Kim, Eun Ji Nam
{"title":"Efficacy of large language models and their potential in Obstetrics and Gynecology education.","authors":"Kyung Jin Eoh, Gu Yeun Kwon, Eun Jin Lee, JoonHo Lee, Inha Lee, Young Tae Kim, Eun Ji Nam","doi":"10.5468/ogs.24211","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>The performance of large language models (LLMs) and their potential utility in obstetric and gynecological education are topics of ongoing debate. This study aimed to contribute to this discussion by examining the recent advancements in LLM technology and their transformative potential in artificial intelligence.</p><p><strong>Methods: </strong>This study assessed the performance of generative pre-trained transformer (GPT)-3.5 and -4 in understanding clinical information, as well as its potential implications for obstetric and gynecological education. Obstetrics and gynecology residents at three hospitals underwent an annual promotional examination, from which 116 of the 170 questions over 4 years (2020-2023) were analyzed, excluding 54 questions with images. The scores achieved by GPT-3.5, -4, and the 100 residents were compared.</p><p><strong>Results: </strong>The average scores across all 4 years for GPT-3.5 and -4 were 38.79 (standard deviation [SD], 5.65) and 79.31 (SD, 3.67), respectively. For groups first-year resident, second-year resident, and third-year resident, the cumulative annual average scores were 79.12 (SD, 9.00), 80.95 (SD, 5.86), and 83.60 (SD, 6.82), respectively. No statistically significant differences were observed between the scores of GPT-4.0 and those of the residents. When analyzing questions specific to obstetrics, the average scores for GPT-3.5 and -4.0 were 33.44 (SD, 10.18) and 90.22 (SD, 7.68), respectively.</p><p><strong>Conclusion: </strong>GPT-4 demonstrated exceptional performance in obstetrics, different types of data interpretation, and problem solving, showcasing the potential utility of LLMs in these areas. However, acknowledging the constraints of LLMs is crucial and their utilization should augment human expertise and discernment.</p>","PeriodicalId":37602,"journal":{"name":"Obstetrics and Gynecology Science","volume":" ","pages":"550-556"},"PeriodicalIF":2.0000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Obstetrics and Gynecology Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5468/ogs.24211","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/2 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"OBSTETRICS & GYNECOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: The performance of large language models (LLMs) and their potential utility in obstetric and gynecological education are topics of ongoing debate. This study aimed to contribute to this discussion by examining the recent advancements in LLM technology and their transformative potential in artificial intelligence.

Methods: This study assessed the performance of generative pre-trained transformer (GPT)-3.5 and -4 in understanding clinical information, as well as its potential implications for obstetric and gynecological education. Obstetrics and gynecology residents at three hospitals underwent an annual promotional examination, from which 116 of the 170 questions over 4 years (2020-2023) were analyzed, excluding 54 questions with images. The scores achieved by GPT-3.5, -4, and the 100 residents were compared.

Results: The average scores across all 4 years for GPT-3.5 and -4 were 38.79 (standard deviation [SD], 5.65) and 79.31 (SD, 3.67), respectively. For groups first-year resident, second-year resident, and third-year resident, the cumulative annual average scores were 79.12 (SD, 9.00), 80.95 (SD, 5.86), and 83.60 (SD, 6.82), respectively. No statistically significant differences were observed between the scores of GPT-4.0 and those of the residents. When analyzing questions specific to obstetrics, the average scores for GPT-3.5 and -4.0 were 33.44 (SD, 10.18) and 90.22 (SD, 7.68), respectively.

Conclusion: GPT-4 demonstrated exceptional performance in obstetrics, different types of data interpretation, and problem solving, showcasing the potential utility of LLMs in these areas. However, acknowledging the constraints of LLMs is crucial and their utilization should augment human expertise and discernment.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
大语言模型的功效及其在妇产科教育中的潜力。
目的:大型语言模型(LLMs)的性能及其在妇产科教育中的潜在用途一直是争论不休的话题。本研究旨在通过研究 LLM 技术的最新进展及其在人工智能领域的变革潜力,为这一讨论做出贡献:本研究评估了生成式预训练变换器(GPT)-3.5 和-4 在理解临床信息方面的表现,以及其对妇产科教育的潜在影响。三家医院的妇产科住院医师参加了每年一次的晋升考试,对其中四年(2020-2023 年)170 道题中的 116 道题进行了分析,不包括 54 道带图像的题目。对 GPT-3.5、-4 和 100 名住院医师的得分进行了比较:GPT-3.5和-4的4年平均得分分别为38.79(标准差[SD],5.65)和79.31(标准差,3.67)。R1、R2 和 R3 组的累积年平均得分分别为 79.12(标准差,9.00)、80.95(标准差,5.86)和 83.60(标准差,6.82)。GPT-4.0 的得分与住院医师的得分在统计学上无明显差异。在分析产科的具体问题时,GPT-3.5 和 -4.0 的平均得分分别为 33.44(标准差,10.18)和 90.22(标准差,7.68):GPT-4 在产科、不同类型的数据解读和问题解决方面表现优异,显示了 LLM 在这些领域的潜在作用。不过,认识到 LLM 的局限性至关重要,使用 LLM 应增强人类的专业知识和辨别力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Obstetrics and Gynecology Science
Obstetrics and Gynecology Science Medicine-Obstetrics and Gynecology
CiteScore
3.80
自引率
15.80%
发文量
58
审稿时长
16 weeks
期刊介绍: Obstetrics & Gynecology Science (NLM title: Obstet Gynecol Sci) is an international peer-review journal that published basic, translational, clinical research, and clinical practice guideline to promote women’s health and prevent obstetric and gynecologic disorders. The journal has an international editorial board and is published in English on the 15th day of every other month. Submitted manuscripts should not contain previously published material and should not be under consideration for publication elsewhere. The journal has been publishing articles since 1958. The aim of the journal is to publish original articles, reviews, case reports, short communications, letters to the editor, and video articles that have the potential to change the practices in women''s health care. The journal’s main focus is the diagnosis, treatment, prediction, and prevention of obstetric and gynecologic disorders. Because the life expectancy of Korean and Asian women is increasing, the journal''s editors are particularly interested in the health of elderly women in these population groups. The journal also publishes articles about reproductive biology, stem cell research, and artificial intelligence research for women; additionally, it provides insights into the physiology and mechanisms of obstetric and gynecologic diseases.
期刊最新文献
Current approach of patients with Mayer-Rokitansky-Küster-Hauser syndrome. Usefulness and limitations of ChatGPT in getting information on teratogenic drugs exposed in pregnancy. Creation of neovagina in women with Müllerian agenesis (Mayer-Rokitansky-Küster-Hauser syndrome) using fresh human amnion. Lenvatinib and pembrolizumab versus platinum doublet chemotherapy as second-line therapy for advanced or recurrent endometrial cancer. Navigating the thyroid-gynecologic interplay: a systematic review and meta-analysis.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1