Efficacy of large language models and their potential in Obstetrics and Gynecology education.

IF 2 Q2 OBSTETRICS & GYNECOLOGY Obstetrics and Gynecology Science Pub Date : 2024-11-01 Epub Date: 2024-10-02 DOI:10.5468/ogs.24211

Kyung Jin Eoh, Gu Yeun Kwon, Eun Jin Lee, JoonHo Lee, Inha Lee, Young Tae Kim, Eun Ji Nam

{"title":"Efficacy of large language models and their potential in Obstetrics and Gynecology education.","authors":"Kyung Jin Eoh, Gu Yeun Kwon, Eun Jin Lee, JoonHo Lee, Inha Lee, Young Tae Kim, Eun Ji Nam","doi":"10.5468/ogs.24211","DOIUrl":null,"url":null,"abstract":"Objective: The performance of large language models (LLMs) and their potential utility in obstetric and gynecological education are topics of ongoing debate. This study aimed to contribute to this discussion by examining the recent advancements in LLM technology and their transformative potential in artificial intelligence.Methods: This study assessed the performance of generative pre-trained transformer (GPT)-3.5 and -4 in understanding clinical information, as well as its potential implications for obstetric and gynecological education. Obstetrics and gynecology residents at three hospitals underwent an annual promotional examination, from which 116 of the 170 questions over 4 years (2020-2023) were analyzed, excluding 54 questions with images. The scores achieved by GPT-3.5, -4, and the 100 residents were compared.Results: The average scores across all 4 years for GPT-3.5 and -4 were 38.79 (standard deviation [SD], 5.65) and 79.31 (SD, 3.67), respectively. For groups first-year resident, second-year resident, and third-year resident, the cumulative annual average scores were 79.12 (SD, 9.00), 80.95 (SD, 5.86), and 83.60 (SD, 6.82), respectively. No statistically significant differences were observed between the scores of GPT-4.0 and those of the residents. When analyzing questions specific to obstetrics, the average scores for GPT-3.5 and -4.0 were 33.44 (SD, 10.18) and 90.22 (SD, 7.68), respectively.Conclusion: GPT-4 demonstrated exceptional performance in obstetrics, different types of data interpretation, and problem solving, showcasing the potential utility of LLMs in these areas. However, acknowledging the constraints of LLMs is crucial and their utilization should augment human expertise and discernment.","PeriodicalId":37602,"journal":{"name":"Obstetrics and Gynecology Science","volume":" ","pages":"550-556"},"PeriodicalIF":2.0000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11581811/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Obstetrics and Gynecology Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5468/ogs.24211","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/2 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"OBSTETRICS & GYNECOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: The performance of large language models (LLMs) and their potential utility in obstetric and gynecological education are topics of ongoing debate. This study aimed to contribute to this discussion by examining the recent advancements in LLM technology and their transformative potential in artificial intelligence.

Methods: This study assessed the performance of generative pre-trained transformer (GPT)-3.5 and -4 in understanding clinical information, as well as its potential implications for obstetric and gynecological education. Obstetrics and gynecology residents at three hospitals underwent an annual promotional examination, from which 116 of the 170 questions over 4 years (2020-2023) were analyzed, excluding 54 questions with images. The scores achieved by GPT-3.5, -4, and the 100 residents were compared.

Results: The average scores across all 4 years for GPT-3.5 and -4 were 38.79 (standard deviation [SD], 5.65) and 79.31 (SD, 3.67), respectively. For groups first-year resident, second-year resident, and third-year resident, the cumulative annual average scores were 79.12 (SD, 9.00), 80.95 (SD, 5.86), and 83.60 (SD, 6.82), respectively. No statistically significant differences were observed between the scores of GPT-4.0 and those of the residents. When analyzing questions specific to obstetrics, the average scores for GPT-3.5 and -4.0 were 33.44 (SD, 10.18) and 90.22 (SD, 7.68), respectively.

Conclusion: GPT-4 demonstrated exceptional performance in obstetrics, different types of data interpretation, and problem solving, showcasing the potential utility of LLMs in these areas. However, acknowledging the constraints of LLMs is crucial and their utilization should augment human expertise and discernment.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

大语言模型的功效及其在妇产科教育中的潜力。

目的：大型语言模型（LLMs）的性能及其在妇产科教育中的潜在用途一直是争论不休的话题。本研究旨在通过研究 LLM 技术的最新进展及其在人工智能领域的变革潜力，为这一讨论做出贡献：本研究评估了生成式预训练变换器（GPT）-3.5 和-4 在理解临床信息方面的表现，以及其对妇产科教育的潜在影响。三家医院的妇产科住院医师参加了每年一次的晋升考试，对其中四年（2020-2023 年）170 道题中的 116 道题进行了分析，不包括 54 道带图像的题目。对 GPT-3.5、-4 和 100 名住院医师的得分进行了比较：GPT-3.5和-4的4年平均得分分别为38.79（标准差[SD]，5.65）和79.31（标准差，3.67）。R1、R2 和 R3 组的累积年平均得分分别为 79.12（标准差，9.00）、80.95（标准差，5.86）和 83.60（标准差，6.82）。GPT-4.0 的得分与住院医师的得分在统计学上无明显差异。在分析产科的具体问题时，GPT-3.5 和 -4.0 的平均得分分别为 33.44（标准差，10.18）和 90.22（标准差，7.68）：GPT-4 在产科、不同类型的数据解读和问题解决方面表现优异，显示了 LLM 在这些领域的潜在作用。不过，认识到 LLM 的局限性至关重要，使用 LLM 应增强人类的专业知识和辨别力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Obstetrics and Gynecology Science Medicine-Obstetrics and Gynecology

CiteScore

3.80

自引率

15.80%

发文量

审稿时长

16 weeks

期刊介绍： Obstetrics & Gynecology Science (NLM title: Obstet Gynecol Sci) is an international peer-review journal that published basic, translational, clinical research, and clinical practice guideline to promote women’s health and prevent obstetric and gynecologic disorders. The journal has an international editorial board and is published in English on the 15th day of every other month. Submitted manuscripts should not contain previously published material and should not be under consideration for publication elsewhere. The journal has been publishing articles since 1958. The aim of the journal is to publish original articles, reviews, case reports, short communications, letters to the editor, and video articles that have the potential to change the practices in women''s health care. The journal’s main focus is the diagnosis, treatment, prediction, and prevention of obstetric and gynecologic disorders. Because the life expectancy of Korean and Asian women is increasing, the journal''s editors are particularly interested in the health of elderly women in these population groups. The journal also publishes articles about reproductive biology, stem cell research, and artificial intelligence research for women; additionally, it provides insights into the physiology and mechanisms of obstetric and gynecologic diseases.