Establishing a novel score system and using it to assess and compare the quality of ChatGPT-4 consultation with physician consultation for obstetrics and gynecology: A pilot study

IF 2.4 3区医学 Q2 OBSTETRICS & GYNECOLOGY International Journal of Gynecology & Obstetrics Pub Date : 2024-09-28 DOI:10.1002/ijgo.15934

Lan Lan, Ling Yang, Jinyan Li, Jia Hou, Yunsheng Yan, Yaozong Zhang

{"title":"Establishing a novel score system and using it to assess and compare the quality of ChatGPT-4 consultation with physician consultation for obstetrics and gynecology: A pilot study","authors":"Lan Lan, Ling Yang, Jinyan Li, Jia Hou, Yunsheng Yan, Yaozong Zhang","doi":"10.1002/ijgo.15934","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Objectives</h3>\n \n <p>In the current study, we aimed to establish a quantified scoring system for evaluating consultation quality. Subsequently, using the score system to assess the quality of ChatGPT-4 consultations, we compared them with physician consultations when presented with the same clinical cases from obstetrics and gynecology.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>This study was conducted in the Women and Children's Hospital of Chongqing Medical University, a tertiary-care hospital with approximately 16 000–20 000 deliveries and 8500–12 000 gynecologic surgeries per year. The detailed data from obstetric and gynecologic medical records were analyzed by ChatGPT-4 and physicians; the consultation opinions were then generated respectively. All consultation opinions were graded by eight junior doctors using the novel score system; subsequently, the correlation, agreement, and comparison between the two types of consultation opinions were then evaluated.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>A total of 100 medical records from obstetrics and 100 medical records from gynecology were randomly selected. Pearson correlation analysis suggested a noncorrelation or weak correlation between consultations from ChatGPT-4 and physicians. Bland–Altman plot showed an unacceptable agreement between the two types of consultation opinions. Paired <i>t</i> tests showed that the scores of physician consultations were significantly higher than those generated by ChatGPT-4 in both obstetric and gynecologic patients.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>At present, ChatGPT-4 may not be a substitute for physicians in consultations for obstetric and gynecologic patients. Therefore, it is crucial to pay careful attention and conduct ongoing evaluations to ensure the quality of consultation opinions generated by ChatGPT-4.</p>\n </section>\n </div>","PeriodicalId":14164,"journal":{"name":"International Journal of Gynecology & Obstetrics","volume":"168 3","pages":"1251-1257"},"PeriodicalIF":2.4000,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Gynecology & Obstetrics","FirstCategoryId":"3","ListUrlMain":"https://obgyn.onlinelibrary.wiley.com/doi/10.1002/ijgo.15934","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OBSTETRICS & GYNECOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives

In the current study, we aimed to establish a quantified scoring system for evaluating consultation quality. Subsequently, using the score system to assess the quality of ChatGPT-4 consultations, we compared them with physician consultations when presented with the same clinical cases from obstetrics and gynecology.

Methods

This study was conducted in the Women and Children's Hospital of Chongqing Medical University, a tertiary-care hospital with approximately 16 000–20 000 deliveries and 8500–12 000 gynecologic surgeries per year. The detailed data from obstetric and gynecologic medical records were analyzed by ChatGPT-4 and physicians; the consultation opinions were then generated respectively. All consultation opinions were graded by eight junior doctors using the novel score system; subsequently, the correlation, agreement, and comparison between the two types of consultation opinions were then evaluated.

Results

A total of 100 medical records from obstetrics and 100 medical records from gynecology were randomly selected. Pearson correlation analysis suggested a noncorrelation or weak correlation between consultations from ChatGPT-4 and physicians. Bland–Altman plot showed an unacceptable agreement between the two types of consultation opinions. Paired t tests showed that the scores of physician consultations were significantly higher than those generated by ChatGPT-4 in both obstetric and gynecologic patients.

Conclusion

At present, ChatGPT-4 may not be a substitute for physicians in consultations for obstetric and gynecologic patients. Therefore, it is crucial to pay careful attention and conduct ongoing evaluations to ensure the quality of consultation opinions generated by ChatGPT-4.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

建立一个新的评分系统，并用它来评估和比较 ChatGPT-4 咨询与妇产科医生咨询的质量：试点研究。

研究目的在本研究中，我们旨在建立一套量化的咨询质量评估评分系统。随后，我们使用该评分系统评估 ChatGPT-4 咨询质量，并将其与妇产科医生在遇到相同临床病例时的咨询进行比较：本研究在重庆医科大学附属妇女儿童医院进行，该医院为三级甲等医院，每年约有 16000-20000 例分娩和 8500-12000 例妇科手术。由 ChatGPT-4 和医生对产科和妇科病历的详细数据进行分析，然后分别得出会诊意见。所有会诊意见均由 8 名初级医生使用新颖的评分系统进行评分，然后对两种会诊意见之间的相关性、一致性和对比性进行评估：结果：随机抽取了 100 份产科病历和 100 份妇科病历。皮尔逊相关分析表明，ChatGPT-4 和医生的咨询意见之间不相关或相关性较弱。Bland-Altman图显示，两种咨询意见的一致性无法接受。配对 t 检验显示，在产科和妇科患者中，医生会诊的评分明显高于 ChatGPT-4 得出的评分：结论：目前，ChatGPT-4 可能无法替代医生对妇产科患者进行会诊。结论：目前，ChatGPT-4 可能无法替代医生为妇产科患者提供会诊服务，因此，仔细关注并持续评估 ChatGPT-4 生成的会诊意见的质量至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal of Gynecology & Obstetrics 医学-妇产科学

CiteScore

5.80

自引率

2.60%

发文量

493

审稿时长

3-6 weeks

期刊介绍： The International Journal of Gynecology & Obstetrics publishes articles on all aspects of basic and clinical research in the fields of obstetrics and gynecology and related subjects, with emphasis on matters of worldwide interest.