Establishing a novel score system and using it to assess and compare the quality of ChatGPT-4 consultation with physician consultation for obstetrics and gynecology: A pilot study.
{"title":"Establishing a novel score system and using it to assess and compare the quality of ChatGPT-4 consultation with physician consultation for obstetrics and gynecology: A pilot study.","authors":"Lan Lan, Ling Yang, Jinyan Li, Jia Hou, Yunsheng Yan, Yaozong Zhang","doi":"10.1002/ijgo.15934","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>In the current study, we aimed to establish a quantified scoring system for evaluating consultation quality. Subsequently, using the score system to assess the quality of ChatGPT-4 consultations, we compared them with physician consultations when presented with the same clinical cases from obstetrics and gynecology.</p><p><strong>Methods: </strong>This study was conducted in the Women and Children's Hospital of Chongqing Medical University, a tertiary-care hospital with approximately 16 000-20 000 deliveries and 8500-12 000 gynecologic surgeries per year. The detailed data from obstetric and gynecologic medical records were analyzed by ChatGPT-4 and physicians; the consultation opinions were then generated respectively. All consultation opinions were graded by eight junior doctors using the novel score system; subsequently, the correlation, agreement, and comparison between the two types of consultation opinions were then evaluated.</p><p><strong>Results: </strong>A total of 100 medical records from obstetrics and 100 medical records from gynecology were randomly selected. Pearson correlation analysis suggested a noncorrelation or weak correlation between consultations from ChatGPT-4 and physicians. Bland-Altman plot showed an unacceptable agreement between the two types of consultation opinions. Paired t tests showed that the scores of physician consultations were significantly higher than those generated by ChatGPT-4 in both obstetric and gynecologic patients.</p><p><strong>Conclusion: </strong>At present, ChatGPT-4 may not be a substitute for physicians in consultations for obstetric and gynecologic patients. Therefore, it is crucial to pay careful attention and conduct ongoing evaluations to ensure the quality of consultation opinions generated by ChatGPT-4.</p>","PeriodicalId":14164,"journal":{"name":"International Journal of Gynecology & Obstetrics","volume":null,"pages":null},"PeriodicalIF":2.6000,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Gynecology & Obstetrics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/ijgo.15934","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OBSTETRICS & GYNECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: In the current study, we aimed to establish a quantified scoring system for evaluating consultation quality. Subsequently, using the score system to assess the quality of ChatGPT-4 consultations, we compared them with physician consultations when presented with the same clinical cases from obstetrics and gynecology.
Methods: This study was conducted in the Women and Children's Hospital of Chongqing Medical University, a tertiary-care hospital with approximately 16 000-20 000 deliveries and 8500-12 000 gynecologic surgeries per year. The detailed data from obstetric and gynecologic medical records were analyzed by ChatGPT-4 and physicians; the consultation opinions were then generated respectively. All consultation opinions were graded by eight junior doctors using the novel score system; subsequently, the correlation, agreement, and comparison between the two types of consultation opinions were then evaluated.
Results: A total of 100 medical records from obstetrics and 100 medical records from gynecology were randomly selected. Pearson correlation analysis suggested a noncorrelation or weak correlation between consultations from ChatGPT-4 and physicians. Bland-Altman plot showed an unacceptable agreement between the two types of consultation opinions. Paired t tests showed that the scores of physician consultations were significantly higher than those generated by ChatGPT-4 in both obstetric and gynecologic patients.
Conclusion: At present, ChatGPT-4 may not be a substitute for physicians in consultations for obstetric and gynecologic patients. Therefore, it is crucial to pay careful attention and conduct ongoing evaluations to ensure the quality of consultation opinions generated by ChatGPT-4.
期刊介绍:
The International Journal of Gynecology & Obstetrics publishes articles on all aspects of basic and clinical research in the fields of obstetrics and gynecology and related subjects, with emphasis on matters of worldwide interest.