Gabriel Levin, Walter Gotlieb, Pedro Ramirez, Raanan Meyer, Yoav Brezinov
{"title":"妇科肿瘤多学科小组肿瘤委员会的 ChatGPT:可行性研究。","authors":"Gabriel Levin, Walter Gotlieb, Pedro Ramirez, Raanan Meyer, Yoav Brezinov","doi":"10.1111/1471-0528.17929","DOIUrl":null,"url":null,"abstract":"<p>The practical medical use of artificial intelligence is rapidly progressing. Specifically, the application of ChatGPT was explored in medical education and even medical clinical data evaluation.<span><sup>1, 2</sup></span> Tumour board is an integral and pivotal part of patient treatment and management in gynaecologic oncology.<span><sup>3</sup></span> It entails the processing of various pathological and clinical parameters, coupled with the familiarity with treatment guidelines in accordance with the various parameters. The participation of ChatGPT in breast cancer tumour board was previously studied, with contrasting results.<span><sup>4, 5</sup></span> We aim to study the feasibility of ChatGPT (Versions 3.5 and 4) as a support tool for endometrial cancer (EC) and ovarian cancer (OC) according to the NCCN and ESGO guidelines.</p><p>Ten EC cases and ten OC cases were fabricated based on experience of authors pertaining to the most complex scenarios discussed in real practice. For EC the following data was formulated: age, histology, stage, grade, lymphovascular space invasion, tumour size and molecular classification—MMR, p53 and POLE mutation status. For OC, the following data was formulated: age, histology and stage.</p><p>We created a new account for ChatGPT 3.5 and purchased and created an account for ChatGPT 4. We used generic prompts for all the cases. The ChatGPT 3.5 and ChatGPT 4 prompt are described (Appendix S1).</p><p>For each tumour board case, we accessed the NCCN and ESGO guidelines separately and recorded their recommendation. All ChatGPT recommendations were judged as correct or incorrect by two independent reviewers (G.L. and Y.B.). Data analysis is described in detail in the Appendix S1.</p><p>We used SPSS 29 for the statistical analysis. As no patient information was used—no ethical board review was needed for this study.</p><p>There were ten cases of EC cancer, stages IA-IIIC with four different histology, and ten cases of OC stages IA-IC3 with five different histology. ChatGPT 3.5 was unable to give a concrete recommendation, and ChatGPT 4 gave a recommendation to all cases. No disagreements between reviewers were noted for all 40 evaluations.</p><p>The rate of correct recommendations was 70% (14/20) for NCCN guidelines and 60% (12/20) for ESGO guidelines (<i>p</i> = 0.512). (Table 1). There were 55% (11/20) of cases with correct recommendations for both guidelines, 20% (4/20) of cases in which a correct recommendation was given only according to one guideline (Figure S1), and 25% (5/20) of cases in which an incorrect recommendation was given. Of those with an incorrect recommendation, 80% (4/5) were EC, stages IA-II, of all histology, and one case of OC, stage IA. Of the four single guidelines correct recommendations, all were EC, with three incorrect recommendations according to ESGO guidelines, including the only two cases with a positive POLE mutation. OC had higher complete correct recommendation as compared to EC (90% vs. 20%, <i>p</i> = 0.005). ChatGPT 4 suggestions for adjuvant treatment are presented in Tables S1 and S2.</p><p>In this feasibility study, we showed that ChatGPT 4 provided correct recommendations in two-thirds of the cases evaluated, however in 25% of cases, mostly endometrial cancer, there was an incorrect recommendation. Endometrial cancer had a lower complete rate of correct recommendations, likely due to the complexity of stage, histology and grade in early stages and in the integration of molecular characterisation of endometrial cancer. More research is required to assess the credibility and configure protocols for the potential use of this tool. However, in a setting of high-volume clinics, or in regions where resources are limiting in terms of expertise, such tools may aid physicians maintain evidenced-based care. Further studies should focus on ChatGPT familiarity with ongoing clinical trials to assess for possible patient eligibility.</p><p>Our limitations include the small number of cases studied and limiting our study to endometrial and ovarian cancer. Additionally, we have used the generic ChatGPT tool without any specific training for our data. Moreover, we have used only two AI platforms in this study, this may limit the generalisability of our results. Importantly, we did not compare the AI-generated recommendation to a multidisciplinary Tumor Board recommendation, which is the ‘gold standard’ in real practice. Finally, all data is correct to the time this manuscript was written. As ChatGPT is a large language model, he is constantly trains on prompts and his output may change and evolve over time. Future prospective real-life evaluation of gynaecologic oncology tumour board is encouraged to better delineate advantages and pitfalls of artificial intelligence tools and their impact on practice.</p><p>Gabriel Levin: conception, design, acquisition of data, analysis and interpretation of data, drafting the article, approval of the final version. Walter Gotlieb: acquisition of data, critical revision of the article, approval of the final version. Pedro Ramirez: acquisition of data, critical revision of the article, approval of the final version. Raanan Meyer: acquisition of data, critical revision of the article, approval of the final version. Yoav Brezinov: conception and design, analysis and interpretation of data, critical revision of the article, approval of the final version.</p><p>This research received no external funding.</p><p>None.</p><p>The authors report no conflict of interest.</p><p>As no patient information was used—no ethical board review was needed for this study.</p>","PeriodicalId":50729,"journal":{"name":"Bjog-An International Journal of Obstetrics and Gynaecology","volume":"132 1","pages":"99-101"},"PeriodicalIF":4.7000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1471-0528.17929","citationCount":"0","resultStr":"{\"title\":\"ChatGPT in a gynaecologic oncology multidisciplinary team tumour board: A feasibility study\",\"authors\":\"Gabriel Levin, Walter Gotlieb, Pedro Ramirez, Raanan Meyer, Yoav Brezinov\",\"doi\":\"10.1111/1471-0528.17929\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The practical medical use of artificial intelligence is rapidly progressing. Specifically, the application of ChatGPT was explored in medical education and even medical clinical data evaluation.<span><sup>1, 2</sup></span> Tumour board is an integral and pivotal part of patient treatment and management in gynaecologic oncology.<span><sup>3</sup></span> It entails the processing of various pathological and clinical parameters, coupled with the familiarity with treatment guidelines in accordance with the various parameters. The participation of ChatGPT in breast cancer tumour board was previously studied, with contrasting results.<span><sup>4, 5</sup></span> We aim to study the feasibility of ChatGPT (Versions 3.5 and 4) as a support tool for endometrial cancer (EC) and ovarian cancer (OC) according to the NCCN and ESGO guidelines.</p><p>Ten EC cases and ten OC cases were fabricated based on experience of authors pertaining to the most complex scenarios discussed in real practice. For EC the following data was formulated: age, histology, stage, grade, lymphovascular space invasion, tumour size and molecular classification—MMR, p53 and POLE mutation status. For OC, the following data was formulated: age, histology and stage.</p><p>We created a new account for ChatGPT 3.5 and purchased and created an account for ChatGPT 4. We used generic prompts for all the cases. The ChatGPT 3.5 and ChatGPT 4 prompt are described (Appendix S1).</p><p>For each tumour board case, we accessed the NCCN and ESGO guidelines separately and recorded their recommendation. All ChatGPT recommendations were judged as correct or incorrect by two independent reviewers (G.L. and Y.B.). Data analysis is described in detail in the Appendix S1.</p><p>We used SPSS 29 for the statistical analysis. As no patient information was used—no ethical board review was needed for this study.</p><p>There were ten cases of EC cancer, stages IA-IIIC with four different histology, and ten cases of OC stages IA-IC3 with five different histology. ChatGPT 3.5 was unable to give a concrete recommendation, and ChatGPT 4 gave a recommendation to all cases. No disagreements between reviewers were noted for all 40 evaluations.</p><p>The rate of correct recommendations was 70% (14/20) for NCCN guidelines and 60% (12/20) for ESGO guidelines (<i>p</i> = 0.512). (Table 1). There were 55% (11/20) of cases with correct recommendations for both guidelines, 20% (4/20) of cases in which a correct recommendation was given only according to one guideline (Figure S1), and 25% (5/20) of cases in which an incorrect recommendation was given. Of those with an incorrect recommendation, 80% (4/5) were EC, stages IA-II, of all histology, and one case of OC, stage IA. Of the four single guidelines correct recommendations, all were EC, with three incorrect recommendations according to ESGO guidelines, including the only two cases with a positive POLE mutation. OC had higher complete correct recommendation as compared to EC (90% vs. 20%, <i>p</i> = 0.005). ChatGPT 4 suggestions for adjuvant treatment are presented in Tables S1 and S2.</p><p>In this feasibility study, we showed that ChatGPT 4 provided correct recommendations in two-thirds of the cases evaluated, however in 25% of cases, mostly endometrial cancer, there was an incorrect recommendation. Endometrial cancer had a lower complete rate of correct recommendations, likely due to the complexity of stage, histology and grade in early stages and in the integration of molecular characterisation of endometrial cancer. More research is required to assess the credibility and configure protocols for the potential use of this tool. However, in a setting of high-volume clinics, or in regions where resources are limiting in terms of expertise, such tools may aid physicians maintain evidenced-based care. Further studies should focus on ChatGPT familiarity with ongoing clinical trials to assess for possible patient eligibility.</p><p>Our limitations include the small number of cases studied and limiting our study to endometrial and ovarian cancer. Additionally, we have used the generic ChatGPT tool without any specific training for our data. Moreover, we have used only two AI platforms in this study, this may limit the generalisability of our results. Importantly, we did not compare the AI-generated recommendation to a multidisciplinary Tumor Board recommendation, which is the ‘gold standard’ in real practice. Finally, all data is correct to the time this manuscript was written. As ChatGPT is a large language model, he is constantly trains on prompts and his output may change and evolve over time. Future prospective real-life evaluation of gynaecologic oncology tumour board is encouraged to better delineate advantages and pitfalls of artificial intelligence tools and their impact on practice.</p><p>Gabriel Levin: conception, design, acquisition of data, analysis and interpretation of data, drafting the article, approval of the final version. Walter Gotlieb: acquisition of data, critical revision of the article, approval of the final version. Pedro Ramirez: acquisition of data, critical revision of the article, approval of the final version. Raanan Meyer: acquisition of data, critical revision of the article, approval of the final version. Yoav Brezinov: conception and design, analysis and interpretation of data, critical revision of the article, approval of the final version.</p><p>This research received no external funding.</p><p>None.</p><p>The authors report no conflict of interest.</p><p>As no patient information was used—no ethical board review was needed for this study.</p>\",\"PeriodicalId\":50729,\"journal\":{\"name\":\"Bjog-An International Journal of Obstetrics and Gynaecology\",\"volume\":\"132 1\",\"pages\":\"99-101\"},\"PeriodicalIF\":4.7000,\"publicationDate\":\"2024-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1471-0528.17929\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bjog-An International Journal of Obstetrics and Gynaecology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/1471-0528.17929\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"OBSTETRICS & GYNECOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bjog-An International Journal of Obstetrics and Gynaecology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/1471-0528.17929","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OBSTETRICS & GYNECOLOGY","Score":null,"Total":0}
ChatGPT in a gynaecologic oncology multidisciplinary team tumour board: A feasibility study
The practical medical use of artificial intelligence is rapidly progressing. Specifically, the application of ChatGPT was explored in medical education and even medical clinical data evaluation.1, 2 Tumour board is an integral and pivotal part of patient treatment and management in gynaecologic oncology.3 It entails the processing of various pathological and clinical parameters, coupled with the familiarity with treatment guidelines in accordance with the various parameters. The participation of ChatGPT in breast cancer tumour board was previously studied, with contrasting results.4, 5 We aim to study the feasibility of ChatGPT (Versions 3.5 and 4) as a support tool for endometrial cancer (EC) and ovarian cancer (OC) according to the NCCN and ESGO guidelines.
Ten EC cases and ten OC cases were fabricated based on experience of authors pertaining to the most complex scenarios discussed in real practice. For EC the following data was formulated: age, histology, stage, grade, lymphovascular space invasion, tumour size and molecular classification—MMR, p53 and POLE mutation status. For OC, the following data was formulated: age, histology and stage.
We created a new account for ChatGPT 3.5 and purchased and created an account for ChatGPT 4. We used generic prompts for all the cases. The ChatGPT 3.5 and ChatGPT 4 prompt are described (Appendix S1).
For each tumour board case, we accessed the NCCN and ESGO guidelines separately and recorded their recommendation. All ChatGPT recommendations were judged as correct or incorrect by two independent reviewers (G.L. and Y.B.). Data analysis is described in detail in the Appendix S1.
We used SPSS 29 for the statistical analysis. As no patient information was used—no ethical board review was needed for this study.
There were ten cases of EC cancer, stages IA-IIIC with four different histology, and ten cases of OC stages IA-IC3 with five different histology. ChatGPT 3.5 was unable to give a concrete recommendation, and ChatGPT 4 gave a recommendation to all cases. No disagreements between reviewers were noted for all 40 evaluations.
The rate of correct recommendations was 70% (14/20) for NCCN guidelines and 60% (12/20) for ESGO guidelines (p = 0.512). (Table 1). There were 55% (11/20) of cases with correct recommendations for both guidelines, 20% (4/20) of cases in which a correct recommendation was given only according to one guideline (Figure S1), and 25% (5/20) of cases in which an incorrect recommendation was given. Of those with an incorrect recommendation, 80% (4/5) were EC, stages IA-II, of all histology, and one case of OC, stage IA. Of the four single guidelines correct recommendations, all were EC, with three incorrect recommendations according to ESGO guidelines, including the only two cases with a positive POLE mutation. OC had higher complete correct recommendation as compared to EC (90% vs. 20%, p = 0.005). ChatGPT 4 suggestions for adjuvant treatment are presented in Tables S1 and S2.
In this feasibility study, we showed that ChatGPT 4 provided correct recommendations in two-thirds of the cases evaluated, however in 25% of cases, mostly endometrial cancer, there was an incorrect recommendation. Endometrial cancer had a lower complete rate of correct recommendations, likely due to the complexity of stage, histology and grade in early stages and in the integration of molecular characterisation of endometrial cancer. More research is required to assess the credibility and configure protocols for the potential use of this tool. However, in a setting of high-volume clinics, or in regions where resources are limiting in terms of expertise, such tools may aid physicians maintain evidenced-based care. Further studies should focus on ChatGPT familiarity with ongoing clinical trials to assess for possible patient eligibility.
Our limitations include the small number of cases studied and limiting our study to endometrial and ovarian cancer. Additionally, we have used the generic ChatGPT tool without any specific training for our data. Moreover, we have used only two AI platforms in this study, this may limit the generalisability of our results. Importantly, we did not compare the AI-generated recommendation to a multidisciplinary Tumor Board recommendation, which is the ‘gold standard’ in real practice. Finally, all data is correct to the time this manuscript was written. As ChatGPT is a large language model, he is constantly trains on prompts and his output may change and evolve over time. Future prospective real-life evaluation of gynaecologic oncology tumour board is encouraged to better delineate advantages and pitfalls of artificial intelligence tools and their impact on practice.
Gabriel Levin: conception, design, acquisition of data, analysis and interpretation of data, drafting the article, approval of the final version. Walter Gotlieb: acquisition of data, critical revision of the article, approval of the final version. Pedro Ramirez: acquisition of data, critical revision of the article, approval of the final version. Raanan Meyer: acquisition of data, critical revision of the article, approval of the final version. Yoav Brezinov: conception and design, analysis and interpretation of data, critical revision of the article, approval of the final version.
This research received no external funding.
None.
The authors report no conflict of interest.
As no patient information was used—no ethical board review was needed for this study.
期刊介绍:
BJOG is an editorially independent publication owned by the Royal College of Obstetricians and Gynaecologists (RCOG). The Journal publishes original, peer-reviewed work in all areas of obstetrics and gynaecology, including contraception, urogynaecology, fertility, oncology and clinical practice. Its aim is to publish the highest quality medical research in women''s health, worldwide.