A105 关于 chatgpt 在胃肠病学系统综述文章筛选中的准确性的试点研究

Journal of the Canadian Association of Gastroenterology Pub Date : 2024-02-14 DOI:10.1093/jcag/gwad061.105

C. Na, G. Sinanian, N. Gimpaya, A. Mokhtar, D. Chopra, M. Scaffidi, E. Yeung, S. Grover

{"title":"A105 关于 chatgpt 在胃肠病学系统综述文章筛选中的准确性的试点研究","authors":"C. Na, G. Sinanian, N. Gimpaya, A. Mokhtar, D. Chopra, M. Scaffidi, E. Yeung, S. Grover","doi":"10.1093/jcag/gwad061.105","DOIUrl":null,"url":null,"abstract":"Abstract Background Systematic reviews synthesize extant research to answer a research question in a way that minimizes bias. After articles for potential inclusion are identified by sensitive searches, screening requires human expert review, which may be time-consuming and subjective. Large language models such as ChatGPT may have potential for this application. Aims This pilot study aims to assess the accuracy of ChatGPT 3.5 in screening of articles for systematic reviews in gastroenterology by (1) identifying if articles were correctly included and (2) excluding articles reported by authors as difficult to assess. Methods We searched the Cochrane Library for gastroenterology systematic reviews (January 1, 2022 to May 31, 2023) and selected the 10 most cited studies. The test set used to determine the accuracy of Open AI’s ChatGPT 3.5 model for included studies was the final list of included studies for each Cochrane review. The test set used for studies challenging to assess was the “excluded studies” list as defined in the Cochrane Handbook. Figure 1 shows the prompt used for the screening query. Articles were omitted if they did not have digital sources, abstracts or methods. Each article was screened 10 times to account for variability within ChatGPT’s outputs. Articles with ≥5 inclusion results were counted as an included study. Results ChatGPT correctly identified included studies at rates ranging from 60% to 100%. ChatGPT correctly identified exlcuded studies at rates ranging from 0% to 50% (Table 1). A total of 265 articles were screened. Conclusions In this pilot study, we demonstrated that ChatGPT is accurate in identifying articles screened for inclusion in Cochrane reviews; however, it is inaccurate in excluding articles described by the authors as being difficult to assess. We hypothesize that the GPT 3.5 model can read for keywords and broad interventions but is unable to reason cognitively, as an expert would, as to why a study may be excluded. We aim to review reasons for exclusion in future work. Table 1. Screening Results of ChatGPT Review author and date Topic No. of studies included by authors No. of studies excluded by authors No. of studies correctly included by ChatGPT (%) No. of studies correctly excluded by ChatGPT(%) Tse, 2022 Guide-wire assisted cannulation 7 14 7 (100%) 0 (0%) Gordon, 2023 Remote care through telehealth for IBD patients 14 10 14 (100%) 3 (30%) Candy, 2022 Mu-opioid antagonists for opioid-induced bowel dysfunction 10 7 10 (100%) 0 (0%) El-Nakeep, 2022 Stem cell transplantation in Crohn 7 10 7 (100%) 5 (50%) Okabayashi, 2022 Certolizumab pegol in Crohn 5 5 3 (60%) 1 (20%) Gordon, 2023 Patient education in IBD management 19 20 18 (95%) 2 (10%) Dichman, 2022 Antibiotics for uncomplicated diverticulitis 6 6 6 (100%) 0 (0%) Grobbee, 2022 Faecal occult blood tests versus faecal immunochemical tests for colorectal cancer screening 53 26 46 (87%) 2 (8%) Midya, 2022 Fundoplication in laparoscopic Heller 9 3 8 (89%) 0 (0%) Imdad, 2023 Fecal transplantation for IBD 13 21 13 (100%) 2 (10%) Figure 1. ChatGPT Screening Prompt Funding Agencies None","PeriodicalId":508018,"journal":{"name":"Journal of the Canadian Association of Gastroenterology","volume":"523 1","pages":"76 - 78"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A105 PILOT STUDY ON THE ACCURACY OF CHATGPT IN ARTICLE SCREENING FOR SYSTEMATIC REVIEWS IN GASTROENTEROLOGY\",\"authors\":\"C. Na, G. Sinanian, N. Gimpaya, A. Mokhtar, D. Chopra, M. Scaffidi, E. Yeung, S. Grover\",\"doi\":\"10.1093/jcag/gwad061.105\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Background Systematic reviews synthesize extant research to answer a research question in a way that minimizes bias. After articles for potential inclusion are identified by sensitive searches, screening requires human expert review, which may be time-consuming and subjective. Large language models such as ChatGPT may have potential for this application. Aims This pilot study aims to assess the accuracy of ChatGPT 3.5 in screening of articles for systematic reviews in gastroenterology by (1) identifying if articles were correctly included and (2) excluding articles reported by authors as difficult to assess. Methods We searched the Cochrane Library for gastroenterology systematic reviews (January 1, 2022 to May 31, 2023) and selected the 10 most cited studies. The test set used to determine the accuracy of Open AI’s ChatGPT 3.5 model for included studies was the final list of included studies for each Cochrane review. The test set used for studies challenging to assess was the “excluded studies” list as defined in the Cochrane Handbook. Figure 1 shows the prompt used for the screening query. Articles were omitted if they did not have digital sources, abstracts or methods. Each article was screened 10 times to account for variability within ChatGPT’s outputs. Articles with ≥5 inclusion results were counted as an included study. Results ChatGPT correctly identified included studies at rates ranging from 60% to 100%. ChatGPT correctly identified exlcuded studies at rates ranging from 0% to 50% (Table 1). A total of 265 articles were screened. Conclusions In this pilot study, we demonstrated that ChatGPT is accurate in identifying articles screened for inclusion in Cochrane reviews; however, it is inaccurate in excluding articles described by the authors as being difficult to assess. We hypothesize that the GPT 3.5 model can read for keywords and broad interventions but is unable to reason cognitively, as an expert would, as to why a study may be excluded. We aim to review reasons for exclusion in future work. Table 1. Screening Results of ChatGPT Review author and date Topic No. of studies included by authors No. of studies excluded by authors No. of studies correctly included by ChatGPT (%) No. of studies correctly excluded by ChatGPT(%) Tse, 2022 Guide-wire assisted cannulation 7 14 7 (100%) 0 (0%) Gordon, 2023 Remote care through telehealth for IBD patients 14 10 14 (100%) 3 (30%) Candy, 2022 Mu-opioid antagonists for opioid-induced bowel dysfunction 10 7 10 (100%) 0 (0%) El-Nakeep, 2022 Stem cell transplantation in Crohn 7 10 7 (100%) 5 (50%) Okabayashi, 2022 Certolizumab pegol in Crohn 5 5 3 (60%) 1 (20%) Gordon, 2023 Patient education in IBD management 19 20 18 (95%) 2 (10%) Dichman, 2022 Antibiotics for uncomplicated diverticulitis 6 6 6 (100%) 0 (0%) Grobbee, 2022 Faecal occult blood tests versus faecal immunochemical tests for colorectal cancer screening 53 26 46 (87%) 2 (8%) Midya, 2022 Fundoplication in laparoscopic Heller 9 3 8 (89%) 0 (0%) Imdad, 2023 Fecal transplantation for IBD 13 21 13 (100%) 2 (10%) Figure 1. ChatGPT Screening Prompt Funding Agencies None\",\"PeriodicalId\":508018,\"journal\":{\"name\":\"Journal of the Canadian Association of Gastroenterology\",\"volume\":\"523 1\",\"pages\":\"76 - 78\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Canadian Association of Gastroenterology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/jcag/gwad061.105\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Canadian Association of Gastroenterology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jcag/gwad061.105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

摘要背景系统综述综合了现存的研究成果，以最小化偏差的方式回答研究问题。在通过灵敏的检索确定可能纳入的文章后，筛选工作需要人工专家审查，这可能会耗费大量时间并带有主观性。大型语言模型（如 ChatGPT）可能具有这种应用潜力。目的本试验性研究旨在评估 ChatGPT 3.5 在筛选胃肠病学系统综述文章时的准确性，具体方法是：（1）确定文章是否被正确纳入；（2）排除作者报告的难以评估的文章。方法我们在科克伦图书馆检索了胃肠病学系统综述（2022 年 1 月 1 日至 2023 年 5 月 31 日），并选择了 10 篇引用率最高的研究。用于确定 Open AI 的 ChatGPT 3.5 模型对纳入研究准确性的测试集是每篇 Cochrane 综述的最终纳入研究列表。用于评估具有挑战性的研究的测试集是 Cochrane 手册中定义的 "排除研究 "列表。图 1 显示了用于筛选查询的提示。如果文章没有数字来源、摘要或方法，则会被忽略。每篇文章筛选 10 次，以考虑到 ChatGPT 输出的差异性。收录结果≥5 条的文章算作一项收录研究。结果 ChatGPT 识别纳入研究的正确率从 60% 到 100% 不等。ChatGPT 正确识别被排除研究的比率为 0% 至 50%（表 1）。共筛选出 265 篇文章。结论在这项试验性研究中，我们证明了 ChatGPT 能准确识别出经筛选纳入 Cochrane 综述的文章；但是，它在排除作者认为难以评估的文章时并不准确。我们推测，GPT 3.5 模型可以读取关键词和广泛的干预措施，但无法像专家那样从认知上推理出一项研究可能被排除的原因。我们希望在今后的工作中对排除原因进行审查。表 1.ChatGPT 的筛选结果评审作者和日期主题作者纳入的研究数量作者排除的研究数量 ChatGPT 正确纳入的研究数量（%） No.of studies correctly excluded by ChatGPT(%) Tse, 2022 Guide-wire assisted cannulation 7 14 7 (100%) 0 (0%) Gordon, 2023 Remote care through telehealth for IBD patients 14 10 14 (100%) 3 (30%) Candy、El-Nakeep, 2022 干细胞移植治疗克罗恩病 7 10 7 (100%) 5 (50%) Okabayashi, 2022 Certolizumab pegol 治疗克罗恩病 5 5 3 (60%) 1 (20%) Gordon、2023 IBD 管理中的患者教育 19 20 18 (95%) 2 (10%) Dichman, 2022 抗生素治疗无并发症憩室炎 6 6 6 (100%) 0 (0%) Grobbee、2022 粪便隐血试验与粪便免疫化学试验在大肠癌筛查中的比较 53 26 46 (87%) 2 (8%) Midya, 2022 腹腔镜海勒胃底折叠术 9 3 8 (89%) 0 (0%) Imdad, 2023 粪便移植治疗 IBD 13 21 13 (100%) 2 (10%) 图 1.ChatGPT 筛查提示资助机构无

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A105 PILOT STUDY ON THE ACCURACY OF CHATGPT IN ARTICLE SCREENING FOR SYSTEMATIC REVIEWS IN GASTROENTEROLOGY

Abstract Background Systematic reviews synthesize extant research to answer a research question in a way that minimizes bias. After articles for potential inclusion are identified by sensitive searches, screening requires human expert review, which may be time-consuming and subjective. Large language models such as ChatGPT may have potential for this application. Aims This pilot study aims to assess the accuracy of ChatGPT 3.5 in screening of articles for systematic reviews in gastroenterology by (1) identifying if articles were correctly included and (2) excluding articles reported by authors as difficult to assess. Methods We searched the Cochrane Library for gastroenterology systematic reviews (January 1, 2022 to May 31, 2023) and selected the 10 most cited studies. The test set used to determine the accuracy of Open AI’s ChatGPT 3.5 model for included studies was the final list of included studies for each Cochrane review. The test set used for studies challenging to assess was the “excluded studies” list as defined in the Cochrane Handbook. Figure 1 shows the prompt used for the screening query. Articles were omitted if they did not have digital sources, abstracts or methods. Each article was screened 10 times to account for variability within ChatGPT’s outputs. Articles with ≥5 inclusion results were counted as an included study. Results ChatGPT correctly identified included studies at rates ranging from 60% to 100%. ChatGPT correctly identified exlcuded studies at rates ranging from 0% to 50% (Table 1). A total of 265 articles were screened. Conclusions In this pilot study, we demonstrated that ChatGPT is accurate in identifying articles screened for inclusion in Cochrane reviews; however, it is inaccurate in excluding articles described by the authors as being difficult to assess. We hypothesize that the GPT 3.5 model can read for keywords and broad interventions but is unable to reason cognitively, as an expert would, as to why a study may be excluded. We aim to review reasons for exclusion in future work. Table 1. Screening Results of ChatGPT Review author and date Topic No. of studies included by authors No. of studies excluded by authors No. of studies correctly included by ChatGPT (%) No. of studies correctly excluded by ChatGPT(%) Tse, 2022 Guide-wire assisted cannulation 7 14 7 (100%) 0 (0%) Gordon, 2023 Remote care through telehealth for IBD patients 14 10 14 (100%) 3 (30%) Candy, 2022 Mu-opioid antagonists for opioid-induced bowel dysfunction 10 7 10 (100%) 0 (0%) El-Nakeep, 2022 Stem cell transplantation in Crohn 7 10 7 (100%) 5 (50%) Okabayashi, 2022 Certolizumab pegol in Crohn 5 5 3 (60%) 1 (20%) Gordon, 2023 Patient education in IBD management 19 20 18 (95%) 2 (10%) Dichman, 2022 Antibiotics for uncomplicated diverticulitis 6 6 6 (100%) 0 (0%) Grobbee, 2022 Faecal occult blood tests versus faecal immunochemical tests for colorectal cancer screening 53 26 46 (87%) 2 (8%) Midya, 2022 Fundoplication in laparoscopic Heller 9 3 8 (89%) 0 (0%) Imdad, 2023 Fecal transplantation for IBD 13 21 13 (100%) 2 (10%) Figure 1. ChatGPT Screening Prompt Funding Agencies None

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the Canadian Association of Gastroenterology

自引率

0.00%

发文量