A105 关于 chatgpt 在胃肠病学系统综述文章筛选中的准确性的试点研究

C. Na, G. Sinanian, N. Gimpaya, A. Mokhtar, D. Chopra, M. Scaffidi, E. Yeung, S. Grover
{"title":"A105 关于 chatgpt 在胃肠病学系统综述文章筛选中的准确性的试点研究","authors":"C. Na, G. Sinanian, N. Gimpaya, A. Mokhtar, D. Chopra, M. Scaffidi, E. Yeung, S. Grover","doi":"10.1093/jcag/gwad061.105","DOIUrl":null,"url":null,"abstract":"Abstract Background Systematic reviews synthesize extant research to answer a research question in a way that minimizes bias. After articles for potential inclusion are identified by sensitive searches, screening requires human expert review, which may be time-consuming and subjective. Large language models such as ChatGPT may have potential for this application. Aims This pilot study aims to assess the accuracy of ChatGPT 3.5 in screening of articles for systematic reviews in gastroenterology by (1) identifying if articles were correctly included and (2) excluding articles reported by authors as difficult to assess. Methods We searched the Cochrane Library for gastroenterology systematic reviews (January 1, 2022 to May 31, 2023) and selected the 10 most cited studies. The test set used to determine the accuracy of Open AI’s ChatGPT 3.5 model for included studies was the final list of included studies for each Cochrane review. The test set used for studies challenging to assess was the “excluded studies” list as defined in the Cochrane Handbook. Figure 1 shows the prompt used for the screening query. Articles were omitted if they did not have digital sources, abstracts or methods. Each article was screened 10 times to account for variability within ChatGPT’s outputs. Articles with ≥5 inclusion results were counted as an included study. Results ChatGPT correctly identified included studies at rates ranging from 60% to 100%. ChatGPT correctly identified exlcuded studies at rates ranging from 0% to 50% (Table 1). A total of 265 articles were screened. Conclusions In this pilot study, we demonstrated that ChatGPT is accurate in identifying articles screened for inclusion in Cochrane reviews; however, it is inaccurate in excluding articles described by the authors as being difficult to assess. We hypothesize that the GPT 3.5 model can read for keywords and broad interventions but is unable to reason cognitively, as an expert would, as to why a study may be excluded. We aim to review reasons for exclusion in future work. Table 1. Screening Results of ChatGPT Review author and date Topic No. of studies included by authors No. of studies excluded by authors No. of studies correctly included by ChatGPT (%) No. of studies correctly excluded by ChatGPT(%) Tse, 2022 Guide-wire assisted cannulation 7 14 7 (100%) 0 (0%) Gordon, 2023 Remote care through telehealth for IBD patients 14 10 14 (100%) 3 (30%) Candy, 2022 Mu-opioid antagonists for opioid-induced bowel dysfunction 10 7 10 (100%) 0 (0%) El-Nakeep, 2022 Stem cell transplantation in Crohn 7 10 7 (100%) 5 (50%) Okabayashi, 2022 Certolizumab pegol in Crohn 5 5 3 (60%) 1 (20%) Gordon, 2023 Patient education in IBD management 19 20 18 (95%) 2 (10%) Dichman, 2022 Antibiotics for uncomplicated diverticulitis 6 6 6 (100%) 0 (0%) Grobbee, 2022 Faecal occult blood tests versus faecal immunochemical tests for colorectal cancer screening 53 26 46 (87%) 2 (8%) Midya, 2022 Fundoplication in laparoscopic Heller 9 3 8 (89%) 0 (0%) Imdad, 2023 Fecal transplantation for IBD 13 21 13 (100%) 2 (10%) Figure 1. ChatGPT Screening Prompt Funding Agencies None","PeriodicalId":508018,"journal":{"name":"Journal of the Canadian Association of Gastroenterology","volume":"523 1","pages":"76 - 78"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A105 PILOT STUDY ON THE ACCURACY OF CHATGPT IN ARTICLE SCREENING FOR SYSTEMATIC REVIEWS IN GASTROENTEROLOGY\",\"authors\":\"C. Na, G. Sinanian, N. Gimpaya, A. Mokhtar, D. Chopra, M. Scaffidi, E. Yeung, S. Grover\",\"doi\":\"10.1093/jcag/gwad061.105\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Background Systematic reviews synthesize extant research to answer a research question in a way that minimizes bias. After articles for potential inclusion are identified by sensitive searches, screening requires human expert review, which may be time-consuming and subjective. Large language models such as ChatGPT may have potential for this application. Aims This pilot study aims to assess the accuracy of ChatGPT 3.5 in screening of articles for systematic reviews in gastroenterology by (1) identifying if articles were correctly included and (2) excluding articles reported by authors as difficult to assess. Methods We searched the Cochrane Library for gastroenterology systematic reviews (January 1, 2022 to May 31, 2023) and selected the 10 most cited studies. The test set used to determine the accuracy of Open AI’s ChatGPT 3.5 model for included studies was the final list of included studies for each Cochrane review. The test set used for studies challenging to assess was the “excluded studies” list as defined in the Cochrane Handbook. Figure 1 shows the prompt used for the screening query. Articles were omitted if they did not have digital sources, abstracts or methods. Each article was screened 10 times to account for variability within ChatGPT’s outputs. Articles with ≥5 inclusion results were counted as an included study. Results ChatGPT correctly identified included studies at rates ranging from 60% to 100%. ChatGPT correctly identified exlcuded studies at rates ranging from 0% to 50% (Table 1). A total of 265 articles were screened. Conclusions In this pilot study, we demonstrated that ChatGPT is accurate in identifying articles screened for inclusion in Cochrane reviews; however, it is inaccurate in excluding articles described by the authors as being difficult to assess. We hypothesize that the GPT 3.5 model can read for keywords and broad interventions but is unable to reason cognitively, as an expert would, as to why a study may be excluded. We aim to review reasons for exclusion in future work. Table 1. Screening Results of ChatGPT Review author and date Topic No. of studies included by authors No. of studies excluded by authors No. of studies correctly included by ChatGPT (%) No. of studies correctly excluded by ChatGPT(%) Tse, 2022 Guide-wire assisted cannulation 7 14 7 (100%) 0 (0%) Gordon, 2023 Remote care through telehealth for IBD patients 14 10 14 (100%) 3 (30%) Candy, 2022 Mu-opioid antagonists for opioid-induced bowel dysfunction 10 7 10 (100%) 0 (0%) El-Nakeep, 2022 Stem cell transplantation in Crohn 7 10 7 (100%) 5 (50%) Okabayashi, 2022 Certolizumab pegol in Crohn 5 5 3 (60%) 1 (20%) Gordon, 2023 Patient education in IBD management 19 20 18 (95%) 2 (10%) Dichman, 2022 Antibiotics for uncomplicated diverticulitis 6 6 6 (100%) 0 (0%) Grobbee, 2022 Faecal occult blood tests versus faecal immunochemical tests for colorectal cancer screening 53 26 46 (87%) 2 (8%) Midya, 2022 Fundoplication in laparoscopic Heller 9 3 8 (89%) 0 (0%) Imdad, 2023 Fecal transplantation for IBD 13 21 13 (100%) 2 (10%) Figure 1. ChatGPT Screening Prompt Funding Agencies None\",\"PeriodicalId\":508018,\"journal\":{\"name\":\"Journal of the Canadian Association of Gastroenterology\",\"volume\":\"523 1\",\"pages\":\"76 - 78\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Canadian Association of Gastroenterology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/jcag/gwad061.105\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Canadian Association of Gastroenterology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jcag/gwad061.105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

摘要 背景 系统综述综合了现存的研究成果,以最小化偏差的方式回答研究问题。在通过灵敏的检索确定可能纳入的文章后,筛选工作需要人工专家审查,这可能会耗费大量时间并带有主观性。大型语言模型(如 ChatGPT)可能具有这种应用潜力。目的 本试验性研究旨在评估 ChatGPT 3.5 在筛选胃肠病学系统综述文章时的准确性,具体方法是:(1)确定文章是否被正确纳入;(2)排除作者报告的难以评估的文章。方法 我们在科克伦图书馆检索了胃肠病学系统综述(2022 年 1 月 1 日至 2023 年 5 月 31 日),并选择了 10 篇引用率最高的研究。用于确定 Open AI 的 ChatGPT 3.5 模型对纳入研究准确性的测试集是每篇 Cochrane 综述的最终纳入研究列表。用于评估具有挑战性的研究的测试集是 Cochrane 手册中定义的 "排除研究 "列表。图 1 显示了用于筛选查询的提示。如果文章没有数字来源、摘要或方法,则会被忽略。每篇文章筛选 10 次,以考虑到 ChatGPT 输出的差异性。收录结果≥5 条的文章算作一项收录研究。结果 ChatGPT 识别纳入研究的正确率从 60% 到 100% 不等。ChatGPT 正确识别被排除研究的比率为 0% 至 50%(表 1)。共筛选出 265 篇文章。结论 在这项试验性研究中,我们证明了 ChatGPT 能准确识别出经筛选纳入 Cochrane 综述的文章;但是,它在排除作者认为难以评估的文章时并不准确。我们推测,GPT 3.5 模型可以读取关键词和广泛的干预措施,但无法像专家那样从认知上推理出一项研究可能被排除的原因。我们希望在今后的工作中对排除原因进行审查。表 1.ChatGPT 的筛选结果 评审作者和日期 主题 作者纳入的研究数量 作者排除的研究数量 ChatGPT 正确纳入的研究数量(%) No.of studies correctly excluded by ChatGPT(%) Tse, 2022 Guide-wire assisted cannulation 7 14 7 (100%) 0 (0%) Gordon, 2023 Remote care through telehealth for IBD patients 14 10 14 (100%) 3 (30%) Candy、El-Nakeep, 2022 干细胞移植治疗克罗恩病 7 10 7 (100%) 5 (50%) Okabayashi, 2022 Certolizumab pegol 治疗克罗恩病 5 5 3 (60%) 1 (20%) Gordon、2023 IBD 管理中的患者教育 19 20 18 (95%) 2 (10%) Dichman, 2022 抗生素治疗无并发症憩室炎 6 6 6 (100%) 0 (0%) Grobbee、2022 粪便隐血试验与粪便免疫化学试验在大肠癌筛查中的比较 53 26 46 (87%) 2 (8%) Midya, 2022 腹腔镜海勒胃底折叠术 9 3 8 (89%) 0 (0%) Imdad, 2023 粪便移植治疗 IBD 13 21 13 (100%) 2 (10%) 图 1.ChatGPT 筛查提示 资助机构 无
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A105 PILOT STUDY ON THE ACCURACY OF CHATGPT IN ARTICLE SCREENING FOR SYSTEMATIC REVIEWS IN GASTROENTEROLOGY
Abstract Background Systematic reviews synthesize extant research to answer a research question in a way that minimizes bias. After articles for potential inclusion are identified by sensitive searches, screening requires human expert review, which may be time-consuming and subjective. Large language models such as ChatGPT may have potential for this application. Aims This pilot study aims to assess the accuracy of ChatGPT 3.5 in screening of articles for systematic reviews in gastroenterology by (1) identifying if articles were correctly included and (2) excluding articles reported by authors as difficult to assess. Methods We searched the Cochrane Library for gastroenterology systematic reviews (January 1, 2022 to May 31, 2023) and selected the 10 most cited studies. The test set used to determine the accuracy of Open AI’s ChatGPT 3.5 model for included studies was the final list of included studies for each Cochrane review. The test set used for studies challenging to assess was the “excluded studies” list as defined in the Cochrane Handbook. Figure 1 shows the prompt used for the screening query. Articles were omitted if they did not have digital sources, abstracts or methods. Each article was screened 10 times to account for variability within ChatGPT’s outputs. Articles with ≥5 inclusion results were counted as an included study. Results ChatGPT correctly identified included studies at rates ranging from 60% to 100%. ChatGPT correctly identified exlcuded studies at rates ranging from 0% to 50% (Table 1). A total of 265 articles were screened. Conclusions In this pilot study, we demonstrated that ChatGPT is accurate in identifying articles screened for inclusion in Cochrane reviews; however, it is inaccurate in excluding articles described by the authors as being difficult to assess. We hypothesize that the GPT 3.5 model can read for keywords and broad interventions but is unable to reason cognitively, as an expert would, as to why a study may be excluded. We aim to review reasons for exclusion in future work. Table 1. Screening Results of ChatGPT Review author and date Topic No. of studies included by authors No. of studies excluded by authors No. of studies correctly included by ChatGPT (%) No. of studies correctly excluded by ChatGPT(%) Tse, 2022 Guide-wire assisted cannulation 7 14 7 (100%) 0 (0%) Gordon, 2023 Remote care through telehealth for IBD patients 14 10 14 (100%) 3 (30%) Candy, 2022 Mu-opioid antagonists for opioid-induced bowel dysfunction 10 7 10 (100%) 0 (0%) El-Nakeep, 2022 Stem cell transplantation in Crohn 7 10 7 (100%) 5 (50%) Okabayashi, 2022 Certolizumab pegol in Crohn 5 5 3 (60%) 1 (20%) Gordon, 2023 Patient education in IBD management 19 20 18 (95%) 2 (10%) Dichman, 2022 Antibiotics for uncomplicated diverticulitis 6 6 6 (100%) 0 (0%) Grobbee, 2022 Faecal occult blood tests versus faecal immunochemical tests for colorectal cancer screening 53 26 46 (87%) 2 (8%) Midya, 2022 Fundoplication in laparoscopic Heller 9 3 8 (89%) 0 (0%) Imdad, 2023 Fecal transplantation for IBD 13 21 13 (100%) 2 (10%) Figure 1. ChatGPT Screening Prompt Funding Agencies None
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Contemporary epidemiology of hepatocellular carcinoma: understanding risk factors and surveillance strategies Identification and prioritization of patient-centred strategies to enhance IBD-related care for older adults: a modified Delphi approach Rectal varices treated by retrograde transvenous obliteration Canadian colorectal cancer screening programs: How do they measure up using the International Agency for Research on Cancer criteria for organized screening? A78 VALIDITY EVIDENCE FOR OBSERVATIONAL EUS COMPETENCY ASSESSMENT: A SYSTEMATIC REVIEW
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1