首页 > 最新文献

Cochrane Evidence Synthesis and Methods最新文献

英文 中文
Meta-Analysis Using Time-to-Event Data: A Tutorial 使用事件时间数据的元分析:教程
Pub Date : 2025-08-26 DOI: 10.1002/cesm.70041
Ashma Krishan, Kerry Dwan

This tutorial focuses on trials that assess time-to-event outcomes. We explain what hazard ratios are, how to interpret them and demonstrate how to include time-to-event data in a meta-analysis. Examples are presented to help with understanding. Accompanying the tutorial is a micro learning module, where we demonstrate a few approaches and give you the chance to practice calculating the hazard ratio. Time-to-event micro learning module https://links.cochrane.org/cesm/tutorials/time-to-event-data.

本教程侧重于评估时间到事件结果的试验。我们解释了什么是风险比,如何解释它们,并演示了如何在荟萃分析中包含事件时间数据。举例来帮助理解。本教程附带了一个微型学习模块,我们在其中演示了几种方法,并让您有机会练习计算风险比。时间到事件微学习模块https://links.cochrane.org/cesm/tutorials/time-to-event-data。
{"title":"Meta-Analysis Using Time-to-Event Data: A Tutorial","authors":"Ashma Krishan,&nbsp;Kerry Dwan","doi":"10.1002/cesm.70041","DOIUrl":"https://doi.org/10.1002/cesm.70041","url":null,"abstract":"<p>This tutorial focuses on trials that assess time-to-event outcomes. We explain what hazard ratios are, how to interpret them and demonstrate how to include time-to-event data in a meta-analysis. Examples are presented to help with understanding. Accompanying the tutorial is a micro learning module, where we demonstrate a few approaches and give you the chance to practice calculating the hazard ratio. Time-to-event micro learning module https://links.cochrane.org/cesm/tutorials/time-to-event-data.</p>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70041","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144897288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lifecycles of Cochrane Systematic Reviews (2003–2024): A Bibliographic Study Cochrane系统综述的生命周期(2003-2024):文献研究
Pub Date : 2025-08-17 DOI: 10.1002/cesm.70043
Shiyin Li, Chong Wu, Zichen Zhang, Mengli Xiao, Mohammad Hassan Murad, Lifeng Lin

Background and Objectives

The relevance of Cochrane systematic reviews depends on timely completion and updates. This study aimed to empirically assess the lifecycles of Cochrane reviews published from 2003 to 2024, including transitions from protocol to review, update patterns, and withdrawals.

Methods

We extracted data from Cochrane Library publications between 2003 and 2024. Each review topic was identified using a unique six-digit DOI-based ID. We recorded protocol publication, review publication, updates, and withdrawals (i.e., removed from the Cochrane Library for editorial or procedural reasons), calculating time intervals between stages and conducting subgroup analyses by review type.

Results

Of 8137 protocols, 71.9% progressed to reviews (median 25.7 months), 2.4% were updated during the protocol stage, and 10.0% were withdrawn. Among 8477 reviews, 64.3% were never updated by the time of our analysis; for those updated at least once, the median interval between updates was 57.2 months. Withdrawal occurred in 2.5% of reviews (median 67.6 months post-publication). Subgroup analyses showed variation across review types; diagnostic and qualitative reviews tended to have longer protocol-to-review times than other types of reviews.

Conclusions

Cochrane reviews show long development and update intervals, with variation by review type. Greater use of automation and targeted support may improve review efficiency and timeliness.

背景与目的Cochrane系统评价的相关性取决于及时完成和更新。本研究旨在对2003年至2024年间发表的Cochrane综述的生命周期进行实证评估,包括从方案到综述的转变、更新模式和退出。方法从Cochrane图书馆2003年至2024年的出版物中提取数据。每个审查主题使用唯一的六位数基于doi的ID进行标识。我们记录了方案发表、综述发表、更新和退出(即由于编辑或程序原因从Cochrane图书馆删除),计算各阶段之间的时间间隔,并按综述类型进行亚组分析。在8137个方案中,71.9%进展到审查阶段(中位25.7个月),2.4%在方案阶段更新,10.0%退出。在8477篇评论中,64.3%的评论在我们分析时从未更新过;对于那些至少更新一次的人,更新之间的中位数间隔为57.2个月。2.5%的综述出现撤回(发表后67.6个月)。亚组分析显示不同综述类型之间存在差异;诊断性和定性审查往往比其他类型的审查有更长的协议到审查的时间。结论Cochrane综述显示较长的发展和更新间隔,且因综述类型而异。更多地使用自动化和有针对性的支持可以提高审查的效率和及时性。
{"title":"Lifecycles of Cochrane Systematic Reviews (2003–2024): A Bibliographic Study","authors":"Shiyin Li,&nbsp;Chong Wu,&nbsp;Zichen Zhang,&nbsp;Mengli Xiao,&nbsp;Mohammad Hassan Murad,&nbsp;Lifeng Lin","doi":"10.1002/cesm.70043","DOIUrl":"https://doi.org/10.1002/cesm.70043","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Background and Objectives</h3>\u0000 \u0000 <p>The relevance of Cochrane systematic reviews depends on timely completion and updates. This study aimed to empirically assess the lifecycles of Cochrane reviews published from 2003 to 2024, including transitions from protocol to review, update patterns, and withdrawals.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>We extracted data from Cochrane Library publications between 2003 and 2024. Each review topic was identified using a unique six-digit DOI-based ID. We recorded protocol publication, review publication, updates, and withdrawals (i.e., removed from the Cochrane Library for editorial or procedural reasons), calculating time intervals between stages and conducting subgroup analyses by review type.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>Of 8137 protocols, 71.9% progressed to reviews (median 25.7 months), 2.4% were updated during the protocol stage, and 10.0% were withdrawn. Among 8477 reviews, 64.3% were never updated by the time of our analysis; for those updated at least once, the median interval between updates was 57.2 months. Withdrawal occurred in 2.5% of reviews (median 67.6 months post-publication). Subgroup analyses showed variation across review types; diagnostic and qualitative reviews tended to have longer protocol-to-review times than other types of reviews.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusions</h3>\u0000 \u0000 <p>Cochrane reviews show long development and update intervals, with variation by review type. Greater use of automation and targeted support may improve review efficiency and timeliness.</p>\u0000 </section>\u0000 </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70043","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144858600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing Research Impact: A Toolkit for Stakeholder-Driven Prioritization of Systematic Review Topics 优化研究影响:利益相关者驱动的系统审查主题优先排序工具包
Pub Date : 2025-08-14 DOI: 10.1002/cesm.70039
Dyon Hoekstra, Stefan K. Lhachimi

Intro

The prioritization of topics for evidence synthesis is crucial for maximizing the relevance and impact of systematic reviews. This article introduces a comprehensive toolkit designed to facilitate a structured, multi-step framework for engaging a broad spectrum of stakeholders in the prioritization process, ensuring the selection of topics that are both relevant and applicable.

Methods

We detail an open-source framework comprising 11 coherent steps, segmented into scoping and Delphi stages, to offer a flexible and resource-efficient approach for stakeholder involvement in research priority setting.

Results

The toolkit provides ready-to-use tools for the development, application, and analysis of the framework, including templates for online surveys developed with free open-source software, ensuring ease of replication and adaptation in various research fields. The framework supports the transparent and systematic development and assessment of systematic review topics, with a particular focus on stakeholder-refined assessment criteria.

Conclusion

Our toolkit enhances the transparency and ease of the priority-setting process. Targeted primarily at organizations and research groups seeking to allocate resources for future research based on stakeholder needs, this toolkit stands as a valuable resource for informed decision-making in research prioritization.

证据综合主题的优先顺序对于最大限度地发挥系统评价的相关性和影响至关重要。本文介绍了一个全面的工具包,旨在促进一个结构化的多步骤框架,使广泛的利益相关者参与优先排序过程,确保选择相关且适用的主题。我们详细介绍了一个开源框架,包括11个连贯的步骤,分为范围界定和德尔菲阶段,为利益相关者参与研究优先级设置提供了一种灵活和资源高效的方法。该工具包为框架的开发、应用和分析提供了现成的工具,包括使用免费开源软件开发的在线调查模板,确保易于复制和适应各种研究领域。该框架支持透明和系统地制定和评估系统审查主题,特别侧重于利益攸关方细化的评估标准。结论:我们的工具包提高了确定优先事项过程的透明度和便利性。该工具包主要针对组织和研究小组,旨在根据利益相关者的需求为未来的研究分配资源,它是研究优先级决策的宝贵资源。
{"title":"Optimizing Research Impact: A Toolkit for Stakeholder-Driven Prioritization of Systematic Review Topics","authors":"Dyon Hoekstra,&nbsp;Stefan K. Lhachimi","doi":"10.1002/cesm.70039","DOIUrl":"https://doi.org/10.1002/cesm.70039","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Intro</h3>\u0000 \u0000 <p>The prioritization of topics for evidence synthesis is crucial for maximizing the relevance and impact of systematic reviews. This article introduces a comprehensive toolkit designed to facilitate a structured, multi-step framework for engaging a broad spectrum of stakeholders in the prioritization process, ensuring the selection of topics that are both relevant and applicable.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>We detail an open-source framework comprising 11 coherent steps, segmented into scoping and Delphi stages, to offer a flexible and resource-efficient approach for stakeholder involvement in research priority setting.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>The toolkit provides ready-to-use tools for the development, application, and analysis of the framework, including templates for online surveys developed with free open-source software, ensuring ease of replication and adaptation in various research fields. The framework supports the transparent and systematic development and assessment of systematic review topics, with a particular focus on stakeholder-refined assessment criteria.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusion</h3>\u0000 \u0000 <p>Our toolkit enhances the transparency and ease of the priority-setting process. Targeted primarily at organizations and research groups seeking to allocate resources for future research based on stakeholder needs, this toolkit stands as a valuable resource for informed decision-making in research prioritization.</p>\u0000 </section>\u0000 </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70039","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144832632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ChatGPT-4o Compared With Human Researchers in Writing Plain-Language Summaries for Cochrane Reviews: A Blinded, Randomized Non-Inferiority Controlled Trial chatgpt - 40与人类研究人员在Cochrane综述中撰写简单语言摘要的比较:一项盲法,随机非劣效对照试验
Pub Date : 2025-07-28 DOI: 10.1002/cesm.70037
Dagný Halla Ágústsdóttir, Jacob Rosenberg, Jason Joe Baker
<div> <section> <h3> Introduction</h3> <p>Plain language summaries in Cochrane reviews are designed to present key information in a way that is understandable to individuals without a medical background. Despite Cochrane's author guidelines, these summaries often fail to achieve their intended purpose. Studies show that they are generally difficult to read and vary in their adherence to the guidelines. Artificial intelligence is increasingly used in medicine and academia, with its potential being tested in various roles. This study aimed to investigate whether ChatGPT-4o could produce plain language summaries that are as good as the already published plain language summaries in Cochrane reviews.</p> </section> <section> <h3> Methods</h3> <p>We conducted a randomized, single-blinded study with a total of 36 plain language summaries: 18 human written and 18 ChatGPT-4o generated summaries where both versions were for the same Cochrane reviews. The sample size was calculated to be 36 and each summary was evaluated four times. Each summary was reviewed twice by members of a Cochrane editorial group and twice by laypersons. The summaries were assessed in three different ways: First, all assessors evaluated the summaries for informativeness, readability, and level of detail using a Likert scale from 1 to 10. They were also asked whether they would submit the summary and whether they could identify who had written it. Second, members of a Cochrane editorial group assessed the summaries using a checklist based on Cochrane's guidelines for plain language summaries, with scores ranging from 0 to 10. Finally, the readability of the summaries was analyzed using objective tools such as Lix and Flesch-Kincaid scores. Randomization and allocation to either ChatGPT-4o or human written summaries were conducted using random.org's random sequence generator, and assessors were blinded to the authorship of the summaries.</p> </section> <section> <h3> Results</h3> <p>The plain language summaries generated by ChatGPT-4o scored 1 point higher on information (<i>p</i> < .001) and level of detail (<i>p</i> = .004), and 2 points higher on readability (<i>p</i> = .002) compared to human written summaries. Lix and Flesch-Kincaid scores were high for both groups of summaries, though ChatGPT was slightly easier to read (<i>p</i> < .001). Assessors found it difficult to distinguish between ChatGPT and human written summaries, with only 20% correctly identifying ChatGPT generated text. ChatGPT summaries were preferred for submission compared to the human written summaries (64% vs. 36%, <i>p</i> < .001).</p> </section> <section>
Cochrane综述中的简单语言摘要旨在以一种没有医学背景的人也能理解的方式呈现关键信息。尽管Cochrane有作者指南,但这些摘要往往达不到预期目的。研究表明,它们通常难以阅读,并且在遵守指南方面各不相同。人工智能越来越多地应用于医学和学术界,其潜力正在各种角色中得到测试。本研究旨在调查chatgpt - 40是否能产生与Cochrane综述中已发表的普通语言摘要一样好的普通语言摘要。方法:我们进行了一项随机、单盲研究,共有36个简单的语言摘要:18个人类手写的摘要和18个chatgpt - 40生成的摘要,两个版本都用于相同的Cochrane综述。样本量计算为36份,每份摘要评估4次。每个摘要由Cochrane编辑小组的成员和外行人员分别审查两次。摘要以三种不同的方式进行评估:首先,所有评估者使用李克特量表从1到10来评估摘要的信息量、可读性和详细程度。他们还被问及是否会提交摘要,以及是否能确认是谁写的摘要。其次,Cochrane编辑组的成员根据Cochrane的简明语言摘要指南,使用清单对摘要进行评估,得分范围从0到10。最后,使用Lix和Flesch-Kincaid评分等客观工具分析摘要的可读性。使用random.org的随机序列生成器对chatgpt - 40或人类书面摘要进行随机化和分配,评估人员对摘要的作者身份不知情。结果与人类书面摘要相比,chatgpt - 40生成的简明语言摘要在信息(p < .001)和细节水平(p = .004)上得分高1分,在可读性(p = .002)上得分高2分。Lix和Flesch-Kincaid分数在两组摘要中都很高,尽管ChatGPT稍微容易阅读(p < .001)。评估人员发现很难区分ChatGPT和人类写的摘要,只有20%的人能正确识别ChatGPT生成的文本。与人类书面摘要相比,ChatGPT摘要更适合提交(64%对36%,p < .001)。chatgpt - 40在为Cochrane综述创建简单的语言摘要方面显示出希望,至少和人类一样好,在某些情况下略好。这项研究表明,chatgpt - 40可以成为一种工具,为Cochrane综述起草易于理解的简单语言摘要,其质量接近或匹配人类作者。临床试验注册和方案可在https://osf.io/aq6r5获得。
{"title":"ChatGPT-4o Compared With Human Researchers in Writing Plain-Language Summaries for Cochrane Reviews: A Blinded, Randomized Non-Inferiority Controlled Trial","authors":"Dagný Halla Ágústsdóttir,&nbsp;Jacob Rosenberg,&nbsp;Jason Joe Baker","doi":"10.1002/cesm.70037","DOIUrl":"https://doi.org/10.1002/cesm.70037","url":null,"abstract":"&lt;div&gt;\u0000 \u0000 \u0000 &lt;section&gt;\u0000 \u0000 &lt;h3&gt; Introduction&lt;/h3&gt;\u0000 \u0000 &lt;p&gt;Plain language summaries in Cochrane reviews are designed to present key information in a way that is understandable to individuals without a medical background. Despite Cochrane's author guidelines, these summaries often fail to achieve their intended purpose. Studies show that they are generally difficult to read and vary in their adherence to the guidelines. Artificial intelligence is increasingly used in medicine and academia, with its potential being tested in various roles. This study aimed to investigate whether ChatGPT-4o could produce plain language summaries that are as good as the already published plain language summaries in Cochrane reviews.&lt;/p&gt;\u0000 &lt;/section&gt;\u0000 \u0000 &lt;section&gt;\u0000 \u0000 &lt;h3&gt; Methods&lt;/h3&gt;\u0000 \u0000 &lt;p&gt;We conducted a randomized, single-blinded study with a total of 36 plain language summaries: 18 human written and 18 ChatGPT-4o generated summaries where both versions were for the same Cochrane reviews. The sample size was calculated to be 36 and each summary was evaluated four times. Each summary was reviewed twice by members of a Cochrane editorial group and twice by laypersons. The summaries were assessed in three different ways: First, all assessors evaluated the summaries for informativeness, readability, and level of detail using a Likert scale from 1 to 10. They were also asked whether they would submit the summary and whether they could identify who had written it. Second, members of a Cochrane editorial group assessed the summaries using a checklist based on Cochrane's guidelines for plain language summaries, with scores ranging from 0 to 10. Finally, the readability of the summaries was analyzed using objective tools such as Lix and Flesch-Kincaid scores. Randomization and allocation to either ChatGPT-4o or human written summaries were conducted using random.org's random sequence generator, and assessors were blinded to the authorship of the summaries.&lt;/p&gt;\u0000 &lt;/section&gt;\u0000 \u0000 &lt;section&gt;\u0000 \u0000 &lt;h3&gt; Results&lt;/h3&gt;\u0000 \u0000 &lt;p&gt;The plain language summaries generated by ChatGPT-4o scored 1 point higher on information (&lt;i&gt;p&lt;/i&gt; &lt; .001) and level of detail (&lt;i&gt;p&lt;/i&gt; = .004), and 2 points higher on readability (&lt;i&gt;p&lt;/i&gt; = .002) compared to human written summaries. Lix and Flesch-Kincaid scores were high for both groups of summaries, though ChatGPT was slightly easier to read (&lt;i&gt;p&lt;/i&gt; &lt; .001). Assessors found it difficult to distinguish between ChatGPT and human written summaries, with only 20% correctly identifying ChatGPT generated text. ChatGPT summaries were preferred for submission compared to the human written summaries (64% vs. 36%, &lt;i&gt;p&lt;/i&gt; &lt; .001).&lt;/p&gt;\u0000 &lt;/section&gt;\u0000 \u0000 &lt;section&gt;\u0000 ","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70037","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144714684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing the Utility of Openalex to Identify Studies for Systematic Reviews: Methods and a Case Study 分析Openalex在系统评价中识别研究的效用:方法和案例研究
Pub Date : 2025-07-24 DOI: 10.1002/cesm.70038
Claire Stansfield, Hossein Dehdarirad, James Thomas, Silvy Mathew, Alison O'Mara-Eves

Open access scholarly resources have potential to simplify the literature search process, support more equitable access to research knowledge, and reduce biases from lack of access to relevant literature. OpenAlex is the world's largest open access database of academic research. However, it is not known whether OpenAlex is suitable for comprehensively identifying research for systematic reviews. We present an approach to measure the utility of OpenAlex as part of undertaking a systematic review, and present findings in the context of undertaking a systematic map on the implementation of diabetic eye screening. Procedures were developed to investigate OpenAlex's content coverage and capture, focusing on: (1) availability of relevant research records; (2) retrieval of relevant records from a Boolean search of OpenAlex (3) retrieval of relevant records from combining a PubMed Boolean search with a citations and related-items search of OpenAlex, and (4) efficient estimation of relevant records not identified elsewhere. The searches were conducted in July 2024 and repeated in March 2025 following removal of certain closed access abstracts from the OpenAlex data set. The original systematic review searches yielded 131 relevant records and 128 (98%) of these are present in OpenAlex. OpenAlex Boolean searches retrieved 126 (96%) of the 131 records, and partial screening yielded two relevant records not previously known to the review team. Retrieval was reduced to 123 (94%) when the searches were repeated in March 2025. However, the volume of records from the OpenAlex Boolean search was considerably greater than assessed for the original systematic map. Combining a Boolean search from PubMed and OpenAlex network graph searches yielded 93% recall. It is feasible and useful to investigate the use of OpenAlex as a key information resource for health topics. This approach can be modified to investigate OpenAlex for other systematic reviews. However, the volume of records obtained from searches is larger than that obtained from conventional sources, something that could be reduced using machine learning. Further investigations are needed, and our approach replicated in other reviews.

开放获取学术资源有可能简化文献检索过程,支持更公平地获取研究知识,并减少因缺乏相关文献而产生的偏见。OpenAlex是世界上最大的学术研究开放获取数据库。然而,目前尚不清楚OpenAlex是否适合全面识别用于系统评论的研究。我们提出了一种方法来衡量OpenAlex的效用,作为进行系统回顾的一部分,并在进行糖尿病眼科筛查实施的系统地图的背景下提出了研究结果。制定程序来调查OpenAlex的内容覆盖和捕获,重点是:(1)相关研究记录的可用性;(2)从OpenAlex的布尔搜索中检索相关记录;(3)将PubMed布尔搜索与OpenAlex的引文和相关条目搜索结合起来检索相关记录;(4)对其他地方未识别的相关记录进行有效估计。搜索于2024年7月进行,并在2025年3月从OpenAlex数据集中删除某些封闭访问摘要后重复搜索。最初的系统评论搜索产生了131条相关记录,其中128条(98%)存在于OpenAlex中。OpenAlex布尔搜索检索了131条记录中的126条(96%),部分筛选产生了审查小组以前不知道的两条相关记录。在2025年3月重复检索时,检索量减少到123(94%)。然而,来自OpenAlex布尔搜索的记录量远远大于原始系统地图的评估量。结合PubMed的布尔搜索和OpenAlex的网络图搜索,召回率达到93%。调查OpenAlex作为健康主题的关键信息资源的使用是可行和有用的。可以修改此方法以调查OpenAlex以进行其他系统审查。然而,从搜索中获得的记录量比从传统来源获得的记录量要大,这可以使用机器学习来减少。需要进一步的研究,我们的方法在其他综述中得到了重复。
{"title":"Analyzing the Utility of Openalex to Identify Studies for Systematic Reviews: Methods and a Case Study","authors":"Claire Stansfield,&nbsp;Hossein Dehdarirad,&nbsp;James Thomas,&nbsp;Silvy Mathew,&nbsp;Alison O'Mara-Eves","doi":"10.1002/cesm.70038","DOIUrl":"https://doi.org/10.1002/cesm.70038","url":null,"abstract":"<p>Open access scholarly resources have potential to simplify the literature search process, support more equitable access to research knowledge, and reduce biases from lack of access to relevant literature. OpenAlex is the world's largest open access database of academic research. However, it is not known whether OpenAlex is suitable for comprehensively identifying research for systematic reviews. We present an approach to measure the utility of OpenAlex as part of undertaking a systematic review, and present findings in the context of undertaking a systematic map on the implementation of diabetic eye screening. Procedures were developed to investigate OpenAlex's content coverage and capture, focusing on: (1) availability of relevant research records; (2) retrieval of relevant records from a Boolean search of OpenAlex (3) retrieval of relevant records from combining a PubMed Boolean search with a citations and related-items search of OpenAlex, and (4) efficient estimation of relevant records not identified elsewhere. The searches were conducted in July 2024 and repeated in March 2025 following removal of certain closed access abstracts from the OpenAlex data set. The original systematic review searches yielded 131 relevant records and 128 (98%) of these are present in OpenAlex. OpenAlex Boolean searches retrieved 126 (96%) of the 131 records, and partial screening yielded two relevant records not previously known to the review team. Retrieval was reduced to 123 (94%) when the searches were repeated in March 2025. However, the volume of records from the OpenAlex Boolean search was considerably greater than assessed for the original systematic map. Combining a Boolean search from PubMed and OpenAlex network graph searches yielded 93% recall. It is feasible and useful to investigate the use of OpenAlex as a key information resource for health topics. This approach can be modified to investigate OpenAlex for other systematic reviews. However, the volume of records obtained from searches is larger than that obtained from conventional sources, something that could be reduced using machine learning. Further investigations are needed, and our approach replicated in other reviews.</p>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70038","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144688187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing nursing and other healthcare professionals' knowledge of childhood sexual abuse through self-assessment: A realist review 通过自我评估提高护理和其他保健专业人员对儿童性虐待的认识:现实主义回顾
Pub Date : 2025-07-23 DOI: 10.1002/cesm.70019
Dr. Olumide Adisa, Ms. Katie Tyrrell, Dr. Katherine Allen

Aim

To explore how child sexual abuse/exploitation (CSA/E) self-assessment tools are being used to enhance healthcare professionals' knowledge and confidence.

Background

Child sexual abuse/exploitation is common and associated with lifelong health impacts. In particular, nurses are well-placed to facilitate disclosures by adult survivors of child sexual abuse/exploitation and promote timely access to support. However, research shows that many are reluctant to enquire about abuse and feel underprepared for disclosures. Self-assessment provides a participatory method for evaluating competencies and identifying areas that need improvement.

Evaluation

Researchers adopted a realist synthesis approach, searching relevant databases for healthcare professionals' self-assessment tools/protocols relevant to adult survivors. In total, researchers reviewed 247 full-text articles. Twenty-five items met the criteria for data extraction, and to assess relevant contexts (C), mechanisms (M) and outcomes (O) were identified and mapped. Eight of these were included in the final synthesis based on papers that identified two key ‘families’ of abuse-related self-assessment interventions for healthcare contexts: PREMIS, a validated survey instrument to assess HCP knowledge, confidence and practice about domestic violence and abuse (DVA); Trauma-informed practice/care (TIP/C) organisational self-assessment protocols. Two revised programme theories were formulated: (1). Individual self-assessment can promote organisational accountability; and (2). Organisational self-assessment can increase the coherence and sustainability of changes in practice.

Conclusions

There is a lack of self-assessment tools/protocols designed to improve healthcare professionals' knowledge and confidence. Our review contributes to the evidence base on improving healthcare responses to CSA/E survivors, illustrating that self-assessment tools or protocols designed to improve HCP responses to adult survivors of CSA/E remain underdeveloped and under-studied. Refined programme theories developed during synthesis regarding DVA and TIP/C-related tools or protocols suggest areas for CSA/E-specific future research with stakeholders and service users.

目的探讨如何使用儿童性虐待/性剥削(CSA/E)自我评估工具来提高医护人员的知识和信心。儿童性虐待/性剥削很常见,并与终身健康影响有关。特别是,护士在促进儿童性虐待/性剥削成年幸存者的披露和促进及时获得支持方面处于有利地位。然而,研究表明,许多人不愿询问性侵问题,对披露感到准备不足。自我评估为评价能力和确定需要改进的领域提供了一种参与性方法。研究人员采用现实主义综合方法,在相关数据库中检索与成年幸存者相关的医疗保健专业人员自我评估工具/协议。研究人员总共审阅了247篇全文文章。25个项目符合数据提取的标准,为了评估相关背景(C),确定并绘制了机制(M)和结果(O)。其中8项被纳入最后的综合,其依据的论文确定了医疗保健环境中与虐待有关的两个关键“家庭”自我评估干预措施:PREMIS,一种有效的调查工具,用于评估卫生保健专业人员对家庭暴力和虐待的知识、信心和做法;创伤知情实践/护理(TIP/C)组织自我评估协议。提出了两种修正方案理论:(1)。个人自我评估可以促进组织问责;和(2)。组织自我评估可以提高实践中变革的一致性和可持续性。结论缺乏旨在提高卫生保健专业人员知识和信心的自我评估工具/方案。我们的综述为改善CSA/E幸存者的医疗保健反应提供了证据基础,说明旨在改善成年CSA/E幸存者的HCP反应的自我评估工具或方案仍然不发达且研究不足。在综合过程中形成的关于DVA和TIP/ c相关工具或协议的完善方案理论,为未来与利益攸关方和服务用户进行特定于CSA/ e的研究提出了建议。
{"title":"Enhancing nursing and other healthcare professionals' knowledge of childhood sexual abuse through self-assessment: A realist review","authors":"Dr. Olumide Adisa,&nbsp;Ms. Katie Tyrrell,&nbsp;Dr. Katherine Allen","doi":"10.1002/cesm.70019","DOIUrl":"https://doi.org/10.1002/cesm.70019","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Aim</h3>\u0000 \u0000 <p>To explore how child sexual abuse/exploitation (CSA/E) self-assessment tools are being used to enhance healthcare professionals' knowledge and confidence.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Background</h3>\u0000 \u0000 <p>Child sexual abuse/exploitation is common and associated with lifelong health impacts. In particular, nurses are well-placed to facilitate disclosures by adult survivors of child sexual abuse/exploitation and promote timely access to support. However, research shows that many are reluctant to enquire about abuse and feel underprepared for disclosures. Self-assessment provides a participatory method for evaluating competencies and identifying areas that need improvement.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Evaluation</h3>\u0000 \u0000 <p>Researchers adopted a realist synthesis approach, searching relevant databases for healthcare professionals' self-assessment tools/protocols relevant to adult survivors. In total, researchers reviewed 247 full-text articles. Twenty-five items met the criteria for data extraction, and to assess relevant contexts (C), mechanisms (M) and outcomes (O) were identified and mapped. Eight of these were included in the final synthesis based on papers that identified two key ‘families’ of abuse-related self-assessment interventions for healthcare contexts: PREMIS, a validated survey instrument to assess HCP knowledge, confidence and practice about domestic violence and abuse (DVA); Trauma-informed practice/care (TIP/C) organisational self-assessment protocols. Two revised programme theories were formulated: (1). Individual self-assessment can promote organisational accountability; and (2). Organisational self-assessment can increase the coherence and sustainability of changes in practice.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusions</h3>\u0000 \u0000 <p>There is a lack of self-assessment tools/protocols designed to improve healthcare professionals' knowledge and confidence. Our review contributes to the evidence base on improving healthcare responses to CSA/E survivors, illustrating that self-assessment tools or protocols designed to improve HCP responses to adult survivors of CSA/E remain underdeveloped and under-studied. Refined programme theories developed during synthesis regarding DVA and TIP/C-related tools or protocols suggest areas for CSA/E-specific future research with stakeholders and service users.</p>\u0000 </section>\u0000 </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70019","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144681594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Artificial Intelligence Tools as Second Reviewers for Data Extraction in Systematic Reviews: A Performance Comparison of Two AI Tools Against Human Reviewers 使用人工智能工具作为系统评价中数据提取的第二审稿人:两种人工智能工具与人类审稿人的性能比较
Pub Date : 2025-07-14 DOI: 10.1002/cesm.70036
T. Helms Andersen, T. M. Marcussen, A. D. Termannsen, T. W. H. Lawaetz, O. Nørgaard

Background

Systematic reviews are essential but time-consuming and expensive. Large language models (LLMs) and artificial intelligence (AI) tools could potentially automate data extraction, but no comprehensive workflow has been tested for different review types.

Objective

To evaluate Elicit's and ChatGPT's abilities to extract data from journal articles as a replacement for one of two human data extractors in systematic reviews.

Methods

Human-extracted data from three systematic reviews (30 articles in total) was compared to data extracted by Elicit and ChatGPT. The AI tools extracted population characteristics, study design, and review-specific variables. Performance metrics were calculated against human double-extracted data as the gold standard, followed by a detailed error analysis.

Results

Precision, recall and F1-score were all 92% for Elicit and 91%, 89% and 90% for ChatGPT. Recall was highest for study design (Elicit: 100%; ChatGPT: 90%) and population characteristics (Elicit: 100%; ChatGPT: 97%), while review-specific variables achieved 77% in Elicit and 80% in ChatGPT. Elicit had four instances of confabulation while ChatGPT had three. There was no significant difference between the two AI tools' performance (recall difference: 3.3% points, 95% CI: –5.2%–11.9%, p = 0.445).

Conclusion

AI tools demonstrated high and similar performance in data extraction compared to human reviewers, particularly for standardized variables. Error analysis revealed confabulations in 4% of data points. We propose adopting AI-assisted extraction to replace the second human extractor, with the second human instead focusing on reconciling discrepancies between AI and the primary human extractor.

系统审查是必要的,但耗时且昂贵。大型语言模型(llm)和人工智能(AI)工具可能会自动提取数据,但还没有针对不同审查类型测试过全面的工作流程。目的评估Elicit和ChatGPT从期刊文章中提取数据的能力,以取代系统评价中两个人工数据提取器中的一个。方法将人工提取的3篇系统综述(共30篇)的数据与Elicit和ChatGPT提取的数据进行比较。人工智能工具提取群体特征、研究设计和评论特定变量。性能指标是根据人类双重提取的数据作为金标准计算的,然后是详细的误差分析。结果Elicit的查全率、查全率和f1评分均为92%,ChatGPT的查全率分别为91%、89%和90%。研究设计的回忆率最高(引出:100%;ChatGPT: 90%)和群体特征(Elicit: 100%;ChatGPT: 97%),而审查特定变量在Elicit中达到77%,在ChatGPT中达到80%。Elicit有4个虚构实例,而ChatGPT有3个。两种人工智能工具的性能没有显著差异(召回差:3.3%点,95% CI: -5.2%-11.9%, p = 0.445)。结论:人工智能工具在数据提取方面表现出与人类审稿人相似的高性能,尤其是在标准化变量方面。误差分析显示4%的数据点存在虚构。我们建议采用人工智能辅助提取来取代第二个人工提取器,而第二个人工提取器则专注于协调人工智能与主要人工提取器之间的差异。
{"title":"Using Artificial Intelligence Tools as Second Reviewers for Data Extraction in Systematic Reviews: A Performance Comparison of Two AI Tools Against Human Reviewers","authors":"T. Helms Andersen,&nbsp;T. M. Marcussen,&nbsp;A. D. Termannsen,&nbsp;T. W. H. Lawaetz,&nbsp;O. Nørgaard","doi":"10.1002/cesm.70036","DOIUrl":"https://doi.org/10.1002/cesm.70036","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Background</h3>\u0000 \u0000 <p>Systematic reviews are essential but time-consuming and expensive. Large language models (LLMs) and artificial intelligence (AI) tools could potentially automate data extraction, but no comprehensive workflow has been tested for different review types.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Objective</h3>\u0000 \u0000 <p>To evaluate Elicit's and ChatGPT's abilities to extract data from journal articles as a replacement for one of two human data extractors in systematic reviews.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>Human-extracted data from three systematic reviews (30 articles in total) was compared to data extracted by Elicit and ChatGPT. The AI tools extracted population characteristics, study design, and review-specific variables. Performance metrics were calculated against human double-extracted data as the gold standard, followed by a detailed error analysis.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>Precision, recall and F1-score were all 92% for Elicit and 91%, 89% and 90% for ChatGPT. Recall was highest for study design (Elicit: 100%; ChatGPT: 90%) and population characteristics (Elicit: 100%; ChatGPT: 97%), while review-specific variables achieved 77% in Elicit and 80% in ChatGPT. Elicit had four instances of confabulation while ChatGPT had three. There was no significant difference between the two AI tools' performance (recall difference: 3.3% points, 95% CI: –5.2%–11.9%, <i>p</i> = 0.445).</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusion</h3>\u0000 \u0000 <p>AI tools demonstrated high and similar performance in data extraction compared to human reviewers, particularly for standardized variables. Error analysis revealed confabulations in 4% of data points. We propose adopting AI-assisted extraction to replace the second human extractor, with the second human instead focusing on reconciling discrepancies between AI and the primary human extractor.</p>\u0000 </section>\u0000 </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70036","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144615305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Creating Interactive Data Dashboards for Evidence Syntheses 为证据合成创建交互式数据仪表板
Pub Date : 2025-06-25 DOI: 10.1002/cesm.70035
Leslie A. Perdue, Shaina D. Trevino, Sean Grant, Jennifer S. Lin, Emily E. Tanner-Smith

Systematic review findings are typically disseminated via static outputs, such as scientific manuscripts, which can limit the accessibility and usability for diverse audiences. Interactive data dashboards transform systematic review data into dynamic, user-friendly visualizations, allowing deeper engagement with evidence synthesis findings. We propose a workflow for creating interactive dashboards to display evidence synthesis results, including three key phases: planning, development, and deployment. Planning involves defining the dashboard objectives and key audiences, selecting the appropriate software (e.g., Tableau or R Shiny) and preparing the data. Development includes designing a user-friendly interface and specifying interactive elements. Lastly, deployment focuses on making it available to users and utilizing user-testing. Throughout all phases, we emphasize seeking and incorporating interest-holder input and aligning dashboards with the intended audience's needs. To demonstrate this workflow, we provide two examples from previous systematic reviews. The first dashboard, created in Tableau, presents findings from a meta-analysis to support a U.S. Preventive Services Task Force recommendation on lipid disorder screening in children, while the second utilizes R Shiny to display data from a scoping review on the 4-day school week among K-12 students in the U.S. Both dashboards incorporate interactive elements to present complex evidence tailored to different interest-holders, including non-research audiences. Interactive dashboards can enhance the utility of evidence syntheses by providing a user-friendly tool for interest-holders to explore data relevant to their specific needs. This workflow can be adapted to create interactive dashboards in flexible formats to increase the use and accessibility of systematic review findings.

系统审查结果通常通过静态输出(例如科学手稿)传播,这可能限制不同受众的可及性和可用性。交互式数据仪表板将系统审查数据转换为动态的,用户友好的可视化,允许更深入地参与证据综合发现。我们提出了一个用于创建交互式仪表板以显示证据合成结果的工作流程,包括三个关键阶段:规划、开发和部署。计划包括定义仪表板目标和关键受众,选择合适的软件(例如Tableau或R Shiny)和准备数据。开发包括设计用户友好的界面和指定交互元素。最后,部署的重点是使其对用户可用并利用用户测试。在所有阶段,我们都强调寻求和整合利益相关者的意见,并将仪表板与目标受众的需求保持一致。为了演示这个工作流程,我们从以前的系统回顾中提供两个例子。第一个仪表板是在Tableau中创建的,展示了一项荟萃分析的结果,以支持美国预防服务工作组关于儿童脂质紊乱筛查的建议,而第二个仪表板利用R Shiny显示了来自K-12学生每周4天学校范围审查的数据。两个仪表板都包含互动元素,以呈现针对不同利益相关者(包括非研究受众)量身定制的复杂证据。交互式仪表板为利益相关者提供了一种用户友好的工具,可以探索与其特定需求相关的数据,从而提高证据综合的效用。可以调整此工作流以创建灵活格式的交互式仪表板,以增加系统审查结果的使用和可访问性。
{"title":"Creating Interactive Data Dashboards for Evidence Syntheses","authors":"Leslie A. Perdue,&nbsp;Shaina D. Trevino,&nbsp;Sean Grant,&nbsp;Jennifer S. Lin,&nbsp;Emily E. Tanner-Smith","doi":"10.1002/cesm.70035","DOIUrl":"https://doi.org/10.1002/cesm.70035","url":null,"abstract":"<p>Systematic review findings are typically disseminated via static outputs, such as scientific manuscripts, which can limit the accessibility and usability for diverse audiences. Interactive data dashboards transform systematic review data into dynamic, user-friendly visualizations, allowing deeper engagement with evidence synthesis findings. We propose a workflow for creating interactive dashboards to display evidence synthesis results, including three key phases: planning, development, and deployment. Planning involves defining the dashboard objectives and key audiences, selecting the appropriate software (e.g., Tableau or R Shiny) and preparing the data. Development includes designing a user-friendly interface and specifying interactive elements. Lastly, deployment focuses on making it available to users and utilizing user-testing. Throughout all phases, we emphasize seeking and incorporating interest-holder input and aligning dashboards with the intended audience's needs. To demonstrate this workflow, we provide two examples from previous systematic reviews. The first dashboard, created in Tableau, presents findings from a meta-analysis to support a U.S. Preventive Services Task Force recommendation on lipid disorder screening in children, while the second utilizes R Shiny to display data from a scoping review on the 4-day school week among K-12 students in the U.S. Both dashboards incorporate interactive elements to present complex evidence tailored to different interest-holders, including non-research audiences. Interactive dashboards can enhance the utility of evidence syntheses by providing a user-friendly tool for interest-holders to explore data relevant to their specific needs. This workflow can be adapted to create interactive dashboards in flexible formats to increase the use and accessibility of systematic review findings.</p>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70035","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144472977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Extractions Using a Large Language Model (Elicit) and Human Reviewers in Randomized Controlled Trials: A Systematic Comparison 随机对照试验中使用大型语言模型(Elicit)和人工审稿人的数据提取:系统比较
Pub Date : 2025-06-08 DOI: 10.1002/cesm.70033
Joleen Bianchi, Julian Hirt, Magdalena Vogt, Janine Vetsch

Aim

We aimed at comparing data extractions from randomized controlled trials by using Elicit and human reviewers.

Background

Elicit is an artificial intelligence tool which may automate specific steps in conducting systematic reviews. However, the tool's performance and accuracy have not been independently assessed.

Methods

For comparison, we sampled 20 randomized controlled trials of which data were extracted manually from a human reviewer. We assessed the variables study objectives, sample characteristics and size, study design, interventions, outcome measured, and intervention effects and classified the results into “more,” “equal to,” “partially equal,” and “deviating” extractions. STROBE checklist was used to report the study.

Results

We analysed 20 randomized controlled trials from 11 countries. The studies covered diverse healthcare topics. Across all seven variables, Elicit extracted “more” data in 29.3% of cases, “equal” in 20.7%, “partially equal” in 45.7%, and “deviating” in 4.3%. Elicit provided “more” information for the variable study design (100%) and sample characteristics (45%). In contrast, for more nuanced variables, such as “intervention effects,” Elicit's extractions were less detailed, with 95% rated as “partially equal.”

Conclusions

Elicit was capable of extracting data partly correct for our predefined variables. Variables like “intervention effect” or “intervention” may require a human reviewer to complete the data extraction. Our results suggest that verification by human reviewers is necessary to ensure that all relevant information is captured completely and correctly by Elicit.

Implications

Systematic reviews are labor-intensive. Data extraction process may be facilitated by artificial intelligence tools. Use of Elicit may require a human reviewer to double-check the extracted data.

我们的目的是通过使用Elicit和人工审稿人来比较随机对照试验的数据提取。Elicit是一种人工智能工具,它可以自动执行系统审查中的特定步骤。然而,该工具的性能和准确性尚未得到独立评估。方法为了进行比较,我们选取了20个随机对照试验,这些试验的数据都是人工从审稿人那里提取的。我们评估了研究目标、样本特征和规模、研究设计、干预措施、测量结果和干预效果等变量,并将结果分为“更多”、“相等”、“部分相等”和“偏离”提取。采用STROBE检查表进行研究报告。结果我们分析了来自11个国家的20个随机对照试验。这些研究涵盖了不同的医疗保健主题。在所有7个变量中,Elicit提取“更多”数据的情况占29.3%,“相等”的情况占20.7%,“部分相等”的情况占45.7%,“偏离”的情况占4.3%。Elicit为变量研究设计(100%)和样本特征(45%)提供了“更多”信息。相比之下,对于更细微的变量,如“干预效应”,Elicit的提取就不那么详细了,95%的人被评为“部分相等”。得出的结论是,Elicit能够提取出部分符合我们预定义变量的数据。诸如“干预效果”或“干预”之类的变量可能需要人工审阅人员来完成数据提取。我们的结果表明,人工审稿人的验证是必要的,以确保所有相关信息被Elicit完整而正确地捕获。系统审查是劳动密集型的。人工智能工具可以促进数据提取过程。使用Elicit可能需要人工审查人员对提取的数据进行双重检查。
{"title":"Data Extractions Using a Large Language Model (Elicit) and Human Reviewers in Randomized Controlled Trials: A Systematic Comparison","authors":"Joleen Bianchi,&nbsp;Julian Hirt,&nbsp;Magdalena Vogt,&nbsp;Janine Vetsch","doi":"10.1002/cesm.70033","DOIUrl":"https://doi.org/10.1002/cesm.70033","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Aim</h3>\u0000 \u0000 <p>We aimed at comparing data extractions from randomized controlled trials by using Elicit and human reviewers.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Background</h3>\u0000 \u0000 <p>Elicit is an artificial intelligence tool which may automate specific steps in conducting systematic reviews. However, the tool's performance and accuracy have not been independently assessed.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>For comparison, we sampled 20 randomized controlled trials of which data were extracted manually from a human reviewer. We assessed the variables study objectives, sample characteristics and size, study design, interventions, outcome measured, and intervention effects and classified the results into “more,” “equal to,” “partially equal,” and “deviating” extractions. STROBE checklist was used to report the study.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>We analysed 20 randomized controlled trials from 11 countries. The studies covered diverse healthcare topics. Across all seven variables, Elicit extracted “more” data in 29.3% of cases, “equal” in 20.7%, “partially equal” in 45.7%, and “deviating” in 4.3%. Elicit provided “more” information for the variable study design (100%) and sample characteristics (45%). In contrast, for more nuanced variables, such as “intervention effects,” Elicit's extractions were less detailed, with 95% rated as “partially equal.”</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusions</h3>\u0000 \u0000 <p>Elicit was capable of extracting data partly correct for our predefined variables. Variables like “intervention effect” or “intervention” may require a human reviewer to complete the data extraction. Our results suggest that verification by human reviewers is necessary to ensure that all relevant information is captured completely and correctly by Elicit.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Implications</h3>\u0000 \u0000 <p>Systematic reviews are labor-intensive. Data extraction process may be facilitated by artificial intelligence tools. Use of Elicit may require a human reviewer to double-check the extracted data.</p>\u0000 </section>\u0000 </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70033","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144244667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using GPT-4 for Title and Abstract Screening in a Literature Review of Public Policies: A Feasibility Study 在公共政策文献综述中使用GPT-4筛选标题和摘要的可行性研究
Pub Date : 2025-05-22 DOI: 10.1002/cesm.70031
Max Rubinstein, Sean Grant, Beth Ann Griffin, Seema Choksy Pessar, Bradley D. Stein

Introduction

We describe the first known use of large language models (LLMs) to screen titles and abstracts in a review of public policy literature. Our objective was to assess the percentage of articles GPT-4 recommended for exclusion that should have been included (“false exclusion rate”).

Methods

We used GPT-4 to exclude articles from a database for a literature review of quantitative evaluations of federal and state policies addressing the opioid crisis. We exported our bibliographic database to a CSV file containing titles, abstracts, and keywords and asked GPT-4 to recommend whether to exclude each article. We conducted a preliminary testing of these recommendations using a subset of articles and a final test on a sample of the entire database. We designated a false exclusion rate of 10% as an adequate performance threshold.

Results

GPT-4 recommended excluding 41,742 of the 43,480 articles (96%) containing an abstract. Our preliminary test identified only one false exclusion; our final test identified no false exclusions, yielding an estimated false exclusion rate of 0.00 [0.00, 0.05]. Fewer than 1%—417 of the 41,742 articles—were incorrectly excluded. After manually assessing the eligibility of all remaining articles, we identified 608 of the 1738 articles that GPT-4 did not exclude: 65% of the articles recommended for inclusion should have been excluded.

Discussion/Conclusions

GPT-4 performed well at recommending articles to exclude from our literature review, resulting in substantial time and cost savings. A key limitation is that we did not use GPT-4 to determine inclusions, nor did our model perform well on this task. However, GPT-4 dramatically reduced the number of articles requiring review. Systematic reviewers should conduct performance evaluations to ensure that an LLM meets a minimally acceptable quality standard before relying on its recommendations.

我们描述了在公共政策文献综述中首次使用大型语言模型(llm)来筛选标题和摘要。我们的目的是评估GPT-4推荐排除的文章本应纳入的百分比(“误排除率”)。方法:我们使用GPT-4从数据库中排除文献,对联邦和州解决阿片类药物危机的政策进行定量评估。我们将书目数据库导出为包含标题、摘要和关键词的CSV文件,并要求GPT-4建议是否排除每篇文章。我们使用文章子集对这些建议进行了初步测试,并对整个数据库的样本进行了最终测试。我们将假排除率指定为10%作为适当的性能阈值。结果GPT-4建议从43480篇包含摘要的文章中剔除41742篇(96%)。我们的初步测试只发现了一个错误排除;我们的最终测试没有发现假排除,估计假排除率为0.00[0.00,0.05]。41742篇文章中只有不到1%(417篇)被错误地排除在外。在人工评估所有剩余文献的合格性后,我们从1738篇GPT-4未排除的文献中确定了608篇:65%推荐纳入的文献本应被排除。GPT-4在推荐从我们的文献综述中排除的文章方面表现良好,从而节省了大量的时间和成本。一个关键的限制是,我们没有使用GPT-4来确定夹杂物,我们的模型也没有很好地完成这项任务。然而,GPT-4大大减少了需要审查的文章数量。系统审稿人应该进行绩效评估,以确保LLM在依赖其建议之前满足最低可接受的质量标准。
{"title":"Using GPT-4 for Title and Abstract Screening in a Literature Review of Public Policies: A Feasibility Study","authors":"Max Rubinstein,&nbsp;Sean Grant,&nbsp;Beth Ann Griffin,&nbsp;Seema Choksy Pessar,&nbsp;Bradley D. Stein","doi":"10.1002/cesm.70031","DOIUrl":"https://doi.org/10.1002/cesm.70031","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Introduction</h3>\u0000 \u0000 <p>We describe the first known use of large language models (LLMs) to screen titles and abstracts in a review of public policy literature. Our objective was to assess the percentage of articles GPT-4 recommended for exclusion that should have been included (“false exclusion rate”).</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>We used GPT-4 to exclude articles from a database for a literature review of quantitative evaluations of federal and state policies addressing the opioid crisis. We exported our bibliographic database to a CSV file containing titles, abstracts, and keywords and asked GPT-4 to recommend whether to exclude each article. We conducted a preliminary testing of these recommendations using a subset of articles and a final test on a sample of the entire database. We designated a false exclusion rate of 10% as an adequate performance threshold.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>GPT-4 recommended excluding 41,742 of the 43,480 articles (96%) containing an abstract. Our preliminary test identified only one false exclusion; our final test identified no false exclusions, yielding an estimated false exclusion rate of 0.00 [0.00, 0.05]. Fewer than 1%—417 of the 41,742 articles—were incorrectly excluded. After manually assessing the eligibility of all remaining articles, we identified 608 of the 1738 articles that GPT-4 did not exclude: 65% of the articles recommended for inclusion should have been excluded.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Discussion/Conclusions</h3>\u0000 \u0000 <p>GPT-4 performed well at recommending articles to exclude from our literature review, resulting in substantial time and cost savings. A key limitation is that we did not use GPT-4 to determine inclusions, nor did our model perform well on this task. However, GPT-4 dramatically reduced the number of articles requiring review. Systematic reviewers should conduct performance evaluations to ensure that an LLM meets a minimally acceptable quality standard before relying on its recommendations.</p>\u0000 </section>\u0000 </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70031","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144108810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Cochrane Evidence Synthesis and Methods
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1