首页 > 最新文献

Cochrane Evidence Synthesis and Methods最新文献

英文 中文
Promoting the Implementation of Co-Produced Cochrane Evidence: An Exploratory Study of Improving Partnering With Consumers 促进共同生产科克伦证据的实施:改善与消费者合作的探索性研究。
Pub Date : 2026-02-03 DOI: 10.1002/cesm.70071
Bronwen Merner, Louisa Walsh, Janet Jull, Nora Refahi, Vasileios Tsialtas, Benjamin Shemesh, Mel Kotze, Rebecca Ryan

Introduction

Co-production of evidence syntheses has the potential to facilitate translation of research findings into policy and practice. However, few studies have explored the process of implementing co-produced evidence. This gap limits our understanding of how, and to what extent, co-production promotes knowledge translation.

In this study, we used an implementation science lens to explore factors influencing the implementation of the Best Practice Principles in partnering with consumers (BPP) in hospitals in Melbourne, Australia. The BPP were developed as part of a co-produced Cochrane qualitative evidence synthesis exploring consumers' and health providers' experiences and perceptions of partnering. We use the findings of our study to develop strategies for evidence synthesis teams engaged in co-production to optimize the implementation of their review findings.

Methods

This exploratory, qualitative study was informed by cooperative inquiry and normalization process theory (NPT). A six-member panel, including researchers, policy makers and consumers, guided data collection and analysis. Data collection involved semi-structured interviews with eleven participants (including consumer engagement leads, consumer representatives, and a policymaker) about how to implement the BPP in Melbourne hospitals. Interviews were analyzed using framework analysis.

Results

Interview participants reported the BPP were relevant to practice, consumer-centered, practical, and flexible. There were several additional factors that could impact their uptake into practice. These included integration of the BPP into government policies and guidelines, evidence of the cost/benefit of BPP implementation, endorsement from health service leadership, involvement of consumers throughout the implementation process, a structured implementation, and flexible measurement of implementation success.

Conclusion

This exploratory study suggested that the BPP, a tool developed through a co-produced Cochrane qualitative evidence synthesis, promoted knowledge translation. Other factors at the macro- (political and economic), meso- (systems and organizations), and micro- (individual) levels could influence the implementation's success. Implications for evidence synthesis teams aiming to optimize the knowledge translation of their review results are discussed.

导言:证据综合的联合制作有可能促进将研究成果转化为政策和实践。然而,很少有研究探索实施共同产生证据的过程。这一差距限制了我们对合拍片如何以及在多大程度上促进知识翻译的理解。在这项研究中,我们使用实施科学的视角来探索影响澳大利亚墨尔本医院在与消费者合作(BPP)中实施最佳实践原则的因素。BPP是作为共同制作的Cochrane定性证据综合的一部分开发的,该综合探讨了消费者和卫生服务提供者对合作的经验和看法。我们利用我们的研究结果为参与联合制作的证据合成团队制定策略,以优化其审查结果的实施。方法:采用合作问询与规范化过程理论(NPT)进行探索性定性研究。一个包括研究人员、政策制定者和消费者在内的六人小组指导数据收集和分析。数据收集涉及与11位参与者(包括消费者参与领导、消费者代表和政策制定者)进行的半结构化访谈,内容涉及如何在墨尔本医院实施BPP。访谈采用框架分析法进行分析。结果:受访者报告BPP与实践相关,以消费者为中心,实用,灵活。还有几个其他因素可能会影响他们的实践。这些措施包括将BPP纳入政府政策和指导方针、BPP实施成本/收益的证据、卫生服务部门领导的认可、消费者在整个实施过程中的参与、结构化的实施以及对实施成功的灵活衡量。结论:本探索性研究表明,通过Cochrane合作开发的定性证据合成工具BPP促进了知识翻译。宏观(政治和经济)、中观(系统和组织)和微观(个人)层面的其他因素可能影响实施的成功。对旨在优化其综述结果的知识转化的证据合成团队的影响进行了讨论。
{"title":"Promoting the Implementation of Co-Produced Cochrane Evidence: An Exploratory Study of Improving Partnering With Consumers","authors":"Bronwen Merner,&nbsp;Louisa Walsh,&nbsp;Janet Jull,&nbsp;Nora Refahi,&nbsp;Vasileios Tsialtas,&nbsp;Benjamin Shemesh,&nbsp;Mel Kotze,&nbsp;Rebecca Ryan","doi":"10.1002/cesm.70071","DOIUrl":"10.1002/cesm.70071","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Introduction</h3>\u0000 \u0000 <p>Co-production of evidence syntheses has the potential to facilitate translation of research findings into policy and practice. However, few studies have explored the process of implementing co-produced evidence. This gap limits our understanding of how, and to what extent, co-production promotes knowledge translation.</p>\u0000 \u0000 <p>In this study, we used an implementation science lens to explore factors influencing the implementation of the Best Practice Principles in partnering with consumers (BPP) in hospitals in Melbourne, Australia. The BPP were developed as part of a co-produced Cochrane qualitative evidence synthesis exploring consumers' and health providers' experiences and perceptions of partnering. We use the findings of our study to develop strategies for evidence synthesis teams engaged in co-production to optimize the implementation of their review findings.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>This exploratory, qualitative study was informed by cooperative inquiry and normalization process theory (NPT). A six-member panel, including researchers, policy makers and consumers, guided data collection and analysis. Data collection involved semi-structured interviews with eleven participants (including consumer engagement leads, consumer representatives, and a policymaker) about how to implement the BPP in Melbourne hospitals. Interviews were analyzed using framework analysis.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>Interview participants reported the BPP were relevant to practice, consumer-centered, practical, and flexible. There were several additional factors that could impact their uptake into practice. These included integration of the BPP into government policies and guidelines, evidence of the cost/benefit of BPP implementation, endorsement from health service leadership, involvement of consumers throughout the implementation process, a structured implementation, and flexible measurement of implementation success.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusion</h3>\u0000 \u0000 <p>This exploratory study suggested that the BPP, a tool developed through a co-produced Cochrane qualitative evidence synthesis, promoted knowledge translation. Other factors at the macro- (political and economic), meso- (systems and organizations), and micro- (individual) levels could influence the implementation's success. Implications for evidence synthesis teams aiming to optimize the knowledge translation of their review results are discussed.</p>\u0000 </section>\u0000 </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"4 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12865661/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to “Sensitivity Analysis in Meta-Analysis: A Tutorial” 修正“meta分析中的敏感性分析:教程”。
Pub Date : 2026-01-28 DOI: 10.1002/cesm.70070

N. M. Aung, I. Jurak, S. Mehmood, and E. Axon, “Sensitivity Analysis in Meta-Analysis: A Tutorial,” Cochrane Evidence Synthesis and Methods 4 (2026): 1–7. https://doi.org/10.1002/cesm.70067.

The article category has been corrected from “METHODS ARTICLE” to “TUTORIAL.”

We apologize for this error.

[这更正了文章DOI: 10.1002/cesm.70067.]。
{"title":"Correction to “Sensitivity Analysis in Meta-Analysis: A Tutorial”","authors":"","doi":"10.1002/cesm.70070","DOIUrl":"10.1002/cesm.70070","url":null,"abstract":"<p>N. M. Aung, I. Jurak, S. Mehmood, and E. Axon, “Sensitivity Analysis in Meta-Analysis: A Tutorial,” <i>Cochrane Evidence Synthesis and Methods</i> 4 (2026): 1–7. https://doi.org/10.1002/cesm.70067.</p><p>The article category has been corrected from “METHODS ARTICLE” to “TUTORIAL.”</p><p>We apologize for this error.</p>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"4 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12850237/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146088712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Don't Stop Me Now, `Cause I'm Having a Good Time Screening: Evaluation of Stopping Methods for Safe Use of Priority Screening in Systematic Reviews 不要现在阻止我,因为我玩得很开心筛选:在系统评价中优先筛选安全使用的停止方法的评估。
Pub Date : 2026-01-21 DOI: 10.1002/cesm.70068
Tim Repke, Francesca Tinsdeall, Diana Danilenko, Sergio Graziosi, Finn Müller-Hansen, Lena Schmidt, James Thomas, Gert van Valkenhoef

Introduction

Priority screening has the potential to reduce the number of records that need to be annotated in systematic literature reviews. So-called technology-assisted reviews (TAR) use machine-learning with prior include/exclude annotations to continuously rank unseen records by their predicted relevance to find relevant records earlier. In this article, we present a systematic evaluation of methods to determine when it is safe to stop screening when using prioritization.

Methods

We implement an open-source evaluation framework that features a novel method to generate rankings and simulate priority screening processes for 81 real-world data sets. We use these simulations to evaluate 15 statistical or rule-based (heuristic) stopping methods, testing a range of hyperparameters for each.

Results

The work-saving potential and performance of stopping criteria heavily rely on “good” rankings, which are typically not achieved by a single ranking algorithm across the entire screening process. Our evaluation shows that almost all existing stopping methods either fail to reliably stop without missing relevant records or fail to utilize the full potential work-savings. Only one method reliably meets the set recall target, but stops conservatively.

Conclusions

Many digital evidence synthesis tools provide priority screening features that are already used in many research projects. However, the theoretical work-savings demonstrated in retrospective simulations of prioritization can only be unlocked with safe and reproducible stopping criteria. Our results highlight the need for improved stopping methods and guidelines on how to responsibly use priority screening. We also urge screening platforms to provide indicators and authors to transparently report metrics when automating (parts of) their synthesis.

优先筛选有可能减少系统文献综述中需要注释的记录数量。所谓的技术辅助审查(TAR)使用机器学习和预先包含/排除注释,根据预测的相关性对未见记录进行连续排序,以便更早地找到相关记录。在这篇文章中,我们提出了一个系统的评估方法,以确定何时是安全的停止筛选时,使用优先级。方法:我们实现了一个开源评估框架,该框架采用了一种新颖的方法来生成81个真实世界数据集的排名和模拟优先筛选过程。我们使用这些模拟来评估15种统计或基于规则的(启发式)停止方法,并为每种方法测试一系列超参数。结果:停止标准的节省工作潜力和性能严重依赖于“好”排名,这通常不是通过整个筛选过程中的单一排名算法实现的。我们的评估表明,几乎所有现有的停止方法要么不能可靠地停止而不丢失相关记录,要么不能充分利用潜在的工作节省。只有一种方法可靠地满足设定的召回目标,但会保守地停止。结论:许多数字证据合成工具提供了已经在许多研究项目中使用的优先筛选功能。然而,在回溯性的优先级模拟中,理论上的工作节省只能通过安全和可重复的停止标准来实现。我们的结果强调需要改进停止方法和指导方针,如何负责任地使用优先筛选。我们还敦促筛选平台在自动化(部分)其合成时提供指标和作者透明地报告指标。
{"title":"Don't Stop Me Now, `Cause I'm Having a Good Time Screening: Evaluation of Stopping Methods for Safe Use of Priority Screening in Systematic Reviews","authors":"Tim Repke,&nbsp;Francesca Tinsdeall,&nbsp;Diana Danilenko,&nbsp;Sergio Graziosi,&nbsp;Finn Müller-Hansen,&nbsp;Lena Schmidt,&nbsp;James Thomas,&nbsp;Gert van Valkenhoef","doi":"10.1002/cesm.70068","DOIUrl":"10.1002/cesm.70068","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Introduction</h3>\u0000 \u0000 <p>Priority screening has the potential to reduce the number of records that need to be annotated in systematic literature reviews. So-called technology-assisted reviews (TAR) use machine-learning with prior include/exclude annotations to continuously rank unseen records by their predicted relevance to find relevant records earlier. In this article, we present a systematic evaluation of methods to determine when it is safe to stop screening when using prioritization.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>We implement an open-source evaluation framework that features a novel method to generate rankings and simulate priority screening processes for 81 real-world data sets. We use these simulations to evaluate 15 statistical or rule-based (heuristic) stopping methods, testing a range of hyperparameters for each.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>The work-saving potential and performance of stopping criteria heavily rely on “good” rankings, which are typically not achieved by a single ranking algorithm across the entire screening process. Our evaluation shows that almost all existing stopping methods either fail to reliably stop without missing relevant records or fail to utilize the full potential work-savings. Only one method reliably meets the set recall target, but stops conservatively.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusions</h3>\u0000 \u0000 <p>Many digital evidence synthesis tools provide priority screening features that are already used in many research projects. However, the theoretical work-savings demonstrated in retrospective simulations of prioritization can only be unlocked with safe and reproducible stopping criteria. Our results highlight the need for improved stopping methods and guidelines on how to responsibly use priority screening. We also urge screening platforms to provide indicators and authors to transparently report metrics when automating (parts of) their synthesis.</p>\u0000 </section>\u0000 </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12825451/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146055756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Methods of Engaging Interest-Holders in Healthcare Evidence Syntheses: A Scoping Review 医疗证据综合中利益相关者参与的方法:范围综述
Pub Date : 2026-01-15 DOI: 10.1002/cesm.70066
Alex Todhunter-Brown, Jennifer Petkovic, Christine Chang, Ursula Griebler, Ailish Hannigan, Jennifer Hilgart, Basharat Hussain, Janet Jull, Christina Koscher-Kien, Dominic Ledinger, Barbara Nussbaumer-Streit, Oyekola Oloyede, Eve Tomlinson, Shoba Dawson, Omar Dewidar, Sean Grant, Lyubov Lytvyn, Thomas W. Concannon, Leonila Dans, Denny John, Zoe Jordan, Evan Mayo-Wilson, Chris McCutcheon, Francesco Nonino, Danielle Pollock, Karine Toupin April, Pauline Campbell, Joanne Khabsa, Olivia Magwood, Vivian Welch, Peter Tugwell
<div> <section> <h3> Introduction</h3> <p>Engaging interest-holders in health care evidence syntheses may make evidence syntheses more relevant, useful, and accessible. However, the best way(s) to engage interest-holders within the evidence synthesis process remain unknown. A previous scoping review collated 291 publications that reported interest-holder engagement in evidence syntheses, but conclusions were limited due to poor reporting. In the present scoping review, our aim was to identify and collate up-to-date publications focussed on interest-holder engagement in healthcare evidence syntheses, describe reported methods of engagement, and compare the results with those from the previous review.</p> </section> <section> <h3> Methods</h3> <p>We updated a scoping review, following JBI guidance, using a pre-published protocol that defined all key terminology in this field. We systematically searched five electronic databases (MEDLINE, CINAHL, EMBASE, PsycInfo, and SCOPUS). Searches were conducted from January 2016 to February 2024. Records were imported into Covidence and screened by pairs of independent reviewers, including any publications that reported engagement of interest-holders in evidence syntheses. We extracted and coded key data relating to the evidence synthesis topic and ACTIVE framework domains (who was engaged, when, and in what way). Two reviewers independently made a judgment of the comprehensiveness of the description of methods of engagement, using a “traffic-light” system, coding evidence syntheses with comprehensive descriptions as “green,” brief or partial descriptions as “amber,” and those with few details as “red”; disagreements were resolved through discussion. Additional detailed data relating to the engagement methods were extracted from “green” evidence syntheses. Any disagreements were resolved through discussion. Data were synthesized within tables, and narrative summaries were written to provide an overview of key methods of engaging interest-holders within the identified evidence syntheses.</p> </section> <section> <h3> Results</h3> <p>We identified 302 publications published since the previous review. Most (272/302, 90%) reported interest-holder engagement in a single evidence synthesis; of these, 74% (200/272) engaged patients and/or their carers, while 17% (46/272) engaged other interest-holders only, and the remainder (26/272, 9.6%) was unclear. Over three-quarters of the evidence syntheses were conducted either in the United Kingdom, United States, Canada, or Australia (215/272, 79%). Most often (113/272, 42%), interest-holders were engaged at both the initial (scope and question setting) <i>and</i> final (inte
让利益攸关方参与卫生保健证据综合可使证据综合更具相关性、实用性和可及性。然而,让利益相关者参与证据合成过程的最佳方式仍然未知。先前的范围审查整理了291份报告利益持有人参与证据合成的出版物,但由于报告不力,结论有限。在当前的范围综述中,我们的目的是识别和整理关注医疗保健证据综合中利益相关者参与的最新出版物,描述报道的参与方法,并将结果与先前综述的结果进行比较。方法:根据JBI指南,使用预先发布的协议更新了范围综述,该协议定义了该领域的所有关键术语。我们系统地检索了5个电子数据库(MEDLINE、CINAHL、EMBASE、PsycInfo和SCOPUS)。搜索于2016年1月至2024年2月进行。将记录输入到covid中,并由独立审稿人对其进行筛选,包括报告利益相关者参与证据合成的任何出版物。我们提取并编码了与证据合成主题和ACTIVE框架域(谁参与、何时参与、以何种方式参与)相关的关键数据。两名审稿人独立对参与方式描述的全面性作出判断,采用“红绿灯”制度,将描述全面的证据综合编码为“绿色”,将描述简短或部分的证据综合编码为“琥珀色”,将描述细节较少的证据综合编码为“红色”;分歧通过讨论得到解决。从“绿色”证据合成中提取了与审计业务方法有关的其他详细数据。任何分歧都是通过讨论解决的。在表格中对数据进行了综合,并编写了叙述性摘要,概述了在已确定的证据综合中吸引利益相关者的关键方法。结果:我们确定了自上次综述以来发表的302篇文献。大多数(272/302,90%)报告利益持有人参与了单一证据合成;其中,74%(200/272)涉及患者和/或其护理人员,17%(46/272)仅涉及其他利益相关者,其余(26/272,9.6%)不清楚。超过四分之三的证据合成是在英国、美国、加拿大或澳大利亚进行的(215/272,79%)。大多数情况下(113/272,42%),利益相关者参与了最初(范围和问题设置)和最终(结果解释)审查阶段(称为“顶部和尾部”方法)。19%(51/272)被判定为提供了一种或多种方法(s)或方法(es)的全面(“绿色”)描述,以参与证据合成,能够进行详细的数据提取和描述。大多数:参与患者/公众成员和其他利益相关者群体(30/51,59%);采用“封闭式”招聘策略(30/51,59%);在解释调查结果阶段让利益相关者参与(39/51,76%);至少有一个利息持有人作为共同作者(27/51,52%)。利益相关者一般出席不采用正式接触方式的会议。在整个审查过程中,让利益相关者参与多种活动是很常见的。我们来自MuSE联盟的国际团队更新了之前的范围审查,汇编了有关利益相关者参与证据合成的最新证据。我们整理了302份出版物,并描述了51份证据综合报告中利益相关者参与的方法,我们认为这些证据综合提供了最全面的信息。利益相关者参与了这一过程的所有阶段,使用了广泛的参与方法,但没有与证据综合的类型或重点相关的明确模式。最常见的是,患者/公众和专业利益相关者都参与其中,但我们的例子中约有四分之一只涉及患者/公众成员,少数只涉及专业利益相关者。我们确定了一些不同的参与策略,并利用这些策略为潜在的决策工具提供信息,以支持参与策略的选择。我们就利益相关者参与证据综合和未来研究的行为和报告提出建议,以推进这一领域的发展。
{"title":"Methods of Engaging Interest-Holders in Healthcare Evidence Syntheses: A Scoping Review","authors":"Alex Todhunter-Brown,&nbsp;Jennifer Petkovic,&nbsp;Christine Chang,&nbsp;Ursula Griebler,&nbsp;Ailish Hannigan,&nbsp;Jennifer Hilgart,&nbsp;Basharat Hussain,&nbsp;Janet Jull,&nbsp;Christina Koscher-Kien,&nbsp;Dominic Ledinger,&nbsp;Barbara Nussbaumer-Streit,&nbsp;Oyekola Oloyede,&nbsp;Eve Tomlinson,&nbsp;Shoba Dawson,&nbsp;Omar Dewidar,&nbsp;Sean Grant,&nbsp;Lyubov Lytvyn,&nbsp;Thomas W. Concannon,&nbsp;Leonila Dans,&nbsp;Denny John,&nbsp;Zoe Jordan,&nbsp;Evan Mayo-Wilson,&nbsp;Chris McCutcheon,&nbsp;Francesco Nonino,&nbsp;Danielle Pollock,&nbsp;Karine Toupin April,&nbsp;Pauline Campbell,&nbsp;Joanne Khabsa,&nbsp;Olivia Magwood,&nbsp;Vivian Welch,&nbsp;Peter Tugwell","doi":"10.1002/cesm.70066","DOIUrl":"https://doi.org/10.1002/cesm.70066","url":null,"abstract":"&lt;div&gt;\u0000 \u0000 \u0000 &lt;section&gt;\u0000 \u0000 &lt;h3&gt; Introduction&lt;/h3&gt;\u0000 \u0000 &lt;p&gt;Engaging interest-holders in health care evidence syntheses may make evidence syntheses more relevant, useful, and accessible. However, the best way(s) to engage interest-holders within the evidence synthesis process remain unknown. A previous scoping review collated 291 publications that reported interest-holder engagement in evidence syntheses, but conclusions were limited due to poor reporting. In the present scoping review, our aim was to identify and collate up-to-date publications focussed on interest-holder engagement in healthcare evidence syntheses, describe reported methods of engagement, and compare the results with those from the previous review.&lt;/p&gt;\u0000 &lt;/section&gt;\u0000 \u0000 &lt;section&gt;\u0000 \u0000 &lt;h3&gt; Methods&lt;/h3&gt;\u0000 \u0000 &lt;p&gt;We updated a scoping review, following JBI guidance, using a pre-published protocol that defined all key terminology in this field. We systematically searched five electronic databases (MEDLINE, CINAHL, EMBASE, PsycInfo, and SCOPUS). Searches were conducted from January 2016 to February 2024. Records were imported into Covidence and screened by pairs of independent reviewers, including any publications that reported engagement of interest-holders in evidence syntheses. We extracted and coded key data relating to the evidence synthesis topic and ACTIVE framework domains (who was engaged, when, and in what way). Two reviewers independently made a judgment of the comprehensiveness of the description of methods of engagement, using a “traffic-light” system, coding evidence syntheses with comprehensive descriptions as “green,” brief or partial descriptions as “amber,” and those with few details as “red”; disagreements were resolved through discussion. Additional detailed data relating to the engagement methods were extracted from “green” evidence syntheses. Any disagreements were resolved through discussion. Data were synthesized within tables, and narrative summaries were written to provide an overview of key methods of engaging interest-holders within the identified evidence syntheses.&lt;/p&gt;\u0000 &lt;/section&gt;\u0000 \u0000 &lt;section&gt;\u0000 \u0000 &lt;h3&gt; Results&lt;/h3&gt;\u0000 \u0000 &lt;p&gt;We identified 302 publications published since the previous review. Most (272/302, 90%) reported interest-holder engagement in a single evidence synthesis; of these, 74% (200/272) engaged patients and/or their carers, while 17% (46/272) engaged other interest-holders only, and the remainder (26/272, 9.6%) was unclear. Over three-quarters of the evidence syntheses were conducted either in the United Kingdom, United States, Canada, or Australia (215/272, 79%). Most often (113/272, 42%), interest-holders were engaged at both the initial (scope and question setting) &lt;i&gt;and&lt;/i&gt; final (inte","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70066","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146016391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Systematic Reviews as Part of Doctoral Theses and for the Promotion to Associate Professor: A Descriptive Study of University Policies in Sweden 作为博士论文一部分的系统评论和副教授的晋升:瑞典大学政策的描述性研究。
Pub Date : 2026-01-14 DOI: 10.1002/cesm.70069
Martin Ringsten, Lea Styrmisdottir, Matilda Naesström, Minna Johansson, Matteo Bruschettini, Susanna M. Wallerstedt

Background

Almost a decade ago, about half of biomedical PhD programs across Europe specifically stated that systematic reviews could not be accepted as part of a doctoral thesis, illustrating limited merit value at that time. The aim of this study was to explore current Swedish university policies on this research design.

Methods

Policy documents for PhD theses and applications to associate professor positions were obtained from all medical faculties at universities in Sweden. Instructions regarding systematic reviews, with focus on their merit value and related aspects, were independently extracted and categorized by two authors, with discrepancies resolved in consensus discussions.

Results

All seven medical faculties accepted at least one systematic review within a PhD thesis, five restricted the number of such studies accepted, and five provided instructions regarding this study design. Regarding policies for promotion to associate professor, six medical faculties accepted at least one published systematic review to merit recognition―the remaining one required meta-analyses for acceptance―and three explicitly restricted the number of systematic reviews. No restrictions or guidance were provided for other designs intended to answer specific research questions.

Conclusion

As of 2025, systematic reviews appear to be generally recognized as contributing to authors' academic merit. For this research design exclusively, some universities impose restrictions that may limit their recognition, and some provide guidance which may help ensure quality in reporting. These findings may encourage research to evaluate the merit value of systematic reviews in other settings, and to examine potential implications of restrictions and guidance in policy documents.

背景:大约十年前,欧洲大约一半的生物医学博士课程特别声明,系统评论不能作为博士论文的一部分,这表明当时的价值有限。本研究的目的是探讨当前瑞典大学对这一研究设计的政策。方法:收集瑞典各大学医学院博士论文和副教授职位申请政策文件。关于系统评价的说明,重点是其优点和相关方面,由两位作者独立提取和分类,在一致讨论中解决差异。结果:所有七所医学院均接受博士论文中至少一篇系统评价,其中五所限制接受此类研究的数量,五所提供有关本研究设计的说明。在副教授晋升政策方面,6所医学院接受了至少一篇已发表的系统综述,以获得认可——其余一所需要荟萃分析才能接受——3所明确限制了系统综述的数量。对于旨在回答特定研究问题的其他设计,没有提供任何限制或指导。结论:截至2025年,系统综述似乎被普遍认为有助于作者的学术价值。对于这种专门的研究设计,有些大学会施加限制,这可能会限制他们的认可,有些大学会提供指导,这可能有助于确保报告的质量。这些发现可能鼓励研究在其他情况下评价系统审查的价值,并审查政策文件中限制和指导的潜在影响。
{"title":"Systematic Reviews as Part of Doctoral Theses and for the Promotion to Associate Professor: A Descriptive Study of University Policies in Sweden","authors":"Martin Ringsten,&nbsp;Lea Styrmisdottir,&nbsp;Matilda Naesström,&nbsp;Minna Johansson,&nbsp;Matteo Bruschettini,&nbsp;Susanna M. Wallerstedt","doi":"10.1002/cesm.70069","DOIUrl":"10.1002/cesm.70069","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Background</h3>\u0000 \u0000 <p>Almost a decade ago, about half of biomedical PhD programs across Europe specifically stated that systematic reviews could not be accepted as part of a doctoral thesis, illustrating limited merit value at that time. The aim of this study was to explore current Swedish university policies on this research design.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>Policy documents for PhD theses and applications to associate professor positions were obtained from all medical faculties at universities in Sweden. Instructions regarding systematic reviews, with focus on their merit value and related aspects, were independently extracted and categorized by two authors, with discrepancies resolved in consensus discussions.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>All seven medical faculties accepted at least one systematic review within a PhD thesis, five restricted the number of such studies accepted, and five provided instructions regarding this study design. Regarding policies for promotion to associate professor, six medical faculties accepted at least one published systematic review to merit recognition―the remaining one required meta-analyses for acceptance―and three explicitly restricted the number of systematic reviews. No restrictions or guidance were provided for other designs intended to answer specific research questions.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusion</h3>\u0000 \u0000 <p>As of 2025, systematic reviews appear to be generally recognized as contributing to authors' academic merit. For this research design exclusively, some universities impose restrictions that may limit their recognition, and some provide guidance which may help ensure quality in reporting. These findings may encourage research to evaluate the merit value of systematic reviews in other settings, and to examine potential implications of restrictions and guidance in policy documents.</p>\u0000 </section>\u0000 </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12806540/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146000321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Co-production Evaluation Tool Informed by Co-production Workshops for Use in Evidence Synthesis Contexts 由联合制作研讨会提供的用于证据综合的联合制作评估工具。
Pub Date : 2026-01-07 DOI: 10.1002/cesm.70065
Meena Khatwa, Vanessa Bennett, Rachael C. Edwards, Lisa Richardson, Phuong Tu Nguyen, Sajid Saleem, Sylvia Chaires, Alison O'Mara-Eves, Dylan Kneale
<div> <section> <h3> Aim</h3> <p>We aimed to co-produce a tool for evaluating co-production within evidence syntheses.</p> </section> <section> <h3> Background</h3> <p>Participatory approaches are recommended to enhance the salience and quality of evidence syntheses, and there is an increasing onus on co-producing evidence synthesis. Co-production is a way of working where research generators, beneficiaries and other interest holders work in equal partnership and for mutual benefit.</p> </section> <section> <h3> Methods</h3> <div>To develop our approach, we: <ul> <li> <p>Examined selected existing tools and frameworks that could be useful in evaluating co-production</p> </li> <li> <p>Developed an initial tool that was then modified through input from co-production workshops</p> </li> <li> <p>Piloted the tool and evaluation approach in a project as part of research involving co-producing a logic model to support evidence syntheses.</p> </li> </ul> </div> </section> <section> <h3> Results</h3> <p>The existing tools guidance and resources we examined were deemed to be oriented towards supporting the conduct and reporting of co-production, rather than evaluating what happens and how. This provided a basis for co-producing a new tool. A new tool was developed that captures our perspectives on: positionality and expertise; motivations and expected benefits; clarity of role and expectations; project involvement and contributions; value and recognition; skills, knowledge, and personal growth; relationships and networking; comfort, support, and accessibility; and decision-making and power sharing. We reflected that the tool and process for administering the tool worked well, and we liked the process of collective sensemaking.</p> </section> <section> <h3> Conclusions</h3> <p>We believe that the tool (which we refer to as the STRAPS tool – Synthesising Through Reflection And Participatory Sense-making) could provide a useful resource and starting point to other review teams who wish to evaluate co-production in their reviews and encourage others to share their experiences with us.</p> </section> <section>
目的:我们的目的是共同制作一个工具来评估证据综合中的共同制作。背景:建议采用参与式方法来提高证据合成的重要性和质量,共同合成证据的责任越来越大。合作生产是一种工作方式,研究产生者、受益者和其他利益相关者以平等的伙伴关系和互利的方式工作。方法:为了开发我们的方法,我们:检查了可用于评估合作生产的现有工具和框架;开发了一个初始工具,然后通过合作生产研讨会的输入进行修改;将该工具和评估方法应用于一个项目中,作为涉及共同生产逻辑模型以支持证据合成的研究的一部分。结果:我们检查的现有工具、指导和资源被认为是面向支持联合生产的行为和报告,而不是评估发生了什么和如何发生。这为共同开发新工具奠定了基础。我们开发了一种新工具,可以捕捉我们对以下方面的看法:定位和专业知识;动机和预期收益;明确角色和期望;参与项目及贡献;价值和认可;技能、知识和个人成长;关系和网络;舒适、支持和可及性;以及决策和权力分享。我们反映了管理工具的工具和过程运行良好,我们喜欢集体意义构建的过程。结论:我们相信该工具(我们称之为“通过反思和参与式意义构建的综合”工具)可以为其他希望在审查中评估合作制作的审查小组提供有用的资源和起点,并鼓励其他人与我们分享他们的经验。启示:联合制作提高了证据综合的质量。使用bands工具可以帮助评审人员使用标准化的方法对过程进行解包。
{"title":"A Co-production Evaluation Tool Informed by Co-production Workshops for Use in Evidence Synthesis Contexts","authors":"Meena Khatwa,&nbsp;Vanessa Bennett,&nbsp;Rachael C. Edwards,&nbsp;Lisa Richardson,&nbsp;Phuong Tu Nguyen,&nbsp;Sajid Saleem,&nbsp;Sylvia Chaires,&nbsp;Alison O'Mara-Eves,&nbsp;Dylan Kneale","doi":"10.1002/cesm.70065","DOIUrl":"10.1002/cesm.70065","url":null,"abstract":"&lt;div&gt;\u0000 \u0000 \u0000 &lt;section&gt;\u0000 \u0000 &lt;h3&gt; Aim&lt;/h3&gt;\u0000 \u0000 &lt;p&gt;We aimed to co-produce a tool for evaluating co-production within evidence syntheses.&lt;/p&gt;\u0000 &lt;/section&gt;\u0000 \u0000 &lt;section&gt;\u0000 \u0000 &lt;h3&gt; Background&lt;/h3&gt;\u0000 \u0000 &lt;p&gt;Participatory approaches are recommended to enhance the salience and quality of evidence syntheses, and there is an increasing onus on co-producing evidence synthesis. Co-production is a way of working where research generators, beneficiaries and other interest holders work in equal partnership and for mutual benefit.&lt;/p&gt;\u0000 &lt;/section&gt;\u0000 \u0000 &lt;section&gt;\u0000 \u0000 &lt;h3&gt; Methods&lt;/h3&gt;\u0000 \u0000 &lt;div&gt;To develop our approach, we:\u0000\u0000 &lt;ul&gt;\u0000 \u0000 &lt;li&gt;\u0000 &lt;p&gt;Examined selected existing tools and frameworks that could be useful in evaluating co-production&lt;/p&gt;\u0000 &lt;/li&gt;\u0000 \u0000 &lt;li&gt;\u0000 &lt;p&gt;Developed an initial tool that was then modified through input from co-production workshops&lt;/p&gt;\u0000 &lt;/li&gt;\u0000 \u0000 &lt;li&gt;\u0000 &lt;p&gt;Piloted the tool and evaluation approach in a project as part of research involving co-producing a logic model to support evidence syntheses.&lt;/p&gt;\u0000 &lt;/li&gt;\u0000 &lt;/ul&gt;\u0000 &lt;/div&gt;\u0000 &lt;/section&gt;\u0000 \u0000 &lt;section&gt;\u0000 \u0000 &lt;h3&gt; Results&lt;/h3&gt;\u0000 \u0000 &lt;p&gt;The existing tools guidance and resources we examined were deemed to be oriented towards supporting the conduct and reporting of co-production, rather than evaluating what happens and how. This provided a basis for co-producing a new tool. A new tool was developed that captures our perspectives on: positionality and expertise; motivations and expected benefits; clarity of role and expectations; project involvement and contributions; value and recognition; skills, knowledge, and personal growth; relationships and networking; comfort, support, and accessibility; and decision-making and power sharing. We reflected that the tool and process for administering the tool worked well, and we liked the process of collective sensemaking.&lt;/p&gt;\u0000 &lt;/section&gt;\u0000 \u0000 &lt;section&gt;\u0000 \u0000 &lt;h3&gt; Conclusions&lt;/h3&gt;\u0000 \u0000 &lt;p&gt;We believe that the tool (which we refer to as the STRAPS tool – Synthesising Through Reflection And Participatory Sense-making) could provide a useful resource and starting point to other review teams who wish to evaluate co-production in their reviews and encourage others to share their experiences with us.&lt;/p&gt;\u0000 &lt;/section&gt;\u0000 \u0000 &lt;section&gt;\u0000 ","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12782252/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145954696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sensitivity Analysis in Meta-Analysis: A Tutorial meta分析中的敏感性分析:教程
Pub Date : 2026-01-05 DOI: 10.1002/cesm.70067
Nyan Min Aung, Ivan Jurak, Seemab Mehmood, Emma Axon

This tutorial explains when systematic review authors may consider performing a sensitivity analysis in a meta-analysis. Such scenarios include removing studies at high risk of bias, exploring the effect of outliers and examining differences in study characteristics (e.g., participants’ age, study design). In addition, examples are provided, as well as advice on how to interpret and report the results. The tutorial also explains the differences between subgroup and sensitivity analyses, as well as describing the disadvantages of a sensitivity analysis. To support this tutorial, a link to an online module, which includes videos and quizzes, is also provided.

本教程解释了系统评价作者何时可以考虑在荟萃分析中进行敏感性分析。这些场景包括移除高偏倚风险的研究,探索异常值的影响,以及检查研究特征的差异(例如,参与者的年龄、研究设计)。此外,还提供了示例,以及如何解释和报告结果的建议。本教程还解释了子组分析和敏感性分析之间的区别,并描述了敏感性分析的缺点。为了支持本教程,还提供了一个在线模块的链接,其中包括视频和测验。
{"title":"Sensitivity Analysis in Meta-Analysis: A Tutorial","authors":"Nyan Min Aung,&nbsp;Ivan Jurak,&nbsp;Seemab Mehmood,&nbsp;Emma Axon","doi":"10.1002/cesm.70067","DOIUrl":"https://doi.org/10.1002/cesm.70067","url":null,"abstract":"<p>This tutorial explains when systematic review authors may consider performing a sensitivity analysis in a meta-analysis. Such scenarios include removing studies at high risk of bias, exploring the effect of outliers and examining differences in study characteristics (e.g., participants’ age, study design). In addition, examples are provided, as well as advice on how to interpret and report the results. The tutorial also explains the differences between subgroup and sensitivity analyses, as well as describing the disadvantages of a sensitivity analysis. To support this tutorial, a link to an online module, which includes videos and quizzes, is also provided.</p>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70067","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145909311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Responsible Integration of Artificial Intelligence in Rapid Reviews: A Position Statement From the Cochrane Rapid Reviews Methods Group 人工智能在快速评论中的负责任整合:Cochrane快速评论方法组的立场声明
Pub Date : 2025-11-24 DOI: 10.1002/cesm.70063
Gerald Gartlehner, Barbara Nussbaumer-Streit, Candyce Hamel, Chantelle Garritty, Ursula Griebler, Valerie Jean King, Declan Devane, Chris Kamel
<p>Rapidly evolving artificial intelligence (AI) technologies are increasingly used to accelerate literature review processes. A recent review and evidence map identified almost 100 studies published since 2021assessing AI applications in evidence synthesis [<span>1</span>]. These technologies span from machine-learning classifiers to generative large-language models (LLMs). Recently, a preprint reported that a tool powered by LLMs autonomously reproduced and updated 12 Cochrane reviews in just 2 days [<span>2</span>], sparking debate about when and how AI can be used safely and effectively to support systematic and rapid reviews.</p><p>In this position statement, the Cochrane Rapid Reviews Methods Group outlines its stance on the use of AI in rapid reviews. Rapid reviews encompass various types of evidence synthesis, and while some AI tools have been developed for specific review types, such as qualitative evidence syntheses, most are designed for more general application across review methodologies.</p><p>The main recommendations are summarized in Textbox 1. They complement a recently released position statement by Cochrane and other evidence synthesis organizations on the use of AI in evidence synthesis [<span>3</span>].</p><p>Semi-automation of discrete steps in the evidence synthesis process —where algorithms assist but do not replace human reviewers—is not new. Cochrane, for instance, was an early adopter with the development of the randomized controlled trial (RCT) Classifier, a machine learning tool that identifies RCTs during abstract screening [<span>4</span>]. Semi-automation plays a different role in rapid reviews than in traditional systematic reviews, where methodological certainty is typically prioritized. Because rapid reviews already balance rigor and timeliness, teams may be more willing to adopt efficiency-enhancing tools sooner.</p><p>The advent of generative LLMs, such as ChatGPT [<span>5</span>] or Gemini [<span>6</span>], has substantially expanded the potential for AI to support tasks in evidence synthesis. Unlike earlier machine learning tools that required extensive task-specific training data, LLMs can be deployed in zero-shot settings—meaning they can be applied without prior training or fine-tuning to a given task. This dramatically lowers the barrier to entry, offering a more accessible pathway for integrating AI into review workflows. Multiple studies have assessed the utility of generative LLMs to support the development of search strategies [<span>7</span>], literature screening [<span>8-10</span>], risk of bias assessment [<span>11, 12</span>], and data extraction [<span>8, 13-15</span>]. However, findings to date indicate highly variable performance ranging from high accuracy in some tasks to concerning errors in others [<span>1</span>]. In parallel, developers of literature review software have begun integrating LLMs into their products.</p><p>Importantly, in rapid reviews, AI has the potential not only to enha
快速发展的人工智能(AI)技术越来越多地用于加快文献综述过程。最近的一项综述和证据图确定了自2021年以来发表的近100项研究,评估了人工智能在证据合成领域的应用。这些技术涵盖从机器学习分类器到生成大语言模型(llm)。最近,一篇预印本报道称,一个由法学硕士支持的工具在短短两天内自动复制和更新了12篇Cochrane综述,引发了关于何时以及如何安全有效地使用人工智能来支持系统和快速的综述的争论。在这份立场声明中,Cochrane快速评价方法小组概述了其在快速评价中使用人工智能的立场。快速审查包括各种类型的证据综合,虽然一些人工智能工具是为特定的审查类型开发的,如定性证据综合,但大多数是为更普遍的审查方法应用而设计的。主要建议汇总在文本框1中。它们补充了Cochrane和其他证据合成组织最近发布的关于在证据合成中使用人工智能的立场声明。证据合成过程中离散步骤的半自动化——算法辅助但不取代人工审查员——并不新鲜。例如,Cochrane是早期采用随机对照试验(RCT)分类器的公司,这是一种机器学习工具,可以在摘要筛选过程中识别RCT。半自动化在快速审查中扮演着与传统系统审查不同的角色,在传统系统审查中,方法的确定性通常是优先考虑的。因为快速评审已经平衡了严格性和及时性,团队可能更愿意更快地采用提高效率的工具。生成法学硕士的出现,如ChatGPT[5]或Gemini[6],极大地扩展了人工智能支持证据合成任务的潜力。与早期需要大量特定任务训练数据的机器学习工具不同,llm可以部署在零射击设置中,这意味着它们可以在没有事先训练或对给定任务进行微调的情况下应用。这大大降低了进入门槛,为将人工智能集成到审查工作流程中提供了更方便的途径。多项研究已经评估了生成式法学模型在支持搜索策略b[7]、文献筛选[8-10]、偏倚风险评估[11,12]和数据提取[8,13 -15]开发方面的效用。然而,迄今为止的研究结果表明,性能变化很大,从某些任务的高精度到其他任务的错误。与此同时,文献回顾软件的开发人员已经开始将法学硕士集成到他们的产品中。重要的是,在快速审查中,人工智能不仅可以提高效率,还可以提高质量。许多快速审查依赖于单个审稿人执行关键任务,例如研究选择,数据提取,偏倚风险评估或证据评级的确定性,这增加了未被发现的错误的风险。在这些情况下,人工智能可以作为一种可扩展的质量控制工具,帮助识别不一致之处,标记缺失的数据,或建议被忽视的研究。例如,Cochrane快速审稿指南建议,如果在双重筛选摘要期间审稿人之间的一致性很高,则切换到单审稿筛选[16,17],这可能会错过一些符合条件的研究bbb。在这种情况下,人工智能可以补充人类的判断,并减轻与单一审稿人工作流程相关的风险。集成人工智能的审查软件可以重新检查在单一审稿人筛选中被排除的摘要,减少错误排除的可能性。审查过程中另一个容易出错的步骤是数据提取,它可以从人工智能支持中受益。研究表明,根据审稿人的经验和主题的复杂性,高达50%的人类提取的数据元素包含错误[19,20]。使用人工智能作为数据提取的辅助审稿人可以提高数据质量并减少错误[21]。然而,最终,审稿人必须决定使用人工智能的额外努力是否对他们的快速审查可行。尽管人工智能潜力巨大,但它也带来了风险。具体来说,生成法学硕士可能会产生错误的反应,捏造数据或参考文献,使偏见永久化,并传播错误信息。为了确保人工智能整合加强而不是破坏快速审查的可信度,持续的人工监督(尽管其本身也有可能出错)必须仍然是任何人工智能支持的证据综合工作的核心原则。最近一项关于在证据合成中使用人工智能的综述发现,在筛选过程中,人工智能工具的不正确纳入决策范围为0%至29%(中位数= 10%),不正确的数据提取范围为4%至31%(中位数= 14%)。 通过多方利益相关者共识过程制定的《负责任的人工智能在系统证据合成中的应用指南》(RAISE)为人工智能在证据合成中的透明、合乎道德和科学合理的整合提供了基本原则。Cochrane一直积极参与RAISE,无论是作为贡献者还是作为实施组织,这反映了它在面对快速技术进步时确保方法严谨性的承诺。Cochrane快速评审方法小组的成员也参与了RAISE的开发,带来了快速评审领域的专业知识,在快速评审领域,速度的压力使得人工智能的采用特别有吸引力——但也有潜在的风险。RAISE强调一些关键原则,如报告的透明度、人为监督、可重复性和适合目的的评估。它提醒人们不要过度依赖未经可靠验证的人工智能系统,并强调披露何时、如何以及为哪些任务使用自动化的重要性。随着生成法学硕士和其他人工智能工具变得越来越容易获得,遵守RAISE原则对于维护人工智能辅助证据合成的可信度和实用性至关重要。在使用人工智能工具时,研究人员还需要验证任何上传或处理的材料在合理使用或同等学术例外的情况下是允许的,并且人工智能模型本身符合版权和数据保护标准。例如,一些专业或企业版本的生成法学硕士明确保证上传的材料不会用于模型训练或再分发,从而提供了更大的保密性和法律合规性保证。应保持人工智能工具使用的透明文档,包括模型版本、目的和数据输入,以维护证据合成的可重复性、问责性和道德完整性。Cochrane与Campbell Collaboration、JBI和the Collaboration for Environmental Evidence一起,在一份关于在证据合成中使用人工智能的立场声明中支持RAISE[10]。该声明强调,证据合成者仍然对他们的工作完全负责,在使用人工智能或自动化时必须确保符合道德和法律。人工智能的任何使用都应该有明确的理由,并且该工具必须在方法上合理,确保它不会损害审查结果的可信度或可靠性。重要的是,所有人工智能辅助任务都需要人工监督,任何人工智能生成或人工智能知情的判断都必须在最终综合中透明地报告。作者不应该使用人工智能来完全自动化整个快速审查或其任何方法步骤。这样做有引入错误、偏见和缺乏透明度的风险,最终会破坏快速审查的可信度和可重复性。此外,这些方法违反了既定的Cochrane方法标准。此外,作者必须继续遵循Cochrane快速评价的现有方法指南,并坚持Cochrane在透明度、利益冲突、问责制和科学严密性方面的标准。人工智能工具的使用是可以接受的——甚至是鼓励的——当它有助于提高评审质量时。当资源限制要求某项任务(如研究选择或数据提取)由单个审稿人完成时,可以使用人工智能工具提供二次检查或提供独立建议。通过这种方式,快速评审作者可以以最小的额外工作引入额外的质量保证层。然而,在所有使用人工智能的情况下,人工审查员必须继续负责验证所有人工智能输出并做出最终决定。人类审稿人必须继续解决歧义,深思熟虑地应用纳入标准,并在临床相关性或政策含义的更广泛背景下解释研究结果。人工智能工具不能被列为作者,也不能对自己的错误负责。因此,人工智能使用的透明度至关重要。评审方案必须记录在评审过程中包含人工智能的意图。Cochrane报告的审查方法和新的“人工智能使用披露”部分必须明确说明使用了哪些工具,如何应用,以及它们在审查过程中发挥了什么作用。如果审稿人使用生成式LLM,则需要记录模型版本和提示。这包括对人为监督程度和所采取的任何验证
{"title":"Responsible Integration of Artificial Intelligence in Rapid Reviews: A Position Statement From the Cochrane Rapid Reviews Methods Group","authors":"Gerald Gartlehner,&nbsp;Barbara Nussbaumer-Streit,&nbsp;Candyce Hamel,&nbsp;Chantelle Garritty,&nbsp;Ursula Griebler,&nbsp;Valerie Jean King,&nbsp;Declan Devane,&nbsp;Chris Kamel","doi":"10.1002/cesm.70063","DOIUrl":"https://doi.org/10.1002/cesm.70063","url":null,"abstract":"&lt;p&gt;Rapidly evolving artificial intelligence (AI) technologies are increasingly used to accelerate literature review processes. A recent review and evidence map identified almost 100 studies published since 2021assessing AI applications in evidence synthesis [&lt;span&gt;1&lt;/span&gt;]. These technologies span from machine-learning classifiers to generative large-language models (LLMs). Recently, a preprint reported that a tool powered by LLMs autonomously reproduced and updated 12 Cochrane reviews in just 2 days [&lt;span&gt;2&lt;/span&gt;], sparking debate about when and how AI can be used safely and effectively to support systematic and rapid reviews.&lt;/p&gt;&lt;p&gt;In this position statement, the Cochrane Rapid Reviews Methods Group outlines its stance on the use of AI in rapid reviews. Rapid reviews encompass various types of evidence synthesis, and while some AI tools have been developed for specific review types, such as qualitative evidence syntheses, most are designed for more general application across review methodologies.&lt;/p&gt;&lt;p&gt;The main recommendations are summarized in Textbox 1. They complement a recently released position statement by Cochrane and other evidence synthesis organizations on the use of AI in evidence synthesis [&lt;span&gt;3&lt;/span&gt;].&lt;/p&gt;&lt;p&gt;Semi-automation of discrete steps in the evidence synthesis process —where algorithms assist but do not replace human reviewers—is not new. Cochrane, for instance, was an early adopter with the development of the randomized controlled trial (RCT) Classifier, a machine learning tool that identifies RCTs during abstract screening [&lt;span&gt;4&lt;/span&gt;]. Semi-automation plays a different role in rapid reviews than in traditional systematic reviews, where methodological certainty is typically prioritized. Because rapid reviews already balance rigor and timeliness, teams may be more willing to adopt efficiency-enhancing tools sooner.&lt;/p&gt;&lt;p&gt;The advent of generative LLMs, such as ChatGPT [&lt;span&gt;5&lt;/span&gt;] or Gemini [&lt;span&gt;6&lt;/span&gt;], has substantially expanded the potential for AI to support tasks in evidence synthesis. Unlike earlier machine learning tools that required extensive task-specific training data, LLMs can be deployed in zero-shot settings—meaning they can be applied without prior training or fine-tuning to a given task. This dramatically lowers the barrier to entry, offering a more accessible pathway for integrating AI into review workflows. Multiple studies have assessed the utility of generative LLMs to support the development of search strategies [&lt;span&gt;7&lt;/span&gt;], literature screening [&lt;span&gt;8-10&lt;/span&gt;], risk of bias assessment [&lt;span&gt;11, 12&lt;/span&gt;], and data extraction [&lt;span&gt;8, 13-15&lt;/span&gt;]. However, findings to date indicate highly variable performance ranging from high accuracy in some tasks to concerning errors in others [&lt;span&gt;1&lt;/span&gt;]. In parallel, developers of literature review software have begun integrating LLMs into their products.&lt;/p&gt;&lt;p&gt;Importantly, in rapid reviews, AI has the potential not only to enha","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70063","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145625659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Introducing a Series of Reviews Assessing Engagement in Evidence Syntheses 介绍一系列评估证据综合参与的综述
Pub Date : 2025-11-20 DOI: 10.1002/cesm.70057
Jennifer Petkovic, Joanne Khabsa, Lyubov Lytvyn, Alex Todhunter-Brown, Olivia Magwood, Pauline Campbell, Elie A. Akl, Thomas W. Concannon, Holger Schunemann, Vivian Welch, Peter Tugwell
<p>High quality evidence syntheses are used in health decision-making, such as policies, legislation, and clinical recommendations [<span>1</span>]. The usefulness, relevance, meaningfulness, and accessibility of evidence syntheses may be improved when people who are affected by those decisions, called “interest-holders,” are included in the evidence synthesis process [<span>2-4</span>]. This concept of engagement in research is based on the principle that those affected by the health condition under study or the intervention to address it have a moral right to contribute to the decisions about how the research is conducted [<span>3, 5</span>]. While there are increasing expectations from funders regarding the involvement of interest-holders [<span>6</span>], the most effective methods for engaging different interest-holders in evidence syntheses have not been identified [<span>5</span>]. Additionally, while there is some guidance related to engagement in research, it predominantly focuses on patient and public engagement in primary research, not evidence synthesis and there is limited guidance for engaging with other interest-holders [<span>3, 4, 7-9</span>].</p><p>The aim of this paper is to introduce a series of articles about how to successfully engage different interest-holders when conducting evidence syntheses. The series of articles will consider methods used to engage different interest-holders (including who to involve and in what way), barriers and facilitators to engagement, impacts of engagement, management of conflicts of interest, and factors relating to equity.</p><p>This paper presents the shared definitions used across each of the five reviews included in this series. These reviews will inform the development of a guidance checklist and resources for engaging interest-holders through all steps of evidence synthesis. The plan for developing this guidance is described in the project protocol [<span>10</span>].</p><p>“Interest-holders” are groups of people with legitimate interests in the health issue under consideration and whose perspectives and views should be considered when conducting this study [<span>2</span>]. Their interests arise and draw their legitimacy from the fact that these people are responsible for or affected by health- and healthcare-related decisions that can be informed by research evidence. Engagement of interest-holders in evidence syntheses can promote transparency, accountability, trust, and help to ensure that the needs of interest-holders are included. Engagement can improve the translation of evidence into policy and practice [<span>11</span>]. Interest-holders can contribute throughout the steps of evidence synthesis including, for example, refining the research question and suggesting appropriate outcomes, suggesting additional references to consider, and providing context to interpret the evidence.</p><p>This study was conducted by the MuSE Consortium, a group of over 160 individuals from 20 countrie
高质量的证据综合用于卫生决策,如政策、立法和临床建议[b]。当受这些决策影响的人(称为“利益相关者”)被纳入证据合成过程时,证据合成的有用性、相关性、意义和可及性可能会得到改善[2-4]。这种参与研究的概念是基于这样一个原则,即受所研究的健康状况或为解决这一问题而采取的干预措施影响的人在道义上有权参与决定如何进行研究[3,5]。虽然供资方对利益相关者参与的期望越来越高[b],但尚未确定让不同利益相关者参与证据综合的最有效方法[b]。此外,虽然有一些与参与研究相关的指导,但主要侧重于患者和公众参与初级研究,而不是证据合成,并且与其他利益相关者参与的指导有限[3,4,7 -9]。本文的目的是介绍一系列关于在进行证据合成时如何成功地吸引不同利益相关者的文章。该系列文章将考虑不同利益相关者参与的方法(包括谁参与以及以何种方式参与)、参与的障碍和促进因素、参与的影响、利益冲突的管理以及与公平有关的因素。本文介绍了本系列中包含的五篇综述中使用的共享定义。这些审查将为制定指导清单和资源提供信息,以便在证据综合的所有步骤中吸引利益攸关方。开发该指南的计划在项目协议[10]中进行了描述。“利益相关者”是指对所考虑的健康问题有合法利益的人群,在进行本研究时应考虑他们的观点和意见。他们的利益产生并获得合法性,是因为这些人对可由研究证据提供信息的卫生和卫生保健相关决定负责或受其影响。利益攸关方参与证据综合可以促进透明度、问责制和信任,并有助于确保包括利益攸关方的需求。参与可以改善将证据转化为政策和实践的过程。利益相关者可以在证据合成的整个步骤中做出贡献,包括,例如,精炼研究问题并提出适当的结果,建议考虑额外的参考资料,并提供解释证据的背景。这项研究是由MuSE联盟进行的,该联盟由来自20个国家的160多人组成,他们对参与卫生研究、证据合成和卫生指南感兴趣。该项目补充了以前的缪斯项目,该项目为参与制定健康准则和临床实践建议制定了指导方针。我们使用了一组标准化的术语和定义,这些术语和定义在本系列的一系列评论中一致使用。这些定义是通过我们参与研究[6,13,14]、指南[15,16]的相关工作,与MuSE联盟合作制定并达成一致的;(Petkovic et al., 2022[2,10,12,17 -19]),并在准备本文介绍的一系列综述。证据综合综合研究证据,以解决卫生保健相关问题。他们使用严格的、明确的和透明的方法,包括范围审查、快速审查以及定量或定性的系统审查bbb。有许多不同类型的证据合成,如表1所示。利益相关者包括以下11个P:患者和护理人员、公众、护理提供者、政策制定者、项目经理、卫生研究支付者、卫生服务支付者、同行评审编辑和产品制造商。表2提供了每个组的定义。利益相关者群体的这种分类法与证据综合有关,其他分类法用于其他类型的研究,如生物医学研究、临床研究和环境卫生研究。关于“利息持有人”一词的完整解释载于本系列[2]的评论中。参与是指利益相关者与研究团队之间的双向关系。也可以使用其他术语,如“参与”、“合作”,但出于我们工作的目的,我们使用“参与”(表3)。本系列包括六篇论文,涉及让利益相关者参与证据合成的不同方面。 这些论文都是由一个团队共同撰写的,该团队包括来自我们确定的各种利益相关者团体的代表,以及缪斯联盟的所有成员。我们的第一篇论文已经发表,并引入了“利息持有者[2]”这个术语。其余五篇论文是证据综合,描述了与审计业务相关的不同问题。第一个综合是范围审查,确定让利益相关者参与证据综合的方法。此范围界定审查是对先前的审查[5]的更新,描述了参与利益相关者的方法,包括参与的对象、参与的目标、参与的方式以及在审查过程[38]的哪些阶段。第二项审查是混合方法证据综合审查影响利益相关者参与的因素。具体而言,它旨在识别和综合使用理论领域框架bbb在审查周期的所有阶段涉及利益相关者的障碍和促进因素。该审查还审查了背景因素如何影响不同利益相关者群体之间参与的性质和程度。第三篇综述评估了利益相关者参与对证据综合的影响。在本综述中,关于参与的“影响”是指“研究过程、研究产品、相关人员或更广泛的社会因参与证据综合而发生的任何变化”bbb。第四篇综述描述了与参与证据综合的利益冲突有关的问题。本综述确定了不同利益相关者之间利益冲突的类型、如何管理这些冲突,以及这些冲突对证据合成过程的影响[b]。最后,本系列的最后一篇综述旨在确定和描述利益相关者参与证据合成bbb的公平考虑因素。公平参与的重点是有意地包容不同的个人和群体。这些审查将为制定在整个证据综合过程中与利益攸关方接触的指导草案提供信息。我们将通过与利益相关者的访谈和国际调查,探讨是否同意本指导草案。我们将采用共识方法,最终确定一份清单,以便在证据综合过程的各个阶段与11个已确定的利益相关者群体进行接触。我们欢迎有兴趣参与这一正在进行的项目的后续阶段。詹妮弗佩特科维奇:概念化,写作-原稿,项目管理,写作-审查和编辑,资金获取。Joanne Khabsa:概念化,写作-原稿,写作-审查和编辑,资金获取。柳波夫·利特温:构思,写作-原稿,写作-审查和编辑。Alex Todhunter-Brown:概念化,写作-原稿,写作-审查和编辑,资金获取。奥利维亚·马格伍德:概念化,写作-审查和编辑,写作-原稿,资金获取。宝琳·坎贝尔:构思、写作、评论和编辑。Elie A. Akl:概念化,写作-审查和编辑,资金获取。Thomas W. Concannon:概念化,写作-审查和编辑,资金获取。Holger Schunemann:概念化,写作-审查和编辑,资金获取。维维安·韦尔奇:概念化,写作-审查和编辑,资金获取。Peter Tugwell:概念,监督,写作-审查和编辑,资金获取。Cochrane Evidence Synthesis and Methods作为Cochrane Collaboration期刊,遵守Cochrane对Cochrane Library内容的利益冲突政策(2020),该政策适用于所有期刊内容。对于Cochrane Evidence Synthesis and Methods, Cochrane的利益冲突政策不仅要求尽早声明研究资金和作者利益,而且还规定一些资金和利益冲突会阻止人们成为投稿的作者。作者声明无利益冲突。数据共享不适用于本文,因为在当前研究期间没有生成或分析数据集。本文的同行评审历史可在https://www.webofscience.com/api/gateway/wos/peer-review/10.1002/cesm.70057上获得。
{"title":"Introducing a Series of Reviews Assessing Engagement in Evidence Syntheses","authors":"Jennifer Petkovic,&nbsp;Joanne Khabsa,&nbsp;Lyubov Lytvyn,&nbsp;Alex Todhunter-Brown,&nbsp;Olivia Magwood,&nbsp;Pauline Campbell,&nbsp;Elie A. Akl,&nbsp;Thomas W. Concannon,&nbsp;Holger Schunemann,&nbsp;Vivian Welch,&nbsp;Peter Tugwell","doi":"10.1002/cesm.70057","DOIUrl":"https://doi.org/10.1002/cesm.70057","url":null,"abstract":"&lt;p&gt;High quality evidence syntheses are used in health decision-making, such as policies, legislation, and clinical recommendations [&lt;span&gt;1&lt;/span&gt;]. The usefulness, relevance, meaningfulness, and accessibility of evidence syntheses may be improved when people who are affected by those decisions, called “interest-holders,” are included in the evidence synthesis process [&lt;span&gt;2-4&lt;/span&gt;]. This concept of engagement in research is based on the principle that those affected by the health condition under study or the intervention to address it have a moral right to contribute to the decisions about how the research is conducted [&lt;span&gt;3, 5&lt;/span&gt;]. While there are increasing expectations from funders regarding the involvement of interest-holders [&lt;span&gt;6&lt;/span&gt;], the most effective methods for engaging different interest-holders in evidence syntheses have not been identified [&lt;span&gt;5&lt;/span&gt;]. Additionally, while there is some guidance related to engagement in research, it predominantly focuses on patient and public engagement in primary research, not evidence synthesis and there is limited guidance for engaging with other interest-holders [&lt;span&gt;3, 4, 7-9&lt;/span&gt;].&lt;/p&gt;&lt;p&gt;The aim of this paper is to introduce a series of articles about how to successfully engage different interest-holders when conducting evidence syntheses. The series of articles will consider methods used to engage different interest-holders (including who to involve and in what way), barriers and facilitators to engagement, impacts of engagement, management of conflicts of interest, and factors relating to equity.&lt;/p&gt;&lt;p&gt;This paper presents the shared definitions used across each of the five reviews included in this series. These reviews will inform the development of a guidance checklist and resources for engaging interest-holders through all steps of evidence synthesis. The plan for developing this guidance is described in the project protocol [&lt;span&gt;10&lt;/span&gt;].&lt;/p&gt;&lt;p&gt;“Interest-holders” are groups of people with legitimate interests in the health issue under consideration and whose perspectives and views should be considered when conducting this study [&lt;span&gt;2&lt;/span&gt;]. Their interests arise and draw their legitimacy from the fact that these people are responsible for or affected by health- and healthcare-related decisions that can be informed by research evidence. Engagement of interest-holders in evidence syntheses can promote transparency, accountability, trust, and help to ensure that the needs of interest-holders are included. Engagement can improve the translation of evidence into policy and practice [&lt;span&gt;11&lt;/span&gt;]. Interest-holders can contribute throughout the steps of evidence synthesis including, for example, refining the research question and suggesting appropriate outcomes, suggesting additional references to consider, and providing context to interpret the evidence.&lt;/p&gt;&lt;p&gt;This study was conducted by the MuSE Consortium, a group of over 160 individuals from 20 countrie","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70057","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145581023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the Feasibility and Acceptability of a Bespoke Large Language Model Pipeline to Extract Data From Different Study Designs for Public Health Evidence Reviews 评估定制大型语言模型管道从不同研究设计中提取数据用于公共卫生证据评价的可行性和可接受性
Pub Date : 2025-11-04 DOI: 10.1002/cesm.70061
Zalaya Simmons, Beti Evans, Tamsyn Harris, Harry Woolnough, Lauren Dunn, Jonathon Fuller, Kerry Cella, Daphne Duval

Introduction

Data extraction is a critical but resource-intensive step of the evidence review process. Whilst there is evidence that artificial intelligence (AI) and large language models (LLMs) can improve the efficiency of data extraction from randomized controlled trials, their potential for other study designs is unclear. In this context, this study aimed to evaluate the performance of a bespoke LLM model pipeline (Retrieval-Augmented Generation pipeline utilizing LLaMa 3-70B) to automate data extraction from a range of study designs by assessing the accuracy and reliability of the extractions measured as error types and acceptability.

Methods

Accuracy was assessed by retrospectively comparing the LLM extractions against human extractions from a review previously conducted by the authors. A total of 173 data fields from 24 articles (including experimental, observational, qualitative, and modeling studies) were assessed, of which three were used for prompt engineering. Reliability was assessed by calculating the mean maximum agreement rate (the highest proportion of identical returns from 10 consecutive extractions) for 116 data fields from 16 of the 24 studies. An evaluation framework was developed to assess the accuracy and reliability of LLM outputs measured as error types and acceptability (acceptability was assessed on whether it would be usable in real-world settings if the model acted as one reviewer and a human as a second reviewer).

Results

Of the 173 data fields evaluated for accuracy, 68% were rated by human reviewers as acceptable (consistent with what is deemed to be acceptable data extraction from a human reviewer). However, acceptability ratings varied depending on the data field extracted (33% to 100%), with at least 90% acceptability for “objective,” “setting,” and “study design,” but 54% or less for data fields such as “outcome” and “time period.” For reliability, the mean maximum agreement rate was 0.71 (SD: 0.28), with variation across different data fields.

Conclusion

This evaluation demonstrates the potential for LLMs, when paired with human quality assurance, to support data extraction in evidence reviews that include a range of study designs. However, further improvements in performance and validation are required before the model can be introduced into review workflows.

数据提取是证据审查过程中一个关键但资源密集的步骤。虽然有证据表明人工智能(AI)和大型语言模型(llm)可以提高随机对照试验中数据提取的效率,但它们在其他研究设计中的潜力尚不清楚。在此背景下,本研究旨在评估定制LLM模型管道(利用LLaMa 3-70B的检索-增强生成管道)的性能,通过评估提取的准确性和可靠性,测量误差类型和可接受性,从一系列研究设计中自动提取数据。方法:通过回顾性比较LLM萃取物与作者先前进行的一项综述的人类萃取物来评估准确性。共评估了来自24篇文章(包括实验、观察、定性和建模研究)的173个数据字段,其中3个用于提示工程。通过计算24项研究中16项的116个数据字段的平均最大一致性率(10个连续提取的相同回报的最高比例)来评估可靠性。开发了一个评估框架来评估LLM输出的准确性和可靠性,以错误类型和可接受性来衡量(可接受性是根据如果模型作为一个审稿人,而人类作为第二个审稿人,它是否在现实环境中可用来评估的)。结果:在评估准确性的173个数据字段中,68%被人工审稿人评为可接受的(与人工审稿人认为可接受的数据提取一致)。然而,可接受度评级因提取的数据字段而异(33%至100%),“目标”、“设置”和“研究设计”的可接受度至少为90%,但“结果”和“时间段”等数据字段的可接受度为54%或更低。对于可靠性,平均最大一致性率为0.71 (SD: 0.28),在不同的数据字段中存在差异。结论:该评价表明llm与人类质量保证相结合,在包括一系列研究设计的证据审查中支持数据提取的潜力。然而,在将模型引入评审工作流之前,还需要进一步改进性能和验证。
{"title":"Assessing the Feasibility and Acceptability of a Bespoke Large Language Model Pipeline to Extract Data From Different Study Designs for Public Health Evidence Reviews","authors":"Zalaya Simmons,&nbsp;Beti Evans,&nbsp;Tamsyn Harris,&nbsp;Harry Woolnough,&nbsp;Lauren Dunn,&nbsp;Jonathon Fuller,&nbsp;Kerry Cella,&nbsp;Daphne Duval","doi":"10.1002/cesm.70061","DOIUrl":"10.1002/cesm.70061","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Introduction</h3>\u0000 \u0000 <p>Data extraction is a critical but resource-intensive step of the evidence review process. Whilst there is evidence that artificial intelligence (AI) and large language models (LLMs) can improve the efficiency of data extraction from randomized controlled trials, their potential for other study designs is unclear. In this context, this study aimed to evaluate the performance of a bespoke LLM model pipeline (Retrieval-Augmented Generation pipeline utilizing LLaMa 3-70B) to automate data extraction from a range of study designs by assessing the accuracy and reliability of the extractions measured as error types and acceptability.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>Accuracy was assessed by retrospectively comparing the LLM extractions against human extractions from a review previously conducted by the authors. A total of 173 data fields from 24 articles (including experimental, observational, qualitative, and modeling studies) were assessed, of which three were used for prompt engineering. Reliability was assessed by calculating the mean maximum agreement rate (the highest proportion of identical returns from 10 consecutive extractions) for 116 data fields from 16 of the 24 studies. An evaluation framework was developed to assess the accuracy and reliability of LLM outputs measured as error types and acceptability (acceptability was assessed on whether it would be usable in real-world settings if the model acted as one reviewer and a human as a second reviewer).</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>Of the 173 data fields evaluated for accuracy, 68% were rated by human reviewers as acceptable (consistent with what is deemed to be acceptable data extraction from a human reviewer). However, acceptability ratings varied depending on the data field extracted (33% to 100%), with at least 90% acceptability for “objective,” “setting,” and “study design,” but 54% or less for data fields such as “outcome” and “time period.” For reliability, the mean maximum agreement rate was 0.71 (SD: 0.28), with variation across different data fields.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusion</h3>\u0000 \u0000 <p>This evaluation demonstrates the potential for LLMs, when paired with human quality assurance, to support data extraction in evidence reviews that include a range of study designs. However, further improvements in performance and validation are required before the model can be introduced into review workflows.</p>\u0000 </section>\u0000 </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12584109/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145454519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Cochrane Evidence Synthesis and Methods
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1