首页 > 最新文献

Cochrane Evidence Synthesis and Methods最新文献

英文 中文
Health Equity in Systematic Reviews: A Tutorial—Part 1 Getting Started With Health Equity in Your Review 系统评论中的健康公平:教程-第1部分:开始在你的评论中使用健康公平
Pub Date : 2025-10-30 DOI: 10.1002/cesm.70055
Jennifer Petkovic, Jordi Pardo Pardo, Vivian Welch, Omar Dewidar, Lara J. Maxwell, Andrea Darzi, Tamara Lotfi, Lawrence Mbuagbaw, Kevin Pottie, Peter Tugwell

This tutorial focuses on how to get started with considering health equity in systematic reviews of interventions. We will explain why health equity should be considered, how to frame your question, and which interest-holders to engage. This is the first tutorial in a series on health equity. The second tutorial focuses on implementing health equity methods in your review.

本教程侧重于如何开始在干预措施的系统审查中考虑卫生公平。我们将解释为什么应该考虑卫生公平,如何构建你的问题,以及与哪些利益攸关方接触。这是健康公平系列教程中的第一篇。第二篇教程侧重于在您的审查中实施卫生公平方法。
{"title":"Health Equity in Systematic Reviews: A Tutorial—Part 1 Getting Started With Health Equity in Your Review","authors":"Jennifer Petkovic,&nbsp;Jordi Pardo Pardo,&nbsp;Vivian Welch,&nbsp;Omar Dewidar,&nbsp;Lara J. Maxwell,&nbsp;Andrea Darzi,&nbsp;Tamara Lotfi,&nbsp;Lawrence Mbuagbaw,&nbsp;Kevin Pottie,&nbsp;Peter Tugwell","doi":"10.1002/cesm.70055","DOIUrl":"https://doi.org/10.1002/cesm.70055","url":null,"abstract":"<p>This tutorial focuses on how to get started with considering health equity in systematic reviews of interventions. We will explain why health equity should be considered, how to frame your question, and which interest-holders to engage. This is the first tutorial in a series on health equity. The second tutorial focuses on implementing health equity methods in your review.</p>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70055","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145406969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harnessing Large-Language Models for Efficient Data Extraction in Systematic Reviews: The Role of Prompt Engineering 利用大语言模型在系统评论中进行有效的数据提取:提示工程的作用。
Pub Date : 2025-10-27 DOI: 10.1002/cesm.70058
Molly Murton, Ellie Boulton, Shona Cross, Ambar Khan, Swati Kumar, Giuseppina Magri, Charlotte Marris, David Slater, Emma Worthington, Elizabeth Lunn

Introduction

Systematic literature reviews (SLRs) of randomized clinical trials (RCTs) underpin evidence-based medicine but can be limited by the intensive resource demands of data extraction. Recent advances in accessible large-language models (LLMs) hold promise for automating this step, however testing is limited across different outcomes and disease areas.

Methods

This study developed prompt engineering strategies for GPT-4o to extract data from RCTs across three disease areas: non-small cell lung cancer, endometrial cancer and hypertrophic cardiomyopathy. Prompts were iteratively refined during the development phase, then tested on unseen data. Performance was evaluated via comparison to human extraction of the same data, using F1 scores, precision, recall and percentage accuracy.

Results

The LLM was highly effective for extracting study and baseline characteristics, often equaling human performance, with test F1 scores exceeding 0.85. Complex efficacy and adverse event data proved more challenging, with test F1 scores ranging from 0.22 to 0.50. Transferability of prompts across disease areas was promising but varied, highlighting the need for disease-specific refinement.

Conclusion

Our findings demonstrate the potential of LLMs, guided by rigorous prompt engineering, to augment the SLR process. However, human oversight remains essential, particularly for complex and nuanced data. As these technologies evolve, continued validation of AI tools will be necessary to ensure accuracy and reliability, and safeguarding of the quality of evidence synthesis.

简介:随机临床试验(rct)的系统文献综述(slr)是循证医学的基础,但可能受到数据提取的密集资源需求的限制。可访问大语言模型(llm)的最新进展有望实现这一步骤的自动化,然而测试在不同的结果和疾病领域受到限制。方法:本研究为gpt - 40制定了快速工程策略,从三个疾病领域的随机对照试验中提取数据:非小细胞肺癌、子宫内膜癌和肥厚性心肌病。在开发阶段迭代地改进提示,然后在不可见的数据上进行测试。通过与人类提取相同数据的比较,使用F1分数、精度、召回率和准确率百分比来评估性能。结果:LLM在提取研究和基线特征方面非常有效,通常与人的表现相当,测试F1得分超过0.85。复杂的疗效和不良事件数据证明更具挑战性,测试F1评分范围为0.22至0.50。提示符在疾病领域之间的可转移性是有希望的,但各不相同,突出了对疾病特异性改进的需要。结论:我们的研究结果表明,在严格的快速工程指导下,llm具有增强SLR过程的潜力。然而,人类的监督仍然是必不可少的,特别是对于复杂和微妙的数据。随着这些技术的发展,有必要对人工智能工具进行持续验证,以确保准确性和可靠性,并保障证据合成的质量。
{"title":"Harnessing Large-Language Models for Efficient Data Extraction in Systematic Reviews: The Role of Prompt Engineering","authors":"Molly Murton,&nbsp;Ellie Boulton,&nbsp;Shona Cross,&nbsp;Ambar Khan,&nbsp;Swati Kumar,&nbsp;Giuseppina Magri,&nbsp;Charlotte Marris,&nbsp;David Slater,&nbsp;Emma Worthington,&nbsp;Elizabeth Lunn","doi":"10.1002/cesm.70058","DOIUrl":"10.1002/cesm.70058","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Introduction</h3>\u0000 \u0000 <p>Systematic literature reviews (SLRs) of randomized clinical trials (RCTs) underpin evidence-based medicine but can be limited by the intensive resource demands of data extraction. Recent advances in accessible large-language models (LLMs) hold promise for automating this step, however testing is limited across different outcomes and disease areas.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>This study developed prompt engineering strategies for GPT-4o to extract data from RCTs across three disease areas: non-small cell lung cancer, endometrial cancer and hypertrophic cardiomyopathy. Prompts were iteratively refined during the development phase, then tested on unseen data. Performance was evaluated via comparison to human extraction of the same data, using F1 scores, precision, recall and percentage accuracy.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>The LLM was highly effective for extracting study and baseline characteristics, often equaling human performance, with test F1 scores exceeding 0.85. Complex efficacy and adverse event data proved more challenging, with test F1 scores ranging from 0.22 to 0.50. Transferability of prompts across disease areas was promising but varied, highlighting the need for disease-specific refinement.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusion</h3>\u0000 \u0000 <p>Our findings demonstrate the potential of LLMs, guided by rigorous prompt engineering, to augment the SLR process. However, human oversight remains essential, particularly for complex and nuanced data. As these technologies evolve, continued validation of AI tools will be necessary to ensure accuracy and reliability, and safeguarding of the quality of evidence synthesis.</p>\u0000 </section>\u0000 </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12559671/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145403637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human-in-the-Loop Artificial Intelligence System for Systematic Literature Review: Methods and Validations for the AutoLit Review Software 用于系统文献评论的人在环人工智能系统:AutoLit评论软件的方法和验证
Pub Date : 2025-10-25 DOI: 10.1002/cesm.70059
Kevin M. Kallmes, Jade Thurnham, Marius Sauca, Ranita Tarchand, Keith R. Kallmes, Karl J. Holub

Introduction

While artificial intelligence (AI) tools have been utilized for individual stages within the systematic literature review (SLR) process, no tool has previously been shown to support each critical SLR step. In addition, the need for expert oversight has been recognized to ensure the quality of SLR findings. Here, we describe a complete methodology for utilizing our AI SLR tool with human-in-the-loop curation workflows, as well as AI validations, time savings, and approaches to ensure compliance with best review practices.

Methods

SLRs require completing Search, Screening, and Extraction from relevant studies, with meta-analysis and critical appraisal as relevant. We present a full methodological framework for completing SLRs utilizing our AutoLit software (Nested Knowledge). This system integrates AI models into the central steps in SLR: Search strategy generation, Dual Screening of Titles/Abstracts and Full Texts, and Extraction of qualitative and quantitative evidence. The system also offers manual Critical Appraisal and Insight drafting and fully-automated Network Meta-analysis. Validations comparing AI performance to experts are reported, and where relevant, time savings and ‘rapid review’ alternatives to the SLR workflow.

Results

Search strategy generation with the Smart Search AI can turn a Research Question into full Boolean strings with 76.8% and 79.6% Recall in two validation sets. Supervised machine learning tools can achieve 82–97% Recall in reviewer-level Screening. Population, Interventions/Comparators, and Outcomes (PICOs) extraction achieved F1 of 0.74; accuracy for study type, location, and size were 74%, 78%, and 91%, respectively. Time savings of 50% in Abstract Screening and 70–80% in qualitative extraction were reported. Extraction of user-specified qualitative and quantitative tags and data elements remains exploratory and requires human curation for SLRs.

Conclusion

AI systems can support high-quality, human-in-the-loop execution of key SLR stages. Transparency, replicability, and expert oversight are central to the use of AI SLR tools.

虽然人工智能(AI)工具已经被用于系统文献综述(SLR)过程中的各个阶段,但以前还没有工具被证明可以支持每个关键的单反步骤。此外,已认识到需要专家监督,以确保单反调查结果的质量。在这里,我们描述了一个完整的方法,用于利用我们的人工智能单反工具和人工在环管理工作流程,以及人工智能验证、节省时间和确保符合最佳审查实践的方法。方法单反研究需要完成相关研究的检索、筛选和提取,并进行meta分析和批判性评价。我们提出了一个完整的方法框架,利用我们的AutoLit软件(嵌套知识)完成单反。该系统将人工智能模型集成到单反的核心步骤中:搜索策略生成,标题/摘要和全文的双重筛选,以及定性和定量证据的提取。该系统还提供手动关键评估和洞察力起草和全自动网络元分析。报告了将人工智能性能与专家进行比较的验证,并在相关的情况下,节省时间和“快速审查”替代单反工作流程。使用Smart Search AI生成搜索策略可以将研究问题转换为完整的布尔字符串,在两个验证集中召回率分别为76.8%和79.6%。有监督的机器学习工具在审查员级别的筛选中可以达到82-97%的召回率。人群、干预/比较物和结果(PICOs)提取的F1为0.74;研究类型、地点和规模的准确性分别为74%、78%和91%。摘要筛选节省时间50%,定性提取节省时间70-80%。用户指定的定性和定量标签和数据元素的提取仍然是探索性的,需要人为的单反管理。结论:人工智能系统可以支持高质量的、人工在环的单反关键阶段的执行。透明度、可复制性和专家监督是使用人工智能单反工具的核心。
{"title":"Human-in-the-Loop Artificial Intelligence System for Systematic Literature Review: Methods and Validations for the AutoLit Review Software","authors":"Kevin M. Kallmes,&nbsp;Jade Thurnham,&nbsp;Marius Sauca,&nbsp;Ranita Tarchand,&nbsp;Keith R. Kallmes,&nbsp;Karl J. Holub","doi":"10.1002/cesm.70059","DOIUrl":"https://doi.org/10.1002/cesm.70059","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Introduction</h3>\u0000 \u0000 <p>While artificial intelligence (AI) tools have been utilized for individual stages within the systematic literature review (SLR) process, no tool has previously been shown to support each critical SLR step. In addition, the need for expert oversight has been recognized to ensure the quality of SLR findings. Here, we describe a complete methodology for utilizing our AI SLR tool with human-in-the-loop curation workflows, as well as AI validations, time savings, and approaches to ensure compliance with best review practices.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>SLRs require completing Search, Screening, and Extraction from relevant studies, with meta-analysis and critical appraisal as relevant. We present a full methodological framework for completing SLRs utilizing our AutoLit software (Nested Knowledge). This system integrates AI models into the central steps in SLR: Search strategy generation, Dual Screening of Titles/Abstracts and Full Texts, and Extraction of qualitative and quantitative evidence. The system also offers manual Critical Appraisal and Insight drafting and fully-automated Network Meta-analysis. Validations comparing AI performance to experts are reported, and where relevant, time savings and ‘rapid review’ alternatives to the SLR workflow.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>Search strategy generation with the Smart Search AI can turn a Research Question into full Boolean strings with 76.8% and 79.6% Recall in two validation sets. Supervised machine learning tools can achieve 82–97% Recall in reviewer-level Screening. Population, Interventions/Comparators, and Outcomes (PICOs) extraction achieved F1 of 0.74; accuracy for study type, location, and size were 74%, 78%, and 91%, respectively. Time savings of 50% in Abstract Screening and 70–80% in qualitative extraction were reported. Extraction of user-specified qualitative and quantitative tags and data elements remains exploratory and requires human curation for SLRs.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusion</h3>\u0000 \u0000 <p>AI systems can support high-quality, human-in-the-loop execution of key SLR stages. Transparency, replicability, and expert oversight are central to the use of AI SLR tools.</p>\u0000 </section>\u0000 </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70059","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145367149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Evidence Synthesis Efficiency: Leveraging Large Language Models and Agentic Workflows for Optimized Literature Screening 提高证据合成效率:利用大型语言模型和代理工作流程优化文献筛选。
Pub Date : 2025-10-21 DOI: 10.1002/cesm.70042
Bing Hu, Emmalie Tomini, Tricia Corrin, Kusala Pussegoda, Elias Sandner, Andre Henriques, Alice Simniceanu, Luca Fontana, Andreas Wagner, Stephanie Brazeau, Lisa Waddell

Background

Public health events of international concern highlight the need for up-to-date evidence curated using sustainable processes that are accessible. In development of the Global Repository of Epidemiological Parameters (grEPI) we explore the performance of an agentic-AI assisted pipeline (GREP-Agent) for screening evidence which capitalizes on recent advancements in large language models (LLMs).

Methods

In this study, the performance of the GREP-Agent was evaluated on a data set of 2000 citations from a systematic review on measles using four LLMs (GPT4o, GPT4o-mini, Llama3.1, and Phi4). The GREP-Agent framework integrates multiple LLMs and human feedback to fine-tune its performance, optimize workload reduction and accuracy in screening research articles. The impact on performance of each part of this Agentic-AI system is presented and measured by accuracy, precision, recall, and F1-score metrics.

Results

The results show how each phase of the GREP-Agent system incrementally improves accuracy regardless of the LLM. We found that GREP-Agent was able to increase sensitivity across a broad range of open source and proprietary LLMs to 84.2%–88.9% after fine-tuning and to 86.4%–95.3% by varying workload reduction strategies. Performance was significantly impacted by the clarity of the screening questions and setting thresholds for optimized workload reduction strategies.

Conclusions

The GREP-Agent shows promise in improving the efficiency and effectiveness of evidence synthesis in dynamic public health contexts. Further development and refinement of adaptable human-in-the-loop AI systems for screening literature are essential to support future public health response activities, while maintaining a human-centric approach.

背景:国际关注的公共卫生事件突出了利用可获得的可持续程序整理最新证据的必要性。在全球流行病学参数库(grEPI)的开发过程中,我们探索了一个代理-人工智能辅助管道(GREP-Agent)的性能,用于筛选证据,该管道利用了大型语言模型(llm)的最新进展。方法:在本研究中,使用四种llm (gpt40, gpt40 -mini, Llama3.1和Phi4)对来自麻疹系统综述的2000次引用数据集的GREP-Agent的性能进行评估。GREP-Agent框架集成了多个llm和人类反馈,以微调其性能,优化减少工作量和筛选研究文章的准确性。对这个agent - ai系统的每个部分的性能的影响是通过准确性、精度、召回率和f1得分指标来呈现和衡量的。结果:结果显示GREP-Agent系统的每个阶段如何逐步提高准确性,而与LLM无关。我们发现GREP-Agent能够在广泛的开源和专有llm中提高灵敏度,经过微调后达到84.2%-88.9%,通过不同的工作量减少策略提高到86.4%-95.3%。筛选问题的清晰度和为优化的工作量减少策略设置阈值对性能有显著影响。结论:GREP-Agent有望在动态公共卫生环境中提高证据合成的效率和有效性。进一步开发和完善用于筛选文献的适应性人在环人工智能系统,对于支持未来的公共卫生应对活动至关重要,同时保持以人为本的方法。
{"title":"Enhancing Evidence Synthesis Efficiency: Leveraging Large Language Models and Agentic Workflows for Optimized Literature Screening","authors":"Bing Hu,&nbsp;Emmalie Tomini,&nbsp;Tricia Corrin,&nbsp;Kusala Pussegoda,&nbsp;Elias Sandner,&nbsp;Andre Henriques,&nbsp;Alice Simniceanu,&nbsp;Luca Fontana,&nbsp;Andreas Wagner,&nbsp;Stephanie Brazeau,&nbsp;Lisa Waddell","doi":"10.1002/cesm.70042","DOIUrl":"10.1002/cesm.70042","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Background</h3>\u0000 \u0000 <p>Public health events of international concern highlight the need for up-to-date evidence curated using sustainable processes that are accessible. In development of the Global Repository of Epidemiological Parameters (grEPI) we explore the performance of an agentic-AI assisted pipeline (GREP-Agent) for screening evidence which capitalizes on recent advancements in large language models (LLMs).</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>In this study, the performance of the GREP-Agent was evaluated on a data set of 2000 citations from a systematic review on measles using four LLMs (GPT4o, GPT4o-mini, Llama3.1, and Phi4). The GREP-Agent framework integrates multiple LLMs and human feedback to fine-tune its performance, optimize workload reduction and accuracy in screening research articles. The impact on performance of each part of this Agentic-AI system is presented and measured by accuracy, precision, recall, and F1-score metrics.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>The results show how each phase of the GREP-Agent system incrementally improves accuracy regardless of the LLM. We found that GREP-Agent was able to increase sensitivity across a broad range of open source and proprietary LLMs to 84.2%–88.9% after fine-tuning and to 86.4%–95.3% by varying workload reduction strategies. Performance was significantly impacted by the clarity of the screening questions and setting thresholds for optimized workload reduction strategies.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusions</h3>\u0000 \u0000 <p>The GREP-Agent shows promise in improving the efficiency and effectiveness of evidence synthesis in dynamic public health contexts. Further development and refinement of adaptable human-in-the-loop AI systems for screening literature are essential to support future public health response activities, while maintaining a human-centric approach.</p>\u0000 </section>\u0000 </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12538819/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145351061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Health Equity in Systematic Reviews: A Tutorial—Part 2 Implementing Health Equity Throughout Your Methods 系统评论中的健康公平:教程-第2部分:在你的方法中实现健康公平。
Pub Date : 2025-10-15 DOI: 10.1002/cesm.70054
Jennifer Petkovic, Jordi Pardo Pardo, Vivian Welch, Omar Dewidar, Lara J. Maxwell, Andrea Darzi, Tamara Lotfi, Lawrence Mbuagbaw, Kevin Pottie, Peter Tugwell

This is the second and final tutorial in a series on health equity. It provides detailed guidance for considering health equity in systematic reviews of interventions. We will explain how to include and report health equity in all remaining sections of the review.

这是关于健康公平系列的第二篇也是最后一篇教程。它为在系统审查干预措施时考虑卫生公平提供了详细指导。我们将在本综述的其余部分解释如何纳入和报告健康公平。
{"title":"Health Equity in Systematic Reviews: A Tutorial—Part 2 Implementing Health Equity Throughout Your Methods","authors":"Jennifer Petkovic,&nbsp;Jordi Pardo Pardo,&nbsp;Vivian Welch,&nbsp;Omar Dewidar,&nbsp;Lara J. Maxwell,&nbsp;Andrea Darzi,&nbsp;Tamara Lotfi,&nbsp;Lawrence Mbuagbaw,&nbsp;Kevin Pottie,&nbsp;Peter Tugwell","doi":"10.1002/cesm.70054","DOIUrl":"10.1002/cesm.70054","url":null,"abstract":"<p>This is the second and final tutorial in a series on health equity. It provides detailed guidance for considering health equity in systematic reviews of interventions. We will explain how to include and report health equity in all remaining sections of the review.</p>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12522172/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145310506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Long-Term Outcomes of Invasive vs Noninvasive Treatment for Intermittent Claudication: A Systematic Review and Meta-Analysis 间歇性跛行有创与无创治疗的长期结果:系统回顾和荟萃分析
Pub Date : 2025-10-03 DOI: 10.1002/cesm.70053
Anas Elmahi, Nathalie Doolan, Mohiedin Hezima, Anwar Gowey, Daragh Moneley, Seamus McHugh, Sayed Aly, Peter Naughton, Elrasheid A. H. Kheirelseid
<div> <section> <h3> Background</h3> <p>Intermittent claudication (IC) is a hallmark symptom of peripheral arterial disease (PAD), causing pain and discomfort during physical activity caused by reduced blood flow to the lower extremities. The condition significantly impairs mobility and quality of life (QoL) in affected individuals. Treatment options for IC range from conservative approaches, including best medical therapy (BMT) and supervised exercise therapy (SET), to invasive interventions like angioplasty and open re-vascularization.</p> </section> <section> <h3> Aim</h3> <p>This meta-analysis and systematic review seek to assess the long-term results of invasive procedures concerning Noninvasive treatments for the management of patients with IC.</p> </section> <section> <h3> Methods</h3> <p>A comprehensive search was conducted in October 2024 across databases containing PubMed, MEDLINE, Cochrane Library, Embase, and Scopus. Randomized controlled trials (RCTs) comparing invasive interventions to Noninvasive treatments were included. Primary outcomes were quality of life (QoL), ankle-brachial pressure index (ABPI), and maximum walking distance (MWD). Secondary outcomes were major adverse cardiovascular events (MACE), mortality, complications, and re-intervention rates. Data analysis was conducted using the Cochrane Review Manager 5. Follow-up duration was between 2 and 7 years, longest available between 2 and 7 years; prioritized 2 years when present.</p> </section> <section> <h3> Results</h3> <p>A total of 11 RCTs with 1379 patients were included in the analysis. Invasive treatments demonstrated a significant improvement in MWD and ABPI compared to Noninvasive treatments (MWD pooled Mean Difference (MD) = 64.94 [10.77, 115.12] 95% CI, <i>p</i> = .02, 5 studies, and ABPI pooled MD = 0.15 [0.04, 0.26] 95% CI, <i>p</i> = .006, 5 studies). However, invasive interventions were associated with a higher rate of complications, including increased amputation risk (Pooled odds ratio (OR) = 2.46 [0.44, 13.94] 95% CI, <i>p</i> = .31, 3 studies), though this was not statistically significant. Long-term rates were higher in the Noninvasive treatment group (Pooled OR: 0.56 [0.33, 0.97] 95% CI, <i>p</i> = .04).</p> </section> <section> <h3> Conclusions</h3> <p>Both invasive and Noninvasive treatments are effective in managing IC. Invasive treatments provide greater improvement in blood flow and walking distance, but the risk of complications and re-interventio
背景:间歇性跛行(IC)是外周动脉疾病(PAD)的标志性症状,在体力活动时由于下肢血流量减少而引起疼痛和不适。这种情况严重损害了受影响个体的活动能力和生活质量。IC的治疗选择范围从保守方法,包括最佳药物治疗(BMT)和监督运动治疗(SET),到侵入性干预,如血管成形术和开放血管重建。本荟萃分析和系统综述旨在评估非侵入性治疗对IC患者管理的长期结果。方法于2024年10月在PubMed, MEDLINE, Cochrane Library, Embase和Scopus数据库中进行了全面检索。随机对照试验(rct)比较侵入性干预和非侵入性治疗。主要结局为生活质量(QoL)、踝肱压力指数(ABPI)和最大步行距离(MWD)。次要结局是主要不良心血管事件(MACE)、死亡率、并发症和再干预率。使用Cochrane Review Manager 5进行数据分析。随访时间2 ~ 7年,最长随访时间2 ~ 7年;优先考虑2年的时间。结果共纳入11项rct, 1379例患者。有创治疗与无创治疗相比,MWD和ABPI有显著改善(MWD合并平均差值(MD) = 64.94 [10.77, 115.12] 95% CI, p =。22,5项研究,ABPI合并MD = 0.15 [0.04, 0.26] 95% CI, p =。006, 5项研究)。然而,侵入性干预与更高的并发症发生率相关,包括截肢风险增加(合并优势比(OR) = 2.46 [0.44, 13.94] 95% CI, p =。31,3项研究),尽管这在统计学上并不显著。无创治疗组的长期发生率较高(合并OR: 0.56 [0.33, 0.97] 95% CI, p = 0.04)。结论有创和无创治疗均可有效治疗IC。有创治疗可显著改善血流和步行距离,但在治疗决策时应考虑并发症和再次干预的风险。为了评估侵入性治疗的成本效益和长期结果,需要进一步研究更大的样本量并设计用于长期评估。
{"title":"Long-Term Outcomes of Invasive vs Noninvasive Treatment for Intermittent Claudication: A Systematic Review and Meta-Analysis","authors":"Anas Elmahi,&nbsp;Nathalie Doolan,&nbsp;Mohiedin Hezima,&nbsp;Anwar Gowey,&nbsp;Daragh Moneley,&nbsp;Seamus McHugh,&nbsp;Sayed Aly,&nbsp;Peter Naughton,&nbsp;Elrasheid A. H. Kheirelseid","doi":"10.1002/cesm.70053","DOIUrl":"https://doi.org/10.1002/cesm.70053","url":null,"abstract":"&lt;div&gt;\u0000 \u0000 \u0000 &lt;section&gt;\u0000 \u0000 &lt;h3&gt; Background&lt;/h3&gt;\u0000 \u0000 &lt;p&gt;Intermittent claudication (IC) is a hallmark symptom of peripheral arterial disease (PAD), causing pain and discomfort during physical activity caused by reduced blood flow to the lower extremities. The condition significantly impairs mobility and quality of life (QoL) in affected individuals. Treatment options for IC range from conservative approaches, including best medical therapy (BMT) and supervised exercise therapy (SET), to invasive interventions like angioplasty and open re-vascularization.&lt;/p&gt;\u0000 &lt;/section&gt;\u0000 \u0000 &lt;section&gt;\u0000 \u0000 &lt;h3&gt; Aim&lt;/h3&gt;\u0000 \u0000 &lt;p&gt;This meta-analysis and systematic review seek to assess the long-term results of invasive procedures concerning Noninvasive treatments for the management of patients with IC.&lt;/p&gt;\u0000 &lt;/section&gt;\u0000 \u0000 &lt;section&gt;\u0000 \u0000 &lt;h3&gt; Methods&lt;/h3&gt;\u0000 \u0000 &lt;p&gt;A comprehensive search was conducted in October 2024 across databases containing PubMed, MEDLINE, Cochrane Library, Embase, and Scopus. Randomized controlled trials (RCTs) comparing invasive interventions to Noninvasive treatments were included. Primary outcomes were quality of life (QoL), ankle-brachial pressure index (ABPI), and maximum walking distance (MWD). Secondary outcomes were major adverse cardiovascular events (MACE), mortality, complications, and re-intervention rates. Data analysis was conducted using the Cochrane Review Manager 5. Follow-up duration was between 2 and 7 years, longest available between 2 and 7 years; prioritized 2 years when present.&lt;/p&gt;\u0000 &lt;/section&gt;\u0000 \u0000 &lt;section&gt;\u0000 \u0000 &lt;h3&gt; Results&lt;/h3&gt;\u0000 \u0000 &lt;p&gt;A total of 11 RCTs with 1379 patients were included in the analysis. Invasive treatments demonstrated a significant improvement in MWD and ABPI compared to Noninvasive treatments (MWD pooled Mean Difference (MD) = 64.94 [10.77, 115.12] 95% CI, &lt;i&gt;p&lt;/i&gt; = .02, 5 studies, and ABPI pooled MD = 0.15 [0.04, 0.26] 95% CI, &lt;i&gt;p&lt;/i&gt; = .006, 5 studies). However, invasive interventions were associated with a higher rate of complications, including increased amputation risk (Pooled odds ratio (OR) = 2.46 [0.44, 13.94] 95% CI, &lt;i&gt;p&lt;/i&gt; = .31, 3 studies), though this was not statistically significant. Long-term rates were higher in the Noninvasive treatment group (Pooled OR: 0.56 [0.33, 0.97] 95% CI, &lt;i&gt;p&lt;/i&gt; = .04).&lt;/p&gt;\u0000 &lt;/section&gt;\u0000 \u0000 &lt;section&gt;\u0000 \u0000 &lt;h3&gt; Conclusions&lt;/h3&gt;\u0000 \u0000 &lt;p&gt;Both invasive and Noninvasive treatments are effective in managing IC. Invasive treatments provide greater improvement in blood flow and walking distance, but the risk of complications and re-interventio","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70053","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145224118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparison of Elicit AI and Traditional Literature Searching in Evidence Syntheses Using Four Case Studies 引导性人工智能与传统文献检索在证据综合中的比较——以四个案例为例
Pub Date : 2025-09-27 DOI: 10.1002/cesm.70050
Oscar Lau, Su Golder

Background

Elicit AI aims to simplify and accelerate the systematic review process without compromising accuracy. However, research on Elicit's performance is limited.

Objectives

To determine whether Elicit AI is a viable tool for systematic literature searches and title/abstract screening stages.

Methods

We compared the included studies in four evidence syntheses to those identified using the subscription-based version of Elicit Pro in Review mode. We calculated sensitivity, precision and observed patterns in the performance of Elicit.

Results

The sensitivity of Elicit was poor, averaging 39.5% (25.5–69.2%) compared to 94.5% (91.1–98.0%) in the original reviews. However, Elicit identified some included studies not identified by the original searches and had an average of 41.8% precision (35.6–46.2%) which was higher than the 7.55% average of the original reviews (0.65–14.7%).

Discussion

At the time of this evaluation, Elicit did not search with high enough sensitivity to replace traditional literature searching. However, the high precision of searching in Elicit could prove useful for preliminary searches, and the unique studies identified mean that Elicit can be used by researchers as a useful adjunct.

Conclusion

Whilst Elicit searches are currently not sensitive enough to replace traditional searching, Elicit is continually improving, and further evaluations should be undertaken as new developments take place.

引出人工智能旨在简化和加速系统审查过程,而不影响准确性。然而,对Elicit的表现的研究是有限的。目的确定Elicit AI是否为系统文献检索和标题/摘要筛选阶段的可行工具。方法我们将四项证据综合纳入的研究与使用基于订阅的Elicit Pro在Review模式下识别的研究进行比较。我们计算了Elicit的灵敏度、精度和观察模式。结果Elicit的敏感性较差,平均为39.5%(25.5 ~ 69.2%),而原始评价的敏感性为94.5%(91.1 ~ 98.0%)。然而,Elicit识别了一些未被原始检索识别的纳入研究,平均准确率为41.8%(35.6-46.2%),高于原始评论的平均准确率7.55%(0.65-14.7%)。在本次评估时,Elicit的检索灵敏度不足以取代传统的文献检索。然而,在Elicit中搜索的高精度可能被证明对初步搜索有用,并且鉴定的独特研究意味着可以被研究人员用作有用的辅助语。结论虽然目前的Elicit搜索不够灵敏,不足以取代传统的搜索,但Elicit仍在不断改进,随着新的发展,应进行进一步的评估。
{"title":"Comparison of Elicit AI and Traditional Literature Searching in Evidence Syntheses Using Four Case Studies","authors":"Oscar Lau,&nbsp;Su Golder","doi":"10.1002/cesm.70050","DOIUrl":"https://doi.org/10.1002/cesm.70050","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Background</h3>\u0000 \u0000 <p>Elicit AI aims to simplify and accelerate the systematic review process without compromising accuracy. However, research on Elicit's performance is limited.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Objectives</h3>\u0000 \u0000 <p>To determine whether Elicit AI is a viable tool for systematic literature searches and title/abstract screening stages.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>We compared the included studies in four evidence syntheses to those identified using the subscription-based version of Elicit Pro in Review mode. We calculated sensitivity, precision and observed patterns in the performance of Elicit.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>The sensitivity of Elicit was poor, averaging 39.5% (25.5–69.2%) compared to 94.5% (91.1–98.0%) in the original reviews. However, Elicit identified some included studies not identified by the original searches and had an average of 41.8% precision (35.6–46.2%) which was higher than the 7.55% average of the original reviews (0.65–14.7%).</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Discussion</h3>\u0000 \u0000 <p>At the time of this evaluation, Elicit did not search with high enough sensitivity to replace traditional literature searching. However, the high precision of searching in Elicit could prove useful for preliminary searches, and the unique studies identified mean that Elicit can be used by researchers as a useful adjunct.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusion</h3>\u0000 \u0000 <p>Whilst Elicit searches are currently not sensitive enough to replace traditional searching, Elicit is continually improving, and further evaluations should be undertaken as new developments take place.</p>\u0000 </section>\u0000 </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70050","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145172063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Split Body Trials in Systematic Reviews and Meta-Analyses: A Tutorial 系统评价和荟萃分析中的分离体试验:指南
Pub Date : 2025-09-24 DOI: 10.1002/cesm.70052
Nuala Livingstone, Kerry Dwan, Marty Chaplin

This tutorial focuses on split body trials in the context of a systematic review and meta-analysis. We will explain what split body trials are, the potential unit of analysis issues they can cause, and how to include data from split body trials in a systematic review.

Split body trials micro learning module

本教程的重点是在系统回顾和荟萃分析的背景下进行分离体试验。我们将解释什么是分离体试验,它们可能引起的潜在分析问题单元,以及如何将分离体试验的数据纳入系统评价。分体试验微学习模块
{"title":"Split Body Trials in Systematic Reviews and Meta-Analyses: A Tutorial","authors":"Nuala Livingstone,&nbsp;Kerry Dwan,&nbsp;Marty Chaplin","doi":"10.1002/cesm.70052","DOIUrl":"https://doi.org/10.1002/cesm.70052","url":null,"abstract":"<p>This tutorial focuses on split body trials in the context of a systematic review and meta-analysis. We will explain what split body trials are, the potential unit of analysis issues they can cause, and how to include data from split body trials in a systematic review.</p><p>Split body trials micro learning module\u0000 \u0000 <figure>\u0000 <div><picture>\u0000 <source></source></picture><p></p>\u0000 </div>\u0000 </figure></p>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70052","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145146377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging AI for Meta-Analysis: Evaluating LLMs in Detecting Publication Bias for Next-Generation Evidence Synthesis 利用人工智能进行荟萃分析:评估法学硕士在检测下一代证据合成的发表偏倚方面的作用
Pub Date : 2025-09-18 DOI: 10.1002/cesm.70047
Xing Xing, Lifeng Lin, Mohammad Hassan Murad, Jiayi Tong

Introduction

Publication bias (PB) threatens the validity of meta-analyses by distorting effect size estimates, potentially leading to misleading conclusions. With advanced pattern recognition and multimodal capabilities, large language models (LLMs) may be able to evaluate PB and make the systematic review process more efficient.

Methods

We evaluated the ability of two state-of-the-art multimodal LLMs, GPT-4o and Llama 3.2 Vision, to detect PB using funnel plots alone and in combination with quantitative inputs. We simulated meta-analyses under varying conditions, including the absence of PB, different levels of presence of PB, varying total number of studies within a meta-analysis, and differing degrees of between-study heterogeneity.

Results

Neither GPT-4o nor Llama 3.2 Vision consistently detected the presence of PB across various settings. Under no-publication-bias conditions, GPT-4o achieved a higher specificity outperforming Llama 3.2 Vision, with the difference most shown in the meta-analyses with 20 or more studies. The inclusion of quantitative inputs alongside funnel plots did not significantly improve performance. Additionally, between-study heterogeneity and patterns of non-reported studies had minimal impact on the models’ assessments.

Conclusions

The ability of LLMs to detect PB without fine-tuning is limited at the present time. This study highlights the need for specialized model adaptation before LLMs can be effectively integrated into meta-analysis workflows. Future research can focus on targeted refinements to enhance LLM performance and utility in evidence synthesis.

发表偏倚(Publication bias, PB)会扭曲效应大小估计,从而威胁到meta分析的有效性,可能导致误导性结论。有了先进的模式识别和多模态能力,大型语言模型(llm)可能能够评估PB并使系统审查过程更有效。方法我们评估了两种最先进的多模式LLMs, gpt - 40和Llama 3.2 Vision,单独使用漏斗图和结合定量输入来检测PB的能力。我们模拟了不同条件下的荟萃分析,包括缺乏PB、不同水平的PB、荟萃分析中不同的研究总数,以及不同程度的研究间异质性。结果gpt - 40和Llama 3.2 Vision在不同设置下都不能一致地检测到PB的存在。在无发表偏倚条件下,gpt - 40比Llama 3.2 Vision具有更高的特异性,这种差异在20项或更多研究的荟萃分析中最为明显。将定量输入与漏斗图一起纳入并没有显著提高性能。此外,研究间异质性和未报告研究的模式对模型评估的影响最小。结论目前LLMs检测PB的能力有限,不需要进行微调。这项研究强调了在llm能够有效地集成到元分析工作流程之前需要专门的模型适应。未来的研究可以专注于有针对性的改进,以提高LLM在证据合成中的性能和效用。
{"title":"Leveraging AI for Meta-Analysis: Evaluating LLMs in Detecting Publication Bias for Next-Generation Evidence Synthesis","authors":"Xing Xing,&nbsp;Lifeng Lin,&nbsp;Mohammad Hassan Murad,&nbsp;Jiayi Tong","doi":"10.1002/cesm.70047","DOIUrl":"https://doi.org/10.1002/cesm.70047","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Introduction</h3>\u0000 \u0000 <p>Publication bias (PB) threatens the validity of meta-analyses by distorting effect size estimates, potentially leading to misleading conclusions. With advanced pattern recognition and multimodal capabilities, large language models (LLMs) may be able to evaluate PB and make the systematic review process more efficient.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>We evaluated the ability of two state-of-the-art multimodal LLMs, GPT-4o and Llama 3.2 Vision, to detect PB using funnel plots alone and in combination with quantitative inputs. We simulated meta-analyses under varying conditions, including the absence of PB, different levels of presence of PB, varying total number of studies within a meta-analysis, and differing degrees of between-study heterogeneity.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>Neither GPT-4o nor Llama 3.2 Vision consistently detected the presence of PB across various settings. Under no-publication-bias conditions, GPT-4o achieved a higher specificity outperforming Llama 3.2 Vision, with the difference most shown in the meta-analyses with 20 or more studies. The inclusion of quantitative inputs alongside funnel plots did not significantly improve performance. Additionally, between-study heterogeneity and patterns of non-reported studies had minimal impact on the models’ assessments.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusions</h3>\u0000 \u0000 <p>The ability of LLMs to detect PB without fine-tuning is limited at the present time. This study highlights the need for specialized model adaptation before LLMs can be effectively integrated into meta-analysis workflows. Future research can focus on targeted refinements to enhance LLM performance and utility in evidence synthesis.</p>\u0000 </section>\u0000 </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70047","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145101484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Retiring the Term “Weighted Mean Difference” in Contemporary Evidence Synthesis 退出当代证据综合中的“加权平均差”一词
Pub Date : 2025-09-11 DOI: 10.1002/cesm.70051
Lifeng Lin, Xing Xing, Wenshan Han, Jiayi Tong
<p>Evidence synthesis frequently involves quantitative analyses of continuous outcomes. A cross-sectional study examining Cochrane systematic reviews identified 6672 out of 22,453 meta-analyses (29.7%) involved continuous outcomes [<span>1</span>]. The primary effect measures employed in meta-analyses of continuous outcomes are the mean difference (MD) and standardized mean difference (SMD) [<span>2</span>]. The MD is appropriately applied when all included studies measure outcomes using identical scales (e.g., body weight in kilograms). In contrast, the SMD serves as a solution when studies utilize different measurement scales (e.g., varied questionnaire scoring methods). Although alternative measures (e.g., the ratio of means) exist [<span>3</span>], their application remains relatively infrequent.</p><p>Despite this conceptual clarity, the term “weighted mean difference” (WMD) appears frequently in the systematic review literature [<span>4</span>], which can lead to confusion about its relationship to the MD. In this article, we first clarify the distinction between MD and WMD, then describe the historical factors underlying the term's adoption and persistence, discuss why contemporary methods render it unnecessary, illustrate examples of misuse, and conclude with practical recommendations for clearer reporting.</p><p>The MD represents the straightforward difference between group means (e.g., intervention vs. control) for a continuous outcome. Although the true MD value relates to unknown population-level differences, practical research relies on sample estimates from individual studies. Meta-analysis systematically synthesizes these study-level MD estimates to derive an overall summary effect across studies.</p><p>The term WMD emerged historically to emphasize the weighted averaging process of meta-analyses, wherein each study contributes a sample MD weighted by its statistical precision (i.e., inverse variance) [<span>5</span>]. Typically, larger studies with smaller variances or narrower confidence intervals are assigned greater weights. Traditional meta-analytical methods, performed through either fixed-effect (also known as common-effect) or random-effects models, follow this inverse-variance weighting principle. Under fixed-effect models, study weights directly reflect the inverse of their variances, whereas random-effects models incorporate both within-study and between-study variances.</p><p>To contextualize the widespread adoption of WMD, we conducted a brief literature search using Google Scholar on June 12, 2025. Using exact-phrase queries in quotation marks, for each calendar year from 1990 to 2024, we recorded the counts for “weighted mean difference” AND “systematic review” and separately for “systematic review,” then calculated the yearly proportion (Figure 1). Google Scholar indexes titles, abstracts, and, when available, full texts, so counts reflect occurrences anywhere in the indexed record, and these counts are approximate.
证据合成通常涉及对连续结果的定量分析。一项检查Cochrane系统评价的横断面研究发现,在22,453项荟萃分析中,有6672项(29.7%)涉及连续结果。连续结局荟萃分析中采用的主要效应测量是平均差(MD)和标准化平均差(SMD)[2]。当所有纳入的研究使用相同的尺度(例如,以公斤为单位的体重)测量结果时,适当应用MD。相反,当研究使用不同的测量尺度(例如,不同的问卷评分方法)时,SMD可以作为一种解决方案。虽然有替代措施(例如,均值比率),但它们的应用仍然相对较少。尽管概念如此清晰,但“加权平均差”(WMD)一词经常出现在系统综述文献[4]中,这可能导致其与大规模杀伤性武器的关系混淆。在本文中,我们首先澄清了大规模杀伤性武器和大规模杀伤性武器之间的区别,然后描述了该术语被采用和持续存在的历史因素,讨论了为什么当代方法使它变得不必要,举例说明误用的例子。最后为更清晰的报告提出切实可行的建议。MD表示连续结果的组均值(例如,干预与对照)之间的直接差异。虽然真正的MD值与未知的群体水平差异有关,但实际研究依赖于个体研究的样本估计。荟萃分析系统地综合了这些研究水平的MD估计,以得出所有研究的总体总结效应。历史上,WMD一词的出现是为了强调荟萃分析的加权平均过程,其中每项研究贡献了一个样本MD,该MD由其统计精度(即逆方差)加权。通常,具有较小方差或较窄置信区间的较大研究被赋予较大权重。传统的元分析方法,通过固定效应(也称为共同效应)或随机效应模型执行,遵循这种逆方差加权原则。在固定效应模型下,研究权重直接反映其方差的反比,而随机效应模型同时包含研究内方差和研究间方差。为了了解大规模杀伤性武器被广泛采用的背景,我们于2025年6月12日使用谷歌Scholar进行了简短的文献检索。使用带引号的精确短语查询,从1990年到2024年的每个日历年,我们分别记录了“加权平均差”和“系统评论”的计数,并分别记录了“系统评论”的计数,然后计算了年度比例(图1)。谷歌Scholar对标题、摘要和全文(如果有的话)进行索引,因此计数反映了索引记录中任何地方的出现次数,这些计数是近似值。我们没有筛选正确与不正确使用的个别记录,因为我们的目标是描述术语的流行程度,而不是量化误用。因此,我们按照图1中报告的比例记录了使用随时间的演变。这项分析显示1996年前后大规模杀伤性武器的使用明显增加,紧跟着1995年4月Cochrane系统评价数据库(CDSR)的建立。Cochrane综述的重要作用可能极大地促进了这一术语的传播。最新的Cochrane手册[6]第6.5章证实了“加权平均差”一词在早期版本的CDSR中普遍存在,至少从2008年开始,手册版本中就出现了这样的警告:“基于这种效应测量的分析在历史上被称为[CDSR]中的[大规模杀伤性武器]分析。这个名字可能会让人混淆:虽然荟萃分析计算的是这些差异的加权平均值,但在计算单个研究的统计摘要时没有涉及加权。此外,所有的元分析都涉及估算的加权组合,但我们在提到其他方法时并不使用“加权”这个词。”另一个可能导致该术语继续使用的因素是引用安德拉德在2020年的声明,即“将汇总的MD更准确地描述为加权平均差或大规模杀伤性武器。”虽然这种解释在描述元分析池背后的统计过程时在技术上并不是不正确的,但它可能无意中鼓励更广泛或粗心地使用大规模杀伤性武器一词。尽管文献中已有关于大规模杀伤性武器的说明,但图1说明了大规模杀伤性武器的继续广泛使用。具体来说,虽然系统综述出版物总数在2018年左右达到峰值,然后下降(图1C),但提到大规模杀伤性武器的出版物数量和比例在2024年继续上升(图1A,B)。 虽然该术语并非在所有情况下都被滥用,但这一趋势表明,现有的警告影响有限,并强调了更明确术语的价值。这些历史性的和描述性的观察激发了对当前分析实践和术语的关注,如下所述。明确强调“大规模杀伤性武器”一词固有的权重可能会产生误导,因为权重是传统元分析方法的基础,无论结果类型(连续、二元、事件时间等)如何。然而,类似的术语,如“加权优势比”或“加权风险比”很少使用。因此,更一般的术语,如“汇集医学博士”、“联合医学博士”、“整体医学博士”或“元分析医学博士”可能更合适和一致。此外,当代证据合成方法的进步经常超出传统的反方差加权。现代荟萃分析,包括两两和网络应用,通常拟合为一阶段广义线性混合或贝叶斯层次模型,其中治疗效果由似然联合估计[8-10]。在这些模型中,精度是通过模型结构而不是通过明确的研究特定的逆方差权重来考虑的。因此,当结果量表相同时,汇总估计更清楚地报告为汇总MD或其他明确的描述符,如元分析MD;“大规模杀伤性武器”一词是不必要的,它可能暗示着一种明显的效果度量。然而,不精确的用法仍然存在于当前的文献中,如下图所示。关键的是,MD专门涉及个人研究结果,而WMD专门代表荟萃分析综合。尽管存在这种明显的区别,但一些系统综述错误地将个别研究效应标记为大规模杀伤性武器[11-14]。例如,最近发表在《美国医学会杂志》上的一篇系统综述不准确地报道了筛查组和对照组之间收缩压和舒张压的“汇总加权平均差异”。在这里,汇集的MD固有地表示权重,使得“加权”的添加变得多余和误导。此外,美国眼科杂志最近的一篇文章将森林图描述为“加权平均差异(WMD)……在每个研究中。”另一篇应用论文将森林图标注为“WMD和95% CI”,两者都意味着研究水平的WMD[13]。此外,方法书的一章明确指出,“表3.4给出了每个研究的大规模杀伤性武器和95%置信区间。”随着时间的推移,这种误用在系统综述中持续存在,包括许多发表在各种高影响力期刊上的综述。将研究水平的效应标记为“大规模杀伤性武器”会模糊研究的大规模杀伤性武器和汇总的荟萃分析估计之间的区别。例如,一个说明“每项研究的大规模杀伤性武器”的图表标题可能表明,每项研究产生的是一种大规模杀伤性武器,而不是一种大规模杀伤性武器,这可能会使证据使用者对汇总的内容感到困惑。更清晰的标记(例如,“每个研究的MD”与“汇总的MD”)减少了这种风险并提高了可解释性。本文强调了“大规模杀伤性武器”一词的潜在不恰当性,特别是它在证据合成中的个别研究中的不正确应用。大规模杀伤性武器主要源于Cochrane系统评价的早期实践,不再符合当代方法的需求和严谨性。因此,我们建议取消“大规模杀伤性武器”一词,并采用更清晰的术语,将MD用于研究水平的效果,将汇总MD或荟萃分析MD用于综合估计,以促进更清晰、方法上合理的沟通。林立峰:构思、资金获取、调研、写作-原稿、可视化、写作-审稿、编辑。星星:调查、写作、评审、编辑。韩文山:数据策展、写作评审与编辑、可视化。童佳怡:构思、写作、审稿、编辑。作者声明无利益冲突。
{"title":"Retiring the Term “Weighted Mean Difference” in Contemporary Evidence Synthesis","authors":"Lifeng Lin,&nbsp;Xing Xing,&nbsp;Wenshan Han,&nbsp;Jiayi Tong","doi":"10.1002/cesm.70051","DOIUrl":"https://doi.org/10.1002/cesm.70051","url":null,"abstract":"&lt;p&gt;Evidence synthesis frequently involves quantitative analyses of continuous outcomes. A cross-sectional study examining Cochrane systematic reviews identified 6672 out of 22,453 meta-analyses (29.7%) involved continuous outcomes [&lt;span&gt;1&lt;/span&gt;]. The primary effect measures employed in meta-analyses of continuous outcomes are the mean difference (MD) and standardized mean difference (SMD) [&lt;span&gt;2&lt;/span&gt;]. The MD is appropriately applied when all included studies measure outcomes using identical scales (e.g., body weight in kilograms). In contrast, the SMD serves as a solution when studies utilize different measurement scales (e.g., varied questionnaire scoring methods). Although alternative measures (e.g., the ratio of means) exist [&lt;span&gt;3&lt;/span&gt;], their application remains relatively infrequent.&lt;/p&gt;&lt;p&gt;Despite this conceptual clarity, the term “weighted mean difference” (WMD) appears frequently in the systematic review literature [&lt;span&gt;4&lt;/span&gt;], which can lead to confusion about its relationship to the MD. In this article, we first clarify the distinction between MD and WMD, then describe the historical factors underlying the term's adoption and persistence, discuss why contemporary methods render it unnecessary, illustrate examples of misuse, and conclude with practical recommendations for clearer reporting.&lt;/p&gt;&lt;p&gt;The MD represents the straightforward difference between group means (e.g., intervention vs. control) for a continuous outcome. Although the true MD value relates to unknown population-level differences, practical research relies on sample estimates from individual studies. Meta-analysis systematically synthesizes these study-level MD estimates to derive an overall summary effect across studies.&lt;/p&gt;&lt;p&gt;The term WMD emerged historically to emphasize the weighted averaging process of meta-analyses, wherein each study contributes a sample MD weighted by its statistical precision (i.e., inverse variance) [&lt;span&gt;5&lt;/span&gt;]. Typically, larger studies with smaller variances or narrower confidence intervals are assigned greater weights. Traditional meta-analytical methods, performed through either fixed-effect (also known as common-effect) or random-effects models, follow this inverse-variance weighting principle. Under fixed-effect models, study weights directly reflect the inverse of their variances, whereas random-effects models incorporate both within-study and between-study variances.&lt;/p&gt;&lt;p&gt;To contextualize the widespread adoption of WMD, we conducted a brief literature search using Google Scholar on June 12, 2025. Using exact-phrase queries in quotation marks, for each calendar year from 1990 to 2024, we recorded the counts for “weighted mean difference” AND “systematic review” and separately for “systematic review,” then calculated the yearly proportion (Figure 1). Google Scholar indexes titles, abstracts, and, when available, full texts, so counts reflect occurrences anywhere in the indexed record, and these counts are approximate.","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70051","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145037868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Cochrane Evidence Synthesis and Methods
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1