Research Synthesis Methods最新文献_第4页

Calculating the power of a planned individual participant data meta-analysis to examine prognostic factor effects for a binary outcome 计算计划进行的个体参与者数据荟萃分析的功率，以检查二元结果的预后因素效应。

IF 5 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Research Synthesis Methods

Pub Date : 2024-07-24 DOI: 10.1002/jrsm.1737

Rebecca Whittle, Joie Ensor, Miriam Hattle, Paula Dhiman, Gary S. Collins, Richard D. Riley

Collecting data for an individual participant data meta-analysis (IPDMA) project can be time consuming and resource intensive and could still have insufficient power to answer the question of interest. Therefore, researchers should consider the power of their planned IPDMA before collecting IPD. Here we propose a method to estimate the power of a planned IPDMA project aiming to synthesise multiple cohort studies to investigate the (unadjusted or adjusted) effects of potential prognostic factors for a binary outcome. We consider both binary and continuous factors and provide a three-step approach to estimating the power in advance of collecting IPD, under an assumption of the true prognostic effect of each factor of interest. The first step uses routinely available (published) aggregate data for each study to approximate Fisher's information matrix and thereby estimate the anticipated variance of the unadjusted prognostic factor effect in each study. These variances are then used in step 2 to estimate the anticipated variance of the summary prognostic effect from the IPDMA. Finally, step 3 uses this variance to estimate the corresponding IPDMA power, based on a two-sided Wald test and the assumed true effect. Extensions are provided to adjust the power calculation for the presence of additional covariates correlated with the prognostic factor of interest (by using a variance inflation factor) and to allow for between-study heterogeneity in prognostic effects. An example is provided for illustration, and Stata code is supplied to enable researchers to implement the method.

为个体参与者数据荟萃分析（IPDMA）项目收集数据可能会耗费大量的时间和资源，而且可能仍然没有足够的力量来回答感兴趣的问题。因此，研究人员应在收集 IPD 之前考虑其计划的 IPDMA 功率。在此，我们提出了一种方法来估算计划中的 IPDMA 项目的功率，该项目旨在综合多项队列研究，以调查潜在预后因素对二元结局的（未调整或调整后的）影响。我们考虑了二元因素和连续因素，并提供了一种分三步的方法，在假定每个相关因素的真实预后效应的前提下，在收集 IPD 之前对功率进行估计。第一步使用每项研究的常规可用（已公布）汇总数据来近似费雪信息矩阵，从而估算出每项研究中未调整预后因素效应的预期方差。然后，在第 2 步中使用这些方差来估计 IPDMA 预测预后效应汇总的预期方差。最后，步骤 3 根据双侧 Wald 检验和假定的真实效应，使用该方差估算相应的 IPDMA 功率。该方法提供了扩展功能，可根据与相关预后因素相关的其他协变量的存在（通过使用方差膨胀因子）调整功率计算，并考虑预后效应的研究间异质性。本文提供了一个示例进行说明，并提供了 Stata 代码，以便研究人员实施该方法。

{"title":"Calculating the power of a planned individual participant data meta-analysis to examine prognostic factor effects for a binary outcome","authors":"Rebecca Whittle, Joie Ensor, Miriam Hattle, Paula Dhiman, Gary S. Collins, Richard D. Riley","doi":"10.1002/jrsm.1737","DOIUrl":"10.1002/jrsm.1737","url":null,"abstract":"<p>Collecting data for an individual participant data meta-analysis (IPDMA) project can be time consuming and resource intensive and could still have insufficient power to answer the question of interest. Therefore, researchers should consider the power of their planned IPDMA before collecting IPD. Here we propose a method to estimate the power of a planned IPDMA project aiming to synthesise multiple cohort studies to investigate the (unadjusted or adjusted) effects of potential prognostic factors for a binary outcome. We consider both binary and continuous factors and provide a three-step approach to estimating the power in advance of collecting IPD, under an assumption of the true prognostic effect of each factor of interest. The first step uses routinely available (published) aggregate data for each study to approximate Fisher's information matrix and thereby estimate the anticipated variance of the unadjusted prognostic factor effect in each study. These variances are then used in step 2 to estimate the anticipated variance of the summary prognostic effect from the IPDMA. Finally, step 3 uses this variance to estimate the corresponding IPDMA power, based on a two-sided Wald test and the assumed true effect. Extensions are provided to adjust the power calculation for the presence of additional covariates correlated with the prognostic factor of interest (by using a variance inflation factor) and to allow for between-study heterogeneity in prognostic effects. An example is provided for illustration, and Stata code is supplied to enable researchers to implement the method.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 6","pages":"905-916"},"PeriodicalIF":5.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1737","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141750692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A comparison of two models for detecting inconsistency in network meta-analysis 网络荟萃分析中检测不一致性的两种模型比较。

IF 5 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Research Synthesis Methods

Pub Date : 2024-07-04 DOI: 10.1002/jrsm.1734

Lu Qin, Shishun Zhao, Wenlai Guo, Tiejun Tong, Ke Yang

The application of network meta-analysis is becoming increasingly widespread, and for a successful implementation, it requires that the direct comparison result and the indirect comparison result should be consistent. Because of this, a proper detection of inconsistency is often a key issue in network meta-analysis as whether the results can be reliably used as a clinical guidance. Among the existing methods for detecting inconsistency, two commonly used models are the design-by-treatment interaction model and the side-splitting models. While the original side-splitting model was initially estimated using a Bayesian approach, in this context, we employ the frequentist approach. In this paper, we review these two types of models comprehensively as well as explore their relationship by treating the data structure of network meta-analysis as missing data and parameterizing the potential complete data for each model. Through both analytical and numerical studies, we verify that the side-splitting models are specific instances of the design-by-treatment interaction model, incorporating additional assumptions or under certain data structure. Moreover, the design-by-treatment interaction model exhibits robust performance across different data structures on inconsistency detection compared to the side-splitting models. Finally, as a practical guidance for inconsistency detection, we recommend utilizing the design-by-treatment interaction model when there is a lack of information about the potential location of inconsistency. By contrast, the side-splitting models can serve as a supplementary method especially when the number of studies in each design is small, enabling a comprehensive assessment of inconsistency from both global and local perspectives.

网络荟萃分析的应用越来越广泛，要想成功实施网络荟萃分析，就要求直接比较结果和间接比较结果保持一致。因此，在网络荟萃分析中，如何正确检测不一致性往往是一个关键问题，因为网络荟萃分析的结果能否可靠地用作临床指导。在现有的不一致性检测方法中，有两种常用的模型，即按治疗设计交互模型和侧分模型。虽然最初的侧分模型是用贝叶斯方法估计的，但在本文中，我们采用的是频数主义方法。在本文中，我们将网络荟萃分析的数据结构视为缺失数据，并对每个模型的潜在完整数据进行参数化，从而对这两类模型进行全面评述，并探讨它们之间的关系。通过分析和数值研究，我们验证了边分裂模型是按治疗设计交互模型的具体实例，包含了额外的假设或在特定的数据结构下。此外，与侧面分割模型相比，按治疗设计交互模型在不同数据结构下的不一致性检测中表现出稳健的性能。最后，作为不一致性检测的实用指南，我们建议在缺乏有关不一致性潜在位置的信息时使用逐项设计交互模型。相比之下，侧分模型可以作为一种补充方法，尤其是当每个设计中的研究数量较少时，可以从整体和局部两个角度对不一致性进行全面评估。

{"title":"A comparison of two models for detecting inconsistency in network meta-analysis","authors":"Lu Qin, Shishun Zhao, Wenlai Guo, Tiejun Tong, Ke Yang","doi":"10.1002/jrsm.1734","DOIUrl":"10.1002/jrsm.1734","url":null,"abstract":"<p>The application of network meta-analysis is becoming increasingly widespread, and for a successful implementation, it requires that the direct comparison result and the indirect comparison result should be consistent. Because of this, a proper detection of inconsistency is often a key issue in network meta-analysis as whether the results can be reliably used as a clinical guidance. Among the existing methods for detecting inconsistency, two commonly used models are the design-by-treatment interaction model and the side-splitting models. While the original side-splitting model was initially estimated using a Bayesian approach, in this context, we employ the frequentist approach. In this paper, we review these two types of models comprehensively as well as explore their relationship by treating the data structure of network meta-analysis as missing data and parameterizing the potential complete data for each model. Through both analytical and numerical studies, we verify that the side-splitting models are specific instances of the design-by-treatment interaction model, incorporating additional assumptions or under certain data structure. Moreover, the design-by-treatment interaction model exhibits robust performance across different data structures on inconsistency detection compared to the side-splitting models. Finally, as a practical guidance for inconsistency detection, we recommend utilizing the design-by-treatment interaction model when there is a lack of information about the potential location of inconsistency. By contrast, the side-splitting models can serve as a supplementary method especially when the number of studies in each design is small, enabling a comprehensive assessment of inconsistency from both global and local perspectives.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 6","pages":"851-871"},"PeriodicalIF":5.0,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141533057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reduce, reuse, recycle: Introducing MetaPipeX, a framework for analyses of multi-lab data 减少、再利用、再循环：介绍 MetaPipeX，一个多实验室数据分析框架。

IF 5 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Research Synthesis Methods

Pub Date : 2024-06-28 DOI: 10.1002/jrsm.1733

Jens H. Fünderich, Lukas J. Beinhauer, Frank Renkewitz

Multi-lab projects are large scale collaborations between participating data collection sites that gather empirical evidence and (usually) analyze that evidence using meta-analyses. They are a valuable form of scientific collaboration, produce outstanding data sets and are a great resource for third-party researchers. Their data may be reanalyzed and used in research synthesis. Their repositories and code could provide guidance to future projects of this kind. But, while multi-labs are similar in their structure and aggregate their data using meta-analyses, they deploy a variety of different solutions regarding the storage structure in the repositories, the way the (analysis) code is structured and the file-formats they provide. Continuing this trend implies that anyone who wants to work with data from multiple of these projects, or combine their datasets, is faced with an ever-increasing complexity. Some of that complexity could be avoided. Here, we introduce MetaPipeX, a standardized framework to harmonize, document and analyze multi-lab data. It features a pipeline conceptualization of the analysis and documentation process, an R-package that implements both and a Shiny App (https://www.apps.meta-rep.lmu.de/metapipex/) that allows users to explore and visualize these data sets. We introduce the framework by describing its components and applying it to a practical example. Engaging with this form of collaboration and integrating it further into research practice will certainly be beneficial to quantitative sciences and we hope the framework provides a structure and tools to reduce effort for anyone who creates, re-uses, harmonizes or learns about multi-lab replication projects.

多实验室项目是参与数据收集站点之间的大规模合作，这些站点收集经验证据，并（通常）使用元分析对证据进行分析。它们是一种有价值的科学合作形式，能产生出色的数据集，是第三方研究人员的重要资源。它们的数据可以重新分析并用于研究综述。它们的资料库和代码可以为未来的此类项目提供指导。不过，虽然多重实验室在结构上相似，并使用元分析汇总数据，但它们在资源库的存储结构、（分析）代码的结构方式以及提供的文件格式方面却采用了各种不同的解决方案。继续保持这种趋势意味着，任何人想要处理来自多个此类项目的数据或合并数据集，都会面临日益增加的复杂性。其中一些复杂性是可以避免的。在此，我们介绍 MetaPipeX，这是一个用于协调、记录和分析多个实验室数据的标准化框架。它的特点包括：分析和记录过程的管道概念化、实现这两个过程的 R 包以及允许用户探索和可视化这些数据集的 Shiny App (https://www.apps.meta-rep.lmu.de/metapipex/)。我们介绍了该框架的各个组成部分，并将其应用到一个实际例子中。参与这种形式的合作并将其进一步整合到研究实践中肯定会对定量科学有益，我们希望该框架能为创建、重用、协调或学习多实验室复制项目的任何人提供结构和工具，以减少工作量。

{"title":"Reduce, reuse, recycle: Introducing MetaPipeX, a framework for analyses of multi-lab data","authors":"Jens H. Fünderich, Lukas J. Beinhauer, Frank Renkewitz","doi":"10.1002/jrsm.1733","DOIUrl":"10.1002/jrsm.1733","url":null,"abstract":"<p>Multi-lab projects are large scale collaborations between participating data collection sites that gather empirical evidence and (usually) analyze that evidence using meta-analyses. They are a valuable form of scientific collaboration, produce outstanding data sets and are a great resource for third-party researchers. Their data may be reanalyzed and used in research synthesis. Their repositories and code could provide guidance to future projects of this kind. But, while multi-labs are similar in their structure and aggregate their data using meta-analyses, they deploy a variety of different solutions regarding the storage structure in the repositories, the way the (analysis) code is structured and the file-formats they provide. Continuing this trend implies that anyone who wants to work with data from multiple of these projects, or combine their datasets, is faced with an ever-increasing complexity. Some of that complexity could be avoided. Here, we introduce MetaPipeX, a standardized framework to harmonize, document and analyze multi-lab data. It features a pipeline conceptualization of the analysis and documentation process, an R-package that implements both and a Shiny App (https://www.apps.meta-rep.lmu.de/metapipex/) that allows users to explore and visualize these data sets. We introduce the framework by describing its components and applying it to a practical example. Engaging with this form of collaboration and integrating it further into research practice will certainly be beneficial to quantitative sciences and we hope the framework provides a structure and tools to reduce effort for anyone who creates, re-uses, harmonizes or learns about multi-lab replication projects.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 6","pages":"1183-1199"},"PeriodicalIF":5.0,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1733","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141464697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Applying Bradford Hill to assessing causality in systematic reviews: A transparent approach using process tracing 应用 Bradford Hill 评估系统性综述中的因果关系：使用过程追踪的透明方法

IF 5 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Research Synthesis Methods

Pub Date : 2024-06-22 DOI: 10.1002/jrsm.1730

Michal Shimonovich, Hilary Thomson, Anna Pearce, Srinivasa Vittal Katikireddi

Background

Bradford Hill (BH) viewpoints are widely used to assess causality in systematic reviews, but their application has often lacked reproducibility. We describe an approach for assessing causality within systematic reviews (‘causal’ reviews), illustrating its application to the topic of income inequality and health. Our approach draws on principles of process tracing, a method used for case study research, to harness BH viewpoints to judge evidence for causal claims.

Methods

In process tracing, a hypothesis may be confirmed by observing highly unique evidence and disconfirmed by observing highly definitive evidence. We drew on these principles to consider the value of finding supportive or contradictory evidence for each BH viewpoint characterised by its uniqueness and definitiveness.

Results

In our exemplar systematic review, we hypothesised that income inequality adversely affects self-rated health and all-cause mortality. BH viewpoints ‘analogy’ and ‘coherence’ were excluded from the causal assessment because of their low uniqueness and low definitiveness. The ‘experiment’ viewpoint was considered highly unique and highly definitive, and thus could be particularly valuable. We propose five steps for using BH viewpoints in a ‘causal’ review: (1) define the hypothesis; (2) characterise each viewpoint; (3) specify the evidence expected for each BH viewpoint for a true or untrue hypothesis; (4) gather evidence for each viewpoint (e.g., systematic review meta-analyses, critical appraisal, background knowledge); (5) consider if each viewpoint was met (supportive evidence) or unmet (contradictory evidence).

Conclusions

Incorporating process tracing has the potential to provide transparency and structure when using BH viewpoints in ‘causal’ reviews.

背景布拉德福德-希尔（BH）观点被广泛用于评估系统综述中的因果关系，但其应用往往缺乏可重复性。我们介绍了一种在系统性综述（"因果 "综述）中评估因果关系的方法，并说明了该方法在收入不平等与健康这一主题中的应用。我们的方法借鉴了用于案例研究的过程追踪原则，利用BH观点来判断因果关系的证据。方法在过程追踪中，一个假设可能通过观察高度独特的证据而得到证实，也可能通过观察高度确定的证据而得不到证实。我们借鉴了这些原则，考虑为每种具有独特性和确定性特征的生物保健观点找到支持性或矛盾性证据的价值。结果在我们的示范性系统综述中，我们假设收入不平等会对自评健康和全因死亡率产生不利影响。由于 "类比 "和 "一致性 "观点的独特性和明确性较低，因此被排除在因果评估之外。实验 "观点被认为具有高度独特性和高度确定性，因此特别有价值。我们提出了在 "因果 "审查中使用生物保健观点的五个步骤：(结论在 "因果 "综述中使用生物保健观点时，纳入过程追踪有可能提供透明度和结构。

{"title":"Applying Bradford Hill to assessing causality in systematic reviews: A transparent approach using process tracing","authors":"Michal Shimonovich, Hilary Thomson, Anna Pearce, Srinivasa Vittal Katikireddi","doi":"10.1002/jrsm.1730","DOIUrl":"10.1002/jrsm.1730","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Background</h3>\u0000 \u0000 <p>Bradford Hill (BH) viewpoints are widely used to assess causality in systematic reviews, but their application has often lacked reproducibility. We describe an approach for assessing causality within systematic reviews (‘causal’ reviews), illustrating its application to the topic of income inequality and health. Our approach draws on principles of process tracing, a method used for case study research, to harness BH viewpoints to judge evidence for causal claims.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>In process tracing, a hypothesis may be confirmed by observing highly unique evidence and disconfirmed by observing highly definitive evidence. We drew on these principles to consider the value of finding supportive or contradictory evidence for each BH viewpoint characterised by its uniqueness and definitiveness.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>In our exemplar systematic review, we hypothesised that income inequality adversely affects self-rated health and all-cause mortality. BH viewpoints ‘analogy’ and ‘coherence’ were excluded from the causal assessment because of their low uniqueness and low definitiveness. The ‘experiment’ viewpoint was considered highly unique and highly definitive, and thus could be particularly valuable. We propose five steps for using BH viewpoints in a ‘causal’ review: (1) define the hypothesis; (2) characterise each viewpoint; (3) specify the evidence expected for each BH viewpoint for a true or untrue hypothesis; (4) gather evidence for each viewpoint (e.g., systematic review meta-analyses, critical appraisal, background knowledge); (5) consider if each viewpoint was met (supportive evidence) or unmet (contradictory evidence).</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusions</h3>\u0000 \u0000 <p>Incorporating process tracing has the potential to provide transparency and structure when using BH viewpoints in ‘causal’ reviews.</p>\u0000 </section>\u0000 </div>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 6","pages":"826-838"},"PeriodicalIF":5.0,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1730","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141508855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Performance of two large language models for data extraction in evidence synthesis 两种大型语言模型在证据合成中提取数据的性能。

IF 5 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Research Synthesis Methods

Pub Date : 2024-06-19 DOI: 10.1002/jrsm.1732

Amanda Konet, Ian Thomas, Gerald Gartlehner, Leila Kahwati, Rainer Hilscher, Shannon Kugley, Karen Crotty, Meera Viswanathan, Robert Chew

Accurate data extraction is a key component of evidence synthesis and critical to valid results. The advent of publicly available large language models (LLMs) has generated interest in these tools for evidence synthesis and created uncertainty about the choice of LLM. We compare the performance of two widely available LLMs (Claude 2 and GPT-4) for extracting pre-specified data elements from 10 published articles included in a previously completed systematic review. We use prompts and full study PDFs to compare the outputs from the browser versions of Claude 2 and GPT-4. GPT-4 required use of a third-party plugin to upload and parse PDFs. Accuracy was high for Claude 2 (96.3%). The accuracy of GPT-4 with the plug-in was lower (68.8%); however, most of the errors were due to the plug-in. Both LLMs correctly recognized when prespecified data elements were missing from the source PDF and generated correct information for data elements that were not reported explicitly in the articles. A secondary analysis demonstrated that, when provided selected text from the PDFs, Claude 2 and GPT-4 accurately extracted 98.7% and 100% of the data elements, respectively. Limitations include the narrow scope of the study PDFs used, that prompt development was completed using only Claude 2, and that we cannot guarantee the open-source articles were not used to train the LLMs. This study highlights the potential for LLMs to revolutionize data extraction but underscores the importance of accurate PDF parsing. For now, it remains essential for a human investigator to validate LLM extractions.

准确的数据提取是证据合成的关键组成部分，也是获得有效结果的关键。可公开获取的大型语言模型（LLM）的出现引起了人们对这些证据综合工具的兴趣，同时也为选择 LLM 带来了不确定性。我们比较了两种广泛使用的 LLM（Claude 2 和 GPT-4）在从先前完成的系统综述中收录的 10 篇已发表文章中提取预先指定的数据元素时的性能。我们使用提示和完整的研究 PDF 来比较 Claude 2 和 GPT-4 浏览器版本的输出结果。GPT-4 需要使用第三方插件来上传和解析 PDF。Claude 2 的准确率很高（96.3%）。使用插件的 GPT-4 的准确率较低（68.8%）；不过，大部分错误是由插件造成的。两种 LLM 都能正确识别源 PDF 中缺少预先指定的数据元素，并为文章中未明确报告的数据元素生成正确的信息。二次分析表明，当提供 PDF 中的选定文本时，Claude 2 和 GPT-4 分别准确提取了 98.7% 和 100% 的数据元素。局限性包括：使用的研究 PDF 范围较窄；仅使用 Claude 2 完成了提示开发；我们无法保证开源文章未被用于训练 LLM。这项研究凸显了 LLM 在数据提取方面的革命性潜力，但同时也强调了精确 PDF 解析的重要性。目前，人类研究人员仍然有必要对 LLM 提取进行验证。

{"title":"Performance of two large language models for data extraction in evidence synthesis","authors":"Amanda Konet, Ian Thomas, Gerald Gartlehner, Leila Kahwati, Rainer Hilscher, Shannon Kugley, Karen Crotty, Meera Viswanathan, Robert Chew","doi":"10.1002/jrsm.1732","DOIUrl":"10.1002/jrsm.1732","url":null,"abstract":"<p>Accurate data extraction is a key component of evidence synthesis and critical to valid results. The advent of publicly available large language models (LLMs) has generated interest in these tools for evidence synthesis and created uncertainty about the choice of LLM. We compare the performance of two widely available LLMs (Claude 2 and GPT-4) for extracting pre-specified data elements from 10 published articles included in a previously completed systematic review. We use prompts and full study PDFs to compare the outputs from the browser versions of Claude 2 and GPT-4. GPT-4 required use of a third-party plugin to upload and parse PDFs. Accuracy was high for Claude 2 (96.3%). The accuracy of GPT-4 with the plug-in was lower (68.8%); however, most of the errors were due to the plug-in. Both LLMs correctly recognized when prespecified data elements were missing from the source PDF and generated correct information for data elements that were not reported explicitly in the articles. A secondary analysis demonstrated that, when provided selected text from the PDFs, Claude 2 and GPT-4 accurately extracted 98.7% and 100% of the data elements, respectively. Limitations include the narrow scope of the study PDFs used, that prompt development was completed using only Claude 2, and that we cannot guarantee the open-source articles were not used to train the LLMs. This study highlights the potential for LLMs to revolutionize data extraction but underscores the importance of accurate PDF parsing. For now, it remains essential for a human investigator to validate LLM extractions.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 5","pages":"818-824"},"PeriodicalIF":5.0,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141417088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automation tools to support undertaking scoping reviews 支持进行范围界定审查的自动化工具。

IF 5 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Research Synthesis Methods

Pub Date : 2024-06-17 DOI: 10.1002/jrsm.1731

Hanan Khalil, Danielle Pollock, Patricia McInerney, Catrin Evans, Erica B. Moraes, Christina M. Godfrey, Lyndsay Alexander, Andrea Tricco, Micah D. J. Peters, Dawid Pieper, Ashrita Saran, Daniel Ameen, Petek Eylul Taneri, Zachary Munn

Objective

This paper describes several automation tools and software that can be considered during evidence synthesis projects and provides guidance for their integration in the conduct of scoping reviews.

Study Design and Setting

The guidance presented in this work is adapted from the results of a scoping review and consultations with the JBI Scoping Review Methodology group.

Results

This paper describes several reliable, validated automation tools and software that can be used to enhance the conduct of scoping reviews. Developments in the automation of systematic reviews, and more recently scoping reviews, are continuously evolving. We detail several helpful tools in order of the key steps recommended by the JBI's methodological guidance for undertaking scoping reviews including team establishment, protocol development, searching, de-duplication, screening titles and abstracts, data extraction, data charting, and report writing. While we include several reliable tools and software that can be used for the automation of scoping reviews, there are some limitations to the tools mentioned. For example, some are available in English only and their lack of integration with other tools results in limited interoperability.

Conclusion

This paper highlighted several useful automation tools and software programs to use in undertaking each step of a scoping review. This guidance has the potential to inform collaborative efforts aiming at the development of evidence informed, integrated automation tools and software packages for enhancing the conduct of high-quality scoping reviews.

目的：本文介绍了在证据综述项目中可以考虑使用的几种自动化工具和软件，并为在范围界定综述中整合这些工具和软件提供指导：本文介绍了在证据综述项目中可以考虑使用的几种自动化工具和软件，并为在范围界定综述中整合这些工具和软件提供了指导：研究设计与背景：本文所提供的指南是根据范围界定综述的结果以及与 JBI 范围界定综述方法学小组的协商结果改编而成：本文介绍了几种可靠的、经过验证的自动化工具和软件，可用于加强范围界定综述的开展。系统性综述以及最近的范围界定综述的自动化技术在不断发展。我们按照 JBI 方法指南推荐的范围界定综述关键步骤详细介绍了几种有用的工具，包括建立团队、制定方案、检索、去重、筛选标题和摘要、数据提取、数据制图和撰写报告。虽然我们提供了几种可靠的工具和软件，可用于范围界定综述的自动化，但提到的工具也有一些局限性。例如，有些工具仅有英文版，而且无法与其他工具集成，导致互操作性有限：本文重点介绍了几种有用的自动化工具和软件程序，可用于范围界定审查的每个步骤。本指南有可能为旨在开发有实证依据的集成自动化工具和软件包的合作努力提供信息，以加强高质量范围界定综述的开展。

{"title":"Automation tools to support undertaking scoping reviews","authors":"Hanan Khalil, Danielle Pollock, Patricia McInerney, Catrin Evans, Erica B. Moraes, Christina M. Godfrey, Lyndsay Alexander, Andrea Tricco, Micah D. J. Peters, Dawid Pieper, Ashrita Saran, Daniel Ameen, Petek Eylul Taneri, Zachary Munn","doi":"10.1002/jrsm.1731","DOIUrl":"10.1002/jrsm.1731","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Objective</h3>\u0000 \u0000 <p>This paper describes several automation tools and software that can be considered during evidence synthesis projects and provides guidance for their integration in the conduct of scoping reviews.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Study Design and Setting</h3>\u0000 \u0000 <p>The guidance presented in this work is adapted from the results of a scoping review and consultations with the JBI Scoping Review Methodology group.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>This paper describes several reliable, validated automation tools and software that can be used to enhance the conduct of scoping reviews. Developments in the automation of systematic reviews, and more recently scoping reviews, are continuously evolving. We detail several helpful tools in order of the key steps recommended by the JBI's methodological guidance for undertaking scoping reviews including team establishment, protocol development, searching, de-duplication, screening titles and abstracts, data extraction, data charting, and report writing. While we include several reliable tools and software that can be used for the automation of scoping reviews, there are some limitations to the tools mentioned. For example, some are available in English only and their lack of integration with other tools results in limited interoperability.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusion</h3>\u0000 \u0000 <p>This paper highlighted several useful automation tools and software programs to use in undertaking each step of a scoping review. This guidance has the potential to inform collaborative efforts aiming at the development of evidence informed, integrated automation tools and software packages for enhancing the conduct of high-quality scoping reviews.</p>\u0000 </section>\u0000 </div>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 6","pages":"839-850"},"PeriodicalIF":5.0,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1731","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141417087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Beyond Google Scholar, Scopus, and Web of Science: An evaluation of the backward and forward citation coverage of 59 databases' citation indices 超越 Google Scholar、Scopus 和 Web of Science：对 59 个数据库引文索引的前向和后向引文覆盖范围的评估。

IF 5 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Research Synthesis Methods

Pub Date : 2024-06-14 DOI: 10.1002/jrsm.1729

Michael Gusenbauer

Citation indices providing information on backward citation (BWC) and forward citation (FWC) links are essential for literature discovery, bibliographic analysis, and knowledge synthesis, especially when language barriers impede document identification. However, the suitability of citation indices varies. While some have been analyzed, the majority, whether new or established, lack comprehensive evaluation. Therefore, this study evaluates the citation coverage of the citation indices of 59 databases, encompassing the widely used Google Scholar, Scopus, and Web of Science alongside many others never previously analyzed, such as the emerging Lens, Scite, Dimensions, and OpenAlex or the subject-specific PubMed and JSTOR. Through a comprehensive analysis using 259 journal articles from across disciplines, this research aims to guide scholars in selecting indices with broader document coverage and more accurate and comprehensive backward and forward citation links. Key findings highlight Google Scholar, ResearchGate, Semantic Scholar, and Lens as leading options for FWC searching, with Lens providing superior download capabilities. For BWC searching, the Web of Science Core Collection can be recommended over Scopus for accuracy. BWC information from publisher databases such as IEEE Xplore or ScienceDirect was generally found to be the most accurate, yet only available for a limited number of articles. The findings will help scholars conducting systematic reviews, meta-analyses, and bibliometric analyses to select the most suitable databases for citation searching.

提供后向引文（BWC）和前向引文（FWC）链接信息的引文索引对于文献发现、书目分析和知识合成至关重要，尤其是在语言障碍阻碍文献识别的情况下。然而，引文索引的适用性各不相同。虽然对一些索引进行了分析，但大多数索引，无论是新的还是已建立的，都缺乏全面的评估。因此，本研究评估了 59 个数据库的引文索引的引文覆盖范围，其中包括广泛使用的 Google Scholar、Scopus 和 Web of Science，以及许多以前从未分析过的其他数据库，如新兴的 Lens、Scite、Dimensions 和 OpenAlex 或特定主题的 PubMed 和 JSTOR。本研究通过对 259 篇跨学科期刊论文进行全面分析，旨在指导学者选择文献覆盖面更广、前后引文链接更准确、更全面的索引。主要研究结果表明，Google Scholar、ResearchGate、Semantic Scholar 和 Lens 是 FWC 搜索的主要选择，其中 Lens 的下载功能更胜一筹。在 BWC 搜索方面，推荐使用 Web of Science 核心合集，其准确性优于 Scopus。一般认为，IEEE Xplore 或 ScienceDirect 等出版商数据库中的 BWC 信息最为准确，但只能提供有限数量的文章。这些发现将有助于进行系统综述、荟萃分析和文献计量学分析的学者选择最合适的数据库进行引文检索。

{"title":"Beyond Google Scholar, Scopus, and Web of Science: An evaluation of the backward and forward citation coverage of 59 databases' citation indices","authors":"Michael Gusenbauer","doi":"10.1002/jrsm.1729","DOIUrl":"10.1002/jrsm.1729","url":null,"abstract":"<p>Citation indices providing information on backward citation (BWC) and forward citation (FWC) links are essential for literature discovery, bibliographic analysis, and knowledge synthesis, especially when language barriers impede document identification. However, the suitability of citation indices varies. While some have been analyzed, the majority, whether new or established, lack comprehensive evaluation. Therefore, this study evaluates the citation coverage of the citation indices of 59 databases, encompassing the widely used Google Scholar, Scopus, and Web of Science alongside many others never previously analyzed, such as the emerging Lens, Scite, Dimensions, and OpenAlex or the subject-specific PubMed and JSTOR. Through a comprehensive analysis using 259 journal articles from across disciplines, this research aims to guide scholars in selecting indices with broader document coverage and more accurate and comprehensive backward and forward citation links. Key findings highlight Google Scholar, ResearchGate, Semantic Scholar, and Lens as leading options for FWC searching, with Lens providing superior download capabilities. For BWC searching, the Web of Science Core Collection can be recommended over Scopus for accuracy. BWC information from publisher databases such as IEEE Xplore or ScienceDirect was generally found to be the most accurate, yet only available for a limited number of articles. The findings will help scholars conducting systematic reviews, meta-analyses, and bibliometric analyses to select the most suitable databases for citation searching.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 5","pages":"802-817"},"PeriodicalIF":5.0,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1729","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141320051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bayesian pairwise meta-analysis of time-to-event outcomes in the presence of non-proportional hazards: A simulation study of flexible parametric, piecewise exponential and fractional polynomial models 存在非比例危害的时间到事件结果的贝叶斯成对荟萃分析：灵活参数模型、分指数模型和分数多项式模型的模拟研究。

IF 5 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Research Synthesis Methods

Pub Date : 2024-05-21 DOI: 10.1002/jrsm.1722

Suzanne C Freeman Alex J Sutton Nicola J Cooper Alessandro Gasparini Michael J Crowther Neil Hawkins

Background

Traditionally, meta-analysis of time-to-event outcomes reports a single pooled hazard ratio assuming proportional hazards (PH). For health technology assessment evaluations, hazard ratios are frequently extrapolated across a lifetime horizon. However, when treatment effects vary over time, an assumption of PH is not always valid. The Royston-Parmar (RP), piecewise exponential (PE), and fractional polynomial (FP) models can accommodate non-PH and provide plausible extrapolations of survival curves beyond observed data.

Methods

Simulation study to assess and compare the performance of RP, PE, and FP models in a Bayesian framework estimating restricted mean survival time difference (RMSTD) at 50 years from a pairwise meta-analysis with evidence of non-PH. Individual patient data were generated from a mixture Weibull distribution. Twelve scenarios were considered varying the amount of follow-up data, number of trials in a meta-analysis, non-PH interaction coefficient, and prior distributions. Performance was assessed through bias and mean squared error. Models were applied to a metastatic breast cancer example.

Results

FP models performed best when the non-PH interaction coefficient was 0.2. RP models performed best in scenarios with complete follow-up data. PE models performed well on average across all scenarios. In the metastatic breast cancer example, RMSTD at 50-years ranged from −14.6 to 8.48 months.

Conclusions

Synthesis of time-to-event outcomes and estimation of RMSTD in the presence of non-PH can be challenging and computationally intensive. Different approaches make different assumptions regarding extrapolation and sensitivity analyses varying key assumptions are essential to check the robustness of conclusions to different assumptions for the underlying survival function.

背景：传统上，时间到事件结果的荟萃分析报告的是假设比例危险（PH）的单一汇总危险比。在卫生技术评估评价中，危害比经常被推断到整个生命周期。然而，当治疗效果随时间变化时，PH 假设并不总是有效的。罗伊斯顿-帕尔马模型（RP）、分项指数模型（PE）和分数多项式模型（FP）可以适应非PH值，并在观察数据之外提供可信的生存曲线外推：模拟研究：在贝叶斯框架中评估和比较 RP、PE 和 FP 模型的性能，从有证据表明非 PH 的配对荟萃分析中估计 50 岁时的受限平均生存时间差 (RMSTD)。单个患者数据由混合 Weibull 分布生成。考虑了 12 种不同的情况，包括随访数据量、荟萃分析中的试验数量、非 PH 交互系数和先验分布。通过偏差和均方误差评估其性能。模型被应用于转移性乳腺癌的实例中：当非 PH 交互系数为 0.2 时，FP 模型表现最佳。RP 模型在具有完整随访数据的情况下表现最佳。PE 模型在所有情况下平均表现良好。以转移性乳腺癌为例，50 年的 RMSTD 为 -14.6 到 8.48 个月不等：时间到事件结果的综合以及在非 PH 情况下 RMSTD 的估算具有挑战性，而且计算量很大。不同的方法会对外推法做出不同的假设，因此必须对不同的关键假设进行敏感性分析，以检查结论在基础生存函数的不同假设下的稳健性。

{"title":"Bayesian pairwise meta-analysis of time-to-event outcomes in the presence of non-proportional hazards: A simulation study of flexible parametric, piecewise exponential and fractional polynomial models","authors":"Suzanne C. Freeman, Alex J. Sutton, Nicola J. Cooper, Alessandro Gasparini, Michael J. Crowther, Neil Hawkins","doi":"10.1002/jrsm.1722","DOIUrl":"10.1002/jrsm.1722","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Background</h3>\u0000 \u0000 <p>Traditionally, meta-analysis of time-to-event outcomes reports a single pooled hazard ratio assuming proportional hazards (PH). For health technology assessment evaluations, hazard ratios are frequently extrapolated across a lifetime horizon. However, when treatment effects vary over time, an assumption of PH is not always valid. The Royston-Parmar (RP), piecewise exponential (PE), and fractional polynomial (FP) models can accommodate non-PH and provide plausible extrapolations of survival curves beyond observed data.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>Simulation study to assess and compare the performance of RP, PE, and FP models in a Bayesian framework estimating restricted mean survival time difference (RMSTD) at 50 years from a pairwise meta-analysis with evidence of non-PH. Individual patient data were generated from a mixture Weibull distribution. Twelve scenarios were considered varying the amount of follow-up data, number of trials in a meta-analysis, non-PH interaction coefficient, and prior distributions. Performance was assessed through bias and mean squared error. Models were applied to a metastatic breast cancer example.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>FP models performed best when the non-PH interaction coefficient was 0.2. RP models performed best in scenarios with complete follow-up data. PE models performed well on average across all scenarios. In the metastatic breast cancer example, RMSTD at 50-years ranged from −14.6 to 8.48 months.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusions</h3>\u0000 \u0000 <p>Synthesis of time-to-event outcomes and estimation of RMSTD in the presence of non-PH can be challenging and computationally intensive. Different approaches make different assumptions regarding extrapolation and sensitivity analyses varying key assumptions are essential to check the robustness of conclusions to different assumptions for the underlying survival function.</p>\u0000 </section>\u0000 </div>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 5","pages":"780-801"},"PeriodicalIF":5.0,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1722","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141074418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The impact of continuity correction methods in Cochrane reviews with single-zero trials with rare events: A meta-epidemiological study 具有罕见事件的单项零试验的 Cochrane 综述中连续性校正方法的影响：一项元流行病学研究。

IF 5 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Research Synthesis Methods

Pub Date : 2024-05-15 DOI: 10.1002/jrsm.1720

Orestis Efthimiou Akihiro Shiroshita Yuki Kataoka Toshi A Furukawa Yusuke Tsutsumi Yasushi Tsujimoto

Meta-analyses examining dichotomous outcomes often include single-zero studies, where no events occur in intervention or control groups. These pose challenges, and several methods have been proposed to address them. A fixed continuity correction method has been shown to bias estimates, but it is frequently used because sometimes software (e.g., RevMan software in Cochrane reviews) uses it as a default. We aimed to empirically compare results using the continuity correction with those using alternative models that do not require correction. To this aim, we reanalyzed the original data from 885 meta-analyses in Cochrane reviews using the following methods: (i) Mantel–Haenszel model with a fixed continuity correction, (ii) random effects inverse variance model with a fixed continuity correction, (iii) Peto method (the three models available in RevMan), (iv) random effects inverse variance model with the treatment arm continuity correction, (v) Mantel–Haenszel model without correction, (vi) logistic regression, and (vii) a Bayesian random effects model with binominal likelihood. For each meta-analysis we calculated ratios of odds ratios between all methods, to assess how the choice of method may impact results. Ratios of odds ratios <0.8 or <1.25 were seen in ~30% of the existing meta-analyses when comparing results between Mantel–Haenszel model with a fixed continuity correction and either Mantel–Haenszel model without correction or logistic regression. We concluded that injudicious use of the fixed continuity correction in existing Cochrane reviews may have substantially influenced effect estimates in some cases. Future updates of RevMan should incorporate less biased statistical methods.

研究二分法结果的元分析通常包括单零研究，即干预组或对照组均未发生任何事件。这就带来了挑战，并提出了几种方法来解决这些问题。固定连续性校正方法已被证明会使估计值出现偏差，但由于有时软件（如 Cochrane 综述中的 RevMan 软件）将其作为默认设置，因此该方法经常被使用。我们的目的是将使用连续性校正的结果与使用不需要校正的替代模型的结果进行实证比较。为此，我们采用以下方法重新分析了 Cochrane 综述中 885 项元分析的原始数据：(i) 带有固定连续性校正的 Mantel-Haenszel 模型，(ii) 带有固定连续性校正的随机效应逆方差模型，(iii) Peto 方法（RevMan 中提供的三种模型），(iv) 带有治疗臂连续性校正的随机效应逆方差模型，(v) 不带校正的 Mantel-Haenszel 模型，(vi) 逻辑回归，以及 (vii) 带有二项式可能性的贝叶斯随机效应模型。对于每项荟萃分析，我们都计算了所有方法之间的几率比，以评估方法的选择可能对结果产生的影响。几率比

{"title":"The impact of continuity correction methods in Cochrane reviews with single-zero trials with rare events: A meta-epidemiological study","authors":"Yasushi Tsujimoto, Yusuke Tsutsumi, Yuki Kataoka, Akihiro Shiroshita, Orestis Efthimiou, Toshi A. Furukawa","doi":"10.1002/jrsm.1720","DOIUrl":"10.1002/jrsm.1720","url":null,"abstract":"<p>Meta-analyses examining dichotomous outcomes often include single-zero studies, where no events occur in intervention or control groups. These pose challenges, and several methods have been proposed to address them. A fixed continuity correction method has been shown to bias estimates, but it is frequently used because sometimes software (e.g., RevMan software in Cochrane reviews) uses it as a default. We aimed to empirically compare results using the continuity correction with those using alternative models that do not require correction. To this aim, we reanalyzed the original data from 885 meta-analyses in Cochrane reviews using the following methods: (i) Mantel–Haenszel model with a fixed continuity correction, (ii) random effects inverse variance model with a fixed continuity correction, (iii) Peto method (the three models available in RevMan), (iv) random effects inverse variance model with the treatment arm continuity correction, (v) Mantel–Haenszel model without correction, (vi) logistic regression, and (vii) a Bayesian random effects model with binominal likelihood. For each meta-analysis we calculated ratios of odds ratios between all methods, to assess how the choice of method may impact results. Ratios of odds ratios <0.8 or <1.25 were seen in ~30% of the existing meta-analyses when comparing results between Mantel–Haenszel model with a fixed continuity correction and either Mantel–Haenszel model without correction or logistic regression. We concluded that injudicious use of the fixed continuity correction in existing Cochrane reviews may have substantially influenced effect estimates in some cases. Future updates of RevMan should incorporate less biased statistical methods.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 5","pages":"769-779"},"PeriodicalIF":5.0,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1720","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140943578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Combining endpoint and change data did not affect the summary standardised mean difference in pairwise and network meta-analyses: An empirical study in depression 合并终点数据和变化数据不会影响成对分析和网络荟萃分析中的汇总标准化平均差：抑郁症实证研究

IF 5 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Research Synthesis Methods

Pub Date : 2024-05-09 DOI: 10.1002/jrsm.1719

Edoardo G Ostinelli Orestis Efthimiou Yan Luo Clara Miguel Eirini Karyotaki Pim Cuijpers Toshi A Furukawa Andrea Cipriani Georgia Salanti

When studies use different scales to measure continuous outcomes, standardised mean differences (SMD) are required to meta-analyse the data. However, outcomes are often reported as endpoint or change from baseline scores. Combining corresponding SMDs can be problematic and available guidance advises against this practice. We aimed to examine the impact of combining the two types of SMD in meta-analyses of depression severity. We used individual participant data on pharmacological interventions (89 studies, 27,409 participants) and internet-delivered cognitive behavioural therapy (iCBT; 61 studies, 13,687 participants) for depression to compare endpoint and change from baseline SMDs at the study level. Next, we performed pairwise (PWMA) and network meta-analyses (NMA) using endpoint SMDs, change from baseline SMDs, or a mixture of the two. Study-specific SMDs calculated from endpoint and change from baseline data were largely similar, although for iCBT interventions 25% of the studies at 3 months were associated with important differences between study-specific SMDs (median 0.01, IQR −0.10, 0.13) especially in smaller trials with baseline imbalances. However, when pooled, the differences between endpoint and change SMDs were negligible. Pooling only the more favourable of the two SMDs did not materially affect meta-analyses, resulting in differences of pooled SMDs up to 0.05 and 0.13 in the pharmacological and iCBT datasets, respectively. Our findings have implications for meta-analyses in depression, where we showed that the choice between endpoint and change scores for estimating SMDs had immaterial impact on summary meta-analytic estimates. Future studies should replicate and extend our analyses to fields other than depression.

当研究使用不同的量表来测量连续性结果时，需要使用标准化平均差（SMD）来对数据进行元分析。然而，研究结果通常以终点或与基线分数相比的变化进行报告。合并相应的 SMD 可能会有问题，现有指南建议不要这样做。我们的目的是研究在抑郁症严重程度的荟萃分析中合并两种类型的 SMD 的影响。我们使用了有关抑郁症药物干预（89 项研究，27409 名参与者）和互联网认知行为疗法（iCBT；61 项研究，13687 名参与者）的个人参与者数据，以比较研究水平上的终点 SMD 和基线变化 SMD。接下来，我们使用终点SMD、基线SMD变化或两者的混合进行了配对分析（PWMA）和网络荟萃分析（NMA）。根据终点数据和基线变化数据计算出的研究特异性 SMD 基本相似，但对于 iCBT 干预，25% 的研究在 3 个月时的研究特异性 SMD 之间存在重大差异（中位数为 0.01，IQR 为 -0.10，0.13），尤其是在基线不平衡的较小试验中。然而，在汇总时，终点和变化SMD之间的差异可以忽略不计。只汇集两个SMD中较有利的一个并不会对荟萃分析产生实质性影响，这导致药物治疗和iCBT数据集的汇集SMD差异分别达到0.05和0.13。我们的研究结果对抑郁症的荟萃分析具有启发意义，我们的研究表明，在估算SMD时选择终点和变化分数对荟萃分析的汇总估算结果影响不大。未来的研究应该复制我们的分析结果，并将其扩展到抑郁症以外的领域。

{"title":"Combining endpoint and change data did not affect the summary standardised mean difference in pairwise and network meta-analyses: An empirical study in depression","authors":"Edoardo G. Ostinelli, Orestis Efthimiou, Yan Luo, Clara Miguel, Eirini Karyotaki, Pim Cuijpers, Toshi A. Furukawa, Georgia Salanti, Andrea Cipriani","doi":"10.1002/jrsm.1719","DOIUrl":"10.1002/jrsm.1719","url":null,"abstract":"<p>When studies use different scales to measure continuous outcomes, standardised mean differences (SMD) are required to meta-analyse the data. However, outcomes are often reported as endpoint or change from baseline scores. Combining corresponding SMDs can be problematic and available guidance advises against this practice. We aimed to examine the impact of combining the two types of SMD in meta-analyses of depression severity. We used individual participant data on pharmacological interventions (89 studies, 27,409 participants) and internet-delivered cognitive behavioural therapy (iCBT; 61 studies, 13,687 participants) for depression to compare endpoint and change from baseline SMDs at the study level. Next, we performed pairwise (PWMA) and network meta-analyses (NMA) using endpoint SMDs, change from baseline SMDs, or a mixture of the two. Study-specific SMDs calculated from endpoint and change from baseline data were largely similar, although for iCBT interventions 25% of the studies at 3 months were associated with important differences between study-specific SMDs (median 0.01, IQR −0.10, 0.13) especially in smaller trials with baseline imbalances. However, when pooled, the differences between endpoint and change SMDs were negligible. Pooling only the more favourable of the two SMDs did not materially affect meta-analyses, resulting in differences of pooled SMDs up to 0.05 and 0.13 in the pharmacological and iCBT datasets, respectively. Our findings have implications for meta-analyses in depression, where we showed that the choice between endpoint and change scores for estimating SMDs had immaterial impact on summary meta-analytic estimates. Future studies should replicate and extend our analyses to fields other than depression.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 5","pages":"758-768"},"PeriodicalIF":5.0,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1719","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140896375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0