Rebecca Whittle, Joie Ensor, Miriam Hattle, Paula Dhiman, Gary S. Collins, Richard D. Riley
Collecting data for an individual participant data meta-analysis (IPDMA) project can be time consuming and resource intensive and could still have insufficient power to answer the question of interest. Therefore, researchers should consider the power of their planned IPDMA before collecting IPD. Here we propose a method to estimate the power of a planned IPDMA project aiming to synthesise multiple cohort studies to investigate the (unadjusted or adjusted) effects of potential prognostic factors for a binary outcome. We consider both binary and continuous factors and provide a three-step approach to estimating the power in advance of collecting IPD, under an assumption of the true prognostic effect of each factor of interest. The first step uses routinely available (published) aggregate data for each study to approximate Fisher's information matrix and thereby estimate the anticipated variance of the unadjusted prognostic factor effect in each study. These variances are then used in step 2 to estimate the anticipated variance of the summary prognostic effect from the IPDMA. Finally, step 3 uses this variance to estimate the corresponding IPDMA power, based on a two-sided Wald test and the assumed true effect. Extensions are provided to adjust the power calculation for the presence of additional covariates correlated with the prognostic factor of interest (by using a variance inflation factor) and to allow for between-study heterogeneity in prognostic effects. An example is provided for illustration, and Stata code is supplied to enable researchers to implement the method.
{"title":"Calculating the power of a planned individual participant data meta-analysis to examine prognostic factor effects for a binary outcome","authors":"Rebecca Whittle, Joie Ensor, Miriam Hattle, Paula Dhiman, Gary S. Collins, Richard D. Riley","doi":"10.1002/jrsm.1737","DOIUrl":"10.1002/jrsm.1737","url":null,"abstract":"<p>Collecting data for an individual participant data meta-analysis (IPDMA) project can be time consuming and resource intensive and could still have insufficient power to answer the question of interest. Therefore, researchers should consider the power of their planned IPDMA before collecting IPD. Here we propose a method to estimate the power of a planned IPDMA project aiming to synthesise multiple cohort studies to investigate the (unadjusted or adjusted) effects of potential prognostic factors for a binary outcome. We consider both binary and continuous factors and provide a three-step approach to estimating the power in advance of collecting IPD, under an assumption of the true prognostic effect of each factor of interest. The first step uses routinely available (published) aggregate data for each study to approximate Fisher's information matrix and thereby estimate the anticipated variance of the unadjusted prognostic factor effect in each study. These variances are then used in step 2 to estimate the anticipated variance of the summary prognostic effect from the IPDMA. Finally, step 3 uses this variance to estimate the corresponding IPDMA power, based on a two-sided Wald test and the assumed true effect. Extensions are provided to adjust the power calculation for the presence of additional covariates correlated with the prognostic factor of interest (by using a variance inflation factor) and to allow for between-study heterogeneity in prognostic effects. An example is provided for illustration, and Stata code is supplied to enable researchers to implement the method.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 6","pages":"905-916"},"PeriodicalIF":5.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1737","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141750692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lu Qin, Shishun Zhao, Wenlai Guo, Tiejun Tong, Ke Yang
The application of network meta-analysis is becoming increasingly widespread, and for a successful implementation, it requires that the direct comparison result and the indirect comparison result should be consistent. Because of this, a proper detection of inconsistency is often a key issue in network meta-analysis as whether the results can be reliably used as a clinical guidance. Among the existing methods for detecting inconsistency, two commonly used models are the design-by-treatment interaction model and the side-splitting models. While the original side-splitting model was initially estimated using a Bayesian approach, in this context, we employ the frequentist approach. In this paper, we review these two types of models comprehensively as well as explore their relationship by treating the data structure of network meta-analysis as missing data and parameterizing the potential complete data for each model. Through both analytical and numerical studies, we verify that the side-splitting models are specific instances of the design-by-treatment interaction model, incorporating additional assumptions or under certain data structure. Moreover, the design-by-treatment interaction model exhibits robust performance across different data structures on inconsistency detection compared to the side-splitting models. Finally, as a practical guidance for inconsistency detection, we recommend utilizing the design-by-treatment interaction model when there is a lack of information about the potential location of inconsistency. By contrast, the side-splitting models can serve as a supplementary method especially when the number of studies in each design is small, enabling a comprehensive assessment of inconsistency from both global and local perspectives.
{"title":"A comparison of two models for detecting inconsistency in network meta-analysis","authors":"Lu Qin, Shishun Zhao, Wenlai Guo, Tiejun Tong, Ke Yang","doi":"10.1002/jrsm.1734","DOIUrl":"10.1002/jrsm.1734","url":null,"abstract":"<p>The application of network meta-analysis is becoming increasingly widespread, and for a successful implementation, it requires that the direct comparison result and the indirect comparison result should be consistent. Because of this, a proper detection of inconsistency is often a key issue in network meta-analysis as whether the results can be reliably used as a clinical guidance. Among the existing methods for detecting inconsistency, two commonly used models are the design-by-treatment interaction model and the side-splitting models. While the original side-splitting model was initially estimated using a Bayesian approach, in this context, we employ the frequentist approach. In this paper, we review these two types of models comprehensively as well as explore their relationship by treating the data structure of network meta-analysis as missing data and parameterizing the potential complete data for each model. Through both analytical and numerical studies, we verify that the side-splitting models are specific instances of the design-by-treatment interaction model, incorporating additional assumptions or under certain data structure. Moreover, the design-by-treatment interaction model exhibits robust performance across different data structures on inconsistency detection compared to the side-splitting models. Finally, as a practical guidance for inconsistency detection, we recommend utilizing the design-by-treatment interaction model when there is a lack of information about the potential location of inconsistency. By contrast, the side-splitting models can serve as a supplementary method especially when the number of studies in each design is small, enabling a comprehensive assessment of inconsistency from both global and local perspectives.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 6","pages":"851-871"},"PeriodicalIF":5.0,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141533057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jens H. Fünderich, Lukas J. Beinhauer, Frank Renkewitz
Multi-lab projects are large scale collaborations between participating data collection sites that gather empirical evidence and (usually) analyze that evidence using meta-analyses. They are a valuable form of scientific collaboration, produce outstanding data sets and are a great resource for third-party researchers. Their data may be reanalyzed and used in research synthesis. Their repositories and code could provide guidance to future projects of this kind. But, while multi-labs are similar in their structure and aggregate their data using meta-analyses, they deploy a variety of different solutions regarding the storage structure in the repositories, the way the (analysis) code is structured and the file-formats they provide. Continuing this trend implies that anyone who wants to work with data from multiple of these projects, or combine their datasets, is faced with an ever-increasing complexity. Some of that complexity could be avoided. Here, we introduce MetaPipeX, a standardized framework to harmonize, document and analyze multi-lab data. It features a pipeline conceptualization of the analysis and documentation process, an R-package that implements both and a Shiny App (https://www.apps.meta-rep.lmu.de/metapipex/) that allows users to explore and visualize these data sets. We introduce the framework by describing its components and applying it to a practical example. Engaging with this form of collaboration and integrating it further into research practice will certainly be beneficial to quantitative sciences and we hope the framework provides a structure and tools to reduce effort for anyone who creates, re-uses, harmonizes or learns about multi-lab replication projects.
多实验室项目是参与数据收集站点之间的大规模合作,这些站点收集经验证据,并(通常)使用元分析对证据进行分析。它们是一种有价值的科学合作形式,能产生出色的数据集,是第三方研究人员的重要资源。它们的数据可以重新分析并用于研究综述。它们的资料库和代码可以为未来的此类项目提供指导。不过,虽然多重实验室在结构上相似,并使用元分析汇总数据,但它们在资源库的存储结构、(分析)代码的结构方式以及提供的文件格式方面却采用了各种不同的解决方案。继续保持这种趋势意味着,任何人想要处理来自多个此类项目的数据或合并数据集,都会面临日益增加的复杂性。其中一些复杂性是可以避免的。在此,我们介绍 MetaPipeX,这是一个用于协调、记录和分析多个实验室数据的标准化框架。它的特点包括:分析和记录过程的管道概念化、实现这两个过程的 R 包以及允许用户探索和可视化这些数据集的 Shiny App (https://www.apps.meta-rep.lmu.de/metapipex/)。我们介绍了该框架的各个组成部分,并将其应用到一个实际例子中。参与这种形式的合作并将其进一步整合到研究实践中肯定会对定量科学有益,我们希望该框架能为创建、重用、协调或学习多实验室复制项目的任何人提供结构和工具,以减少工作量。
{"title":"Reduce, reuse, recycle: Introducing MetaPipeX, a framework for analyses of multi-lab data","authors":"Jens H. Fünderich, Lukas J. Beinhauer, Frank Renkewitz","doi":"10.1002/jrsm.1733","DOIUrl":"10.1002/jrsm.1733","url":null,"abstract":"<p>Multi-lab projects are large scale collaborations between participating data collection sites that gather empirical evidence and (usually) analyze that evidence using meta-analyses. They are a valuable form of scientific collaboration, produce outstanding data sets and are a great resource for third-party researchers. Their data may be reanalyzed and used in research synthesis. Their repositories and code could provide guidance to future projects of this kind. But, while multi-labs are similar in their structure and aggregate their data using meta-analyses, they deploy a variety of different solutions regarding the storage structure in the repositories, the way the (analysis) code is structured and the file-formats they provide. Continuing this trend implies that anyone who wants to work with data from multiple of these projects, or combine their datasets, is faced with an ever-increasing complexity. Some of that complexity could be avoided. Here, we introduce MetaPipeX, a standardized framework to harmonize, document and analyze multi-lab data. It features a pipeline conceptualization of the analysis and documentation process, an R-package that implements both and a Shiny App (https://www.apps.meta-rep.lmu.de/metapipex/) that allows users to explore and visualize these data sets. We introduce the framework by describing its components and applying it to a practical example. Engaging with this form of collaboration and integrating it further into research practice will certainly be beneficial to quantitative sciences and we hope the framework provides a structure and tools to reduce effort for anyone who creates, re-uses, harmonizes or learns about multi-lab replication projects.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 6","pages":"1183-1199"},"PeriodicalIF":5.0,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1733","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141464697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michal Shimonovich, Hilary Thomson, Anna Pearce, Srinivasa Vittal Katikireddi
Background
Bradford Hill (BH) viewpoints are widely used to assess causality in systematic reviews, but their application has often lacked reproducibility. We describe an approach for assessing causality within systematic reviews (‘causal’ reviews), illustrating its application to the topic of income inequality and health. Our approach draws on principles of process tracing, a method used for case study research, to harness BH viewpoints to judge evidence for causal claims.
Methods
In process tracing, a hypothesis may be confirmed by observing highly unique evidence and disconfirmed by observing highly definitive evidence. We drew on these principles to consider the value of finding supportive or contradictory evidence for each BH viewpoint characterised by its uniqueness and definitiveness.
Results
In our exemplar systematic review, we hypothesised that income inequality adversely affects self-rated health and all-cause mortality. BH viewpoints ‘analogy’ and ‘coherence’ were excluded from the causal assessment because of their low uniqueness and low definitiveness. The ‘experiment’ viewpoint was considered highly unique and highly definitive, and thus could be particularly valuable. We propose five steps for using BH viewpoints in a ‘causal’ review: (1) define the hypothesis; (2) characterise each viewpoint; (3) specify the evidence expected for each BH viewpoint for a true or untrue hypothesis; (4) gather evidence for each viewpoint (e.g., systematic review meta-analyses, critical appraisal, background knowledge); (5) consider if each viewpoint was met (supportive evidence) or unmet (contradictory evidence).
Conclusions
Incorporating process tracing has the potential to provide transparency and structure when using BH viewpoints in ‘causal’ reviews.
{"title":"Applying Bradford Hill to assessing causality in systematic reviews: A transparent approach using process tracing","authors":"Michal Shimonovich, Hilary Thomson, Anna Pearce, Srinivasa Vittal Katikireddi","doi":"10.1002/jrsm.1730","DOIUrl":"10.1002/jrsm.1730","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Background</h3>\u0000 \u0000 <p>Bradford Hill (BH) viewpoints are widely used to assess causality in systematic reviews, but their application has often lacked reproducibility. We describe an approach for assessing causality within systematic reviews (‘causal’ reviews), illustrating its application to the topic of income inequality and health. Our approach draws on principles of process tracing, a method used for case study research, to harness BH viewpoints to judge evidence for causal claims.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>In process tracing, a hypothesis may be confirmed by observing highly unique evidence and disconfirmed by observing highly definitive evidence. We drew on these principles to consider the value of finding supportive or contradictory evidence for each BH viewpoint characterised by its uniqueness and definitiveness.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>In our exemplar systematic review, we hypothesised that income inequality adversely affects self-rated health and all-cause mortality. BH viewpoints ‘analogy’ and ‘coherence’ were excluded from the causal assessment because of their low uniqueness and low definitiveness. The ‘experiment’ viewpoint was considered highly unique and highly definitive, and thus could be particularly valuable. We propose five steps for using BH viewpoints in a ‘causal’ review: (1) define the hypothesis; (2) characterise each viewpoint; (3) specify the evidence expected for each BH viewpoint for a true or untrue hypothesis; (4) gather evidence for each viewpoint (e.g., systematic review meta-analyses, critical appraisal, background knowledge); (5) consider if each viewpoint was met (supportive evidence) or unmet (contradictory evidence).</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusions</h3>\u0000 \u0000 <p>Incorporating process tracing has the potential to provide transparency and structure when using BH viewpoints in ‘causal’ reviews.</p>\u0000 </section>\u0000 </div>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 6","pages":"826-838"},"PeriodicalIF":5.0,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1730","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141508855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amanda Konet, Ian Thomas, Gerald Gartlehner, Leila Kahwati, Rainer Hilscher, Shannon Kugley, Karen Crotty, Meera Viswanathan, Robert Chew
Accurate data extraction is a key component of evidence synthesis and critical to valid results. The advent of publicly available large language models (LLMs) has generated interest in these tools for evidence synthesis and created uncertainty about the choice of LLM. We compare the performance of two widely available LLMs (Claude 2 and GPT-4) for extracting pre-specified data elements from 10 published articles included in a previously completed systematic review. We use prompts and full study PDFs to compare the outputs from the browser versions of Claude 2 and GPT-4. GPT-4 required use of a third-party plugin to upload and parse PDFs. Accuracy was high for Claude 2 (96.3%). The accuracy of GPT-4 with the plug-in was lower (68.8%); however, most of the errors were due to the plug-in. Both LLMs correctly recognized when prespecified data elements were missing from the source PDF and generated correct information for data elements that were not reported explicitly in the articles. A secondary analysis demonstrated that, when provided selected text from the PDFs, Claude 2 and GPT-4 accurately extracted 98.7% and 100% of the data elements, respectively. Limitations include the narrow scope of the study PDFs used, that prompt development was completed using only Claude 2, and that we cannot guarantee the open-source articles were not used to train the LLMs. This study highlights the potential for LLMs to revolutionize data extraction but underscores the importance of accurate PDF parsing. For now, it remains essential for a human investigator to validate LLM extractions.
准确的数据提取是证据合成的关键组成部分,也是获得有效结果的关键。可公开获取的大型语言模型(LLM)的出现引起了人们对这些证据综合工具的兴趣,同时也为选择 LLM 带来了不确定性。我们比较了两种广泛使用的 LLM(Claude 2 和 GPT-4)在从先前完成的系统综述中收录的 10 篇已发表文章中提取预先指定的数据元素时的性能。我们使用提示和完整的研究 PDF 来比较 Claude 2 和 GPT-4 浏览器版本的输出结果。GPT-4 需要使用第三方插件来上传和解析 PDF。Claude 2 的准确率很高(96.3%)。使用插件的 GPT-4 的准确率较低(68.8%);不过,大部分错误是由插件造成的。两种 LLM 都能正确识别源 PDF 中缺少预先指定的数据元素,并为文章中未明确报告的数据元素生成正确的信息。二次分析表明,当提供 PDF 中的选定文本时,Claude 2 和 GPT-4 分别准确提取了 98.7% 和 100% 的数据元素。局限性包括:使用的研究 PDF 范围较窄;仅使用 Claude 2 完成了提示开发;我们无法保证开源文章未被用于训练 LLM。这项研究凸显了 LLM 在数据提取方面的革命性潜力,但同时也强调了精确 PDF 解析的重要性。目前,人类研究人员仍然有必要对 LLM 提取进行验证。
{"title":"Performance of two large language models for data extraction in evidence synthesis","authors":"Amanda Konet, Ian Thomas, Gerald Gartlehner, Leila Kahwati, Rainer Hilscher, Shannon Kugley, Karen Crotty, Meera Viswanathan, Robert Chew","doi":"10.1002/jrsm.1732","DOIUrl":"10.1002/jrsm.1732","url":null,"abstract":"<p>Accurate data extraction is a key component of evidence synthesis and critical to valid results. The advent of publicly available large language models (LLMs) has generated interest in these tools for evidence synthesis and created uncertainty about the choice of LLM. We compare the performance of two widely available LLMs (Claude 2 and GPT-4) for extracting pre-specified data elements from 10 published articles included in a previously completed systematic review. We use prompts and full study PDFs to compare the outputs from the browser versions of Claude 2 and GPT-4. GPT-4 required use of a third-party plugin to upload and parse PDFs. Accuracy was high for Claude 2 (96.3%). The accuracy of GPT-4 with the plug-in was lower (68.8%); however, most of the errors were due to the plug-in. Both LLMs correctly recognized when prespecified data elements were missing from the source PDF and generated correct information for data elements that were not reported explicitly in the articles. A secondary analysis demonstrated that, when provided selected text from the PDFs, Claude 2 and GPT-4 accurately extracted 98.7% and 100% of the data elements, respectively. Limitations include the narrow scope of the study PDFs used, that prompt development was completed using only Claude 2, and that we cannot guarantee the open-source articles were not used to train the LLMs. This study highlights the potential for LLMs to revolutionize data extraction but underscores the importance of accurate PDF parsing. For now, it remains essential for a human investigator to validate LLM extractions.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 5","pages":"818-824"},"PeriodicalIF":5.0,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141417088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hanan Khalil, Danielle Pollock, Patricia McInerney, Catrin Evans, Erica B. Moraes, Christina M. Godfrey, Lyndsay Alexander, Andrea Tricco, Micah D. J. Peters, Dawid Pieper, Ashrita Saran, Daniel Ameen, Petek Eylul Taneri, Zachary Munn
Objective
This paper describes several automation tools and software that can be considered during evidence synthesis projects and provides guidance for their integration in the conduct of scoping reviews.
Study Design and Setting
The guidance presented in this work is adapted from the results of a scoping review and consultations with the JBI Scoping Review Methodology group.
Results
This paper describes several reliable, validated automation tools and software that can be used to enhance the conduct of scoping reviews. Developments in the automation of systematic reviews, and more recently scoping reviews, are continuously evolving. We detail several helpful tools in order of the key steps recommended by the JBI's methodological guidance for undertaking scoping reviews including team establishment, protocol development, searching, de-duplication, screening titles and abstracts, data extraction, data charting, and report writing. While we include several reliable tools and software that can be used for the automation of scoping reviews, there are some limitations to the tools mentioned. For example, some are available in English only and their lack of integration with other tools results in limited interoperability.
Conclusion
This paper highlighted several useful automation tools and software programs to use in undertaking each step of a scoping review. This guidance has the potential to inform collaborative efforts aiming at the development of evidence informed, integrated automation tools and software packages for enhancing the conduct of high-quality scoping reviews.
{"title":"Automation tools to support undertaking scoping reviews","authors":"Hanan Khalil, Danielle Pollock, Patricia McInerney, Catrin Evans, Erica B. Moraes, Christina M. Godfrey, Lyndsay Alexander, Andrea Tricco, Micah D. J. Peters, Dawid Pieper, Ashrita Saran, Daniel Ameen, Petek Eylul Taneri, Zachary Munn","doi":"10.1002/jrsm.1731","DOIUrl":"10.1002/jrsm.1731","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Objective</h3>\u0000 \u0000 <p>This paper describes several automation tools and software that can be considered during evidence synthesis projects and provides guidance for their integration in the conduct of scoping reviews.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Study Design and Setting</h3>\u0000 \u0000 <p>The guidance presented in this work is adapted from the results of a scoping review and consultations with the JBI Scoping Review Methodology group.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>This paper describes several reliable, validated automation tools and software that can be used to enhance the conduct of scoping reviews. Developments in the automation of systematic reviews, and more recently scoping reviews, are continuously evolving. We detail several helpful tools in order of the key steps recommended by the JBI's methodological guidance for undertaking scoping reviews including team establishment, protocol development, searching, de-duplication, screening titles and abstracts, data extraction, data charting, and report writing. While we include several reliable tools and software that can be used for the automation of scoping reviews, there are some limitations to the tools mentioned. For example, some are available in English only and their lack of integration with other tools results in limited interoperability.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusion</h3>\u0000 \u0000 <p>This paper highlighted several useful automation tools and software programs to use in undertaking each step of a scoping review. This guidance has the potential to inform collaborative efforts aiming at the development of evidence informed, integrated automation tools and software packages for enhancing the conduct of high-quality scoping reviews.</p>\u0000 </section>\u0000 </div>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 6","pages":"839-850"},"PeriodicalIF":5.0,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1731","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141417087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Citation indices providing information on backward citation (BWC) and forward citation (FWC) links are essential for literature discovery, bibliographic analysis, and knowledge synthesis, especially when language barriers impede document identification. However, the suitability of citation indices varies. While some have been analyzed, the majority, whether new or established, lack comprehensive evaluation. Therefore, this study evaluates the citation coverage of the citation indices of 59 databases, encompassing the widely used Google Scholar, Scopus, and Web of Science alongside many others never previously analyzed, such as the emerging Lens, Scite, Dimensions, and OpenAlex or the subject-specific PubMed and JSTOR. Through a comprehensive analysis using 259 journal articles from across disciplines, this research aims to guide scholars in selecting indices with broader document coverage and more accurate and comprehensive backward and forward citation links. Key findings highlight Google Scholar, ResearchGate, Semantic Scholar, and Lens as leading options for FWC searching, with Lens providing superior download capabilities. For BWC searching, the Web of Science Core Collection can be recommended over Scopus for accuracy. BWC information from publisher databases such as IEEE Xplore or ScienceDirect was generally found to be the most accurate, yet only available for a limited number of articles. The findings will help scholars conducting systematic reviews, meta-analyses, and bibliometric analyses to select the most suitable databases for citation searching.
提供后向引文(BWC)和前向引文(FWC)链接信息的引文索引对于文献发现、书目分析和知识合成至关重要,尤其是在语言障碍阻碍文献识别的情况下。然而,引文索引的适用性各不相同。虽然对一些索引进行了分析,但大多数索引,无论是新的还是已建立的,都缺乏全面的评估。因此,本研究评估了 59 个数据库的引文索引的引文覆盖范围,其中包括广泛使用的 Google Scholar、Scopus 和 Web of Science,以及许多以前从未分析过的其他数据库,如新兴的 Lens、Scite、Dimensions 和 OpenAlex 或特定主题的 PubMed 和 JSTOR。本研究通过对 259 篇跨学科期刊论文进行全面分析,旨在指导学者选择文献覆盖面更广、前后引文链接更准确、更全面的索引。主要研究结果表明,Google Scholar、ResearchGate、Semantic Scholar 和 Lens 是 FWC 搜索的主要选择,其中 Lens 的下载功能更胜一筹。在 BWC 搜索方面,推荐使用 Web of Science 核心合集,其准确性优于 Scopus。一般认为,IEEE Xplore 或 ScienceDirect 等出版商数据库中的 BWC 信息最为准确,但只能提供有限数量的文章。这些发现将有助于进行系统综述、荟萃分析和文献计量学分析的学者选择最合适的数据库进行引文检索。
{"title":"Beyond Google Scholar, Scopus, and Web of Science: An evaluation of the backward and forward citation coverage of 59 databases' citation indices","authors":"Michael Gusenbauer","doi":"10.1002/jrsm.1729","DOIUrl":"10.1002/jrsm.1729","url":null,"abstract":"<p>Citation indices providing information on backward citation (BWC) and forward citation (FWC) links are essential for literature discovery, bibliographic analysis, and knowledge synthesis, especially when language barriers impede document identification. However, the suitability of citation indices varies. While some have been analyzed, the majority, whether new or established, lack comprehensive evaluation. Therefore, this study evaluates the citation coverage of the citation indices of 59 databases, encompassing the widely used Google Scholar, Scopus, and Web of Science alongside many others never previously analyzed, such as the emerging Lens, Scite, Dimensions, and OpenAlex or the subject-specific PubMed and JSTOR. Through a comprehensive analysis using 259 journal articles from across disciplines, this research aims to guide scholars in selecting indices with broader document coverage and more accurate and comprehensive backward and forward citation links. Key findings highlight Google Scholar, ResearchGate, Semantic Scholar, and Lens as leading options for FWC searching, with Lens providing superior download capabilities. For BWC searching, the Web of Science Core Collection can be recommended over Scopus for accuracy. BWC information from publisher databases such as IEEE Xplore or ScienceDirect was generally found to be the most accurate, yet only available for a limited number of articles. The findings will help scholars conducting systematic reviews, meta-analyses, and bibliometric analyses to select the most suitable databases for citation searching.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 5","pages":"802-817"},"PeriodicalIF":5.0,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1729","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141320051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Traditionally, meta-analysis of time-to-event outcomes reports a single pooled hazard ratio assuming proportional hazards (PH). For health technology assessment evaluations, hazard ratios are frequently extrapolated across a lifetime horizon. However, when treatment effects vary over time, an assumption of PH is not always valid. The Royston-Parmar (RP), piecewise exponential (PE), and fractional polynomial (FP) models can accommodate non-PH and provide plausible extrapolations of survival curves beyond observed data.
Methods
Simulation study to assess and compare the performance of RP, PE, and FP models in a Bayesian framework estimating restricted mean survival time difference (RMSTD) at 50 years from a pairwise meta-analysis with evidence of non-PH. Individual patient data were generated from a mixture Weibull distribution. Twelve scenarios were considered varying the amount of follow-up data, number of trials in a meta-analysis, non-PH interaction coefficient, and prior distributions. Performance was assessed through bias and mean squared error. Models were applied to a metastatic breast cancer example.
Results
FP models performed best when the non-PH interaction coefficient was 0.2. RP models performed best in scenarios with complete follow-up data. PE models performed well on average across all scenarios. In the metastatic breast cancer example, RMSTD at 50-years ranged from −14.6 to 8.48 months.
Conclusions
Synthesis of time-to-event outcomes and estimation of RMSTD in the presence of non-PH can be challenging and computationally intensive. Different approaches make different assumptions regarding extrapolation and sensitivity analyses varying key assumptions are essential to check the robustness of conclusions to different assumptions for the underlying survival function.
{"title":"Bayesian pairwise meta-analysis of time-to-event outcomes in the presence of non-proportional hazards: A simulation study of flexible parametric, piecewise exponential and fractional polynomial models","authors":"Suzanne C. Freeman, Alex J. Sutton, Nicola J. Cooper, Alessandro Gasparini, Michael J. Crowther, Neil Hawkins","doi":"10.1002/jrsm.1722","DOIUrl":"10.1002/jrsm.1722","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Background</h3>\u0000 \u0000 <p>Traditionally, meta-analysis of time-to-event outcomes reports a single pooled hazard ratio assuming proportional hazards (PH). For health technology assessment evaluations, hazard ratios are frequently extrapolated across a lifetime horizon. However, when treatment effects vary over time, an assumption of PH is not always valid. The Royston-Parmar (RP), piecewise exponential (PE), and fractional polynomial (FP) models can accommodate non-PH and provide plausible extrapolations of survival curves beyond observed data.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>Simulation study to assess and compare the performance of RP, PE, and FP models in a Bayesian framework estimating restricted mean survival time difference (RMSTD) at 50 years from a pairwise meta-analysis with evidence of non-PH. Individual patient data were generated from a mixture Weibull distribution. Twelve scenarios were considered varying the amount of follow-up data, number of trials in a meta-analysis, non-PH interaction coefficient, and prior distributions. Performance was assessed through bias and mean squared error. Models were applied to a metastatic breast cancer example.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>FP models performed best when the non-PH interaction coefficient was 0.2. RP models performed best in scenarios with complete follow-up data. PE models performed well on average across all scenarios. In the metastatic breast cancer example, RMSTD at 50-years ranged from −14.6 to 8.48 months.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusions</h3>\u0000 \u0000 <p>Synthesis of time-to-event outcomes and estimation of RMSTD in the presence of non-PH can be challenging and computationally intensive. Different approaches make different assumptions regarding extrapolation and sensitivity analyses varying key assumptions are essential to check the robustness of conclusions to different assumptions for the underlying survival function.</p>\u0000 </section>\u0000 </div>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 5","pages":"780-801"},"PeriodicalIF":5.0,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1722","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141074418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meta-analyses examining dichotomous outcomes often include single-zero studies, where no events occur in intervention or control groups. These pose challenges, and several methods have been proposed to address them. A fixed continuity correction method has been shown to bias estimates, but it is frequently used because sometimes software (e.g., RevMan software in Cochrane reviews) uses it as a default. We aimed to empirically compare results using the continuity correction with those using alternative models that do not require correction. To this aim, we reanalyzed the original data from 885 meta-analyses in Cochrane reviews using the following methods: (i) Mantel–Haenszel model with a fixed continuity correction, (ii) random effects inverse variance model with a fixed continuity correction, (iii) Peto method (the three models available in RevMan), (iv) random effects inverse variance model with the treatment arm continuity correction, (v) Mantel–Haenszel model without correction, (vi) logistic regression, and (vii) a Bayesian random effects model with binominal likelihood. For each meta-analysis we calculated ratios of odds ratios between all methods, to assess how the choice of method may impact results. Ratios of odds ratios <0.8 or <1.25 were seen in ~30% of the existing meta-analyses when comparing results between Mantel–Haenszel model with a fixed continuity correction and either Mantel–Haenszel model without correction or logistic regression. We concluded that injudicious use of the fixed continuity correction in existing Cochrane reviews may have substantially influenced effect estimates in some cases. Future updates of RevMan should incorporate less biased statistical methods.
{"title":"The impact of continuity correction methods in Cochrane reviews with single-zero trials with rare events: A meta-epidemiological study","authors":"Yasushi Tsujimoto, Yusuke Tsutsumi, Yuki Kataoka, Akihiro Shiroshita, Orestis Efthimiou, Toshi A. Furukawa","doi":"10.1002/jrsm.1720","DOIUrl":"10.1002/jrsm.1720","url":null,"abstract":"<p>Meta-analyses examining dichotomous outcomes often include single-zero studies, where no events occur in intervention or control groups. These pose challenges, and several methods have been proposed to address them. A fixed continuity correction method has been shown to bias estimates, but it is frequently used because sometimes software (e.g., RevMan software in Cochrane reviews) uses it as a default. We aimed to empirically compare results using the continuity correction with those using alternative models that do not require correction. To this aim, we reanalyzed the original data from 885 meta-analyses in Cochrane reviews using the following methods: (i) Mantel–Haenszel model with a fixed continuity correction, (ii) random effects inverse variance model with a fixed continuity correction, (iii) Peto method (the three models available in RevMan), (iv) random effects inverse variance model with the treatment arm continuity correction, (v) Mantel–Haenszel model without correction, (vi) logistic regression, and (vii) a Bayesian random effects model with binominal likelihood. For each meta-analysis we calculated ratios of odds ratios between all methods, to assess how the choice of method may impact results. Ratios of odds ratios <0.8 or <1.25 were seen in ~30% of the existing meta-analyses when comparing results between Mantel–Haenszel model with a fixed continuity correction and either Mantel–Haenszel model without correction or logistic regression. We concluded that injudicious use of the fixed continuity correction in existing Cochrane reviews may have substantially influenced effect estimates in some cases. Future updates of RevMan should incorporate less biased statistical methods.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 5","pages":"769-779"},"PeriodicalIF":5.0,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1720","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140943578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When studies use different scales to measure continuous outcomes, standardised mean differences (SMD) are required to meta-analyse the data. However, outcomes are often reported as endpoint or change from baseline scores. Combining corresponding SMDs can be problematic and available guidance advises against this practice. We aimed to examine the impact of combining the two types of SMD in meta-analyses of depression severity. We used individual participant data on pharmacological interventions (89 studies, 27,409 participants) and internet-delivered cognitive behavioural therapy (iCBT; 61 studies, 13,687 participants) for depression to compare endpoint and change from baseline SMDs at the study level. Next, we performed pairwise (PWMA) and network meta-analyses (NMA) using endpoint SMDs, change from baseline SMDs, or a mixture of the two. Study-specific SMDs calculated from endpoint and change from baseline data were largely similar, although for iCBT interventions 25% of the studies at 3 months were associated with important differences between study-specific SMDs (median 0.01, IQR −0.10, 0.13) especially in smaller trials with baseline imbalances. However, when pooled, the differences between endpoint and change SMDs were negligible. Pooling only the more favourable of the two SMDs did not materially affect meta-analyses, resulting in differences of pooled SMDs up to 0.05 and 0.13 in the pharmacological and iCBT datasets, respectively. Our findings have implications for meta-analyses in depression, where we showed that the choice between endpoint and change scores for estimating SMDs had immaterial impact on summary meta-analytic estimates. Future studies should replicate and extend our analyses to fields other than depression.
{"title":"Combining endpoint and change data did not affect the summary standardised mean difference in pairwise and network meta-analyses: An empirical study in depression","authors":"Edoardo G. Ostinelli, Orestis Efthimiou, Yan Luo, Clara Miguel, Eirini Karyotaki, Pim Cuijpers, Toshi A. Furukawa, Georgia Salanti, Andrea Cipriani","doi":"10.1002/jrsm.1719","DOIUrl":"10.1002/jrsm.1719","url":null,"abstract":"<p>When studies use different scales to measure continuous outcomes, standardised mean differences (SMD) are required to meta-analyse the data. However, outcomes are often reported as endpoint or change from baseline scores. Combining corresponding SMDs can be problematic and available guidance advises against this practice. We aimed to examine the impact of combining the two types of SMD in meta-analyses of depression severity. We used individual participant data on pharmacological interventions (89 studies, 27,409 participants) and internet-delivered cognitive behavioural therapy (iCBT; 61 studies, 13,687 participants) for depression to compare endpoint and change from baseline SMDs at the study level. Next, we performed pairwise (PWMA) and network meta-analyses (NMA) using endpoint SMDs, change from baseline SMDs, or a mixture of the two. Study-specific SMDs calculated from endpoint and change from baseline data were largely similar, although for iCBT interventions 25% of the studies at 3 months were associated with important differences between study-specific SMDs (median 0.01, IQR −0.10, 0.13) especially in smaller trials with baseline imbalances. However, when pooled, the differences between endpoint and change SMDs were negligible. Pooling only the more favourable of the two SMDs did not materially affect meta-analyses, resulting in differences of pooled SMDs up to 0.05 and 0.13 in the pharmacological and iCBT datasets, respectively. Our findings have implications for meta-analyses in depression, where we showed that the choice between endpoint and change scores for estimating SMDs had immaterial impact on summary meta-analytic estimates. Future studies should replicate and extend our analyses to fields other than depression.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 5","pages":"758-768"},"PeriodicalIF":5.0,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1719","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140896375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}