Pub Date : 2026-01-01Epub Date: 2025-09-18DOI: 10.1017/rsm.2025.10036
Yu-Kang Tu, James S Hodges
The contrast-based model (CBM) is the most popular network meta-analysis (NMA) method, although alternative approaches, e.g., the baseline model (BM), have been proposed but seldom used. This article aims to illuminate the difference between the CBM and BM and explores when they produce different results. These models differ in key assumptions: The CBM assumes treatment contrasts are exchangeable across trials and models the reference (baseline) treatment's outcome levels as fixed effects, while the BM further assumes that the baseline treatment's outcome levels are exchangeable across trials and treats them as random effects. We show algebraically and graphically that the difference between the CBM and BM is analogous to the difference between the two analyses in a statistical conundrum called Lord's Paradox, in which the t-test and analysis of covariance (ANCOVA) yield conflicting conclusions about the group difference in weight gain. We show that this conflict arises because the t-test compares the observed weight change, whereas ANCOVA compares an adjusted weight change. In NMA, analogously, the CBM compares observed treatment contrasts, while the BM compares adjusted treatment contrasts. We demonstrate how the difference in modeling baseline effects can cause the CBM and BM to give different results. The analogy of Lord's Paradox provides insights into the different assumptions of the CBM and BM regarding the relationship between baseline effects and treatment contrasts. When these two models produce substantially different results, it may indicate a violation of the transitivity assumption. Therefore, we should be cautious in interpreting the results from either model.
{"title":"Lord's Paradox and two network meta-analysis models.","authors":"Yu-Kang Tu, James S Hodges","doi":"10.1017/rsm.2025.10036","DOIUrl":"10.1017/rsm.2025.10036","url":null,"abstract":"<p><p>The contrast-based model (CBM) is the most popular network meta-analysis (NMA) method, although alternative approaches, e.g., the baseline model (BM), have been proposed but seldom used. This article aims to illuminate the difference between the CBM and BM and explores when they produce different results. These models differ in key assumptions: The CBM assumes treatment contrasts are exchangeable across trials and models the reference (baseline) treatment's outcome levels as fixed effects, while the BM further assumes that the baseline treatment's outcome levels are exchangeable across trials and treats them as random effects. We show algebraically and graphically that the difference between the CBM and BM is analogous to the difference between the two analyses in a statistical conundrum called Lord's Paradox, in which the <i>t</i>-test and analysis of covariance (ANCOVA) yield conflicting conclusions about the group difference in weight gain. We show that this conflict arises because the <i>t</i>-test compares the <i>observed</i> weight change, whereas ANCOVA compares an <i>adjusted</i> weight change. In NMA, analogously, the CBM compares observed treatment contrasts, while the BM compares adjusted treatment contrasts. We demonstrate how the difference in modeling baseline effects can cause the CBM and BM to give different results. The analogy of Lord's Paradox provides insights into the different assumptions of the CBM and BM regarding the relationship between baseline effects and treatment contrasts. When these two models produce substantially different results, it may indicate a violation of the transitivity assumption. Therefore, we should be cautious in interpreting the results from either model.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"17 1","pages":"111-122"},"PeriodicalIF":6.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12823209/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-10-23DOI: 10.1017/rsm.2025.10043
Lianne K Siegel, Joseph S Koopmeiners, Jamie Hartmann-Boyce, Peter J Godolphin, Abdel G Babiker, Giota Touloumi, Kirk U Knowlton, Richard D Riley
To understand a treatment's potential impact at the individual level, it is crucial to explore whether the effect differs across patient subgroups and covariate values. Meta-analysis provides an important tool for detecting treatment-covariate interactions, as it can improve power compared to a single study. However, aggregation bias can occur when estimating individual-level treatment-covariate interactions in meta-analysis, due to trial-level confounding. This refers to when the association between the covariate and treatment effect across trials (at the aggregate level) differs from that observed within trials (at the individual level). It is, thus, recommended that heterogeneity in the treatment effect at the individual level should be disentangled from that at the trial level, ideally using an individual participant data (IPD) meta-analysis. Here, we explain this issue and provide new intuition about how trial-level confounding is impacted by differences in within-trial distributions of covariates and how this corresponds to asymmetry in subgroup-specific funnel plots in the case of categorical covariates. We then propose a sensitivity analysis to assess the robustness of interaction estimates to potential trial-level confounding. We illustrate these concepts using simulated and real data from an IPD meta-analysis of trials conducted on the TICO/ACTIV-3 platform, which assessed passive immunotherapy treatments for inpatients with COVID-19.
{"title":"Examining covariate-specific treatment effects in individual participant data meta-analysis: Framing aggregation bias in terms of trial-level confounding and funnel plots.","authors":"Lianne K Siegel, Joseph S Koopmeiners, Jamie Hartmann-Boyce, Peter J Godolphin, Abdel G Babiker, Giota Touloumi, Kirk U Knowlton, Richard D Riley","doi":"10.1017/rsm.2025.10043","DOIUrl":"10.1017/rsm.2025.10043","url":null,"abstract":"<p><p>To understand a treatment's potential impact at the individual level, it is crucial to explore whether the effect differs across patient subgroups and covariate values. Meta-analysis provides an important tool for detecting treatment-covariate interactions, as it can improve power compared to a single study. However, aggregation bias can occur when estimating individual-level treatment-covariate interactions in meta-analysis, due to trial-level confounding. This refers to when the association between the covariate and treatment effect <i>across</i> trials (at the aggregate level) differs from that observed <i>within</i> trials (at the individual level). It is, thus, recommended that heterogeneity in the treatment effect at the individual level should be disentangled from that at the trial level, ideally using an individual participant data (IPD) meta-analysis. Here, we explain this issue and provide new intuition about how trial-level confounding is impacted by differences in within-trial distributions of covariates and how this corresponds to asymmetry in subgroup-specific funnel plots in the case of categorical covariates. We then propose a sensitivity analysis to assess the robustness of interaction estimates to potential trial-level confounding. We illustrate these concepts using simulated and real data from an IPD meta-analysis of trials conducted on the TICO/ACTIV-3 platform, which assessed passive immunotherapy treatments for inpatients with COVID-19.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"17 1","pages":"194-209"},"PeriodicalIF":6.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12823212/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-10-14DOI: 10.1017/rsm.2025.10040
Cindy Stern, Jiaoli Li, Jennifer Stone, Hanan Khalil, Kim Sears, Romy Menghao Jia, Patraporn Bhatarasakoon, Edoardo Aromataris, Ritin Fernandez
Umbrella reviews (URs) synthesize findings from multiple systematic reviews on a specific topic. Methodological approaches for analyzing and presenting UR results vary, and reviewers often adapt methods to align with research objectives. This study examined the characteristics of analysis and presentation methods used in healthcare-related URs. A systematic PubMed search identified URs published between 2023 and 2024. Inclusion criteria focused on healthcare URs using systematic reviews as the unit of analysis. A random sample of 100 eligible URs was included. A customized, piloted data extraction form was used to collect bibliographic, conduct, and reporting data independently. Descriptive analysis and narrative synthesis summarized findings. The most common terminology for eligible studies was "umbrella reviews" (65%) or "overviews" (30%). Question frameworks included PICO (43%) and PICOS (14%), with quantitative systematic reviews included in most URs (98%), and 68% including randomized controlled trials. The most frequent methodological guidance source was Cochrane (32%). Data analysis commonly used narrative synthesis and meta-analysis, with Stata, RevMan, and GRADEPro GDT employed for presentation. Information about study overlap and certainty assessment was rarely reported.Variation exists in how data are analyzed and presented in URs, with key elements often omitted. These findings highlight the need for clearer methodological guidance to enhance consistency and reporting in future URs.
{"title":"Data analysis and presentation methods in umbrella reviews/overviews of reviews in health care: A cross-sectional study.","authors":"Cindy Stern, Jiaoli Li, Jennifer Stone, Hanan Khalil, Kim Sears, Romy Menghao Jia, Patraporn Bhatarasakoon, Edoardo Aromataris, Ritin Fernandez","doi":"10.1017/rsm.2025.10040","DOIUrl":"10.1017/rsm.2025.10040","url":null,"abstract":"<p><p>Umbrella reviews (URs) synthesize findings from multiple systematic reviews on a specific topic. Methodological approaches for analyzing and presenting UR results vary, and reviewers often adapt methods to align with research objectives. This study examined the characteristics of analysis and presentation methods used in healthcare-related URs. A systematic PubMed search identified URs published between 2023 and 2024. Inclusion criteria focused on healthcare URs using systematic reviews as the unit of analysis. A random sample of 100 eligible URs was included. A customized, piloted data extraction form was used to collect bibliographic, conduct, and reporting data independently. Descriptive analysis and narrative synthesis summarized findings. The most common terminology for eligible studies was \"umbrella reviews\" (65%) or \"overviews\" (30%). Question frameworks included PICO (43%) and PICOS (14%), with quantitative systematic reviews included in most URs (98%), and 68% including randomized controlled trials. The most frequent methodological guidance source was Cochrane (32%). Data analysis commonly used narrative synthesis and meta-analysis, with Stata, RevMan, and GRADEPro GDT employed for presentation. Information about study overlap and certainty assessment was rarely reported.Variation exists in how data are analyzed and presented in URs, with key elements often omitted. These findings highlight the need for clearer methodological guidance to enhance consistency and reporting in future URs.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"17 1","pages":"210-224"},"PeriodicalIF":6.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12823197/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-10-08DOI: 10.1017/rsm.2025.10041
Saritte Perlman, Eliana Ben-Sheleg, Moriah E Ellen
Critical interpretive synthesis was introduced in 2006 to address various shortcomings of systematic reviews such as their limitations in synthesizing heterogeneous data, integrating diverse study types, and generating theoretical insights. This review sought to outline the methodological process of conducting critical interpretive syntheses by identifying the methods currently in use, mapping the processes that have been used to date, and highlighting directions for further research. To achieve this, a scoping review of critical interpretive syntheses published between 2006 and 2023 was conducted. Initial searches identified 1628 publications and after removal of duplicates and exclusions, 212 reviews were included in the study. Most reviews focused on health-related subjects. Authors chose to utilize the method due to its iterative, inductive, and recursive nature. Both question-based and topic-based reviews were conducted. Literature searches relied on electronic databases and reference chaining. Mapping to the original six-phase model showed most variability in use of sampling and quality assessment phases, which were each done in 50.7% of reviews. Data extraction utilized a data extraction table. Synthesis involved constant comparison, critique, and consolidation of themes into constructs, and a synthesizing argument. Refining critical interpretive synthesis methodology and its best practices are important for optimizing the utility and impact and ensuring findings are relevant and actionable for informing policy, practice, and future research.
{"title":"Making sense of conducting a critical interpretive synthesis: A scoping review.","authors":"Saritte Perlman, Eliana Ben-Sheleg, Moriah E Ellen","doi":"10.1017/rsm.2025.10041","DOIUrl":"10.1017/rsm.2025.10041","url":null,"abstract":"<p><p>Critical interpretive synthesis was introduced in 2006 to address various shortcomings of systematic reviews such as their limitations in synthesizing heterogeneous data, integrating diverse study types, and generating theoretical insights. This review sought to outline the methodological process of conducting critical interpretive syntheses by identifying the methods currently in use, mapping the processes that have been used to date, and highlighting directions for further research. To achieve this, a scoping review of critical interpretive syntheses published between 2006 and 2023 was conducted. Initial searches identified 1628 publications and after removal of duplicates and exclusions, 212 reviews were included in the study. Most reviews focused on health-related subjects. Authors chose to utilize the method due to its iterative, inductive, and recursive nature. Both question-based and topic-based reviews were conducted. Literature searches relied on electronic databases and reference chaining. Mapping to the original six-phase model showed most variability in use of sampling and quality assessment phases, which were each done in 50.7% of reviews. Data extraction utilized a data extraction table. Synthesis involved constant comparison, critique, and consolidation of themes into constructs, and a synthesizing argument. Refining critical interpretive synthesis methodology and its best practices are important for optimizing the utility and impact and ensuring findings are relevant and actionable for informing policy, practice, and future research.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"17 1","pages":"30-41"},"PeriodicalIF":6.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12823206/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-10-16DOI: 10.1017/rsm.2025.10039
Rebecca Kathleen Metcalfe, Antonio Remiro-Azócar, Quang Vuong, Anders Gorst-Rasmussen, Oliver Keene, Shomoita Alam, Jay J H Park
The ICH E9(R1) addendum provides guidelines on accounting for intercurrent events in clinical trials using the estimands framework. However, there has been limited attention to the estimands framework for meta-analysis. Using treatment switching, a well-known intercurrent event that occurs frequently in oncology, we conducted a simulation study to explore the bias introduced by pooling together estimates targeting different estimands in a meta-analysis of randomized clinical trials (RCTs) that allowed treatment switching. We simulated overall survival data of a collection of RCTs that allowed patients in the control group to switch to the intervention treatment after disease progression under fixed effects and random effects models. For each RCT, we calculated effect estimates for a treatment policy estimand that ignored treatment switching, and a hypothetical estimand that accounted for treatment switching either by fitting rank-preserving structural failure time models or by censoring switchers. Then, we performed random effects and fixed effects meta-analyses to pool together RCT effect estimates while varying the proportions of trials providing treatment policy and hypothetical effect estimates. We compared the results of meta-analyses that pooled different types of effect estimates with those that pooled only treatment policy or hypothetical estimates. We found that pooling estimates targeting different estimands results in pooled estimators that do not target any estimand of interest, and that pooling estimates of varying estimands can generate misleading results, even under a random effects model. Adopting the estimands framework for meta-analysis may improve alignment between meta-analytic results and the clinical research question of interest.
ICH E9(R1)附录提供了使用估算框架对临床试验中并发事件进行会计处理的指南。然而,对meta分析的估算框架的关注有限。治疗转换是肿瘤学中经常发生的一个众所周知的交叉事件,我们进行了一项模拟研究,以探索在允许治疗转换的随机临床试验(rct)的荟萃分析中,将针对不同估计的估计汇集在一起所引入的偏倚。我们模拟了一组随机对照试验的总体生存数据,这些随机对照试验允许对照组患者在疾病进展后在固定效应和随机效应模型下切换到干预治疗。对于每个RCT,我们计算了忽略治疗切换的治疗策略估计的效果估计,以及通过拟合保秩结构失效时间模型或通过审查切换者来考虑治疗切换的假设估计。然后,我们进行随机效应和固定效应荟萃分析,将提供治疗政策和假设效应估计的试验比例不同的RCT效应估计汇总在一起。我们比较了合并不同类型效果估计的荟萃分析结果与仅合并治疗政策或假设估计的荟萃分析结果。我们发现,针对不同估计的池化估计会导致不针对任何感兴趣的估计的池化估计,并且即使在随机效应模型下,对不同估计的池化估计也会产生误导性的结果。采用估算框架进行meta分析可以改善meta分析结果与感兴趣的临床研究问题之间的一致性。
{"title":"Estimands and their implications for evidence synthesis for oncology: A simulation study of treatment switching in meta-analysis.","authors":"Rebecca Kathleen Metcalfe, Antonio Remiro-Azócar, Quang Vuong, Anders Gorst-Rasmussen, Oliver Keene, Shomoita Alam, Jay J H Park","doi":"10.1017/rsm.2025.10039","DOIUrl":"10.1017/rsm.2025.10039","url":null,"abstract":"<p><p>The ICH E9(R1) addendum provides guidelines on accounting for intercurrent events in clinical trials using the estimands framework. However, there has been limited attention to the estimands framework for meta-analysis. Using treatment switching, a well-known intercurrent event that occurs frequently in oncology, we conducted a simulation study to explore the bias introduced by pooling together estimates targeting different estimands in a meta-analysis of randomized clinical trials (RCTs) that allowed treatment switching. We simulated overall survival data of a collection of RCTs that allowed patients in the control group to switch to the intervention treatment after disease progression under fixed effects and random effects models. For each RCT, we calculated effect estimates for a treatment policy estimand that ignored treatment switching, and a hypothetical estimand that accounted for treatment switching either by fitting rank-preserving structural failure time models or by censoring switchers. Then, we performed random effects and fixed effects meta-analyses to pool together RCT effect estimates while varying the proportions of trials providing treatment policy and hypothetical effect estimates. We compared the results of meta-analyses that pooled different types of effect estimates with those that pooled only treatment policy or hypothetical estimates. We found that pooling estimates targeting different estimands results in pooled estimators that do not target any estimand of interest, and that pooling estimates of varying estimands can generate misleading results, even under a random effects model. Adopting the estimands framework for meta-analysis may improve alignment between meta-analytic results and the clinical research question of interest.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"17 1","pages":"170-193"},"PeriodicalIF":6.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12824772/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-10-13DOI: 10.1017/rsm.2025.10047
Andrew Dullea, Lydia O'Sullivan, Kirsty K O'Brien, Patricia Harrington, Marie Carrigan, Susan Ahern, Maeve McGarry, Karen Cardwell, Michelle O'Neill, Kieran A Walsh, Barbara Clyne, Susan M Smith, Mairin Ryan
Existing guidelines on overviews of reviews and umbrella reviews recommend an assessment of the certainty of evidence, but provide limited guidance on 'how to' apply the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) to such a complex evidence synthesis. We share our experience of developing a 'general principles' approach to applying GRADE to a complex overview of reviews. The approach was developed in an iterative and exploratory manner during the planning and conduct of an overview of reviews of a novel molecular imaging technique for the staging of prostate cancer, involving a formal review by a group of 11 methodologists/health services researchers. This approach was developed during the evidence synthesis process, piloted, and then applied to our ongoing overview of reviews. A 'general principles' approach of applying the domains of GRADE to an overview of reviews and arriving at an overall summary judgement for each outcome is presented. Our approach details additional factors to consider, including addressing both the primary study risk of bias as assessed by the included reviews and the risk of bias of the systematic reviews themselves, as well as the statistical heterogeneity observed in meta-analyses conducted within the included reviews. Our approach distilled key principles from the relevant GRADE guidelines and allowed us to apply GRADE to a complex body of evidence in a consistent and transparent way. The approach taken and the methods used to develop our approach may inform researchers working on overviews of reviews, umbrella reviews, or future methodological guidelines.
{"title":"Developing an approach for assigning GRADE levels in a systematic overview of reviews of diagnostic test accuracy using general principles identified from current GRADE guidelines: A case study.","authors":"Andrew Dullea, Lydia O'Sullivan, Kirsty K O'Brien, Patricia Harrington, Marie Carrigan, Susan Ahern, Maeve McGarry, Karen Cardwell, Michelle O'Neill, Kieran A Walsh, Barbara Clyne, Susan M Smith, Mairin Ryan","doi":"10.1017/rsm.2025.10047","DOIUrl":"10.1017/rsm.2025.10047","url":null,"abstract":"<p><p>Existing guidelines on overviews of reviews and umbrella reviews recommend an assessment of the certainty of evidence, but provide limited guidance on 'how to' apply the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) to such a complex evidence synthesis. We share our experience of developing a 'general principles' approach to applying GRADE to a complex overview of reviews. The approach was developed in an iterative and exploratory manner during the planning and conduct of an overview of reviews of a novel molecular imaging technique for the staging of prostate cancer, involving a formal review by a group of 11 methodologists/health services researchers. This approach was developed during the evidence synthesis process, piloted, and then applied to our ongoing overview of reviews. A 'general principles' approach of applying the domains of GRADE to an overview of reviews and arriving at an overall summary judgement for each outcome is presented. Our approach details additional factors to consider, including addressing both the primary study risk of bias as assessed by the included reviews and the risk of bias of the systematic reviews themselves, as well as the statistical heterogeneity observed in meta-analyses conducted within the included reviews. Our approach distilled key principles from the relevant GRADE guidelines and allowed us to apply GRADE to a complex body of evidence in a consistent and transparent way. The approach taken and the methods used to develop our approach may inform researchers working on overviews of reviews, umbrella reviews, or future methodological guidelines.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"17 1","pages":"225-236"},"PeriodicalIF":6.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12823199/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-10-27DOI: 10.1017/rsm.2025.10032
Carole Lunny, Nityanand Jain, Tina Nazari, Melodi Kosaner-Kließ, Lucas Santos, Ian Goodman, Alaa A M Osman, Stefano Berrone, Mohammad Najm Dadam, Connor T A Brenna, Heba Hussein, Gioia Dahdal, Diana Cespedes A, Nicola Ferri, Salmaan Kanji, Yuan Chi, Dawid Pieper, Beverly Shea, Amanda Parker, Dipika Neupane, Paul A Khan, Daniella Rangira, Kat Kolaski, Ben Ridley, Amina Berour, Kevin Sun, Radin Hamidi Rad, Zihui Ouyang, Emma K Reid, Iván Pérez-Neri, Sanabel O Barakat, Silvia Bargeri, Silvia Gianola, Greta Castellini, Sera Whitelaw, Adrienne Stevens, Shailesh B Kolekar, Kristy Wong, Paityn Major, Ebrahim Bagheri, Andrea C Tricco
AMSTAR-2 (A Measurement Tool to Assess Systematic Reviews, version 2) and ROBIS are tools used to assess the methodological quality and the risk of bias in a systematic review (SR). We applied AMSTAR-2 and ROBIS to a sample of 200 published SRs. We investigated the overlap in their methodological constructs, responses by item, and overall, percentage agreement, direction of effect, and timing of assessments. AMSTAR-2 contains 16 items and ROBIS 24 items. Three items in AMSTAR-2 and nine in ROBIS did not overlap in construct. Of the 200 SRs, 73% were low or critically low quality using AMSTAR-2, and 81% had a high risk of bias using ROBIS. The median time to complete AMSTAR-2 and ROBIS was 51 and 64 minutes, respectively. When assessment times were calibrated to the number of items in each tool, each item took an average of 3.2 minutes per item for AMSTAR-2 compared to 2.7 minutes for ROBIS. Nine percent of SRs had opposing ratings (i.e., AMSTAR-2 was high quality while ROBIS was high risk). In both tools, three-quarters of items showed more than 70% agreement between raters after extensive training and piloting. AMSTAR-2 and ROBIS provide complementary rather than interchangeable assessments of systematic reviews. AMSTAR-2 may be preferable when efficiency is prioritized and methodological rigour is the focus, whereas ROBIS offers a deeper examination of potential biases and external validity. Given the widespread reliance on systematic reviews for policy and practice, selecting the appropriate appraisal tool remains crucial. Future research should explore strategies to integrate the strengths of both instruments while minimizing the burden on assessors.
{"title":"Exploring the methodological quality and risk of bias in 200 systematic reviews: A comparative study of ROBIS and AMSTAR-2 tools.","authors":"Carole Lunny, Nityanand Jain, Tina Nazari, Melodi Kosaner-Kließ, Lucas Santos, Ian Goodman, Alaa A M Osman, Stefano Berrone, Mohammad Najm Dadam, Connor T A Brenna, Heba Hussein, Gioia Dahdal, Diana Cespedes A, Nicola Ferri, Salmaan Kanji, Yuan Chi, Dawid Pieper, Beverly Shea, Amanda Parker, Dipika Neupane, Paul A Khan, Daniella Rangira, Kat Kolaski, Ben Ridley, Amina Berour, Kevin Sun, Radin Hamidi Rad, Zihui Ouyang, Emma K Reid, Iván Pérez-Neri, Sanabel O Barakat, Silvia Bargeri, Silvia Gianola, Greta Castellini, Sera Whitelaw, Adrienne Stevens, Shailesh B Kolekar, Kristy Wong, Paityn Major, Ebrahim Bagheri, Andrea C Tricco","doi":"10.1017/rsm.2025.10032","DOIUrl":"10.1017/rsm.2025.10032","url":null,"abstract":"<p><p>AMSTAR-2 (A Measurement Tool to Assess Systematic Reviews, version 2) and ROBIS are tools used to assess the methodological quality and the risk of bias in a systematic review (SR). We applied AMSTAR-2 and ROBIS to a sample of 200 published SRs. We investigated the overlap in their methodological constructs, responses by item, and overall, percentage agreement, direction of effect, and timing of assessments. AMSTAR-2 contains 16 items and ROBIS 24 items. Three items in AMSTAR-2 and nine in ROBIS did not overlap in construct. Of the 200 SRs, 73% were low or critically low quality using AMSTAR-2, and 81% had a high risk of bias using ROBIS. The median time to complete AMSTAR-2 and ROBIS was 51 and 64 minutes, respectively. When assessment times were calibrated to the number of items in each tool, each item took an average of 3.2 minutes per item for AMSTAR-2 compared to 2.7 minutes for ROBIS. Nine percent of SRs had opposing ratings (i.e., AMSTAR-2 was high quality while ROBIS was high risk). In both tools, three-quarters of items showed more than 70% agreement between raters after extensive training and piloting. AMSTAR-2 and ROBIS provide complementary rather than interchangeable assessments of systematic reviews. AMSTAR-2 may be preferable when efficiency is prioritized and methodological rigour is the focus, whereas ROBIS offers a deeper examination of potential biases and external validity. Given the widespread reliance on systematic reviews for policy and practice, selecting the appropriate appraisal tool remains crucial. Future research should explore strategies to integrate the strengths of both instruments while minimizing the burden on assessors.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"17 1","pages":"63-92"},"PeriodicalIF":6.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12823211/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-09-17DOI: 10.1017/rsm.2025.10030
Yuki Kataoka, Tomohiro Takayama, Keisuke Yoshimura, Ryuhei So, Yasushi Tsujimoto, Yosuke Yamagishi, Shiro Takagi, Yuki Furukawa, Masatsugu Sakata, Đorđe Bašić, Andrea Cipriani, Pim Cuijpers, Eirini Karyotaki, Mathias Harrer, Stefan Leucht, Ava Homiar, Edoardo G Ostinelli, Clara Miguel, Alessandro Rodolico, Toshi A Furukawa
Large language models have shown promise for automating data extraction (DE) in systematic reviews (SRs), but most existing approaches require manual interaction. We developed an open-source system using GPT-4o to automatically extract data with no human intervention during the extraction process. We developed the system on a dataset of 290 randomized controlled trials (RCTs) from a published SR about cognitive behavioral therapy for insomnia. We evaluated the system on two other datasets: 5 RCTs from an updated search for the same review and 10 RCTs used in a separate published study that had also evaluated automated DE. We developed the best approach across all variables in the development dataset using GPT-4o. The performance in the updated-search dataset using o3 was 74.9% sensitivity, 76.7% specificity, 75.7 precision, 93.5% variable detection comprehensiveness, and 75.3% accuracy. In both datasets, accuracy was higher for string variables (e.g., country, study design, drug names, and outcome definitions) compared with numeric variables. In the third external validation dataset, GPT-4o showed a lower performance with a mean accuracy of 84.4% compared with the previous study. However, by adjusting our DE method, while maintaining the same prompting technique, we achieved a mean accuracy of 96.3%, which was comparable to the previous manual extraction study. Our system shows potential for assisting the DE of string variables alongside a human reviewer. However, it cannot yet replace humans for numeric DE. Further evaluation across diverse review contexts is needed to establish broader applicability.
{"title":"Automating the data extraction process for systematic reviews using GPT-4o and o3.","authors":"Yuki Kataoka, Tomohiro Takayama, Keisuke Yoshimura, Ryuhei So, Yasushi Tsujimoto, Yosuke Yamagishi, Shiro Takagi, Yuki Furukawa, Masatsugu Sakata, Đorđe Bašić, Andrea Cipriani, Pim Cuijpers, Eirini Karyotaki, Mathias Harrer, Stefan Leucht, Ava Homiar, Edoardo G Ostinelli, Clara Miguel, Alessandro Rodolico, Toshi A Furukawa","doi":"10.1017/rsm.2025.10030","DOIUrl":"10.1017/rsm.2025.10030","url":null,"abstract":"<p><p>Large language models have shown promise for automating data extraction (DE) in systematic reviews (SRs), but most existing approaches require manual interaction. We developed an open-source system using GPT-4o to automatically extract data with no human intervention during the extraction process. We developed the system on a dataset of 290 randomized controlled trials (RCTs) from a published SR about cognitive behavioral therapy for insomnia. We evaluated the system on two other datasets: 5 RCTs from an updated search for the same review and 10 RCTs used in a separate published study that had also evaluated automated DE. We developed the best approach across all variables in the development dataset using GPT-4o. The performance in the updated-search dataset using o3 was 74.9% sensitivity, 76.7% specificity, 75.7 precision, 93.5% variable detection comprehensiveness, and 75.3% accuracy. In both datasets, accuracy was higher for string variables (e.g., country, study design, drug names, and outcome definitions) compared with numeric variables. In the third external validation dataset, GPT-4o showed a lower performance with a mean accuracy of 84.4% compared with the previous study. However, by adjusting our DE method, while maintaining the same prompting technique, we achieved a mean accuracy of 96.3%, which was comparable to the previous manual extraction study. Our system shows potential for assisting the DE of string variables alongside a human reviewer. However, it cannot yet replace humans for numeric DE. Further evaluation across diverse review contexts is needed to establish broader applicability.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"17 1","pages":"42-62"},"PeriodicalIF":6.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12823200/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-10-02DOI: 10.1017/rsm.2025.10035
Weilun Wu, Jianhua Duan, W Robert Reed, Elizabeth Tipton
This study analyzes 1,000 meta-analyses drawn from 10 disciplines-including medicine, psychology, education, biology, and economics-to document and compare methodological practices across fields. We find large differences in the size of meta-analyses, the number of effect sizes per study, and the types of effect sizes used. Disciplines also vary in their use of unpublished studies, the frequency and type of tests for publication bias, and whether they attempt to correct for it. Notably, many meta-analyses include multiple effect sizes from the same study, yet fail to account for statistical dependence in their analyses. We document the limited use of advanced methods-such as multilevel models and cluster-adjusted standard errors-that can accommodate dependent data structures. Correlations are frequently used as effect sizes in some disciplines, yet researchers often fail to address the methodological issues this introduces, including biased weighting and misleading tests for publication bias. We also find that meta-regression is underutilized, even when sample sizes are large enough to support it. This work serves as a resource for researchers conducting their first meta-analyses, as a benchmark for researchers designing simulation experiments, and as a reference for applied meta-analysts aiming to improve their methodological practices.
{"title":"What can we learn from 1,000 meta-analyses across 10 different disciplines?","authors":"Weilun Wu, Jianhua Duan, W Robert Reed, Elizabeth Tipton","doi":"10.1017/rsm.2025.10035","DOIUrl":"10.1017/rsm.2025.10035","url":null,"abstract":"<p><p>This study analyzes 1,000 meta-analyses drawn from 10 disciplines-including medicine, psychology, education, biology, and economics-to document and compare methodological practices across fields. We find large differences in the size of meta-analyses, the number of effect sizes per study, and the types of effect sizes used. Disciplines also vary in their use of unpublished studies, the frequency and type of tests for publication bias, and whether they attempt to correct for it. Notably, many meta-analyses include multiple effect sizes from the same study, yet fail to account for statistical dependence in their analyses. We document the limited use of advanced methods-such as multilevel models and cluster-adjusted standard errors-that can accommodate dependent data structures. Correlations are frequently used as effect sizes in some disciplines, yet researchers often fail to address the methodological issues this introduces, including biased weighting and misleading tests for publication bias. We also find that meta-regression is underutilized, even when sample sizes are large enough to support it. This work serves as a resource for researchers conducting their first meta-analyses, as a benchmark for researchers designing simulation experiments, and as a reference for applied meta-analysts aiming to improve their methodological practices.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"17 1","pages":"123-156"},"PeriodicalIF":6.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12823205/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-10-15DOI: 10.1017/rsm.2025.10038
Keith Chan, Sarah Goring, Kabirraaj Toor, Murat Kurt, Andriy Moshyk, Jeroen Jansen
In many areas of oncology, cancer drugs are now associated with long-term survivorship and mixture cure models (MCM) are increasingly being used for survival analysis. The objective of this article was to propose a methodology for conducting network meta-analysis (NMA) of MCM. This method was illustrated through a case study evaluating recurrence-free survival (RFS) with adjuvant therapy for stage III/IV resected melanoma. For the case study, the MCM NMA was conducted by: (1) fitting MCMs to each trial included within the network of evidence; and (2) incorporating the parameters of the MCMs into a multivariate NMA. Outputs included relative effect estimates for the MCM NMA as well as absolute estimates of survival (RFS), modeled within the Bayesian multivariate NMA, by incorporating absolute baseline effects of the reference treatment. The case study was intended for illustrative purposes of the MCM NMA methodology and is not meant for clinical interpretation. The case study demonstrated the feasibility of conducting an MCM NMA and highlighted key issues and considerations when conducting such analyses, including plausibility of cure, maturity of data, process for model selection, and the presentation and interpretation of results. MCM NMA provides a method of comparative survival that acknowledges the benefit newer treatments may confer on a subset of patients, resulting in long-term survival and reflection of this survival in extrapolation. In the future, this method may provide an additional metric to compare treatments that is of value to patients.
{"title":"Incorporating the possibility of cure into network meta-analyses: A case study from resected Stage III/IV melanoma.","authors":"Keith Chan, Sarah Goring, Kabirraaj Toor, Murat Kurt, Andriy Moshyk, Jeroen Jansen","doi":"10.1017/rsm.2025.10038","DOIUrl":"10.1017/rsm.2025.10038","url":null,"abstract":"<p><p>In many areas of oncology, cancer drugs are now associated with long-term survivorship and mixture cure models (MCM) are increasingly being used for survival analysis. The objective of this article was to propose a methodology for conducting network meta-analysis (NMA) of MCM. This method was illustrated through a case study evaluating recurrence-free survival (RFS) with adjuvant therapy for stage III/IV resected melanoma. For the case study, the MCM NMA was conducted by: (1) fitting MCMs to each trial included within the network of evidence; and (2) incorporating the parameters of the MCMs into a multivariate NMA. Outputs included relative effect estimates for the MCM NMA as well as absolute estimates of survival (RFS), modeled within the Bayesian multivariate NMA, by incorporating absolute baseline effects of the reference treatment. The case study was intended for illustrative purposes of the MCM NMA methodology and is not meant for clinical interpretation. The case study demonstrated the feasibility of conducting an MCM NMA and highlighted key issues and considerations when conducting such analyses, including plausibility of cure, maturity of data, process for model selection, and the presentation and interpretation of results. MCM NMA provides a method of comparative survival that acknowledges the benefit newer treatments may confer on a subset of patients, resulting in long-term survival and reflection of this survival in extrapolation. In the future, this method may provide an additional metric to compare treatments that is of value to patients.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"17 1","pages":"157-169"},"PeriodicalIF":6.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12823198/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}