Pub Date : 2025-11-22DOI: 10.1016/j.jclinepi.2025.112086
Carlos A. Cuello-Garcia , Rebecca L. Morgan , Nancy Santesso , Pablo Alonso-Coello , Romina Brignardello-Petersen , Lukas Schwingshackl , Jan L. Brozek , Srinivasa Vittal Katikireddi , Zachary Munn , Hugh Sharma Waddington , Kevin C. Wilson , Joerg Meerpohl , Daniel Morales , Ignacio Neumann , Peter Tugwell , Gordon Guyatt , Holger J. Schünemann
Background and Objectives
Ideally, guideline developers and health technology assessment authors base intervention decisions on randomized controlled trials (RCTs). However, relying solely on RCTs is uncommon, especially for public health interventions and harms assessment. In these situations, nonrandomized studies of interventions (NRSIs) can provide valuable information. This article presents Grading of Recommendations Assessment, Development, and Evaluation (GRADE) guidance for integrating bodies of evidence RCT and NRSI in evidence syntheses of health interventions.
Methods
Following standard GRADE methods, we developed this guidance through iterative discussions and examples with experts from the GRADE NRSI project group in multiple dedicated meetings. We presented findings of the group discussions for feedback at GRADE Working Group meetings in September 2023 and May 2024.
Results
The resulting GRADE guidance outlines a structured approach: (1) assessing the certainty of evidence (CoE) after defining the number of decision thresholds and the target of the certainty rating; (2) evaluating congruency of effect estimates between RCTs and NRSIs; (3) identifying which GRADE domains are affected by certainty ratings to inform complementariness between RCTs and NRSIs and the overall CoE; and (4) deciding whether and how to use one or both types of studies.
Conclusion
This GRADE guidance offers a structured and practical approach for integrating or not integrating RCTs and NRSIs in evidence syntheses. By addressing the interplay between affected GRADE domains and assessing the congruency of effects, it helps GRADE users determine when and how NRSIs can meaningfully complement or replace RCT evidence to inform certainty ratings and decision-making.
{"title":"Grading of Recommendations, Assessment, Development, and Evaluation guidance 44: strategies to enhance the utilization of randomized and nonrandomized studies in evidence syntheses of healthinterventions","authors":"Carlos A. Cuello-Garcia , Rebecca L. Morgan , Nancy Santesso , Pablo Alonso-Coello , Romina Brignardello-Petersen , Lukas Schwingshackl , Jan L. Brozek , Srinivasa Vittal Katikireddi , Zachary Munn , Hugh Sharma Waddington , Kevin C. Wilson , Joerg Meerpohl , Daniel Morales , Ignacio Neumann , Peter Tugwell , Gordon Guyatt , Holger J. Schünemann","doi":"10.1016/j.jclinepi.2025.112086","DOIUrl":"10.1016/j.jclinepi.2025.112086","url":null,"abstract":"<div><h3>Background and Objectives</h3><div>Ideally, guideline developers and health technology assessment authors base intervention decisions on randomized controlled trials (RCTs). However, relying solely on RCTs is uncommon, especially for public health interventions and harms assessment. In these situations, nonrandomized studies of interventions (NRSIs) can provide valuable information. This article presents Grading of Recommendations Assessment, Development, and Evaluation (GRADE) guidance for integrating bodies of evidence RCT and NRSI in evidence syntheses of health interventions.</div></div><div><h3>Methods</h3><div>Following standard GRADE methods, we developed this guidance through iterative discussions and examples with experts from the GRADE NRSI project group in multiple dedicated meetings. We presented findings of the group discussions for feedback at GRADE Working Group meetings in September 2023 and May 2024.</div></div><div><h3>Results</h3><div>The resulting GRADE guidance outlines a structured approach: (1) assessing the certainty of evidence (CoE) after defining the number of decision thresholds and the target of the certainty rating; (2) evaluating congruency of effect estimates between RCTs and NRSIs; (3) identifying which GRADE domains are affected by certainty ratings to inform complementariness between RCTs and NRSIs and the overall CoE; and (4) deciding whether and how to use one or both types of studies.</div></div><div><h3>Conclusion</h3><div>This GRADE guidance offers a structured and practical approach for integrating or not integrating RCTs and NRSIs in evidence syntheses. By addressing the interplay between affected GRADE domains and assessing the congruency of effects, it helps GRADE users determine when and how NRSIs can meaningfully complement or replace RCT evidence to inform certainty ratings and decision-making.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112086"},"PeriodicalIF":5.2,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145598039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-22DOI: 10.1016/j.jclinepi.2025.112087
David A. Savitz , Neil Pearce , Kenneth J. Rothman
Hill's list of considerations for assessing causality, proposed 60 years ago, became a landmark in the interpretation of epidemiologic evidence. However, it has been and continues to be misused as a list of causal criteria to be scored and summed, despite causal inference being unattainable through the application of this or any other algorithm. Recognizing the distinction between statistical associations and causal effects was a key contribution of Hill. While he identified several clues for distinguishing between causal and noncausal associations, causal inference in epidemiology has become much more explicit and effective. Rather than relying on Hill's indirect hints of potential bias by considering strength of association or dose-response gradients, newer methods such as quantitative bias analysis directly assess confounding and other candidate biases that compete with causal explanations, leading to more informed inferences. Similarly, the interpretation of consistency depends on variation in methods across studies; triangulation may be used to search for informative inconsistencies, strengthening causal inference. Most importantly, a causal connection is not a categorical property bestowed upon an association based on Hill's considerations or any other checklist. Causal inference is an inherently indirect process, with the inference gradually crystallizing by withstanding challenges from competing theories in which other explanations, including random error or biases, are found not to account for the measured association.
{"title":"Hill's considerations are not causal criteria","authors":"David A. Savitz , Neil Pearce , Kenneth J. Rothman","doi":"10.1016/j.jclinepi.2025.112087","DOIUrl":"10.1016/j.jclinepi.2025.112087","url":null,"abstract":"<div><div>Hill's list of considerations for assessing causality, proposed 60 years ago, became a landmark in the interpretation of epidemiologic evidence. However, it has been and continues to be misused as a list of causal criteria to be scored and summed, despite causal inference being unattainable through the application of this or any other algorithm. Recognizing the distinction between statistical associations and causal effects was a key contribution of Hill. While he identified several clues for distinguishing between causal and noncausal associations, causal inference in epidemiology has become much more explicit and effective. Rather than relying on Hill's indirect hints of potential bias by considering strength of association or dose-response gradients, newer methods such as quantitative bias analysis directly assess confounding and other candidate biases that compete with causal explanations, leading to more informed inferences. Similarly, the interpretation of consistency depends on variation in methods across studies; triangulation may be used to search for informative inconsistencies, strengthening causal inference. Most importantly, a causal connection is not a categorical property bestowed upon an association based on Hill's considerations or any other checklist. Causal inference is an inherently indirect process, with the inference gradually crystallizing by withstanding challenges from competing theories in which other explanations, including random error or biases, are found not to account for the measured association.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112087"},"PeriodicalIF":5.2,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145597988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-21DOI: 10.1016/j.jclinepi.2025.112082
Lee X. Li , Ashley M. Hopkins , Richard Woodman , Ahmad Y. Abuhelwa , Yuan Gao , Natalie Parent , Andrew Rowland , Michael J. Sorich
Background and Objectives
Prognostic models can enhance clinician-patient communication and guide treatment decisions. Numerous machine learning (ML) algorithms are available and offer a novel approach to predicting survival in patients treated with immune checkpoint inhibitors. However, large-scale benchmarking of their performances—particularly in terms of calibration—has not been evaluated across multiple independent cohorts. This study aimed to develop, evaluate, and compare statistical and ML models regarding discrimination, calibration, and variable importance for predicting overall survival across seven clinical trial cohorts of advanced non–small cell lung cancer (NSCLC) undergoing immune checkpoint inhibitor treatment.
Methods
This study included atezolizumab-treated patients with advanced NSCLC from seven clinical trials. We compared two statistical models: Cox proportional-hazard (Coxph) and accelerated failure time models, and 6 ML models: CoxBoost, extreme gradient-boosting (XGBoost), gradient-boosting machines (GBMs), random survival forest, regularized Coxph models (least absolute shrinkage and selection operator [LASSO]), and support vector machines (SVMs). Models were evaluated on discrimination and calibration using a leave-one-study-out nested cross-validation (nCV) framework. Discrimination was assessed using Harrell's concordance index (Cindex), while calibration was assessed using integrated calibration index (ICI) and plot. Variable importance was assessed using Shapley Additive exPlanations (SHAP) values.
Results
In a cohort of 3203 patients, the two statistical models and 5 of the 6 ML models demonstrated comparable and moderate discrimination performances (aggregated Cindex: 0.69–0.70), while SVM exhibited poor discrimination (aggregated Cindex: 0.57). Regarding calibration, the models appeared largely comparable in aggregated plots, except for LASSO, although the XGBoost models demonstrated superior calibration numerically. Across the evaluation cohorts, individual performance measures varied and no single model consistently outperforming the others. Pretreatment neutrophil-to-lymphocyte ratios (NLRs) and Eastern Cooperative Oncology Group Performance Status (ECOGPS) were ranked among the top five most important predictors across all models.
Conclusion
There was no clear best-performing model for either discrimination or calibration, although XGBoost models showed possible superior calibration numerically. Performance of a given model varied across evaluation cohorts, highlighting the importance of model assessment using multiple independent datasets. All models identified pretreatment NLR and ECOGPS as the key prognostic factors.
背景与目的:预后模型可以加强临床与患者的沟通,指导治疗决策。许多机器学习(ML)算法可用,并提供了一种新的方法来预测接受免疫检查点抑制剂治疗的患者的生存。然而,对它们的性能进行大规模基准测试——特别是在校准方面——尚未在多个独立队列中进行评估。本研究旨在开发、评估和比较7个接受免疫检查点抑制剂治疗的晚期非小细胞肺癌(NSCLC)临床试验队列中关于区分、校准和可变重要性的统计和ML模型,以预测总生存期。患者和方法:本研究纳入了七项临床试验的atezolizumab治疗的晚期非小细胞肺癌患者。我们比较了两种统计模型:Cox比例风险(Coxph)和加速失效时间模型,以及6种ML模型:Cox boost、极端梯度增强(XGBoost)、梯度增强机(GBM)、随机生存森林、正则化Coxph模型(LASSO)和支持向量机(SVM)。使用留一项研究的嵌套交叉验证(nCV)框架对模型进行判别和校准评估。判别采用Harrell’s concordance index (Cindex)评价,校正采用integrated calibration index (ICI)和plot评价。变量重要性采用Shapley加性解释(SHAP)值进行评估。结果:在3203例患者队列中,两种统计模型和6种ML模型中的5种具有可比性和中等的判别性能(综合判别指数:0.69-0.70),而SVM的判别性能较差(综合判别指数:0.57)。在校准方面,除了LASSO之外,这些模型在汇总图中表现出很大的可比性,尽管XGBoost模型在数值上显示出更好的校准。在整个评估队列中,个人绩效指标各不相同,没有一个模型始终优于其他模型。治疗前中性粒细胞与淋巴细胞比率(NLR)和东部肿瘤合作组表现状态(ECOGPS)在所有模型中排名前五。结论:虽然XGBoost模型在数值上可能具有更好的校准效果,但在鉴别和校准方面没有明确的最佳模型。给定模型的性能在评估队列中有所不同,突出了使用多个独立数据集进行模型评估的重要性。所有模型均将治疗前NLR和ECOGPS确定为关键预后因素。
{"title":"Discrimination, calibration, and variable importance in statistical and machine learning models for predicting overall survival in advanced non–small cell lung cancer patients treated with immune checkpoint inhibitors","authors":"Lee X. Li , Ashley M. Hopkins , Richard Woodman , Ahmad Y. Abuhelwa , Yuan Gao , Natalie Parent , Andrew Rowland , Michael J. Sorich","doi":"10.1016/j.jclinepi.2025.112082","DOIUrl":"10.1016/j.jclinepi.2025.112082","url":null,"abstract":"<div><h3>Background and Objectives</h3><div>Prognostic models can enhance clinician-patient communication and guide treatment decisions. Numerous machine learning (ML) algorithms are available and offer a novel approach to predicting survival in patients treated with immune checkpoint inhibitors. However, large-scale benchmarking of their performances—particularly in terms of calibration—has not been evaluated across multiple independent cohorts. This study aimed to develop, evaluate, and compare statistical and ML models regarding discrimination, calibration, and variable importance for predicting overall survival across seven clinical trial cohorts of advanced non–small cell lung cancer (NSCLC) undergoing immune checkpoint inhibitor treatment.</div></div><div><h3>Methods</h3><div>This study included atezolizumab-treated patients with advanced NSCLC from seven clinical trials. We compared two statistical models: Cox proportional-hazard (Coxph) and accelerated failure time models, and 6 ML models: CoxBoost, extreme gradient-boosting (XGBoost), gradient-boosting machines (GBMs), random survival forest, regularized Coxph models (least absolute shrinkage and selection operator [LASSO]), and support vector machines (SVMs). Models were evaluated on discrimination and calibration using a leave-one-study-out nested cross-validation (nCV) framework. Discrimination was assessed using Harrell's concordance index (Cindex), while calibration was assessed using integrated calibration index (ICI) and plot. Variable importance was assessed using Shapley Additive exPlanations (SHAP) values.</div></div><div><h3>Results</h3><div>In a cohort of 3203 patients, the two statistical models and 5 of the 6 ML models demonstrated comparable and moderate discrimination performances (aggregated Cindex: 0.69–0.70), while SVM exhibited poor discrimination (aggregated Cindex: 0.57). Regarding calibration, the models appeared largely comparable in aggregated plots, except for LASSO, although the XGBoost models demonstrated superior calibration numerically. Across the evaluation cohorts, individual performance measures varied and no single model consistently outperforming the others. Pretreatment neutrophil-to-lymphocyte ratios (NLRs) and Eastern Cooperative Oncology Group Performance Status (ECOGPS) were ranked among the top five most important predictors across all models.</div></div><div><h3>Conclusion</h3><div>There was no clear best-performing model for either discrimination or calibration, although XGBoost models showed possible superior calibration numerically. Performance of a given model varied across evaluation cohorts, highlighting the importance of model assessment using multiple independent datasets. All models identified pretreatment NLR and ECOGPS as the key prognostic factors.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112082"},"PeriodicalIF":5.2,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145589810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-21DOI: 10.1016/j.jclinepi.2025.112071
Hillary Bonnet , Julian P.T. Higgins , Anna Chaimani , Theodoros Evrenoglou , Lina Ghosn , Carolina Graña , Elodie Perrodeau , Sally Yaacoub , Gabriel Rada , Hanna Bergman , Brian Buckley , Elise Cogo , Gemma Villanueva , Nicholas Henschke , Rouba Assi , Carolina Riveros , Rosie Cornish , Francesca Spiga , Silvia Minozzi , David Tovey , Isabelle Boutron
<div><h3>Background and Objectives</h3><div>Randomized controlled trials (RCTs) are more likely to be included in evidence syntheses of health interventions due to their methodological rigor. However, the integration of nonrandomized studies (NRSs) may be necessary, as was seen during the COVID-19 pandemic due to the emergence of variants of concern. We aimed to examine the body of evidence, randomized and nonrandomized, on COVID-19 vaccine effectiveness (VE) during the emergence of the Delta variant and to share lessons learned from including nonrandomized evidence alongside randomized evidence in the COVID-NMA living systematic review.</div></div><div><h3>Study Design and Setting</h3><div>The COVID-NMA initiative is an international, living systematic review and meta-analysis that continually synthesized evidence on COVID-19 interventions. For this study, we identified all RCTs and comparative NRSs reporting on VE against the Delta variant from December 2020 (its initial detection) through November 2021 (date of last COVID-NMA NRS search). We conducted two parallel systematic reviews: one focusing on RCTs and the other on NRSs to compare available evidence on VE against the Delta variant. We also compared the publication timelines of the included studies with the global prevalence of the Delta variant, and documented the specific methodological challenges and solutions when including NRSs in living systematic reviews.</div></div><div><h3>Results</h3><div>From December 2020 to November 2021, only one RCT reported vaccine efficacy against Delta in a subgroup of 6325 participants, while, during the same period, 52 NRSs including 68,010,961 participants reported VE against this variant. Nevertheless, including NRSs in our living systematic review posed several challenges. We faced difficulties in identifying eligible studies, encountered overlapping studies (ie, NRSs using the same database), and inconsistent definitions of Delta variant cases. Moreover, multiple analyses and metrics for the same outcome were reported without a pre-specified primary analysis in a registry or protocol. In addition, assessing the risk of bias required expertise, standardization, and training.</div></div><div><h3>Conclusion</h3><div>To remain responsive during public health emergencies, living systematic reviews should implement processes that enable the timely identification, evaluation, and integration of both randomized and nonrandomized evidence where appropriate.</div></div><div><h3>Plain Language Summary</h3><div>When new health treatments are tested, the best way to see how well they work is through randomized controlled trials (RCTs). These are carefully designed studies that help reduce bias. However, during the COVID-19 pandemic, scientists also had to rely on other types of studies called nonrandomized studies (NRS) based on real-world data because the virus was changing quickly and required urgent action. Our living systematic review examined how effective
{"title":"Including nonrandomized evidence in living systematic reviews: lessons learned from the COVID-NMA initiative","authors":"Hillary Bonnet , Julian P.T. Higgins , Anna Chaimani , Theodoros Evrenoglou , Lina Ghosn , Carolina Graña , Elodie Perrodeau , Sally Yaacoub , Gabriel Rada , Hanna Bergman , Brian Buckley , Elise Cogo , Gemma Villanueva , Nicholas Henschke , Rouba Assi , Carolina Riveros , Rosie Cornish , Francesca Spiga , Silvia Minozzi , David Tovey , Isabelle Boutron","doi":"10.1016/j.jclinepi.2025.112071","DOIUrl":"10.1016/j.jclinepi.2025.112071","url":null,"abstract":"<div><h3>Background and Objectives</h3><div>Randomized controlled trials (RCTs) are more likely to be included in evidence syntheses of health interventions due to their methodological rigor. However, the integration of nonrandomized studies (NRSs) may be necessary, as was seen during the COVID-19 pandemic due to the emergence of variants of concern. We aimed to examine the body of evidence, randomized and nonrandomized, on COVID-19 vaccine effectiveness (VE) during the emergence of the Delta variant and to share lessons learned from including nonrandomized evidence alongside randomized evidence in the COVID-NMA living systematic review.</div></div><div><h3>Study Design and Setting</h3><div>The COVID-NMA initiative is an international, living systematic review and meta-analysis that continually synthesized evidence on COVID-19 interventions. For this study, we identified all RCTs and comparative NRSs reporting on VE against the Delta variant from December 2020 (its initial detection) through November 2021 (date of last COVID-NMA NRS search). We conducted two parallel systematic reviews: one focusing on RCTs and the other on NRSs to compare available evidence on VE against the Delta variant. We also compared the publication timelines of the included studies with the global prevalence of the Delta variant, and documented the specific methodological challenges and solutions when including NRSs in living systematic reviews.</div></div><div><h3>Results</h3><div>From December 2020 to November 2021, only one RCT reported vaccine efficacy against Delta in a subgroup of 6325 participants, while, during the same period, 52 NRSs including 68,010,961 participants reported VE against this variant. Nevertheless, including NRSs in our living systematic review posed several challenges. We faced difficulties in identifying eligible studies, encountered overlapping studies (ie, NRSs using the same database), and inconsistent definitions of Delta variant cases. Moreover, multiple analyses and metrics for the same outcome were reported without a pre-specified primary analysis in a registry or protocol. In addition, assessing the risk of bias required expertise, standardization, and training.</div></div><div><h3>Conclusion</h3><div>To remain responsive during public health emergencies, living systematic reviews should implement processes that enable the timely identification, evaluation, and integration of both randomized and nonrandomized evidence where appropriate.</div></div><div><h3>Plain Language Summary</h3><div>When new health treatments are tested, the best way to see how well they work is through randomized controlled trials (RCTs). These are carefully designed studies that help reduce bias. However, during the COVID-19 pandemic, scientists also had to rely on other types of studies called nonrandomized studies (NRS) based on real-world data because the virus was changing quickly and required urgent action. Our living systematic review examined how effective","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112071"},"PeriodicalIF":5.2,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145589867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
<div><h3>Objectives</h3><div>To systematically review approaches for reporting and interpreting statistically nonsignificant findings with clinical relevance in evidence synthesis and to assess their methodological quality and the extent of their empirical validation.</div></div><div><h3>Study Design and Setting</h3><div>We searched Ovid MEDLINE ALL, Scopus, PsycInfo, Library of Guidance for Health Scientists, and MathSciNet for published studies in English from January 1, 2000, to January 30, 2025, for (1) best practices in guidance documents for evidence synthesis when interpreting clinically relevant nonsignificant findings, (2) statistical methods to support the interpretation, and (3) reporting practices. To identify relevant reporting guidelines, we also searched the Enhancing the QUAlity and Transparency Of health Research Network. The quality assessment applied the Mixed Methods Appraisal Tool, Appraisal tool for Cross-Sectional Studies, and checklists for expert opinion and systematic reviews from the Joanna Briggs Institute. At least two reviewers independently conducted all procedures, and a large language model facilitated data extraction and quality appraisal.</div></div><div><h3>Results</h3><div>Of the 5332 records, 37 were eligible for inclusion. Of these, 15 were editorials or opinion pieces, nine addressed methods, eight were cross-sectional or mixed-methods studies, four were journal guidance documents, and one was a systematic review. Twenty-seven records met the quality criteria of the appraisal tool relevant to their study design or publication type, while 10 records, comprising one systematic review, two editorials or opinion pieces, and seven cross-sectional studies, did not. Relevant methodological approaches to evidence synthesis included utilization of uncertainty intervals and their integration with various statistical measures (15 of 37, 41%), Bayes factors (six of 37, 16%), likelihood ratios (three of 37, 8%), effect conversion measures (two of 37, 5%), equivalence testing (two of 37, 5%), modified Fisher's test (one of 37, 3%), and reverse fragility index (one of 37, 3%). Reporting practices included problematic “null acceptance” language (14 of 37, 38%), with some records discouraging the inappropriate claim of no effect based on nonsignificant findings (nine of 37, 24%). None of the proposed methods were empirically tested with interest holders.</div></div><div><h3>Conclusion</h3><div>Although various approaches have been proposed to improve the presentation and interpretation of statistically nonsignificant findings, a widely accepted consensus has not emerged, as these approaches have yet to be systematically tested for their practicality and validity. This review provides a comprehensive review of available methodological approaches spanning both the frequentist and Bayesian statistical frameworks and identifies critical gaps in empirical validation of some approaches, namely the lack of thresholds to guide the
{"title":"Approaches for reporting and interpreting statistically nonsignificant findings in evidence syntheses: a systematic review","authors":"Amin Sharifan , Andreea Dobrescu , Curtis Harrod , Irma Klerings , Ariel Yuhan Ong , Etienne Ngeh , Yu-Tian Xiao , Gerald Gartlehner","doi":"10.1016/j.jclinepi.2025.112083","DOIUrl":"10.1016/j.jclinepi.2025.112083","url":null,"abstract":"<div><h3>Objectives</h3><div>To systematically review approaches for reporting and interpreting statistically nonsignificant findings with clinical relevance in evidence synthesis and to assess their methodological quality and the extent of their empirical validation.</div></div><div><h3>Study Design and Setting</h3><div>We searched Ovid MEDLINE ALL, Scopus, PsycInfo, Library of Guidance for Health Scientists, and MathSciNet for published studies in English from January 1, 2000, to January 30, 2025, for (1) best practices in guidance documents for evidence synthesis when interpreting clinically relevant nonsignificant findings, (2) statistical methods to support the interpretation, and (3) reporting practices. To identify relevant reporting guidelines, we also searched the Enhancing the QUAlity and Transparency Of health Research Network. The quality assessment applied the Mixed Methods Appraisal Tool, Appraisal tool for Cross-Sectional Studies, and checklists for expert opinion and systematic reviews from the Joanna Briggs Institute. At least two reviewers independently conducted all procedures, and a large language model facilitated data extraction and quality appraisal.</div></div><div><h3>Results</h3><div>Of the 5332 records, 37 were eligible for inclusion. Of these, 15 were editorials or opinion pieces, nine addressed methods, eight were cross-sectional or mixed-methods studies, four were journal guidance documents, and one was a systematic review. Twenty-seven records met the quality criteria of the appraisal tool relevant to their study design or publication type, while 10 records, comprising one systematic review, two editorials or opinion pieces, and seven cross-sectional studies, did not. Relevant methodological approaches to evidence synthesis included utilization of uncertainty intervals and their integration with various statistical measures (15 of 37, 41%), Bayes factors (six of 37, 16%), likelihood ratios (three of 37, 8%), effect conversion measures (two of 37, 5%), equivalence testing (two of 37, 5%), modified Fisher's test (one of 37, 3%), and reverse fragility index (one of 37, 3%). Reporting practices included problematic “null acceptance” language (14 of 37, 38%), with some records discouraging the inappropriate claim of no effect based on nonsignificant findings (nine of 37, 24%). None of the proposed methods were empirically tested with interest holders.</div></div><div><h3>Conclusion</h3><div>Although various approaches have been proposed to improve the presentation and interpretation of statistically nonsignificant findings, a widely accepted consensus has not emerged, as these approaches have yet to be systematically tested for their practicality and validity. This review provides a comprehensive review of available methodological approaches spanning both the frequentist and Bayesian statistical frameworks and identifies critical gaps in empirical validation of some approaches, namely the lack of thresholds to guide the ","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112083"},"PeriodicalIF":5.2,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145589791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
<div><h3>Objectives</h3><div>Systematic reviews (SRs) are pivotal to evidence-based medicine. Structured tools exist to guide their reporting and appraisal, such as Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and A Measurement Tool to Assess Systematic Reviews (AMSTAR). However, there are limited data on whether peer reviewers of SRs use such tools when assessing manuscripts. This study aimed to investigate the use of structured tools by peer reviewers when assessing SRs of interventions, identify which tools are used, and explore perceived needs for structured tools to support the peer-review process.</div></div><div><h3>Study Design and Setting</h3><div>In 2025, we conducted a cross-sectional study targeting individuals who peer-reviewed at least 1 SR of interventions in the past year. The online survey collected data on demographics, use, and familiarity with structured tools, as well as open-ended responses on potential needs.</div></div><div><h3>Results</h3><div>Two hundred seventeen peer reviewers took part in the study. PRISMA was the most familiar tool (99% familiar or very familiar) and most frequently used during peer review (53% always used). The use of other tools such as AMSTAR, Peer Review of Electronic Search Strategies (PRESS), A Risk of Bias Assessment Tool for Systematic Reviews (ROBIS), and JBI checklist was infrequent. Seventeen percent reported using other structured tools beyond those listed. Most participants indicated that journals rarely required use of structured tools, except PRISMA. A notable proportion (55%) expressed concerns about time constraints, and 25% noted the lack of a comprehensive tool. Nearly half (45%) expressed a need for a dedicated structured tool for SR peer review, with checklists in PDF or embedded formats preferred. Participants expressed both advantages and concerns related to such tools.</div></div><div><h3>Conclusion</h3><div>Most peer reviewers used PRISMA when assessing SRs, while other structured tools were seldom applied. Only a few journals provided or required such tools, revealing inconsistent editorial practices. Participants reported barriers, including time constraints and a lack of suitable instruments. These findings highlight the need for a practical, validated tool, built upon existing instruments and integrated into editorial workflows. Such a tool could make peer review of SRs more consistent and transparent.</div></div><div><h3>Plain Language Summary</h3><div>Systematic reviews (SRs) are a type of research that synthesizes results from primary studies. Several structured tools, such as PRISMA for reporting and AMSTAR 2 for methodological quality, exist to guide how SRs are written and appraised. When manuscripts that report SRs are submitted to scholarly journals, editors invite expert peer reviewers to assess these SRs. In this study, researchers aimed to analyze which tools peer reviewers actually use when evaluating SR manuscripts, their percep
{"title":"Use of structured tools by peer reviewers of systematic reviews: a cross-sectional study reveals high familiarity with Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) but limited use of other tools","authors":"Livia Puljak , Sara Pintur , Tanja Rombey , Craig Lockwood , Dawid Pieper","doi":"10.1016/j.jclinepi.2025.112084","DOIUrl":"10.1016/j.jclinepi.2025.112084","url":null,"abstract":"<div><h3>Objectives</h3><div>Systematic reviews (SRs) are pivotal to evidence-based medicine. Structured tools exist to guide their reporting and appraisal, such as Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and A Measurement Tool to Assess Systematic Reviews (AMSTAR). However, there are limited data on whether peer reviewers of SRs use such tools when assessing manuscripts. This study aimed to investigate the use of structured tools by peer reviewers when assessing SRs of interventions, identify which tools are used, and explore perceived needs for structured tools to support the peer-review process.</div></div><div><h3>Study Design and Setting</h3><div>In 2025, we conducted a cross-sectional study targeting individuals who peer-reviewed at least 1 SR of interventions in the past year. The online survey collected data on demographics, use, and familiarity with structured tools, as well as open-ended responses on potential needs.</div></div><div><h3>Results</h3><div>Two hundred seventeen peer reviewers took part in the study. PRISMA was the most familiar tool (99% familiar or very familiar) and most frequently used during peer review (53% always used). The use of other tools such as AMSTAR, Peer Review of Electronic Search Strategies (PRESS), A Risk of Bias Assessment Tool for Systematic Reviews (ROBIS), and JBI checklist was infrequent. Seventeen percent reported using other structured tools beyond those listed. Most participants indicated that journals rarely required use of structured tools, except PRISMA. A notable proportion (55%) expressed concerns about time constraints, and 25% noted the lack of a comprehensive tool. Nearly half (45%) expressed a need for a dedicated structured tool for SR peer review, with checklists in PDF or embedded formats preferred. Participants expressed both advantages and concerns related to such tools.</div></div><div><h3>Conclusion</h3><div>Most peer reviewers used PRISMA when assessing SRs, while other structured tools were seldom applied. Only a few journals provided or required such tools, revealing inconsistent editorial practices. Participants reported barriers, including time constraints and a lack of suitable instruments. These findings highlight the need for a practical, validated tool, built upon existing instruments and integrated into editorial workflows. Such a tool could make peer review of SRs more consistent and transparent.</div></div><div><h3>Plain Language Summary</h3><div>Systematic reviews (SRs) are a type of research that synthesizes results from primary studies. Several structured tools, such as PRISMA for reporting and AMSTAR 2 for methodological quality, exist to guide how SRs are written and appraised. When manuscripts that report SRs are submitted to scholarly journals, editors invite expert peer reviewers to assess these SRs. In this study, researchers aimed to analyze which tools peer reviewers actually use when evaluating SR manuscripts, their percep","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112084"},"PeriodicalIF":5.2,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145582736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-20DOI: 10.1016/j.jclinepi.2025.112056
K.M. Mondragon , C.S. Tan-Lim , R. Velasco Jr. , C.P. Cordero , H.M. Strebel , L. Palileo-Villanueva , J.V. Mantaring
<div><h3>Background</h3><div>Systematic reviews (SRs) with network meta-analyses (NMAs) are increasingly used to inform guidelines, health technology assessments (HTAs), and policy decisions. Their methodological complexity, as well as the difficulty in assessing the exchangeability assumption and the large amount of results, makes appraisal more challenging than for SRs with pairwise NMAs. Numerous SR- and NMA-specific appraisal tools exist, but they vary in scope, intended users, and methodological guidance, and few have been validated.</div></div><div><h3>Objectives</h3><div>To identify and describe appraisal instruments and interpretive guides for SRs and NMAs specifically, summarizing their characteristics, domain coverage, development methods, and measurement-property evaluations.</div></div><div><h3>Methods</h3><div>We conducted a methodological scoping review which included structured appraisal instruments or interpretive guides for SRs with or without NMA-specific domains, aimed at review authors, clinicians, guideline developers, or HTA assessors from published or gray literature in English. Searches (inception–August 2025) covered major databases, registries, organizational websites, and reference lists. Two reviewers independently screened records; data were extracted by one and checked by a second. We synthesized the findings narratively. First, we classified tools as either structured instruments or interpretive guides. Second, we grouped them according to their intended audience and scope. Third, we assessed available measurement-property data using relevant COnsensus-based Standards for the selection of health Measurement INstruments items.</div></div><div><h3>Results</h3><div>Thirty-four articles described 22 instruments (11 NMA-specific, nine systematic reviews with meta-analysis-specific, 2 encompassing both systematic reviews with meta-analysis and NMA). NMA tools added domains such as network geometry, transitivity, and coherence, but guidance on transitivity evaluation, publication bias, and ranking was either limited or ineffective. Reviewer-focused tools were structured with explicit response options, whereas clinician-oriented guides posed appraisal questions with explanations but no prescribed response. Nine instruments reported measurement-property data, with validity and reliability varying widely.</div></div><div><h3>Conclusion</h3><div>This first comprehensive map of systematic reviews with meta-analysis and NMA appraisal resources highlights the need for clearer operational criteria, structured decision rules, and integrated rater training to improve reliability and align foundational SR domains with NMA-specific content.</div></div><div><h3>Plain Language Summary</h3><div>NMA is a way to compare many treatments at once by combining results from multiple studies—even when some treatments have not been directly compared head-to-head. Because NMAs are complex, users need clear tools to judge whether an analysis is tru
{"title":"A scoping review of critical appraisal tools and user guides for systematic reviews with network meta-analysis: methodological gaps and directions for tool development","authors":"K.M. Mondragon , C.S. Tan-Lim , R. Velasco Jr. , C.P. Cordero , H.M. Strebel , L. Palileo-Villanueva , J.V. Mantaring","doi":"10.1016/j.jclinepi.2025.112056","DOIUrl":"10.1016/j.jclinepi.2025.112056","url":null,"abstract":"<div><h3>Background</h3><div>Systematic reviews (SRs) with network meta-analyses (NMAs) are increasingly used to inform guidelines, health technology assessments (HTAs), and policy decisions. Their methodological complexity, as well as the difficulty in assessing the exchangeability assumption and the large amount of results, makes appraisal more challenging than for SRs with pairwise NMAs. Numerous SR- and NMA-specific appraisal tools exist, but they vary in scope, intended users, and methodological guidance, and few have been validated.</div></div><div><h3>Objectives</h3><div>To identify and describe appraisal instruments and interpretive guides for SRs and NMAs specifically, summarizing their characteristics, domain coverage, development methods, and measurement-property evaluations.</div></div><div><h3>Methods</h3><div>We conducted a methodological scoping review which included structured appraisal instruments or interpretive guides for SRs with or without NMA-specific domains, aimed at review authors, clinicians, guideline developers, or HTA assessors from published or gray literature in English. Searches (inception–August 2025) covered major databases, registries, organizational websites, and reference lists. Two reviewers independently screened records; data were extracted by one and checked by a second. We synthesized the findings narratively. First, we classified tools as either structured instruments or interpretive guides. Second, we grouped them according to their intended audience and scope. Third, we assessed available measurement-property data using relevant COnsensus-based Standards for the selection of health Measurement INstruments items.</div></div><div><h3>Results</h3><div>Thirty-four articles described 22 instruments (11 NMA-specific, nine systematic reviews with meta-analysis-specific, 2 encompassing both systematic reviews with meta-analysis and NMA). NMA tools added domains such as network geometry, transitivity, and coherence, but guidance on transitivity evaluation, publication bias, and ranking was either limited or ineffective. Reviewer-focused tools were structured with explicit response options, whereas clinician-oriented guides posed appraisal questions with explanations but no prescribed response. Nine instruments reported measurement-property data, with validity and reliability varying widely.</div></div><div><h3>Conclusion</h3><div>This first comprehensive map of systematic reviews with meta-analysis and NMA appraisal resources highlights the need for clearer operational criteria, structured decision rules, and integrated rater training to improve reliability and align foundational SR domains with NMA-specific content.</div></div><div><h3>Plain Language Summary</h3><div>NMA is a way to compare many treatments at once by combining results from multiple studies—even when some treatments have not been directly compared head-to-head. Because NMAs are complex, users need clear tools to judge whether an analysis is tru","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112056"},"PeriodicalIF":5.2,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145582728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-20DOI: 10.1016/j.jclinepi.2025.112085
Joanne Khabsa , Vanessa Helou , Hussein A. Noureldine , Reem Hoteit , Aya Hassoun , Ali H. Dakroub , Lea Assaf , Ahmed Mohamed , Tala Chehaitly , Leana Ellaham , Elie A. Akl
Background and Objectives
Interest-holder engagement is increasingly recognized as essential to the relevance and uptake of practice guidelines. “Interest-holders” are groups with legitimate interests in the health issue under consideration. The interests' legitimacy arises from the fact that these groups are responsible for or affected by health-related decisions. The objective of this study was to describe interest-holder engagement approaches for practice guideline development as described in guidance documents by guideline-producing organizations.
Methods
We compiled a list of guideline-producing organizations and searched for their guidance documents on guideline development. We abstracted data on interest-holder engagement details for each subtopic in the Guidelines International Network (GIN)-McMaster Guideline Development Checklist (a total of 23 subtopics following the division of some original checklist topics).
Results
Of the 133 identified organizations, 129 (97%) describe in their guidance documents engaging at least 1 interest-holder group in at least 1 GIN-McMaster checklist subtopic. The subtopics with most engagement are “developing recommendations and determining their strength” (96%) and “peer review” (81%), while the subtopics with the least engagement are “establishing guideline group processes” (3%) and “training” (2%). The interest-holder groups with the highest engagement in at least one of the subtopics are providers (95%), principal investigators (78%) and patient representatives (64%), while interest-holder groups with lower engagement are program managers (3%), and peer-reviewed journal editors (1%). Across most subtopics, engagement occurs mostly through panel membership and decision-making level.
Conclusion
A high proportion of organizations engaged at least 1 interest-holder group in at least 1 subtopic of guideline development, with panel membership being the most common approach. However, this engagement was limited to a few interest-holder groups, and to a few subtopics with highest engagement.
{"title":"Guideline organizations’ guidance documents paper 4: interest-holder engagement","authors":"Joanne Khabsa , Vanessa Helou , Hussein A. Noureldine , Reem Hoteit , Aya Hassoun , Ali H. Dakroub , Lea Assaf , Ahmed Mohamed , Tala Chehaitly , Leana Ellaham , Elie A. Akl","doi":"10.1016/j.jclinepi.2025.112085","DOIUrl":"10.1016/j.jclinepi.2025.112085","url":null,"abstract":"<div><h3>Background and Objectives</h3><div>Interest-holder engagement is increasingly recognized as essential to the relevance and uptake of practice guidelines. “Interest-holders” are groups with legitimate interests in the health issue under consideration. The interests' legitimacy arises from the fact that these groups are responsible for or affected by health-related decisions. The objective of this study was to describe interest-holder engagement approaches for practice guideline development as described in guidance documents by guideline-producing organizations.</div></div><div><h3>Methods</h3><div>We compiled a list of guideline-producing organizations and searched for their guidance documents on guideline development. We abstracted data on interest-holder engagement details for each subtopic in the Guidelines International Network (GIN)-McMaster Guideline Development Checklist (a total of 23 subtopics following the division of some original checklist topics).</div></div><div><h3>Results</h3><div>Of the 133 identified organizations, 129 (97%) describe in their guidance documents engaging at least 1 interest-holder group in at least 1 GIN-McMaster checklist subtopic. The subtopics with most engagement are “developing recommendations and determining their strength” (96%) and “peer review” (81%), while the subtopics with the least engagement are “establishing guideline group processes” (3%) and “training” (2%). The interest-holder groups with the highest engagement in at least one of the subtopics are providers (95%), principal investigators (78%) and patient representatives (64%), while interest-holder groups with lower engagement are program managers (3%), and peer-reviewed journal editors (1%). Across most subtopics, engagement occurs mostly through panel membership and decision-making level.</div></div><div><h3>Conclusion</h3><div>A high proportion of organizations engaged at least 1 interest-holder group in at least 1 subtopic of guideline development, with panel membership being the most common approach. However, this engagement was limited to a few interest-holder groups, and to a few subtopics with highest engagement.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"189 ","pages":"Article 112085"},"PeriodicalIF":5.2,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145582742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-19DOI: 10.1016/j.jclinepi.2025.112063
Joanne Khabsa , Mariam Nour Eldine , Sally Yaacoub , Rayane El-Khoury , Noha El Yaman , Wojtek Wiercioch , Holger J. Schünemann , Elie A. Akl
Background and Objectives
Given the role of practice guidelines in impacting practice and health outcomes, it is important that their development follows rigorous methodology. We present a series of papers exploring various aspects of practice guideline development based on a descriptive summary of guidance documents from guideline-producing organizations. The overall aim is to describe the methods employed by these organizations in developing practice guidelines. This first paper of the series aims to (1) describe the methodology followed in the descriptive summary, including the identification process of a sample of guideline-producing organizations with publicly available guidance documents on guideline development; (2) characterize the included guideline-producing organizations and their guidance documents; and (3) assess the extent to which these organizations cover the topics of the GIN-McMaster Guideline Development Checklist in their guidance documents.
Methods
We conducted a descriptive summary of guideline-producing organizations' publicly available guidance documents on guideline development (eg, guideline handbooks). We exhaustively sampled a list of guideline-producing organizations from multiple sources and searched their websites and the peer-reviewed literature for publicly available guidance documents on their guideline development process. We abstracted data in duplicate and independently on both the organizations and the documents' general characteristics and on whether the organizations covered the topics of the GIN-McMaster Guideline Development Checklist in their guidance documents. We subdivided some of 18 main topics of the checklist to disaggregate key concepts. Based on a discussion between the lead authors, this resulted in 27 examined subtopics. We conducted descriptive statistical analyses.
Results
Our final sample consisted of 133 guideline-producing organizations. The majority were professional associations (59%), based in North America (51%), and from the clinical field (84%). Out of the 27 GIN-McMaster Guideline Development Checklist subtopics, the median number covered was 20 (interquartile range (IQR): 15–24). The subtopics most frequently covered were “consumer and stakeholder engagement” (97%), “conflict of interest considerations” (92%), and “guideline group membership” (92%). The subtopics least covered were “training” (40%) and “considering additional information” (42%).
Conclusion
The number of GIN-McMaster Guideline Development Checklist subtopics covered by a sample of guideline-producing organizations in their guidance documents is both variable and suboptimal.
{"title":"Guideline organizations' guidance documents paper 1: Introduction","authors":"Joanne Khabsa , Mariam Nour Eldine , Sally Yaacoub , Rayane El-Khoury , Noha El Yaman , Wojtek Wiercioch , Holger J. Schünemann , Elie A. Akl","doi":"10.1016/j.jclinepi.2025.112063","DOIUrl":"10.1016/j.jclinepi.2025.112063","url":null,"abstract":"<div><h3>Background and Objectives</h3><div>Given the role of practice guidelines in impacting practice and health outcomes, it is important that their development follows rigorous methodology. We present a series of papers exploring various aspects of practice guideline development based on a descriptive summary of guidance documents from guideline-producing organizations. The overall aim is to describe the methods employed by these organizations in developing practice guidelines. This first paper of the series aims to (1) describe the methodology followed in the descriptive summary, including the identification process of a sample of guideline-producing organizations with publicly available guidance documents on guideline development; (2) characterize the included guideline-producing organizations and their guidance documents; and (3) assess the extent to which these organizations cover the topics of the GIN-McMaster Guideline Development Checklist in their guidance documents.</div></div><div><h3>Methods</h3><div>We conducted a descriptive summary of guideline-producing organizations' publicly available guidance documents on guideline development (eg, guideline handbooks). We exhaustively sampled a list of guideline-producing organizations from multiple sources and searched their websites and the peer-reviewed literature for publicly available guidance documents on their guideline development process. We abstracted data in duplicate and independently on both the organizations and the documents' general characteristics and on whether the organizations covered the topics of the GIN-McMaster Guideline Development Checklist in their guidance documents. We subdivided some of 18 main topics of the checklist to disaggregate key concepts. Based on a discussion between the lead authors, this resulted in 27 examined subtopics. We conducted descriptive statistical analyses.</div></div><div><h3>Results</h3><div>Our final sample consisted of 133 guideline-producing organizations. The majority were professional associations (59%), based in North America (51%), and from the clinical field (84%). Out of the 27 GIN-McMaster Guideline Development Checklist subtopics, the median number covered was 20 (interquartile range (IQR): 15–24). The subtopics most frequently covered were “consumer and stakeholder engagement” (97%), “conflict of interest considerations” (92%), and “guideline group membership” (92%). The subtopics least covered were “training” (40%) and “considering additional information” (42%).</div></div><div><h3>Conclusion</h3><div>The number of GIN-McMaster Guideline Development Checklist subtopics covered by a sample of guideline-producing organizations in their guidance documents is both variable and suboptimal.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"189 ","pages":"Article 112063"},"PeriodicalIF":5.2,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145574865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}