首页 > 最新文献

Journal of Clinical Epidemiology最新文献

英文 中文
Comparison between risk of bias-1 and risk of bias-2 tool and impact on network meta-analysis results—A case study from a living Cochrane review on psoriasis rob1和rob2工具的比较及其对网络meta分析结果的影响——以牛皮癣Cochrane综述为例
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-04 DOI: 10.1016/j.jclinepi.2025.112097
R. Guelimi , C. Choudhary , C. Ollivier , Q. Beytout , Q. Samaran , A. Mubuangankusu , A. Chaimani , E. Sbidian , S. Afach , L. Le Cleach

Objectives

This study was conducted within a large Cochrane living systematic review (SR) on psoriasis treatments with the aim to evaluate the inter-rater agreement of the Cochrane risk of bias tool 2 (RoB-2) tool, to compare its RoB judgments with the original RoB-1, and to explore the impact of changes in RoB judgment between the two tools on the Cochrane network meta-analysis’ (NMA) results.

Study Design and Setting

This study was conducted within the 2025 update of a living Cochrane review on systemic treatments for psoriasis. Four pairs of assessors used RoB-2 to evaluate the RoB of 193 randomized controlled trials for two primary outcomes: Psoriasis Area Severity Index (PASI) 90 (reflecting clear or almost clear skin) and serious adverse events (SAEs). Inter-rater reliability (IRR) was calculated using Cohen's kappa. RoB-2 judgments for 147 trials (PASI 90) and 154 trials (SAEs) were compared to the previous RoB-1 assessments from the Cochrane 2023 update. The impact of using RoB-2 vs. RoB-1 judgments on the NMA's results was explored through sensitivity analyses, with calculation of ratio of risk ratios (RRRs) between the analyses for each treatment effect.

Results

For the RoB-2 overall judgment, the IRR was fair for PASI 90 (kappa = 0.37) and moderate for SAEs (kappa = 0.46). IRR varied between domains (from kappa = 0.33, to kappa = 0.65), with lower IRR found for domains 2, 3, and 5. Significant discrepancies were found between RoB-1 and RoB-2 judgments. Compared to RoB-1, RoB-2 rated a smaller proportion of results as low risk for both PASI 90 (36% vs 58%) and SAEs (13% vs 58%) and a higher proportion as high risk for SAEs (55% vs 29%). For PASI 90, 66/147 (45%) studies showed switches between different judgments, including 18 extreme switches either from low to high or from high to low RoB. For SAEs, 93/154 (60%) studies underwent switches between different judgments, with 32 extreme switches occurring exclusively from low to high RoB. Sensitivity analyses excluding high-risk trials showed moderate impact on the NMA efficacy results (median RRR = 0.92, interquartile range (IQR), 0.91–0.92), but wider changes for SAEs (median RRR = 1.07, IQR, 0.97–1.15).

Conclusion

The transition to RoB-2 in a large Cochrane SR revealed fair-to-moderate inter-rater agreement, underscoring the need for consensus among reviewers. The shift from RoB-1 to RoB-2 led to changes in risk-of-bias judgments in our review. Although the impact on the NMA results was pronounced for SAEs, the changes in results were limited for our efficacy outcome PASI 90.
目的:本研究是在一项大型Cochrane牛皮癣治疗的实时系统评价中进行的,目的是评估Cochrane rob2工具的评分一致性,比较其与原始rob1工具的偏倚判断风险,并探讨两种工具之间偏倚判断风险的变化对Cochrane网络meta分析(NMA)结果的影响。研究设计和设置:本研究是在2025年更新的关于银屑病全身治疗的Cochrane活综述中进行的。四对评估者使用rob2来评估193个随机对照试验的两个主要结局的偏倚风险:银屑病区域严重程度指数(PASI) 90(反映皮肤透明或几乎透明)和严重不良事件(SAE)。评估者间信度(IRR)采用Cohen's kappa计算。147项试验(PASI 90)和154项试验(SAE)的rob2判断与Cochrane 2023更新中的先前rob1评估进行了比较。通过敏感性分析探讨了使用rob2和rob1判断对NMA结果的影响,并计算了每种治疗效果分析之间的风险比(RRR)。结果:对于rob2总体判断,PASI 90的IRR为一般(kappa = 0.37), sae的IRR为中等(kappa = 0.46)。区域之间的IRR不同(从kappa = 0.33到kappa = 0.65),区域2、3和5的IRR较低。rob1和rob2的判断存在显著差异。与rob1相比,rob2对PASI 90(36%对58%)和SAEs(13%对58%)的低风险评分比例较小,对SAEs的高风险评分比例较高(55%对29%)。对于PASI 90, 66/147(45%)的研究显示了不同判断之间的切换,包括18个从低到高,或从高到低偏倚风险的极端切换。对于SAE而言,93/154(60%)的研究经历了不同判断之间的切换,其中32个极端切换完全发生从低到高的偏倚风险。排除高风险试验的敏感性分析显示,对NMA疗效结果的影响中等(中位RRR = 0.92, IQR 0.91-0.92),但对SAE的影响较大(中位RRR = 1.07, IQR 0.97-1.15)。结论:在Cochrane大型系统评价中,向rob2的过渡显示了评分者之间的公平到中等程度的一致,强调了评分者之间达成共识的必要性。在我们的综述中,从rob1到rob2的转变导致了偏倚风险判断的变化。虽然SAE对NMA结果的影响很明显,但我们的疗效结果PASI 90的结果变化有限。
{"title":"Comparison between risk of bias-1 and risk of bias-2 tool and impact on network meta-analysis results—A case study from a living Cochrane review on psoriasis","authors":"R. Guelimi ,&nbsp;C. Choudhary ,&nbsp;C. Ollivier ,&nbsp;Q. Beytout ,&nbsp;Q. Samaran ,&nbsp;A. Mubuangankusu ,&nbsp;A. Chaimani ,&nbsp;E. Sbidian ,&nbsp;S. Afach ,&nbsp;L. Le Cleach","doi":"10.1016/j.jclinepi.2025.112097","DOIUrl":"10.1016/j.jclinepi.2025.112097","url":null,"abstract":"<div><h3>Objectives</h3><div>This study was conducted within a large Cochrane living systematic review (SR) on psoriasis treatments with the aim to evaluate the inter-rater agreement of the Cochrane risk of bias tool 2 (RoB-2) tool, to compare its RoB judgments with the original RoB-1, and to explore the impact of changes in RoB judgment between the two tools on the Cochrane network meta-analysis’ (NMA) results.</div></div><div><h3>Study Design and Setting</h3><div>This study was conducted within the 2025 update of a living Cochrane review on systemic treatments for psoriasis. Four pairs of assessors used RoB-2 to evaluate the RoB of 193 randomized controlled trials for two primary outcomes: Psoriasis Area Severity Index (PASI) 90 (reflecting clear or almost clear skin) and serious adverse events (SAEs). Inter-rater reliability (IRR) was calculated using Cohen's kappa. RoB-2 judgments for 147 trials (PASI 90) and 154 trials (SAEs) were compared to the previous RoB-1 assessments from the Cochrane 2023 update. The impact of using RoB-2 vs. RoB-1 judgments on the NMA's results was explored through sensitivity analyses, with calculation of ratio of risk ratios (RRRs) between the analyses for each treatment effect.</div></div><div><h3>Results</h3><div>For the RoB-2 overall judgment, the IRR was fair for PASI 90 (kappa = 0.37) and moderate for SAEs (kappa = 0.46). IRR varied between domains (from kappa = 0.33, to kappa = 0.65), with lower IRR found for domains 2, 3, and 5. Significant discrepancies were found between RoB-1 and RoB-2 judgments. Compared to RoB-1, RoB-2 rated a smaller proportion of results as low risk for both PASI 90 (36% vs 58%) and SAEs (13% vs 58%) and a higher proportion as high risk for SAEs (55% vs 29%). For PASI 90, 66/147 (45%) studies showed switches between different judgments, including 18 extreme switches either from low to high or from high to low RoB. For SAEs, 93/154 (60%) studies underwent switches between different judgments, with 32 extreme switches occurring exclusively from low to high RoB. Sensitivity analyses excluding high-risk trials showed moderate impact on the NMA efficacy results (median RRR = 0.92, interquartile range (IQR), 0.91–0.92), but wider changes for SAEs (median RRR = 1.07, IQR, 0.97–1.15).</div></div><div><h3>Conclusion</h3><div>The transition to RoB-2 in a large Cochrane SR revealed fair-to-moderate inter-rater agreement, underscoring the need for consensus among reviewers. The shift from RoB-1 to RoB-2 led to changes in risk-of-bias judgments in our review. Although the impact on the NMA results was pronounced for SAEs, the changes in results were limited for our efficacy outcome PASI 90.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112097"},"PeriodicalIF":5.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145696458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Omission of main effects from regression models with a ratio variable as the focal exposure can result in bias and inflated type I error rates 在以比例变量作为焦点曝光的回归模型中遗漏主效应可能导致偏差和膨胀的I型错误率。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-03 DOI: 10.1016/j.jclinepi.2025.112092
Matthew J. Valente , Biwei Cao , Daniëlle D.B. Holthuijsen , Martijn J.L. Bours , Simone J.P.M. Eussen , Matty P. Weijenberg , Judith J.M. Rijnhart

Objectives

Ratio variables (eg, body mass index (BMI), cholesterol ratios, and metabolite ratios) are widely used as exposure variables in epidemiologic studies on cause-and-effect. While statisticians have emphasized the importance of including main effects of the variables that make up a ratio variable in regression models, main effects are still often omitted in practice. The objective of this study is to demonstrate the impact of omitting main effects from regression models with a ratio variable as the focal exposure on bias in the effect estimates and type I error rates.

Study Design and Setting

We demonstrated the impact of omitting main effects in three steps. First, we showed the connection between regression models with ratio variables and regression models with product terms, which are well-understood by epidemiologists. Second, we estimated models with and without main effects of a ratio variable using a real-life data example. Third, we performed a simulation study to demonstrate the impact of omitting main effects on bias and type I error rates.

Results

We showed the impact of omitting main effects in regression models with ratio terms. In the real-life example, the ratio term was only statistically significantly associated with the outcome when omitting main effects. The simulation study results indicated that the omission of main effects often leads to biased effect estimates and inflated type I error rates.

Conclusion

Regression models with a ratio term as an exposure variable need to include main effects to avoid bias in the effect estimates and inflated type I error rates.
目的:比值变量(如BMI、胆固醇比值、代谢物比值)在流行病学因果关系研究中被广泛用作暴露变量。虽然统计学家强调在回归模型中包括构成比率变量的变量的主效应的重要性,但在实践中仍然经常忽略主效应。本研究的目的是证明从以比率变量作为焦点暴露的回归模型中省略主效应对效应估计偏差和I型错误率的影响。研究设计和设置:我们分三步论证了忽略主效应的影响。首先,我们展示了带有比率变量的回归模型和带有产品项的回归模型之间的联系,这是流行病学家所熟知的。其次,我们使用实际数据示例估计了具有和不具有比率变量主效应的模型。第三,我们进行了模拟研究,以证明忽略主效应对偏差和I型错误率的影响。结果:我们发现忽略主效应的影响在带有比率项和乘积项的回归模型中是相同的。在现实生活的例子中,当忽略主效应时,比率项仅与结果具有统计显著性相关。仿真研究结果表明,主效应的遗漏往往会导致效应估计的偏差和I型错误率的膨胀。结论:以比率项作为暴露变量的回归模型需要包括主效应,以避免效应估计中的偏差和夸大的I型错误率。
{"title":"Omission of main effects from regression models with a ratio variable as the focal exposure can result in bias and inflated type I error rates","authors":"Matthew J. Valente ,&nbsp;Biwei Cao ,&nbsp;Daniëlle D.B. Holthuijsen ,&nbsp;Martijn J.L. Bours ,&nbsp;Simone J.P.M. Eussen ,&nbsp;Matty P. Weijenberg ,&nbsp;Judith J.M. Rijnhart","doi":"10.1016/j.jclinepi.2025.112092","DOIUrl":"10.1016/j.jclinepi.2025.112092","url":null,"abstract":"<div><h3>Objectives</h3><div>Ratio variables (eg, body mass index (BMI), cholesterol ratios, and metabolite ratios) are widely used as exposure variables in epidemiologic studies on cause-and-effect. While statisticians have emphasized the importance of including main effects of the variables that make up a ratio variable in regression models, main effects are still often omitted in practice. The objective of this study is to demonstrate the impact of omitting main effects from regression models with a ratio variable as the focal exposure on bias in the effect estimates and type I error rates.</div></div><div><h3>Study Design and Setting</h3><div>We demonstrated the impact of omitting main effects in three steps. First, we showed the connection between regression models with ratio variables and regression models with product terms, which are well-understood by epidemiologists. Second, we estimated models with and without main effects of a ratio variable using a real-life data example. Third, we performed a simulation study to demonstrate the impact of omitting main effects on bias and type I error rates.</div></div><div><h3>Results</h3><div>We showed the impact of omitting main effects in regression models with ratio terms. In the real-life example, the ratio term was only statistically significantly associated with the outcome when omitting main effects. The simulation study results indicated that the omission of main effects often leads to biased effect estimates and inflated type I error rates.</div></div><div><h3>Conclusion</h3><div>Regression models with a ratio term as an exposure variable need to include main effects to avoid bias in the effect estimates and inflated type I error rates.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112092"},"PeriodicalIF":5.2,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145688599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating and standardizing functioning outcomes in rheumatoid arthritis pharmacological trials: a scoping review informed by the International Classification of Functioning, Disability and Health (ICF) 整合和标准化类风湿关节炎药理试验的功能结果:使用国际功能、残疾和健康分类(ICF)的范围综述
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-03 DOI: 10.1016/j.jclinepi.2025.112093
Adrian Martinez-De la Torre , Polina Leshetkina , Ogie Ahanor , Roxanne Maritz

Background

To examine how functioning-related outcomes in Phase III pharmacological clinical trials for rheumatoid arthritis (RA) align with the International Classification of Functioning, Disability and Health (ICF) Brief Core Set, and to identify which domains of functioning are most frequently represented.

Study Design and Setting

RA is a chronic autoimmune disease and a major cause of disability worldwide. While Phase III randomized controlled trials (RCTs) remain the gold standard for evaluating pharmacological treatments, they often rely on clinical and laboratory endpoints and overlook how therapies affect patients’ functioning. The International Classification of Functioning, Disability and Health (ICF) provides a standardized, patient-centered framework to assess functioning across key domains. A scoping review was conducted in accordance with the JBI methodology for scoping reviews and reported following PRISMA-ScR guidelines. Literature was searched in MEDLINE, EMBASE, and ClinicalTrials.gov from 2010 to 2025. Phase III RCTs evaluating pharmacological interventions in adult patients with RA were included. Functioning-related outcomes were extracted and mapped to ICF categories using standardized linking rules.

Results

Of 852 records screened, 91 met the inclusion criteria. Functioning was frequently assessed through patient-reported outcomes and composite clinical measures. The most commonly linked ICF categories were related to pain and joint mobility within the body functions domain, walking and carrying out daily activities within the activities and participation domain, and joint structures of the shoulder, upper, and lower limbs within body structures. Despite the broad representation, none of the studies explicitly used the ICF framework.

Conclusion

Functioning is often assessed in RA phase III RCTs, but only implicitly and without reference to the ICF framework. Explicitly integrating the ICF could bring greater standardization, comparability, and patient-centeredness in outcome measurement in pharmacological trials, not only in RA but across chronic conditions.
背景:类风湿关节炎(RA)是一种慢性自身免疫性疾病,是世界范围内致残的主要原因。虽然III期随机对照试验(rct)仍然是评估药物治疗的金标准,但它们往往依赖于临床和实验室终点,而忽略了治疗如何影响患者的功能。国际功能、残疾和健康分类(ICF)提供了一个标准化的、以患者为中心的框架来评估关键领域的功能。这篇范围综述研究了风湿性关节炎的III期药理随机对照试验中与功能相关的结果如何与ICF简要核心集一致,以及哪些功能领域最常被代表。方法:根据JBI范围审查方法进行范围审查,并按照PRISMA-ScR指南进行报告。文献检索自2010年至2025年的MEDLINE、EMBASE和ClinicalTrials.gov。纳入评估成年RA患者药物干预的III期随机对照试验。使用标准化链接规则提取与功能相关的结果并将其映射到ICF类别。结果:经筛选的852条记录中,91条符合纳入标准。功能通常通过患者报告的结果和综合临床措施进行评估。最常见的ICF类别与身体功能领域的疼痛和关节活动有关,与活动和参与领域的行走和日常活动有关,与身体结构中肩部、上肢和下肢的关节结构有关。尽管具有广泛的代表性,但没有一项研究明确使用了ICF框架。结论:功能通常在RA III期随机对照试验中进行评估,但仅隐含且不参考ICF框架。明确整合ICF可以在药理学试验的结果测量中带来更大的标准化、可比性和以患者为中心,不仅适用于RA,也适用于所有慢性疾病。
{"title":"Integrating and standardizing functioning outcomes in rheumatoid arthritis pharmacological trials: a scoping review informed by the International Classification of Functioning, Disability and Health (ICF)","authors":"Adrian Martinez-De la Torre ,&nbsp;Polina Leshetkina ,&nbsp;Ogie Ahanor ,&nbsp;Roxanne Maritz","doi":"10.1016/j.jclinepi.2025.112093","DOIUrl":"10.1016/j.jclinepi.2025.112093","url":null,"abstract":"<div><h3>Background</h3><div>To examine how functioning-related outcomes in Phase III pharmacological clinical trials for rheumatoid arthritis (RA) align with the International Classification of Functioning, Disability and Health (ICF) Brief Core Set, and to identify which domains of functioning are most frequently represented.</div></div><div><h3>Study Design and Setting</h3><div>RA is a chronic autoimmune disease and a major cause of disability worldwide. While Phase III randomized controlled trials (RCTs) remain the gold standard for evaluating pharmacological treatments, they often rely on clinical and laboratory endpoints and overlook how therapies affect patients’ functioning. The International Classification of Functioning, Disability and Health (ICF) provides a standardized, patient-centered framework to assess functioning across key domains. A scoping review was conducted in accordance with the JBI methodology for scoping reviews and reported following PRISMA-ScR guidelines. Literature was searched in MEDLINE, EMBASE, and ClinicalTrials.gov from 2010 to 2025. Phase III RCTs evaluating pharmacological interventions in adult patients with RA were included. Functioning-related outcomes were extracted and mapped to ICF categories using standardized linking rules.</div></div><div><h3>Results</h3><div>Of 852 records screened, 91 met the inclusion criteria. Functioning was frequently assessed through patient-reported outcomes and composite clinical measures. The most commonly linked ICF categories were related to pain and joint mobility within the body functions domain, walking and carrying out daily activities within the activities and participation domain, and joint structures of the shoulder, upper, and lower limbs within body structures. Despite the broad representation, none of the studies explicitly used the ICF framework.</div></div><div><h3>Conclusion</h3><div>Functioning is often assessed in RA phase III RCTs, but only implicitly and without reference to the ICF framework. Explicitly integrating the ICF could bring greater standardization, comparability, and patient-centeredness in outcome measurement in pharmacological trials, not only in RA but across chronic conditions.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112093"},"PeriodicalIF":5.2,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145688644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editors’ Choice December 2025 编辑选择2025年12月
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-01 DOI: 10.1016/j.jclinepi.2025.112072
David Tovey, Andrea C. Tricco
{"title":"Editors’ Choice December 2025","authors":"David Tovey,&nbsp;Andrea C. Tricco","doi":"10.1016/j.jclinepi.2025.112072","DOIUrl":"10.1016/j.jclinepi.2025.112072","url":null,"abstract":"","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"188 ","pages":"Article 112072"},"PeriodicalIF":5.2,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145693692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The many roles of decision thresholds for primary research, evidence synthesis, and health decision-making 评注:决策阈值在初级研究、证据综合和卫生决策中的许多作用。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-11-26 DOI: 10.1016/j.jclinepi.2025.112090
Holger J. Schünemann , Bernardo Sousa-Pinto , Samuel G. Schumacher , Jessie McGowan , David Tovey , Gian Paolo Morgano , Wojtek Wiercioch , Stephanie Chang , Ignacio Neumann
<div><h3>Background</h3><div>A decision threshold (DT) reflects the point at which a decision or judgment changes, leading to the selection of an action or a commitment for one of several alternatives. Thresholds have always played a role in decision-making. Very small effects may achieve statistical significance yet remain not important to patients or the public. Judgments shift, for instance, from “no or trivial effect” to “small, moderate, and large benefit” with direct implications for decision-making. However, in guideline panels and other clinical or policy decisions, these thresholds are often applied subconsciously when interpreting effect estimates from studies and likely to vary across panel members.</div></div><div><h3>Study Design and Setting</h3><div>In this commentary, inspired by the concepts leading to recent publications by the Grading of Recommendations Assessment, Development and Evaluation (GRADE) working group and its members, we argue that the use of DTs has many advantages.</div></div><div><h3>Results</h3><div>DTs are the basis for an interpretation of results that is not centered on “statistical significance.” In addition, DTs are useful for other aspects of evidence synthesis. The certainty of evidence ratings using the GRADE approach (<span><span>https://book.gradepro.org/</span><svg><path></path></svg></span>) are centered on DTs, including the determination of the target of the certainty rating, with advantages for transparency, objectivity, and simplicity. For example, judging imprecision is informed by DTs. Specifically, the number of DTs crossed by the plausible effect sizes, as indicated by its confidence interval, helps determine the degree of uncertainty assigned in a GRADE assessment of imprecision, including the number of levels of certainty a user rates down. DTs have also altered the way how users can transparently integrate bodies of evidence from both nonrandomized and randomized studies. Once determined, DTs can be used to validate automated judgments about the certainty of evidence. Beyond these developments, DTs can be useful for designing primary research. For example, sample size calculations could use standardized DTs for large effects when there are known harms that the intended benefits need to outweigh.</div></div><div><h3>Conclusions</h3><div>DTs have many roles in interpretation, certainty assessments and research planning and design.</div></div><div><h3>Plain Language Summary</h3><div>Decision thresholds are the points where a decision changes—for example, when evidence shifts our judgment from “moderate benefit” to “large benefit.” Unlike statistical significance, decision thresholds focus on what matters to citizens and decision-makers. In health guidelines, these thresholds often influence judgments unconsciously, but making them explicit improves transparency and consistency. The GRADE approach uses decision thresholds to judge how certain we are about evidence, helping to make these judgmen
决策阈值(DT)反映了决策或判断发生变化的点,导致在几个备选方案中选择一个行动或承诺。阈值一直在决策中发挥作用。非常小的影响可能达到统计学意义,但对患者或公众来说仍然不重要。例如,判断从“没有或微不足道的影响”转变为“小、中等和大的利益”,这对决策有直接的影响。然而,在指南小组和其他临床或政策决策中,这些阈值通常在解释研究结果估计时下意识地应用,并且可能因小组成员而异。dt是不以“统计显著性”为中心的结果解释的基础。在这篇评注中,受到GRADE工作组及其成员最近发表的一些概念的启发,我们认为使用DTs还有许多其他优点。此外,除了对效应估计的解释外,dt在证据综合的其他方面也很有用。使用GRADE方法(https://book.gradepro.org/)的证据评级的确定性以dt为中心,包括确定确定性评级的目标,具有透明度,客观性和简单性的优势。例如,判断不精确是由dt提供的。具体地说,似是而非的效应大小所跨越的dt的数量——如其置信区间所示——有助于确定在GRADE不精确评估中分配的不确定性程度,包括用户降低的确定性水平的数量。DTs还改变了用户透明地整合来自非随机和随机研究的证据的方式。一旦确定,dt也可以用来验证关于证据确定性的自动判断。除了这些发展之外,DTs还可以用于设计初级研究。例如,当已知的危害需要超过预期的益处时,样本大小计算可以使用标准化的dt来计算大的影响。
{"title":"The many roles of decision thresholds for primary research, evidence synthesis, and health decision-making","authors":"Holger J. Schünemann ,&nbsp;Bernardo Sousa-Pinto ,&nbsp;Samuel G. Schumacher ,&nbsp;Jessie McGowan ,&nbsp;David Tovey ,&nbsp;Gian Paolo Morgano ,&nbsp;Wojtek Wiercioch ,&nbsp;Stephanie Chang ,&nbsp;Ignacio Neumann","doi":"10.1016/j.jclinepi.2025.112090","DOIUrl":"10.1016/j.jclinepi.2025.112090","url":null,"abstract":"&lt;div&gt;&lt;h3&gt;Background&lt;/h3&gt;&lt;div&gt;A decision threshold (DT) reflects the point at which a decision or judgment changes, leading to the selection of an action or a commitment for one of several alternatives. Thresholds have always played a role in decision-making. Very small effects may achieve statistical significance yet remain not important to patients or the public. Judgments shift, for instance, from “no or trivial effect” to “small, moderate, and large benefit” with direct implications for decision-making. However, in guideline panels and other clinical or policy decisions, these thresholds are often applied subconsciously when interpreting effect estimates from studies and likely to vary across panel members.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Study Design and Setting&lt;/h3&gt;&lt;div&gt;In this commentary, inspired by the concepts leading to recent publications by the Grading of Recommendations Assessment, Development and Evaluation (GRADE) working group and its members, we argue that the use of DTs has many advantages.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Results&lt;/h3&gt;&lt;div&gt;DTs are the basis for an interpretation of results that is not centered on “statistical significance.” In addition, DTs are useful for other aspects of evidence synthesis. The certainty of evidence ratings using the GRADE approach (&lt;span&gt;&lt;span&gt;https://book.gradepro.org/&lt;/span&gt;&lt;svg&gt;&lt;path&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/span&gt;) are centered on DTs, including the determination of the target of the certainty rating, with advantages for transparency, objectivity, and simplicity. For example, judging imprecision is informed by DTs. Specifically, the number of DTs crossed by the plausible effect sizes, as indicated by its confidence interval, helps determine the degree of uncertainty assigned in a GRADE assessment of imprecision, including the number of levels of certainty a user rates down. DTs have also altered the way how users can transparently integrate bodies of evidence from both nonrandomized and randomized studies. Once determined, DTs can be used to validate automated judgments about the certainty of evidence. Beyond these developments, DTs can be useful for designing primary research. For example, sample size calculations could use standardized DTs for large effects when there are known harms that the intended benefits need to outweigh.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Conclusions&lt;/h3&gt;&lt;div&gt;DTs have many roles in interpretation, certainty assessments and research planning and design.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Plain Language Summary&lt;/h3&gt;&lt;div&gt;Decision thresholds are the points where a decision changes—for example, when evidence shifts our judgment from “moderate benefit” to “large benefit.” Unlike statistical significance, decision thresholds focus on what matters to citizens and decision-makers. In health guidelines, these thresholds often influence judgments unconsciously, but making them explicit improves transparency and consistency. The GRADE approach uses decision thresholds to judge how certain we are about evidence, helping to make these judgmen","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112090"},"PeriodicalIF":5.2,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145642547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Methodological guidance for individual participant data meta-analyses: a systematic review 个体参与者数据荟萃分析的方法学指导:一项系统综述。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-11-25 DOI: 10.1016/j.jclinepi.2025.112089
Edith Ginika Otalike , Mike Clarke , Farjana Akhter , Areti Angeliki Veroniki , Ngianga-Bakwin Kandala , Joel J. Gagnier
<div><h3>Objectives</h3><div>To systematically identify and synthesize methodological guidance for conducting individual participant data meta-analyses (IPD-MAs) of randomized trials and observational studies, to inform the development of a critical appraisal tool for reports of IPD-MAs.</div></div><div><h3>Study Design and Setting</h3><div>We searched nine major electronic databases and gray literature sources through June 2025 using a strategy developed with a health sciences librarian. To be eligible, articles had to report empirical, simulation-based, consensus-based, or narrative research and offer guidance on the methodology of IPD-MA. Study selection and data extraction were performed independently by two reviewers. Quality was assessed using tools tailored to study design (eg, Aims, Data generating mechanism, Estimands, Methods, and Performance measures, Risk of Bias in Systematic Reviews, Appraisal of Guidelines for Research & Evaluation using Delphi, Scale for the Assessment of Narrative Review Articles). Extracted guidance was categorized thematically and mapped to appraisal domains.</div></div><div><h3>Results</h3><div>After screening 14,736 records, we included 141 studies. These encompassed simulation (38%), empirical (21%), and methodological guidance (12%), among others. Key themes included IPD-MA planning, data access and harmonization, analytical strategies, and other statistical issues, as well as reporting. While there was robust guidance for IPD-MA of randomized trials, recommendations for observational studies are sparse. Across all study types, 63% were rated high quality.</div></div><div><h3>Conclusion</h3><div>This review synthesizes previously fragmented guidance into an integrative synthesis, highlighting best practices and critical domains for evaluating IPD-MAs. These findings formed the evidence base for a Delphi consensus process to develop a dedicated IPD-MA critical appraisal tool.</div></div><div><h3>Plain Language Summary</h3><div>Meta-analyses often pool published summaries from many studies. That approach can miss important details and introduce bias. An IPD-MA instead reanalyses the original, participant-level data across studies. IPD-MAs are powerful but complex, and practical guidance is scattered, especially for observational studies. We wanted to bring these recommendations together in one place and identify candidate items for a tool to assess the quality of a completed IPD-MA. We systematically searched eight databases from their inception to 2025 to identify papers offering practical guidance on conducting IPD-MAs for health interventions. We organized guidance across the full project life cycle, from planning, finding and accessing data, to preparing and checking data, analyzing results, and reporting. We highlighted where experts broadly agree and where gaps remain. We found 141 relevant papers published between 1995 and 2025. Among these, we identified 25 key topic areas and several smaller subt
目的:系统地确定和综合进行随机试验和观察性研究的IPD-MAs的方法学指导,为IPD-MAs报告的关键评估工具的开发提供信息。研究设计和设置:我们使用与健康科学图书管理员开发的策略,检索了截至2025年6月的9个主要电子数据库和灰色文献来源。为了符合资格,文章必须报告经验性的、基于模拟的、基于共识的或叙述性的研究,并提供关于IPD-MA方法的指导。研究选择和数据提取由两位评论者独立完成。使用适合研究设计的工具(如ADEMP、ROBIS、ACCORD、SANRA)评估质量。提取的指导按主题分类,并映射到评估领域。结果:在筛选14736份记录后,我们纳入了141项研究。这些包括模拟(38%)、经验(21%)和方法指导(12%)等。关键主题包括IPD-MA规划、数据获取和统一、分析战略和其他统计问题以及报告。虽然对随机试验的IPD-MA有强有力的指导,但对观察性研究的建议很少。在所有研究类型中,63%被评为高质量。结论:本综述将先前零散的指南综合为一个综合的综合,突出了评估IPD-MAs的最佳实践和关键领域。这些发现形成了德尔菲共识过程的证据基础,以开发专用的IPD-MA关键评估工具。简洁的语言总结:荟萃分析经常汇集来自许多研究的已发表的总结。这种方法可能会遗漏重要细节,并引入偏见。个体参与者数据荟萃分析(IPD-MA)重新分析了原始的、参与者水平的跨研究数据。IPD-MAs功能强大但复杂,实用的指导是分散的,特别是对于观察性研究。我们希望将这些建议集中在一个地方,并确定用于评估已完成IPD-MA质量的工具的候选项目。我们系统地检索了8个数据库,从建立之初到2025年,以确定为卫生干预实施IPD-MAs提供实用指导的论文。我们组织了贯穿整个项目生命周期的指导,从计划、查找和访问数据,到准备和检查数据、分析结果和报告。我们强调了专家们普遍同意的领域和仍然存在差距的领域。我们找到了1995年至2025年间发表的141篇相关论文。其中,我们确定了25个关键主题领域和几个较小的子主题。许多论文涵盖了多个主题,因此我们允许它们出现在多个类别中,而不是强制进行单一分类。根据这一映射,我们围绕四个主题提出了一套明确的建议:IPDMA规划;识别和获取研究和个人参与者数据;meta分析方法;还有其他一些特殊的考虑,比如观察性研究的方法。我们还发现了观察性IPD-MAs的显著差距。我们的综合为团队计划或审查ipd - ma提供了一个实用的清单。它还为基于共识的评估工具奠定了基础,以帮助编辑、资助者和指南开发者判断质量并改进实践。
{"title":"Methodological guidance for individual participant data meta-analyses: a systematic review","authors":"Edith Ginika Otalike ,&nbsp;Mike Clarke ,&nbsp;Farjana Akhter ,&nbsp;Areti Angeliki Veroniki ,&nbsp;Ngianga-Bakwin Kandala ,&nbsp;Joel J. Gagnier","doi":"10.1016/j.jclinepi.2025.112089","DOIUrl":"10.1016/j.jclinepi.2025.112089","url":null,"abstract":"&lt;div&gt;&lt;h3&gt;Objectives&lt;/h3&gt;&lt;div&gt;To systematically identify and synthesize methodological guidance for conducting individual participant data meta-analyses (IPD-MAs) of randomized trials and observational studies, to inform the development of a critical appraisal tool for reports of IPD-MAs.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Study Design and Setting&lt;/h3&gt;&lt;div&gt;We searched nine major electronic databases and gray literature sources through June 2025 using a strategy developed with a health sciences librarian. To be eligible, articles had to report empirical, simulation-based, consensus-based, or narrative research and offer guidance on the methodology of IPD-MA. Study selection and data extraction were performed independently by two reviewers. Quality was assessed using tools tailored to study design (eg, Aims, Data generating mechanism, Estimands, Methods, and Performance measures, Risk of Bias in Systematic Reviews, Appraisal of Guidelines for Research &amp; Evaluation using Delphi, Scale for the Assessment of Narrative Review Articles). Extracted guidance was categorized thematically and mapped to appraisal domains.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Results&lt;/h3&gt;&lt;div&gt;After screening 14,736 records, we included 141 studies. These encompassed simulation (38%), empirical (21%), and methodological guidance (12%), among others. Key themes included IPD-MA planning, data access and harmonization, analytical strategies, and other statistical issues, as well as reporting. While there was robust guidance for IPD-MA of randomized trials, recommendations for observational studies are sparse. Across all study types, 63% were rated high quality.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Conclusion&lt;/h3&gt;&lt;div&gt;This review synthesizes previously fragmented guidance into an integrative synthesis, highlighting best practices and critical domains for evaluating IPD-MAs. These findings formed the evidence base for a Delphi consensus process to develop a dedicated IPD-MA critical appraisal tool.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Plain Language Summary&lt;/h3&gt;&lt;div&gt;Meta-analyses often pool published summaries from many studies. That approach can miss important details and introduce bias. An IPD-MA instead reanalyses the original, participant-level data across studies. IPD-MAs are powerful but complex, and practical guidance is scattered, especially for observational studies. We wanted to bring these recommendations together in one place and identify candidate items for a tool to assess the quality of a completed IPD-MA. We systematically searched eight databases from their inception to 2025 to identify papers offering practical guidance on conducting IPD-MAs for health interventions. We organized guidance across the full project life cycle, from planning, finding and accessing data, to preparing and checking data, analyzing results, and reporting. We highlighted where experts broadly agree and where gaps remain. We found 141 relevant papers published between 1995 and 2025. Among these, we identified 25 key topic areas and several smaller subt","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112089"},"PeriodicalIF":5.2,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145642598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Risk of Bias in Vaccine Effectiveness (RoB-VE) project: introduction to a methodological initiative to improve risk-of-bias assessment and reporting in vaccine effectiveness research RoB-VE项目:介绍一项改进疫苗有效性研究中偏倚风险评估和报告的方法学倡议。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-11-24 DOI: 10.1016/j.jclinepi.2025.112088
Cassandra Laurie , Pablo Alonso Coello , Ivan D. Florez , Maxime Lê , David Moher , Manish Sadarangani , Maria E. Sundaram , George Wells , Krista Wilkinson , Kerry Dwan , Scott A. Halperin , Stuart G. Nicholls , Barnaby C. Reeves , Hugh Sharma Waddington , Beverley Shea , Melissa Brouwers , Giorgia Sulis
<div><h3>Background and Objective</h3><div>Vaccine effectiveness (VE) studies are essential for informing immunization policy and public health decision-making. However, the observational nature of most VE studies introduces unique methodological challenges, including biases that are not adequately addressed by existing risk-of-bias (RoB) tools. The Risk of Bias in Vaccine Effectiveness (RoB-VE) project is an international, multiphase methodological research initiative aimed at improving the quality, transparency, interpretability, and reporting of VE research.</div></div><div><h3>Discussion</h3><div>Funded by the Canadian Institutes of Health Research and supported by many global partners, the project seeks to generate a comprehensive toolkit for VE studies. This includes an RoB assessment resource tailored to VE study designs and a complementary reporting guideline to enhance consistency in VE study reporting. The project follows an evidence-informed approach, beginning with a review of the literature to inform tool development, and progressing through interest holder engagement, modified Delphi consensus, usability testing, and beta validation. This introductory paper outlines the rationale, scope, and methodology of the RoB-VE project. These efforts aim to strengthen the methodological foundation of VE research and support more reliable evidence synthesis and policy development.</div></div><div><h3>Plain Language Summary</h3><div>VE studies measure how well vaccines work in real-world scenarios. These studies are essential for shaping vaccination recommendations. To assess the validity of VE studies, it is necessary to carry out an RoB assessment, which involves looking at different aspects of the study (eg, data collection methods, how participants are recruited, etc.) that have the potential to yield misleading results. Existing RoB assessment tools do not fully capture issues particularly relevant to VE studies and inconsistent reporting limits their usefulness. To address this, we are conducting the RoB-VE project. This project aims to improve the quality, transparency, interpretability, and reporting of VE research through the development, validation, and dissemination of a robust and user-friendly RoB assessment tool, specifically tailored for assessing VE studies. Our methodology involves a comprehensive multistep process based on established approaches. A broad range of international participants with diverse expertise and profiles will be engaged along the way to refine and finalize the tool. After pilot testing the beta version of the tool and making further refinements, we aim to deliver version 1 of the tool, which will undergo a large-scale application phase to assess its reliability and usefulness. Additionally, we will develop a reporting guideline to enhance the completeness of reporting of VE studies. This introductory paper outlines the rationale, scope, and methodology of the RoB-VE project. This project will elevate the st
背景:疫苗有效性(VE)研究对免疫政策和公共卫生决策至关重要。然而,大多数VE研究的观察性质带来了独特的方法学挑战,包括现有风险偏倚(RoB)工具无法充分解决的偏倚。目的:疫苗有效性偏倚风险(robve)项目是一项国际多阶段方法学研究倡议,旨在提高疫苗有效性研究的质量、透明度、可解释性和报告。讨论:该项目由加拿大卫生研究所提供资金,并得到许多全球伙伴的支持,旨在为VE研究编制一个全面的工具包。这包括为VE研究设计量身定制的RoB评估资源和补充报告指南,以增强VE研究报告的一致性。该项目遵循循证方法,从文献回顾开始,为工具开发提供信息,并通过利益相关者参与、修改Delphi共识、可用性测试和beta验证进行进展。这篇介绍性的论文概述了罗布- ve项目的基本原理、范围和方法。这些努力的目的是加强VE研究的方法基础,支持更可靠的证据合成和政策制定。简明扼要:疫苗有效性(VE)研究衡量疫苗在现实世界中的效果。这些研究对于制定疫苗接种建议至关重要。为了评估VE研究的有效性,有必要进行偏倚风险(RoB)评估,这涉及到研究的不同方面(例如,数据收集方法,参与者如何招募等),这些方面有可能产生误导性的结果。现有的RoB评估工具不能完全捕获与VE研究特别相关的问题,并且不一致的报告限制了它们的有用性。为了解决这个问题,我们正在开展疫苗有效性偏倚风险(robve)项目。该项目旨在通过开发、验证和传播专门为评估VE研究量身定制的强大且用户友好的偏倚风险评估工具,提高VE研究的质量、透明度、可解释性和报告性。我们的方法包括基于既定方法的综合多步骤过程。在此过程中,具有不同专业知识和背景的广泛国际参与者将参与完善和最终确定该工具。在测试了该工具的Beta版本并进行了进一步的改进之后,我们的目标是发布该工具的版本1,它将经历一个大规模的应用阶段,以评估其可靠性和有用性。此外,我们将制定报告指引,以提高报告VE研究的完整性。这篇介绍性的论文概述了罗布- ve项目的基本原理、范围和方法。该项目将提高证据综合的标准,最终有助于在VE评价这一关键领域开展更可靠、透明和有影响力的研究。
{"title":"The Risk of Bias in Vaccine Effectiveness (RoB-VE) project: introduction to a methodological initiative to improve risk-of-bias assessment and reporting in vaccine effectiveness research","authors":"Cassandra Laurie ,&nbsp;Pablo Alonso Coello ,&nbsp;Ivan D. Florez ,&nbsp;Maxime Lê ,&nbsp;David Moher ,&nbsp;Manish Sadarangani ,&nbsp;Maria E. Sundaram ,&nbsp;George Wells ,&nbsp;Krista Wilkinson ,&nbsp;Kerry Dwan ,&nbsp;Scott A. Halperin ,&nbsp;Stuart G. Nicholls ,&nbsp;Barnaby C. Reeves ,&nbsp;Hugh Sharma Waddington ,&nbsp;Beverley Shea ,&nbsp;Melissa Brouwers ,&nbsp;Giorgia Sulis","doi":"10.1016/j.jclinepi.2025.112088","DOIUrl":"10.1016/j.jclinepi.2025.112088","url":null,"abstract":"&lt;div&gt;&lt;h3&gt;Background and Objective&lt;/h3&gt;&lt;div&gt;Vaccine effectiveness (VE) studies are essential for informing immunization policy and public health decision-making. However, the observational nature of most VE studies introduces unique methodological challenges, including biases that are not adequately addressed by existing risk-of-bias (RoB) tools. The Risk of Bias in Vaccine Effectiveness (RoB-VE) project is an international, multiphase methodological research initiative aimed at improving the quality, transparency, interpretability, and reporting of VE research.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;div&gt;Funded by the Canadian Institutes of Health Research and supported by many global partners, the project seeks to generate a comprehensive toolkit for VE studies. This includes an RoB assessment resource tailored to VE study designs and a complementary reporting guideline to enhance consistency in VE study reporting. The project follows an evidence-informed approach, beginning with a review of the literature to inform tool development, and progressing through interest holder engagement, modified Delphi consensus, usability testing, and beta validation. This introductory paper outlines the rationale, scope, and methodology of the RoB-VE project. These efforts aim to strengthen the methodological foundation of VE research and support more reliable evidence synthesis and policy development.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Plain Language Summary&lt;/h3&gt;&lt;div&gt;VE studies measure how well vaccines work in real-world scenarios. These studies are essential for shaping vaccination recommendations. To assess the validity of VE studies, it is necessary to carry out an RoB assessment, which involves looking at different aspects of the study (eg, data collection methods, how participants are recruited, etc.) that have the potential to yield misleading results. Existing RoB assessment tools do not fully capture issues particularly relevant to VE studies and inconsistent reporting limits their usefulness. To address this, we are conducting the RoB-VE project. This project aims to improve the quality, transparency, interpretability, and reporting of VE research through the development, validation, and dissemination of a robust and user-friendly RoB assessment tool, specifically tailored for assessing VE studies. Our methodology involves a comprehensive multistep process based on established approaches. A broad range of international participants with diverse expertise and profiles will be engaged along the way to refine and finalize the tool. After pilot testing the beta version of the tool and making further refinements, we aim to deliver version 1 of the tool, which will undergo a large-scale application phase to assess its reliability and usefulness. Additionally, we will develop a reporting guideline to enhance the completeness of reporting of VE studies. This introductory paper outlines the rationale, scope, and methodology of the RoB-VE project. This project will elevate the st","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112088"},"PeriodicalIF":5.2,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145642635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Grading of Recommendations, Assessment, Development, and Evaluation guidance 44: strategies to enhance the utilization of randomized and nonrandomized studies in evidence syntheses of healthinterventions GRADE指南44:在卫生干预措施证据综合中加强利用随机和非随机研究的战略。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-11-22 DOI: 10.1016/j.jclinepi.2025.112086
Carlos A. Cuello-Garcia , Rebecca L. Morgan , Nancy Santesso , Pablo Alonso-Coello , Romina Brignardello-Petersen , Lukas Schwingshackl , Jan L. Brozek , Srinivasa Vittal Katikireddi , Zachary Munn , Hugh Sharma Waddington , Kevin C. Wilson , Joerg Meerpohl , Daniel Morales , Ignacio Neumann , Peter Tugwell , Gordon Guyatt , Holger J. Schünemann

Background and Objectives

Ideally, guideline developers and health technology assessment authors base intervention decisions on randomized controlled trials (RCTs). However, relying solely on RCTs is uncommon, especially for public health interventions and harms assessment. In these situations, nonrandomized studies of interventions (NRSIs) can provide valuable information. This article presents Grading of Recommendations Assessment, Development, and Evaluation (GRADE) guidance for integrating bodies of evidence RCT and NRSI in evidence syntheses of health interventions.

Methods

Following standard GRADE methods, we developed this guidance through iterative discussions and examples with experts from the GRADE NRSI project group in multiple dedicated meetings. We presented findings of the group discussions for feedback at GRADE Working Group meetings in September 2023 and May 2024.

Results

The resulting GRADE guidance outlines a structured approach: (1) assessing the certainty of evidence (CoE) after defining the number of decision thresholds and the target of the certainty rating; (2) evaluating congruency of effect estimates between RCTs and NRSIs; (3) identifying which GRADE domains are affected by certainty ratings to inform complementariness between RCTs and NRSIs and the overall CoE; and (4) deciding whether and how to use one or both types of studies.

Conclusion

This GRADE guidance offers a structured and practical approach for integrating or not integrating RCTs and NRSIs in evidence syntheses. By addressing the interplay between affected GRADE domains and assessing the congruency of effects, it helps GRADE users determine when and how NRSIs can meaningfully complement or replace RCT evidence to inform certainty ratings and decision-making.
背景和目的:理想情况下,指南制定者和HTA作者基于随机对照试验(rct)来决定干预措施。然而,仅仅依靠随机对照试验是不常见的,特别是在公共卫生干预和危害评估方面。在这些情况下,非随机干预研究(NRSI)可以提供有价值的信息。本文介绍了GRADE指南,用于在健康干预措施的证据综合中整合证据机构RCT和NRSI。方法:遵循标准的GRADE方法,我们通过与来自GRADE NRSI项目组的专家在多次专门会议上的反复讨论和示例开发了该指南。我们在2023年9月和2024年5月的GRADE工作组会议上提交了小组讨论的结果,以供反馈。结果:由此产生的GRADE指南概述了一种结构化方法:1)在定义决策阈值的数量和确定性评级的目标后评估证据的确定性(CoE);2)评价rct与NRSIs效应估计的一致性;3)确定哪些GRADE域受到确定性评级的影响,以告知rct和nsis与总体CoE之间的互补性;4)决定是否以及如何使用一种或两种类型的研究。结论:GRADE指南为在证据综合中整合或不整合rct和nrsi提供了一个结构化和实用的方法。通过解决受影响的GRADE域之间的相互作用和评估效果的一致性,它可以帮助GRADE用户确定nrsi何时以及如何有意义地补充或取代RCT证据,以告知确定性评级和决策。
{"title":"Grading of Recommendations, Assessment, Development, and Evaluation guidance 44: strategies to enhance the utilization of randomized and nonrandomized studies in evidence syntheses of healthinterventions","authors":"Carlos A. Cuello-Garcia ,&nbsp;Rebecca L. Morgan ,&nbsp;Nancy Santesso ,&nbsp;Pablo Alonso-Coello ,&nbsp;Romina Brignardello-Petersen ,&nbsp;Lukas Schwingshackl ,&nbsp;Jan L. Brozek ,&nbsp;Srinivasa Vittal Katikireddi ,&nbsp;Zachary Munn ,&nbsp;Hugh Sharma Waddington ,&nbsp;Kevin C. Wilson ,&nbsp;Joerg Meerpohl ,&nbsp;Daniel Morales ,&nbsp;Ignacio Neumann ,&nbsp;Peter Tugwell ,&nbsp;Gordon Guyatt ,&nbsp;Holger J. Schünemann","doi":"10.1016/j.jclinepi.2025.112086","DOIUrl":"10.1016/j.jclinepi.2025.112086","url":null,"abstract":"<div><h3>Background and Objectives</h3><div>Ideally, guideline developers and health technology assessment authors base intervention decisions on randomized controlled trials (RCTs). However, relying solely on RCTs is uncommon, especially for public health interventions and harms assessment. In these situations, nonrandomized studies of interventions (NRSIs) can provide valuable information. This article presents Grading of Recommendations Assessment, Development, and Evaluation (GRADE) guidance for integrating bodies of evidence RCT and NRSI in evidence syntheses of health interventions.</div></div><div><h3>Methods</h3><div>Following standard GRADE methods, we developed this guidance through iterative discussions and examples with experts from the GRADE NRSI project group in multiple dedicated meetings. We presented findings of the group discussions for feedback at GRADE Working Group meetings in September 2023 and May 2024.</div></div><div><h3>Results</h3><div>The resulting GRADE guidance outlines a structured approach: (1) assessing the certainty of evidence (CoE) after defining the number of decision thresholds and the target of the certainty rating; (2) evaluating congruency of effect estimates between RCTs and NRSIs; (3) identifying which GRADE domains are affected by certainty ratings to inform complementariness between RCTs and NRSIs and the overall CoE; and (4) deciding whether and how to use one or both types of studies.</div></div><div><h3>Conclusion</h3><div>This GRADE guidance offers a structured and practical approach for integrating or not integrating RCTs and NRSIs in evidence syntheses. By addressing the interplay between affected GRADE domains and assessing the congruency of effects, it helps GRADE users determine when and how NRSIs can meaningfully complement or replace RCT evidence to inform certainty ratings and decision-making.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112086"},"PeriodicalIF":5.2,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145598039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hill's considerations are not causal criteria 希尔的考虑不是因果标准。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-11-22 DOI: 10.1016/j.jclinepi.2025.112087
David A. Savitz , Neil Pearce , Kenneth J. Rothman
Hill's list of considerations for assessing causality, proposed 60 years ago, became a landmark in the interpretation of epidemiologic evidence. However, it has been and continues to be misused as a list of causal criteria to be scored and summed, despite causal inference being unattainable through the application of this or any other algorithm. Recognizing the distinction between statistical associations and causal effects was a key contribution of Hill. While he identified several clues for distinguishing between causal and noncausal associations, causal inference in epidemiology has become much more explicit and effective. Rather than relying on Hill's indirect hints of potential bias by considering strength of association or dose-response gradients, newer methods such as quantitative bias analysis directly assess confounding and other candidate biases that compete with causal explanations, leading to more informed inferences. Similarly, the interpretation of consistency depends on variation in methods across studies; triangulation may be used to search for informative inconsistencies, strengthening causal inference. Most importantly, a causal connection is not a categorical property bestowed upon an association based on Hill's considerations or any other checklist. Causal inference is an inherently indirect process, with the inference gradually crystallizing by withstanding challenges from competing theories in which other explanations, including random error or biases, are found not to account for the measured association.
希尔在60年前提出的评估因果关系的考虑因素清单,成为解释流行病学证据的一个里程碑。然而,它一直并将继续被误用为要评分和总结的因果标准列表,尽管通过应用该算法或任何其他算法无法获得因果推理。认识到统计关联和因果效应之间的区别是希尔的一个关键贡献。虽然他发现了一些区分因果关系和非因果关系的线索,但流行病学中的因果推理已经变得更加明确和有效。新的方法,如定量偏倚分析,直接评估混淆和其他与因果解释竞争的候选偏倚,而不是依靠希尔通过考虑关联强度或剂量-反应梯度来间接暗示潜在的偏倚,从而得出更明智的推论。同样,对一致性的解释取决于不同研究方法的差异;三角测量可用于搜索信息不一致,加强因果推理。最重要的是,因果关系不是基于希尔的考虑或任何其他清单赋予联想的绝对属性。因果推理本质上是一个间接的过程,通过经受竞争理论的挑战,推理逐渐具体化,其中其他解释,包括随机误差或偏差,被发现不能解释测量的关联。
{"title":"Hill's considerations are not causal criteria","authors":"David A. Savitz ,&nbsp;Neil Pearce ,&nbsp;Kenneth J. Rothman","doi":"10.1016/j.jclinepi.2025.112087","DOIUrl":"10.1016/j.jclinepi.2025.112087","url":null,"abstract":"<div><div>Hill's list of considerations for assessing causality, proposed 60 years ago, became a landmark in the interpretation of epidemiologic evidence. However, it has been and continues to be misused as a list of causal criteria to be scored and summed, despite causal inference being unattainable through the application of this or any other algorithm. Recognizing the distinction between statistical associations and causal effects was a key contribution of Hill. While he identified several clues for distinguishing between causal and noncausal associations, causal inference in epidemiology has become much more explicit and effective. Rather than relying on Hill's indirect hints of potential bias by considering strength of association or dose-response gradients, newer methods such as quantitative bias analysis directly assess confounding and other candidate biases that compete with causal explanations, leading to more informed inferences. Similarly, the interpretation of consistency depends on variation in methods across studies; triangulation may be used to search for informative inconsistencies, strengthening causal inference. Most importantly, a causal connection is not a categorical property bestowed upon an association based on Hill's considerations or any other checklist. Causal inference is an inherently indirect process, with the inference gradually crystallizing by withstanding challenges from competing theories in which other explanations, including random error or biases, are found not to account for the measured association.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112087"},"PeriodicalIF":5.2,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145597988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discrimination, calibration, and variable importance in statistical and machine learning models for predicting overall survival in advanced non–small cell lung cancer patients treated with immune checkpoint inhibitors 用于预测接受免疫检查点抑制剂治疗的晚期非小细胞肺癌患者总生存率的统计和机器学习模型的区分、校准和可变重要性
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-11-21 DOI: 10.1016/j.jclinepi.2025.112082
Lee X. Li , Ashley M. Hopkins , Richard Woodman , Ahmad Y. Abuhelwa , Yuan Gao , Natalie Parent , Andrew Rowland , Michael J. Sorich

Background and Objectives

Prognostic models can enhance clinician-patient communication and guide treatment decisions. Numerous machine learning (ML) algorithms are available and offer a novel approach to predicting survival in patients treated with immune checkpoint inhibitors. However, large-scale benchmarking of their performances—particularly in terms of calibration—has not been evaluated across multiple independent cohorts. This study aimed to develop, evaluate, and compare statistical and ML models regarding discrimination, calibration, and variable importance for predicting overall survival across seven clinical trial cohorts of advanced non–small cell lung cancer (NSCLC) undergoing immune checkpoint inhibitor treatment.

Methods

This study included atezolizumab-treated patients with advanced NSCLC from seven clinical trials. We compared two statistical models: Cox proportional-hazard (Coxph) and accelerated failure time models, and 6 ML models: CoxBoost, extreme gradient-boosting (XGBoost), gradient-boosting machines (GBMs), random survival forest, regularized Coxph models (least absolute shrinkage and selection operator [LASSO]), and support vector machines (SVMs). Models were evaluated on discrimination and calibration using a leave-one-study-out nested cross-validation (nCV) framework. Discrimination was assessed using Harrell's concordance index (Cindex), while calibration was assessed using integrated calibration index (ICI) and plot. Variable importance was assessed using Shapley Additive exPlanations (SHAP) values.

Results

In a cohort of 3203 patients, the two statistical models and 5 of the 6 ML models demonstrated comparable and moderate discrimination performances (aggregated Cindex: 0.69–0.70), while SVM exhibited poor discrimination (aggregated Cindex: 0.57). Regarding calibration, the models appeared largely comparable in aggregated plots, except for LASSO, although the XGBoost models demonstrated superior calibration numerically. Across the evaluation cohorts, individual performance measures varied and no single model consistently outperforming the others. Pretreatment neutrophil-to-lymphocyte ratios (NLRs) and Eastern Cooperative Oncology Group Performance Status (ECOGPS) were ranked among the top five most important predictors across all models.

Conclusion

There was no clear best-performing model for either discrimination or calibration, although XGBoost models showed possible superior calibration numerically. Performance of a given model varied across evaluation cohorts, highlighting the importance of model assessment using multiple independent datasets. All models identified pretreatment NLR and ECOGPS as the key prognostic factors.
背景与目的:预后模型可以加强临床与患者的沟通,指导治疗决策。许多机器学习(ML)算法可用,并提供了一种新的方法来预测接受免疫检查点抑制剂治疗的患者的生存。然而,对它们的性能进行大规模基准测试——特别是在校准方面——尚未在多个独立队列中进行评估。本研究旨在开发、评估和比较7个接受免疫检查点抑制剂治疗的晚期非小细胞肺癌(NSCLC)临床试验队列中关于区分、校准和可变重要性的统计和ML模型,以预测总生存期。患者和方法:本研究纳入了七项临床试验的atezolizumab治疗的晚期非小细胞肺癌患者。我们比较了两种统计模型:Cox比例风险(Coxph)和加速失效时间模型,以及6种ML模型:Cox boost、极端梯度增强(XGBoost)、梯度增强机(GBM)、随机生存森林、正则化Coxph模型(LASSO)和支持向量机(SVM)。使用留一项研究的嵌套交叉验证(nCV)框架对模型进行判别和校准评估。判别采用Harrell’s concordance index (Cindex)评价,校正采用integrated calibration index (ICI)和plot评价。变量重要性采用Shapley加性解释(SHAP)值进行评估。结果:在3203例患者队列中,两种统计模型和6种ML模型中的5种具有可比性和中等的判别性能(综合判别指数:0.69-0.70),而SVM的判别性能较差(综合判别指数:0.57)。在校准方面,除了LASSO之外,这些模型在汇总图中表现出很大的可比性,尽管XGBoost模型在数值上显示出更好的校准。在整个评估队列中,个人绩效指标各不相同,没有一个模型始终优于其他模型。治疗前中性粒细胞与淋巴细胞比率(NLR)和东部肿瘤合作组表现状态(ECOGPS)在所有模型中排名前五。结论:虽然XGBoost模型在数值上可能具有更好的校准效果,但在鉴别和校准方面没有明确的最佳模型。给定模型的性能在评估队列中有所不同,突出了使用多个独立数据集进行模型评估的重要性。所有模型均将治疗前NLR和ECOGPS确定为关键预后因素。
{"title":"Discrimination, calibration, and variable importance in statistical and machine learning models for predicting overall survival in advanced non–small cell lung cancer patients treated with immune checkpoint inhibitors","authors":"Lee X. Li ,&nbsp;Ashley M. Hopkins ,&nbsp;Richard Woodman ,&nbsp;Ahmad Y. Abuhelwa ,&nbsp;Yuan Gao ,&nbsp;Natalie Parent ,&nbsp;Andrew Rowland ,&nbsp;Michael J. Sorich","doi":"10.1016/j.jclinepi.2025.112082","DOIUrl":"10.1016/j.jclinepi.2025.112082","url":null,"abstract":"<div><h3>Background and Objectives</h3><div>Prognostic models can enhance clinician-patient communication and guide treatment decisions. Numerous machine learning (ML) algorithms are available and offer a novel approach to predicting survival in patients treated with immune checkpoint inhibitors. However, large-scale benchmarking of their performances—particularly in terms of calibration—has not been evaluated across multiple independent cohorts. This study aimed to develop, evaluate, and compare statistical and ML models regarding discrimination, calibration, and variable importance for predicting overall survival across seven clinical trial cohorts of advanced non–small cell lung cancer (NSCLC) undergoing immune checkpoint inhibitor treatment.</div></div><div><h3>Methods</h3><div>This study included atezolizumab-treated patients with advanced NSCLC from seven clinical trials. We compared two statistical models: Cox proportional-hazard (Coxph) and accelerated failure time models, and 6 ML models: CoxBoost, extreme gradient-boosting (XGBoost), gradient-boosting machines (GBMs), random survival forest, regularized Coxph models (least absolute shrinkage and selection operator [LASSO]), and support vector machines (SVMs). Models were evaluated on discrimination and calibration using a leave-one-study-out nested cross-validation (nCV) framework. Discrimination was assessed using Harrell's concordance index (Cindex), while calibration was assessed using integrated calibration index (ICI) and plot. Variable importance was assessed using Shapley Additive exPlanations (SHAP) values.</div></div><div><h3>Results</h3><div>In a cohort of 3203 patients, the two statistical models and 5 of the 6 ML models demonstrated comparable and moderate discrimination performances (aggregated Cindex: 0.69–0.70), while SVM exhibited poor discrimination (aggregated Cindex: 0.57). Regarding calibration, the models appeared largely comparable in aggregated plots, except for LASSO, although the XGBoost models demonstrated superior calibration numerically. Across the evaluation cohorts, individual performance measures varied and no single model consistently outperforming the others. Pretreatment neutrophil-to-lymphocyte ratios (NLRs) and Eastern Cooperative Oncology Group Performance Status (ECOGPS) were ranked among the top five most important predictors across all models.</div></div><div><h3>Conclusion</h3><div>There was no clear best-performing model for either discrimination or calibration, although XGBoost models showed possible superior calibration numerically. Performance of a given model varied across evaluation cohorts, highlighting the importance of model assessment using multiple independent datasets. All models identified pretreatment NLR and ECOGPS as the key prognostic factors.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112082"},"PeriodicalIF":5.2,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145589810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Clinical Epidemiology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1