首页 > 最新文献

Journal of Clinical Epidemiology最新文献

英文 中文
Predictors of citation rates and the problem of citation bias: a scoping review 引文率的预测因子和引文偏倚的问题——一个范围综述。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-02-01 Epub Date: 2025-11-19 DOI: 10.1016/j.jclinepi.2025.112057
Birgitte Nørgaard , Karen E. Lie , Hans Lund
<div><h3>Objectives</h3><div>To systematically map the factors associated with citation rates, to categorize the types of studies evaluating these factors, and to obtain an overall status of citation bias in scientific health literature.</div></div><div><h3>Study Design and Setting</h3><div>A scoping review was reported following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses scoping review extension checklist. Four electronic databases were searched, and the reference-lists of all included articles were screened. Empirical meta-research studies reporting any source of predictors of citation rates and/or citation bias within health care were included. Data are presented by descriptive statistics such as frequencies, portions, and percentages.</div></div><div><h3>Results</h3><div>A total of 165 studies were included. Fifty-four distinct factors of citation rates were evaluated in 786 quantitative analyses. Regardless of using the same basic methodological approach to calculate citation rate, 78 studies (48%) aimed to examined citation bias, whereas 79 studies (48%) aimed to optimizing article characteristics to enhance authors’ own citation rates. The remaining seven studies (4%) analyzed infrastructural characteristics at publication level to make all studies more accessible.</div></div><div><h3>Conclusion</h3><div>Seventy-nine of the 165 included studies (48%) explicitly recommended modifying paper characteristics—such as title length or author count—to boost citations rather than prioritizing scientific contribution. Such recommendations may conflict with principles of scientific integrity, which emphasize relevance and methodological rigor over strategic citation practices. Given the high proportion of analyses identifying a significant increase in citation rates, publication bias cannot be ruled out.</div></div><div><h3>Plain Language Summary</h3><div>Why was the study done? Within scientific research, it is important to cite previous research. This is done for specific reasons, including crediting earlier authors and providing a credible and trustworthy background for conducting the study. However, findings suggest that citations are not always chosen for their intended purpose. This is known as citation bias. What did the researchers do? The researchers searched for all existing studies evaluating predictors of citation rate, ie, how often is a specific study referred to by other researchers. They systematically mapped these studies to find out both the level of citation bias and the types of citation bias present in scientific health literature. To find these studies, the researchers searched four electronic databases and screened the reference lists of all included studies to be sure to include as many studies as possible. What did the researchers find? The researchers found a total of 165 studies that evaluated predictors of citation rate in no less than 786 analyses. However, the researchers found that the studie
目的:系统地绘制与被引率相关的因素,对评估这些因素的研究类型进行分类,并获得科学卫生文献中引文偏倚的总体状况。研究设计和设置:根据PRISMA范围审查扩展清单报告范围审查。检索了四个电子数据库,并筛选了所有纳入文章的参考文献清单。报告医疗保健中引用率和/或引用偏倚预测因素的任何来源的经验元研究被纳入。数据由描述性统计数据表示,如频率、部分和百分比。结果:共纳入165项研究。在纳入的786项定量分析中,评估了54个不同的被引率因素。无论使用相同的基本方法来计算引用率,78项研究(48%)旨在检查引用偏倚,而79项研究(48%)旨在优化文章特征以提高作者自己的引用率。其余7项研究(4%)分析了发表水平的基础设施特征,使所有研究更容易获得。结论:在纳入的165项研究中,有79项(48%)明确建议修改论文特征,如标题长度或作者数量,以提高引用率,而不是优先考虑科学贡献。这些建议可能与科学诚信原则相冲突,科学诚信原则强调相关性和方法的严谨性,而不是战略性的引用实践。鉴于高比例的分析表明引用率显著增加,不能排除发表偏倚。简单的语言总结:为什么要做这项研究?在科学研究中,引用前人的研究是很重要的。这样做有特定的原因,包括归功于早期的作者,并为进行研究提供可信和值得信赖的背景。然而,研究结果表明,引文并不总是根据其预期目的而选择的。这就是所谓的引文偏倚。
{"title":"Predictors of citation rates and the problem of citation bias: a scoping review","authors":"Birgitte Nørgaard ,&nbsp;Karen E. Lie ,&nbsp;Hans Lund","doi":"10.1016/j.jclinepi.2025.112057","DOIUrl":"10.1016/j.jclinepi.2025.112057","url":null,"abstract":"&lt;div&gt;&lt;h3&gt;Objectives&lt;/h3&gt;&lt;div&gt;To systematically map the factors associated with citation rates, to categorize the types of studies evaluating these factors, and to obtain an overall status of citation bias in scientific health literature.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Study Design and Setting&lt;/h3&gt;&lt;div&gt;A scoping review was reported following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses scoping review extension checklist. Four electronic databases were searched, and the reference-lists of all included articles were screened. Empirical meta-research studies reporting any source of predictors of citation rates and/or citation bias within health care were included. Data are presented by descriptive statistics such as frequencies, portions, and percentages.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Results&lt;/h3&gt;&lt;div&gt;A total of 165 studies were included. Fifty-four distinct factors of citation rates were evaluated in 786 quantitative analyses. Regardless of using the same basic methodological approach to calculate citation rate, 78 studies (48%) aimed to examined citation bias, whereas 79 studies (48%) aimed to optimizing article characteristics to enhance authors’ own citation rates. The remaining seven studies (4%) analyzed infrastructural characteristics at publication level to make all studies more accessible.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Conclusion&lt;/h3&gt;&lt;div&gt;Seventy-nine of the 165 included studies (48%) explicitly recommended modifying paper characteristics—such as title length or author count—to boost citations rather than prioritizing scientific contribution. Such recommendations may conflict with principles of scientific integrity, which emphasize relevance and methodological rigor over strategic citation practices. Given the high proportion of analyses identifying a significant increase in citation rates, publication bias cannot be ruled out.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Plain Language Summary&lt;/h3&gt;&lt;div&gt;Why was the study done? Within scientific research, it is important to cite previous research. This is done for specific reasons, including crediting earlier authors and providing a credible and trustworthy background for conducting the study. However, findings suggest that citations are not always chosen for their intended purpose. This is known as citation bias. What did the researchers do? The researchers searched for all existing studies evaluating predictors of citation rate, ie, how often is a specific study referred to by other researchers. They systematically mapped these studies to find out both the level of citation bias and the types of citation bias present in scientific health literature. To find these studies, the researchers searched four electronic databases and screened the reference lists of all included studies to be sure to include as many studies as possible. What did the researchers find? The researchers found a total of 165 studies that evaluated predictors of citation rate in no less than 786 analyses. However, the researchers found that the studie","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112057"},"PeriodicalIF":5.2,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145566074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Use of structured tools by peer reviewers of systematic reviews: a cross-sectional study reveals high familiarity with Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) but limited use of other tools 系统评审的同行审稿人对结构化工具的使用:一项横断面研究显示对PRISMA的高度熟悉,但对其他工具的使用有限。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-02-01 Epub Date: 2025-11-20 DOI: 10.1016/j.jclinepi.2025.112084
Livia Puljak , Sara Pintur , Tanja Rombey , Craig Lockwood , Dawid Pieper
<div><h3>Objectives</h3><div>Systematic reviews (SRs) are pivotal to evidence-based medicine. Structured tools exist to guide their reporting and appraisal, such as Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and A Measurement Tool to Assess Systematic Reviews (AMSTAR). However, there are limited data on whether peer reviewers of SRs use such tools when assessing manuscripts. This study aimed to investigate the use of structured tools by peer reviewers when assessing SRs of interventions, identify which tools are used, and explore perceived needs for structured tools to support the peer-review process.</div></div><div><h3>Study Design and Setting</h3><div>In 2025, we conducted a cross-sectional study targeting individuals who peer-reviewed at least 1 SR of interventions in the past year. The online survey collected data on demographics, use, and familiarity with structured tools, as well as open-ended responses on potential needs.</div></div><div><h3>Results</h3><div>Two hundred seventeen peer reviewers took part in the study. PRISMA was the most familiar tool (99% familiar or very familiar) and most frequently used during peer review (53% always used). The use of other tools such as AMSTAR, Peer Review of Electronic Search Strategies (PRESS), A Risk of Bias Assessment Tool for Systematic Reviews (ROBIS), and JBI checklist was infrequent. Seventeen percent reported using other structured tools beyond those listed. Most participants indicated that journals rarely required use of structured tools, except PRISMA. A notable proportion (55%) expressed concerns about time constraints, and 25% noted the lack of a comprehensive tool. Nearly half (45%) expressed a need for a dedicated structured tool for SR peer review, with checklists in PDF or embedded formats preferred. Participants expressed both advantages and concerns related to such tools.</div></div><div><h3>Conclusion</h3><div>Most peer reviewers used PRISMA when assessing SRs, while other structured tools were seldom applied. Only a few journals provided or required such tools, revealing inconsistent editorial practices. Participants reported barriers, including time constraints and a lack of suitable instruments. These findings highlight the need for a practical, validated tool, built upon existing instruments and integrated into editorial workflows. Such a tool could make peer review of SRs more consistent and transparent.</div></div><div><h3>Plain Language Summary</h3><div>Systematic reviews (SRs) are a type of research that synthesizes results from primary studies. Several structured tools, such as PRISMA for reporting and AMSTAR 2 for methodological quality, exist to guide how SRs are written and appraised. When manuscripts that report SRs are submitted to scholarly journals, editors invite expert peer reviewers to assess these SRs. In this study, researchers aimed to analyze which tools peer reviewers actually use when evaluating SR manuscripts, their percep
目的:系统评价(SRs)是循证医学的关键。现有的结构化工具可以指导他们的报告和评估,例如PRISMA和AMSTAR。然而,关于SRs的同行审稿人在评估手稿时是否使用这些工具的数据有限。本研究旨在调查同行评议者在评估干预措施的SRs时对结构化工具的使用情况,确定使用了哪些工具,并探索对结构化工具的感知需求,以支持同行评议过程。研究设计和背景:在2025年,我们进行了一项横断面研究,目标是在过去一年中同行评审过至少一项干预措施SR的个人。这项在线调查收集了人口统计数据、使用情况和对结构化工具的熟悉程度,以及对潜在需求的开放式回应。结果:共有217名同行评议人参与研究。PRISMA是最熟悉的工具(99%熟悉或非常熟悉),也是同行评议中最常用的工具(53%总是使用)。其他工具如AMSTAR、PRESS、ROBIS和JBI的使用很少。17%的人表示他们使用了上述工具之外的其他结构化工具。大多数与会者表示,除了PRISMA之外,期刊很少要求使用结构化工具。值得注意的是,55%的人表示担心时间限制,25%的人表示缺乏全面的工具。近一半(45%)的受访者表示需要一个专门的结构化工具来进行SR同行评审,首选PDF或嵌入式格式的清单。与会者表示了与这些工具有关的优点和关切。结论:大多数同行审稿人在评估系统评审时使用PRISMA,而其他结构化工具很少使用。只有少数期刊提供或需要这样的工具,这暴露了不一致的编辑实践。与会者报告了各种障碍,包括时间限制和缺乏合适的工具。这些发现强调需要一种实用的、经过验证的工具,建立在现有工具的基础上,并融入编辑工作流程。这样一个工具可以使同行评议的系统性更加一致和透明。
{"title":"Use of structured tools by peer reviewers of systematic reviews: a cross-sectional study reveals high familiarity with Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) but limited use of other tools","authors":"Livia Puljak ,&nbsp;Sara Pintur ,&nbsp;Tanja Rombey ,&nbsp;Craig Lockwood ,&nbsp;Dawid Pieper","doi":"10.1016/j.jclinepi.2025.112084","DOIUrl":"10.1016/j.jclinepi.2025.112084","url":null,"abstract":"&lt;div&gt;&lt;h3&gt;Objectives&lt;/h3&gt;&lt;div&gt;Systematic reviews (SRs) are pivotal to evidence-based medicine. Structured tools exist to guide their reporting and appraisal, such as Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and A Measurement Tool to Assess Systematic Reviews (AMSTAR). However, there are limited data on whether peer reviewers of SRs use such tools when assessing manuscripts. This study aimed to investigate the use of structured tools by peer reviewers when assessing SRs of interventions, identify which tools are used, and explore perceived needs for structured tools to support the peer-review process.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Study Design and Setting&lt;/h3&gt;&lt;div&gt;In 2025, we conducted a cross-sectional study targeting individuals who peer-reviewed at least 1 SR of interventions in the past year. The online survey collected data on demographics, use, and familiarity with structured tools, as well as open-ended responses on potential needs.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Results&lt;/h3&gt;&lt;div&gt;Two hundred seventeen peer reviewers took part in the study. PRISMA was the most familiar tool (99% familiar or very familiar) and most frequently used during peer review (53% always used). The use of other tools such as AMSTAR, Peer Review of Electronic Search Strategies (PRESS), A Risk of Bias Assessment Tool for Systematic Reviews (ROBIS), and JBI checklist was infrequent. Seventeen percent reported using other structured tools beyond those listed. Most participants indicated that journals rarely required use of structured tools, except PRISMA. A notable proportion (55%) expressed concerns about time constraints, and 25% noted the lack of a comprehensive tool. Nearly half (45%) expressed a need for a dedicated structured tool for SR peer review, with checklists in PDF or embedded formats preferred. Participants expressed both advantages and concerns related to such tools.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Conclusion&lt;/h3&gt;&lt;div&gt;Most peer reviewers used PRISMA when assessing SRs, while other structured tools were seldom applied. Only a few journals provided or required such tools, revealing inconsistent editorial practices. Participants reported barriers, including time constraints and a lack of suitable instruments. These findings highlight the need for a practical, validated tool, built upon existing instruments and integrated into editorial workflows. Such a tool could make peer review of SRs more consistent and transparent.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Plain Language Summary&lt;/h3&gt;&lt;div&gt;Systematic reviews (SRs) are a type of research that synthesizes results from primary studies. Several structured tools, such as PRISMA for reporting and AMSTAR 2 for methodological quality, exist to guide how SRs are written and appraised. When manuscripts that report SRs are submitted to scholarly journals, editors invite expert peer reviewers to assess these SRs. In this study, researchers aimed to analyze which tools peer reviewers actually use when evaluating SR manuscripts, their percep","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112084"},"PeriodicalIF":5.2,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145582736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparison between risk of bias-1 and risk of bias-2 tool and impact on network meta-analysis results—A case study from a living Cochrane review on psoriasis rob1和rob2工具的比较及其对网络meta分析结果的影响——以牛皮癣Cochrane综述为例
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-02-01 Epub Date: 2025-12-04 DOI: 10.1016/j.jclinepi.2025.112097
R. Guelimi , C. Choudhary , C. Ollivier , Q. Beytout , Q. Samaran , A. Mubuangankusu , A. Chaimani , E. Sbidian , S. Afach , L. Le Cleach

Objectives

This study was conducted within a large Cochrane living systematic review (SR) on psoriasis treatments with the aim to evaluate the inter-rater agreement of the Cochrane risk of bias tool 2 (RoB-2) tool, to compare its RoB judgments with the original RoB-1, and to explore the impact of changes in RoB judgment between the two tools on the Cochrane network meta-analysis’ (NMA) results.

Study Design and Setting

This study was conducted within the 2025 update of a living Cochrane review on systemic treatments for psoriasis. Four pairs of assessors used RoB-2 to evaluate the RoB of 193 randomized controlled trials for two primary outcomes: Psoriasis Area Severity Index (PASI) 90 (reflecting clear or almost clear skin) and serious adverse events (SAEs). Inter-rater reliability (IRR) was calculated using Cohen's kappa. RoB-2 judgments for 147 trials (PASI 90) and 154 trials (SAEs) were compared to the previous RoB-1 assessments from the Cochrane 2023 update. The impact of using RoB-2 vs. RoB-1 judgments on the NMA's results was explored through sensitivity analyses, with calculation of ratio of risk ratios (RRRs) between the analyses for each treatment effect.

Results

For the RoB-2 overall judgment, the IRR was fair for PASI 90 (kappa = 0.37) and moderate for SAEs (kappa = 0.46). IRR varied between domains (from kappa = 0.33, to kappa = 0.65), with lower IRR found for domains 2, 3, and 5. Significant discrepancies were found between RoB-1 and RoB-2 judgments. Compared to RoB-1, RoB-2 rated a smaller proportion of results as low risk for both PASI 90 (36% vs 58%) and SAEs (13% vs 58%) and a higher proportion as high risk for SAEs (55% vs 29%). For PASI 90, 66/147 (45%) studies showed switches between different judgments, including 18 extreme switches either from low to high or from high to low RoB. For SAEs, 93/154 (60%) studies underwent switches between different judgments, with 32 extreme switches occurring exclusively from low to high RoB. Sensitivity analyses excluding high-risk trials showed moderate impact on the NMA efficacy results (median RRR = 0.92, interquartile range (IQR), 0.91–0.92), but wider changes for SAEs (median RRR = 1.07, IQR, 0.97–1.15).

Conclusion

The transition to RoB-2 in a large Cochrane SR revealed fair-to-moderate inter-rater agreement, underscoring the need for consensus among reviewers. The shift from RoB-1 to RoB-2 led to changes in risk-of-bias judgments in our review. Although the impact on the NMA results was pronounced for SAEs, the changes in results were limited for our efficacy outcome PASI 90.
目的:本研究是在一项大型Cochrane牛皮癣治疗的实时系统评价中进行的,目的是评估Cochrane rob2工具的评分一致性,比较其与原始rob1工具的偏倚判断风险,并探讨两种工具之间偏倚判断风险的变化对Cochrane网络meta分析(NMA)结果的影响。研究设计和设置:本研究是在2025年更新的关于银屑病全身治疗的Cochrane活综述中进行的。四对评估者使用rob2来评估193个随机对照试验的两个主要结局的偏倚风险:银屑病区域严重程度指数(PASI) 90(反映皮肤透明或几乎透明)和严重不良事件(SAE)。评估者间信度(IRR)采用Cohen's kappa计算。147项试验(PASI 90)和154项试验(SAE)的rob2判断与Cochrane 2023更新中的先前rob1评估进行了比较。通过敏感性分析探讨了使用rob2和rob1判断对NMA结果的影响,并计算了每种治疗效果分析之间的风险比(RRR)。结果:对于rob2总体判断,PASI 90的IRR为一般(kappa = 0.37), sae的IRR为中等(kappa = 0.46)。区域之间的IRR不同(从kappa = 0.33到kappa = 0.65),区域2、3和5的IRR较低。rob1和rob2的判断存在显著差异。与rob1相比,rob2对PASI 90(36%对58%)和SAEs(13%对58%)的低风险评分比例较小,对SAEs的高风险评分比例较高(55%对29%)。对于PASI 90, 66/147(45%)的研究显示了不同判断之间的切换,包括18个从低到高,或从高到低偏倚风险的极端切换。对于SAE而言,93/154(60%)的研究经历了不同判断之间的切换,其中32个极端切换完全发生从低到高的偏倚风险。排除高风险试验的敏感性分析显示,对NMA疗效结果的影响中等(中位RRR = 0.92, IQR 0.91-0.92),但对SAE的影响较大(中位RRR = 1.07, IQR 0.97-1.15)。结论:在Cochrane大型系统评价中,向rob2的过渡显示了评分者之间的公平到中等程度的一致,强调了评分者之间达成共识的必要性。在我们的综述中,从rob1到rob2的转变导致了偏倚风险判断的变化。虽然SAE对NMA结果的影响很明显,但我们的疗效结果PASI 90的结果变化有限。
{"title":"Comparison between risk of bias-1 and risk of bias-2 tool and impact on network meta-analysis results—A case study from a living Cochrane review on psoriasis","authors":"R. Guelimi ,&nbsp;C. Choudhary ,&nbsp;C. Ollivier ,&nbsp;Q. Beytout ,&nbsp;Q. Samaran ,&nbsp;A. Mubuangankusu ,&nbsp;A. Chaimani ,&nbsp;E. Sbidian ,&nbsp;S. Afach ,&nbsp;L. Le Cleach","doi":"10.1016/j.jclinepi.2025.112097","DOIUrl":"10.1016/j.jclinepi.2025.112097","url":null,"abstract":"<div><h3>Objectives</h3><div>This study was conducted within a large Cochrane living systematic review (SR) on psoriasis treatments with the aim to evaluate the inter-rater agreement of the Cochrane risk of bias tool 2 (RoB-2) tool, to compare its RoB judgments with the original RoB-1, and to explore the impact of changes in RoB judgment between the two tools on the Cochrane network meta-analysis’ (NMA) results.</div></div><div><h3>Study Design and Setting</h3><div>This study was conducted within the 2025 update of a living Cochrane review on systemic treatments for psoriasis. Four pairs of assessors used RoB-2 to evaluate the RoB of 193 randomized controlled trials for two primary outcomes: Psoriasis Area Severity Index (PASI) 90 (reflecting clear or almost clear skin) and serious adverse events (SAEs). Inter-rater reliability (IRR) was calculated using Cohen's kappa. RoB-2 judgments for 147 trials (PASI 90) and 154 trials (SAEs) were compared to the previous RoB-1 assessments from the Cochrane 2023 update. The impact of using RoB-2 vs. RoB-1 judgments on the NMA's results was explored through sensitivity analyses, with calculation of ratio of risk ratios (RRRs) between the analyses for each treatment effect.</div></div><div><h3>Results</h3><div>For the RoB-2 overall judgment, the IRR was fair for PASI 90 (kappa = 0.37) and moderate for SAEs (kappa = 0.46). IRR varied between domains (from kappa = 0.33, to kappa = 0.65), with lower IRR found for domains 2, 3, and 5. Significant discrepancies were found between RoB-1 and RoB-2 judgments. Compared to RoB-1, RoB-2 rated a smaller proportion of results as low risk for both PASI 90 (36% vs 58%) and SAEs (13% vs 58%) and a higher proportion as high risk for SAEs (55% vs 29%). For PASI 90, 66/147 (45%) studies showed switches between different judgments, including 18 extreme switches either from low to high or from high to low RoB. For SAEs, 93/154 (60%) studies underwent switches between different judgments, with 32 extreme switches occurring exclusively from low to high RoB. Sensitivity analyses excluding high-risk trials showed moderate impact on the NMA efficacy results (median RRR = 0.92, interquartile range (IQR), 0.91–0.92), but wider changes for SAEs (median RRR = 1.07, IQR, 0.97–1.15).</div></div><div><h3>Conclusion</h3><div>The transition to RoB-2 in a large Cochrane SR revealed fair-to-moderate inter-rater agreement, underscoring the need for consensus among reviewers. The shift from RoB-1 to RoB-2 led to changes in risk-of-bias judgments in our review. Although the impact on the NMA results was pronounced for SAEs, the changes in results were limited for our efficacy outcome PASI 90.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112097"},"PeriodicalIF":5.2,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145696458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Methodological guidance for individual participant data meta-analyses: a systematic review 个体参与者数据荟萃分析的方法学指导:一项系统综述。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-02-01 Epub Date: 2025-11-25 DOI: 10.1016/j.jclinepi.2025.112089
Edith Ginika Otalike , Mike Clarke , Farjana Akhter , Areti Angeliki Veroniki , Ngianga-Bakwin Kandala , Joel J. Gagnier
<div><h3>Objectives</h3><div>To systematically identify and synthesize methodological guidance for conducting individual participant data meta-analyses (IPD-MAs) of randomized trials and observational studies, to inform the development of a critical appraisal tool for reports of IPD-MAs.</div></div><div><h3>Study Design and Setting</h3><div>We searched nine major electronic databases and gray literature sources through June 2025 using a strategy developed with a health sciences librarian. To be eligible, articles had to report empirical, simulation-based, consensus-based, or narrative research and offer guidance on the methodology of IPD-MA. Study selection and data extraction were performed independently by two reviewers. Quality was assessed using tools tailored to study design (eg, Aims, Data generating mechanism, Estimands, Methods, and Performance measures, Risk of Bias in Systematic Reviews, Appraisal of Guidelines for Research & Evaluation using Delphi, Scale for the Assessment of Narrative Review Articles). Extracted guidance was categorized thematically and mapped to appraisal domains.</div></div><div><h3>Results</h3><div>After screening 14,736 records, we included 141 studies. These encompassed simulation (38%), empirical (21%), and methodological guidance (12%), among others. Key themes included IPD-MA planning, data access and harmonization, analytical strategies, and other statistical issues, as well as reporting. While there was robust guidance for IPD-MA of randomized trials, recommendations for observational studies are sparse. Across all study types, 63% were rated high quality.</div></div><div><h3>Conclusion</h3><div>This review synthesizes previously fragmented guidance into an integrative synthesis, highlighting best practices and critical domains for evaluating IPD-MAs. These findings formed the evidence base for a Delphi consensus process to develop a dedicated IPD-MA critical appraisal tool.</div></div><div><h3>Plain Language Summary</h3><div>Meta-analyses often pool published summaries from many studies. That approach can miss important details and introduce bias. An IPD-MA instead reanalyses the original, participant-level data across studies. IPD-MAs are powerful but complex, and practical guidance is scattered, especially for observational studies. We wanted to bring these recommendations together in one place and identify candidate items for a tool to assess the quality of a completed IPD-MA. We systematically searched eight databases from their inception to 2025 to identify papers offering practical guidance on conducting IPD-MAs for health interventions. We organized guidance across the full project life cycle, from planning, finding and accessing data, to preparing and checking data, analyzing results, and reporting. We highlighted where experts broadly agree and where gaps remain. We found 141 relevant papers published between 1995 and 2025. Among these, we identified 25 key topic areas and several smaller subt
目的:系统地确定和综合进行随机试验和观察性研究的IPD-MAs的方法学指导,为IPD-MAs报告的关键评估工具的开发提供信息。研究设计和设置:我们使用与健康科学图书管理员开发的策略,检索了截至2025年6月的9个主要电子数据库和灰色文献来源。为了符合资格,文章必须报告经验性的、基于模拟的、基于共识的或叙述性的研究,并提供关于IPD-MA方法的指导。研究选择和数据提取由两位评论者独立完成。使用适合研究设计的工具(如ADEMP、ROBIS、ACCORD、SANRA)评估质量。提取的指导按主题分类,并映射到评估领域。结果:在筛选14736份记录后,我们纳入了141项研究。这些包括模拟(38%)、经验(21%)和方法指导(12%)等。关键主题包括IPD-MA规划、数据获取和统一、分析战略和其他统计问题以及报告。虽然对随机试验的IPD-MA有强有力的指导,但对观察性研究的建议很少。在所有研究类型中,63%被评为高质量。结论:本综述将先前零散的指南综合为一个综合的综合,突出了评估IPD-MAs的最佳实践和关键领域。这些发现形成了德尔菲共识过程的证据基础,以开发专用的IPD-MA关键评估工具。简洁的语言总结:荟萃分析经常汇集来自许多研究的已发表的总结。这种方法可能会遗漏重要细节,并引入偏见。个体参与者数据荟萃分析(IPD-MA)重新分析了原始的、参与者水平的跨研究数据。IPD-MAs功能强大但复杂,实用的指导是分散的,特别是对于观察性研究。我们希望将这些建议集中在一个地方,并确定用于评估已完成IPD-MA质量的工具的候选项目。我们系统地检索了8个数据库,从建立之初到2025年,以确定为卫生干预实施IPD-MAs提供实用指导的论文。我们组织了贯穿整个项目生命周期的指导,从计划、查找和访问数据,到准备和检查数据、分析结果和报告。我们强调了专家们普遍同意的领域和仍然存在差距的领域。我们找到了1995年至2025年间发表的141篇相关论文。其中,我们确定了25个关键主题领域和几个较小的子主题。许多论文涵盖了多个主题,因此我们允许它们出现在多个类别中,而不是强制进行单一分类。根据这一映射,我们围绕四个主题提出了一套明确的建议:IPDMA规划;识别和获取研究和个人参与者数据;meta分析方法;还有其他一些特殊的考虑,比如观察性研究的方法。我们还发现了观察性IPD-MAs的显著差距。我们的综合为团队计划或审查ipd - ma提供了一个实用的清单。它还为基于共识的评估工具奠定了基础,以帮助编辑、资助者和指南开发者判断质量并改进实践。
{"title":"Methodological guidance for individual participant data meta-analyses: a systematic review","authors":"Edith Ginika Otalike ,&nbsp;Mike Clarke ,&nbsp;Farjana Akhter ,&nbsp;Areti Angeliki Veroniki ,&nbsp;Ngianga-Bakwin Kandala ,&nbsp;Joel J. Gagnier","doi":"10.1016/j.jclinepi.2025.112089","DOIUrl":"10.1016/j.jclinepi.2025.112089","url":null,"abstract":"&lt;div&gt;&lt;h3&gt;Objectives&lt;/h3&gt;&lt;div&gt;To systematically identify and synthesize methodological guidance for conducting individual participant data meta-analyses (IPD-MAs) of randomized trials and observational studies, to inform the development of a critical appraisal tool for reports of IPD-MAs.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Study Design and Setting&lt;/h3&gt;&lt;div&gt;We searched nine major electronic databases and gray literature sources through June 2025 using a strategy developed with a health sciences librarian. To be eligible, articles had to report empirical, simulation-based, consensus-based, or narrative research and offer guidance on the methodology of IPD-MA. Study selection and data extraction were performed independently by two reviewers. Quality was assessed using tools tailored to study design (eg, Aims, Data generating mechanism, Estimands, Methods, and Performance measures, Risk of Bias in Systematic Reviews, Appraisal of Guidelines for Research &amp; Evaluation using Delphi, Scale for the Assessment of Narrative Review Articles). Extracted guidance was categorized thematically and mapped to appraisal domains.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Results&lt;/h3&gt;&lt;div&gt;After screening 14,736 records, we included 141 studies. These encompassed simulation (38%), empirical (21%), and methodological guidance (12%), among others. Key themes included IPD-MA planning, data access and harmonization, analytical strategies, and other statistical issues, as well as reporting. While there was robust guidance for IPD-MA of randomized trials, recommendations for observational studies are sparse. Across all study types, 63% were rated high quality.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Conclusion&lt;/h3&gt;&lt;div&gt;This review synthesizes previously fragmented guidance into an integrative synthesis, highlighting best practices and critical domains for evaluating IPD-MAs. These findings formed the evidence base for a Delphi consensus process to develop a dedicated IPD-MA critical appraisal tool.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Plain Language Summary&lt;/h3&gt;&lt;div&gt;Meta-analyses often pool published summaries from many studies. That approach can miss important details and introduce bias. An IPD-MA instead reanalyses the original, participant-level data across studies. IPD-MAs are powerful but complex, and practical guidance is scattered, especially for observational studies. We wanted to bring these recommendations together in one place and identify candidate items for a tool to assess the quality of a completed IPD-MA. We systematically searched eight databases from their inception to 2025 to identify papers offering practical guidance on conducting IPD-MAs for health interventions. We organized guidance across the full project life cycle, from planning, finding and accessing data, to preparing and checking data, analyzing results, and reporting. We highlighted where experts broadly agree and where gaps remain. We found 141 relevant papers published between 1995 and 2025. Among these, we identified 25 key topic areas and several smaller subt","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112089"},"PeriodicalIF":5.2,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145642598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The measurement properties reliability and measurement error explained – a COSMIN perspective 从COSMIN的角度解释了测量特性、可靠性和测量误差。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-02-01 Epub Date: 2025-11-19 DOI: 10.1016/j.jclinepi.2025.112058
Lidwine B. Mokkink , Iris Eekhout
Reliability and measurement error are related but distinct measurement properties. They are connected because both can be evaluated using the same data, typically collected from studies involving repeated measurements in individuals who are stable on the outcome of interest. However, they are calculated using different statistical methods and refer to different quality aspects of measurement instruments. We explain that a measurement error refers to the precision of a measurement, that is, how similar or close the scores are across repeated measurements in a stable individual (variation within individuals). In contrast, reliability indicates an instrument's ability to distinguish between individuals, which depends both on the variation between individuals (ie, heterogeneity in the outcome being measured in the population) and the precision of the score, ie, the measurement error. Evaluating reliability helps to understand if a particular source of variation (eg, occasion, type of machine, or rater) influences the score, and whether the measurement can be improved by better standardizing this source. Intraclass-correlation coefficients, standards error of measurement, and variance components are explained and illustrated with an example.
可靠性和测量误差是两个相关但不同的测量特性。它们是相互联系的,因为两者都可以使用相同的数据进行评估,这些数据通常是从涉及重复测量的研究中收集的,这些研究涉及对感兴趣的结果稳定的个体。然而,它们是用不同的统计方法计算的,涉及测量仪器的不同质量方面。我们解释说,测量误差是指测量的精度,也就是说,在一个稳定的个体(个体内部的变化)中重复测量的分数是多么相似或接近。相比之下,可靠性表明一种工具区分个体的能力,这既取决于个体之间的差异(即在总体中测量结果的异质性),也取决于得分的精度,即测量误差。评估可靠性有助于了解特定的变化来源(例如场合、机器类型等)是否会影响得分,以及是否可以通过更好地标准化该来源来改进测量。对类内相关系数、测量标准误差和方差成分进行了解释和举例说明。
{"title":"The measurement properties reliability and measurement error explained – a COSMIN perspective","authors":"Lidwine B. Mokkink ,&nbsp;Iris Eekhout","doi":"10.1016/j.jclinepi.2025.112058","DOIUrl":"10.1016/j.jclinepi.2025.112058","url":null,"abstract":"<div><div>Reliability and measurement error are related but distinct measurement properties. They are connected because both can be evaluated using the same data, typically collected from studies involving repeated measurements in individuals who are stable on the outcome of interest. However, they are calculated using different statistical methods and refer to different quality aspects of measurement instruments. We explain that a measurement error refers to the precision of a measurement, that is, how similar or close the scores are across repeated measurements in a stable individual (variation within individuals). In contrast, reliability indicates an instrument's ability to distinguish between individuals, which depends both on the variation between individuals (ie, heterogeneity in the outcome being measured in the population) and the precision of the score, ie, the measurement error. Evaluating reliability helps to understand if a particular source of variation (eg, occasion, type of machine, or rater) influences the score, and whether the measurement can be improved by better standardizing this source. Intraclass-correlation coefficients, standards error of measurement, and variance components are explained and illustrated with an example.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112058"},"PeriodicalIF":5.2,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145566035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Developing a framework for assessing the applicability of the target condition in diagnostic research 发展一个评估目标条件在诊断研究中的适用性的框架。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-02-01 Epub Date: 2025-11-15 DOI: 10.1016/j.jclinepi.2025.112059
Eve Tomlinson , Jude Holmes , Anne W.S. Rutjes , Clare Davenport , Mariska Leeflang , Bada Yang , Sue Mallett , Penny Whiting
<div><h3>Objectives</h3><div>Assessment of the applicability of primary studies is an essential but often a challenging aspect of systematic reviews of diagnostic test accuracy studies (DTA reviews). We explored review authors’ applicability assessments for the QUADAS-2 reference standard domain within Cochrane DTA reviews. We highlight applicability concerns, identify potential issues with assessment, and develop a framework for assessing the applicability of the target condition as defined by the reference standard.</div></div><div><h3>Study Design and Setting</h3><div>Methodological review. DTA reviews in the Cochrane Library that used QUADAS-2 and judged applicability for the reference standard domain as “high concern” for at least one study were eligible. One reviewer extracted the rationale for the “high concern” and this was checked by a second reviewer. Two reviewers categorized the rationale inductively into themes, and a third reviewer verified these. Discussions regarding the extracted information informed framework development.</div></div><div><h3>Results</h3><div>We identified 50 eligible reviews. Five themes emerged: study uses different reference standard threshold to define the target condition (six reviews), misclassification by the reference standard in the study such that the target condition in the study does not match the review question (11 reviews), reference standard could not be applied to all participants resulting in a different target condition (five reviews), misunderstanding QUADAS-2 applicability (seven reviews), and insufficient information (21 reviews). Our framework for researchers outlines four potential applicability concerns for the assessment of the target condition as defined by the reference standard: different sub-categories of the target condition, different threshold used to define the target condition, reference standard not applied to full study group, and misclassification of the target condition by the reference standard.</div></div><div><h3>Conclusion</h3><div>Clear sources of applicability concerns are identifiable, but several Cochrane review authors struggle to adequately identify and report them. We have developed an applicability framework to guide review authors in their assessment of applicability concerns for the QUADAS reference standard domain.</div></div><div><h3>Plain Language Summary</h3><div>What is the problem? Doctors use tests to help to decide if a person has a certain condition. They want to know how accurate the test is before they use it. This means how well it can tell people who have the condition from people who do not have it. This information can be found in “diagnostic systematic reviews”. Diagnostic systematic reviews start with a research question. They bring together findings from studies that have already been done to try to answer this question. It is important for researchers to check that the studies match the review question. This is called an “applicability assess
目的:评估初步研究的适用性是诊断试验准确性研究系统评价(DTA评价)的一个重要但经常具有挑战性的方面。我们在Cochrane DTA综述中探讨了综述作者对QUADAS-2参考标准领域的适用性评估。我们强调适用性问题,识别评估中的潜在问题,并制定一个框架来评估参考标准定义的目标条件的适用性。研究设计和设置:方法学回顾。Cochrane图书馆中使用QUADAS-2并判定参考标准领域的适用性为至少一项研究“高度关注”的DTA评价是合格的。一位审稿人提取了“高度关注”的理由,这由另一位审稿人进行了检查。两位审稿人将基本原理归纳为主题,第三位审稿人验证了这些主题。关于抽取信息知情框架开发的讨论。结果:我们确定了50篇符合条件的综述。出现了五个主题:研究使用不同的参考标准阈值来定义目标条件(6篇综述),研究中参考标准的错误分类导致研究中的目标条件与综述问题不匹配(11篇综述),参考标准不能适用于所有参与者导致不同的目标条件(5篇综述),误解QUADAS-2适用性(7篇综述),信息不充分(21篇综述)。我们的研究人员框架概述了参考标准所定义的目标条件评估的四个潜在适用性问题:目标条件的不同子类别,定义目标条件的不同阈值,参考标准不适用于整个研究组,以及参考标准对目标条件的错误分类。结论:明确的适用性问题来源是可识别的,但一些Cochrane综述作者难以充分识别和报告它们。我们已经开发了一个适用性框架来指导评审作者评估QUADAS参考标准领域的适用性问题。问题是什么?医生通过测试来帮助确定一个人是否患有某种疾病。在使用之前,他们想知道测试的准确性。这意味着它可以很好地区分患有这种疾病的人。这些信息可以在“诊断系统评价”中找到。诊断性系统评价从一个研究问题开始。他们汇集了已经完成的研究结果,试图回答这个问题。研究人员检查研究是否符合回顾问题是很重要的。这被称为“适用性评估”。例如,如果综述着眼于儿童,那么检查这些研究是否也关注儿童是很重要的。有一个名为“QUADAS-2”的工具可以用来检查研究与复习问题的匹配程度。这可能很难做到,而且没有很多例子可以帮助人们。我们做了什么?我们想更多地了解人们如何使用QUADAS-2工具来判断适用性。我们还想编写指导来支持关于适用性的判断。我们发现了什么?我们找到了人们如何进行适用性评估的例子。许多评论没有做对,我们解释为什么会这样。我们还做了指导,帮助人们进行适用性评估。
{"title":"Developing a framework for assessing the applicability of the target condition in diagnostic research","authors":"Eve Tomlinson ,&nbsp;Jude Holmes ,&nbsp;Anne W.S. Rutjes ,&nbsp;Clare Davenport ,&nbsp;Mariska Leeflang ,&nbsp;Bada Yang ,&nbsp;Sue Mallett ,&nbsp;Penny Whiting","doi":"10.1016/j.jclinepi.2025.112059","DOIUrl":"10.1016/j.jclinepi.2025.112059","url":null,"abstract":"&lt;div&gt;&lt;h3&gt;Objectives&lt;/h3&gt;&lt;div&gt;Assessment of the applicability of primary studies is an essential but often a challenging aspect of systematic reviews of diagnostic test accuracy studies (DTA reviews). We explored review authors’ applicability assessments for the QUADAS-2 reference standard domain within Cochrane DTA reviews. We highlight applicability concerns, identify potential issues with assessment, and develop a framework for assessing the applicability of the target condition as defined by the reference standard.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Study Design and Setting&lt;/h3&gt;&lt;div&gt;Methodological review. DTA reviews in the Cochrane Library that used QUADAS-2 and judged applicability for the reference standard domain as “high concern” for at least one study were eligible. One reviewer extracted the rationale for the “high concern” and this was checked by a second reviewer. Two reviewers categorized the rationale inductively into themes, and a third reviewer verified these. Discussions regarding the extracted information informed framework development.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Results&lt;/h3&gt;&lt;div&gt;We identified 50 eligible reviews. Five themes emerged: study uses different reference standard threshold to define the target condition (six reviews), misclassification by the reference standard in the study such that the target condition in the study does not match the review question (11 reviews), reference standard could not be applied to all participants resulting in a different target condition (five reviews), misunderstanding QUADAS-2 applicability (seven reviews), and insufficient information (21 reviews). Our framework for researchers outlines four potential applicability concerns for the assessment of the target condition as defined by the reference standard: different sub-categories of the target condition, different threshold used to define the target condition, reference standard not applied to full study group, and misclassification of the target condition by the reference standard.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Conclusion&lt;/h3&gt;&lt;div&gt;Clear sources of applicability concerns are identifiable, but several Cochrane review authors struggle to adequately identify and report them. We have developed an applicability framework to guide review authors in their assessment of applicability concerns for the QUADAS reference standard domain.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Plain Language Summary&lt;/h3&gt;&lt;div&gt;What is the problem? Doctors use tests to help to decide if a person has a certain condition. They want to know how accurate the test is before they use it. This means how well it can tell people who have the condition from people who do not have it. This information can be found in “diagnostic systematic reviews”. Diagnostic systematic reviews start with a research question. They bring together findings from studies that have already been done to try to answer this question. It is important for researchers to check that the studies match the review question. This is called an “applicability assess","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112059"},"PeriodicalIF":5.2,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145543777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A scoping review of critical appraisal tools and user guides for systematic reviews with network meta-analysis: methodological gaps and directions for tool development 网络元分析系统评价的关键评估工具和用户指南的范围审查:方法差距和工具开发方向。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-02-01 Epub Date: 2025-11-20 DOI: 10.1016/j.jclinepi.2025.112056
K.M. Mondragon , C.S. Tan-Lim , R. Velasco Jr. , C.P. Cordero , H.M. Strebel , L. Palileo-Villanueva , J.V. Mantaring
<div><h3>Background</h3><div>Systematic reviews (SRs) with network meta-analyses (NMAs) are increasingly used to inform guidelines, health technology assessments (HTAs), and policy decisions. Their methodological complexity, as well as the difficulty in assessing the exchangeability assumption and the large amount of results, makes appraisal more challenging than for SRs with pairwise NMAs. Numerous SR- and NMA-specific appraisal tools exist, but they vary in scope, intended users, and methodological guidance, and few have been validated.</div></div><div><h3>Objectives</h3><div>To identify and describe appraisal instruments and interpretive guides for SRs and NMAs specifically, summarizing their characteristics, domain coverage, development methods, and measurement-property evaluations.</div></div><div><h3>Methods</h3><div>We conducted a methodological scoping review which included structured appraisal instruments or interpretive guides for SRs with or without NMA-specific domains, aimed at review authors, clinicians, guideline developers, or HTA assessors from published or gray literature in English. Searches (inception–August 2025) covered major databases, registries, organizational websites, and reference lists. Two reviewers independently screened records; data were extracted by one and checked by a second. We synthesized the findings narratively. First, we classified tools as either structured instruments or interpretive guides. Second, we grouped them according to their intended audience and scope. Third, we assessed available measurement-property data using relevant COnsensus-based Standards for the selection of health Measurement INstruments items.</div></div><div><h3>Results</h3><div>Thirty-four articles described 22 instruments (11 NMA-specific, nine systematic reviews with meta-analysis-specific, 2 encompassing both systematic reviews with meta-analysis and NMA). NMA tools added domains such as network geometry, transitivity, and coherence, but guidance on transitivity evaluation, publication bias, and ranking was either limited or ineffective. Reviewer-focused tools were structured with explicit response options, whereas clinician-oriented guides posed appraisal questions with explanations but no prescribed response. Nine instruments reported measurement-property data, with validity and reliability varying widely.</div></div><div><h3>Conclusion</h3><div>This first comprehensive map of systematic reviews with meta-analysis and NMA appraisal resources highlights the need for clearer operational criteria, structured decision rules, and integrated rater training to improve reliability and align foundational SR domains with NMA-specific content.</div></div><div><h3>Plain Language Summary</h3><div>NMA is a way to compare many treatments at once by combining results from multiple studies—even when some treatments have not been directly compared head-to-head. Because NMAs are complex, users need clear tools to judge whether an analysis is tru
背景:系统评价(SR)与网络荟萃分析(NMA)越来越多地用于指导方针、卫生技术评估和政策决策。它们的方法复杂性,以及评估可交换性假设和大量结果的难度,使得评估比两两荟萃分析的SRs更具挑战性。存在许多特定于SR和nma的评估工具,但它们在范围、预期用户和方法指导上各不相同,并且很少得到验证。目的:明确和描述SRs和nma的评估工具和解释指南,总结它们的特征、领域覆盖、开发方法和测量属性评估。方法:我们进行了方法学范围评价,包括有或没有nma特定领域的系统评价的结构化评价工具或解释性指南,目标是来自已发表或灰色英文文献的综述作者、临床医生、指南制定者或HTA评估者。搜索(启动至2025年8月)涵盖主要数据库、注册表、组织网站和参考列表。两名审稿人独立筛选记录;数据由一个人提取,另一个人检查。我们以叙述的方式综合了这些发现。首先,我们将工具分为结构化工具和解释性指南。其次,我们根据目标受众和范围对它们进行分组。第三,我们使用相关的COSMIN项目评估可用的测量属性数据。结果:36篇文章描述了21种仪器(11种nma专用仪器,9种SR/ ma专用仪器,1种通用仪器)。NMA工具增加了网络几何、及物性和连贯性等领域,但对及物性评估、发表偏倚和排名的指导要么有限,要么无效。以审稿人为中心的工具具有明确的回答选项,而以临床医生为导向的指南提出了带有解释的评估问题,但没有规定的回答。9种仪器报告了测量性能数据,其有效性和可靠性差异很大。结论:这是第一张综合SR/MA和NMA评估资源的地图,强调需要更清晰的操作标准、结构化的决策规则和综合的评估师培训,以提高可靠性,并使基础SR领域与NMA特定内容保持一致。简单的语言总结:网络荟萃分析(NMA)是一种通过结合多个研究的结果来同时比较许多治疗方法的方法,即使有些治疗方法没有直接进行正面比较。由于nma很复杂,用户需要明确的工具来判断分析是否可信。我们回顾并绘制了过去三十年中发表的22种用于评估或解释系统评价(SRs)和nma的工具。大约一半是专门为NMAS设计的;其余的是适用于nma的一般SR工具。大多数工具涵盖了良好评论的基础(明确的问题、公平的搜索、偏见评估和透明的综合)。nma专用工具还解决了网络特有的问题,例如网络如何连接,间接和直接证据是否一致(一致性),以及如何解释治疗排名。然而,重要的差距仍然存在。很少有工具对传递性/一致性、网络级发表偏差或排名不确定性进行逐步检查,并且评级者之间报告的可靠性不一致。报告核对表(例如,PRISMA-NMA)规定了应该报告什么信息,但没有规定应该如何呈现。确定性框架(例如GRADE或CINeMA)概述了如何跨领域评估结果的可信度,例如不一致或不精确,但它们没有解释或标准化这些领域评估的不同方式。这意味着:指南制定者、HTA评估人员和临床医生应该同时使用SR和NMA工具,寻求与在NMA方面经验丰富的统计学家合作,并支持具有明确决策规则和用户培训的工具。经过更好测试、更清晰的工具将使NMA评估更加一致和可信。
{"title":"A scoping review of critical appraisal tools and user guides for systematic reviews with network meta-analysis: methodological gaps and directions for tool development","authors":"K.M. Mondragon ,&nbsp;C.S. Tan-Lim ,&nbsp;R. Velasco Jr. ,&nbsp;C.P. Cordero ,&nbsp;H.M. Strebel ,&nbsp;L. Palileo-Villanueva ,&nbsp;J.V. Mantaring","doi":"10.1016/j.jclinepi.2025.112056","DOIUrl":"10.1016/j.jclinepi.2025.112056","url":null,"abstract":"&lt;div&gt;&lt;h3&gt;Background&lt;/h3&gt;&lt;div&gt;Systematic reviews (SRs) with network meta-analyses (NMAs) are increasingly used to inform guidelines, health technology assessments (HTAs), and policy decisions. Their methodological complexity, as well as the difficulty in assessing the exchangeability assumption and the large amount of results, makes appraisal more challenging than for SRs with pairwise NMAs. Numerous SR- and NMA-specific appraisal tools exist, but they vary in scope, intended users, and methodological guidance, and few have been validated.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Objectives&lt;/h3&gt;&lt;div&gt;To identify and describe appraisal instruments and interpretive guides for SRs and NMAs specifically, summarizing their characteristics, domain coverage, development methods, and measurement-property evaluations.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Methods&lt;/h3&gt;&lt;div&gt;We conducted a methodological scoping review which included structured appraisal instruments or interpretive guides for SRs with or without NMA-specific domains, aimed at review authors, clinicians, guideline developers, or HTA assessors from published or gray literature in English. Searches (inception–August 2025) covered major databases, registries, organizational websites, and reference lists. Two reviewers independently screened records; data were extracted by one and checked by a second. We synthesized the findings narratively. First, we classified tools as either structured instruments or interpretive guides. Second, we grouped them according to their intended audience and scope. Third, we assessed available measurement-property data using relevant COnsensus-based Standards for the selection of health Measurement INstruments items.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Results&lt;/h3&gt;&lt;div&gt;Thirty-four articles described 22 instruments (11 NMA-specific, nine systematic reviews with meta-analysis-specific, 2 encompassing both systematic reviews with meta-analysis and NMA). NMA tools added domains such as network geometry, transitivity, and coherence, but guidance on transitivity evaluation, publication bias, and ranking was either limited or ineffective. Reviewer-focused tools were structured with explicit response options, whereas clinician-oriented guides posed appraisal questions with explanations but no prescribed response. Nine instruments reported measurement-property data, with validity and reliability varying widely.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Conclusion&lt;/h3&gt;&lt;div&gt;This first comprehensive map of systematic reviews with meta-analysis and NMA appraisal resources highlights the need for clearer operational criteria, structured decision rules, and integrated rater training to improve reliability and align foundational SR domains with NMA-specific content.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Plain Language Summary&lt;/h3&gt;&lt;div&gt;NMA is a way to compare many treatments at once by combining results from multiple studies—even when some treatments have not been directly compared head-to-head. Because NMAs are complex, users need clear tools to judge whether an analysis is tru","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112056"},"PeriodicalIF":5.2,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145582728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Omission of main effects from regression models with a ratio variable as the focal exposure can result in bias and inflated type I error rates 在以比例变量作为焦点曝光的回归模型中遗漏主效应可能导致偏差和膨胀的I型错误率。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-02-01 Epub Date: 2025-12-03 DOI: 10.1016/j.jclinepi.2025.112092
Matthew J. Valente , Biwei Cao , Daniëlle D.B. Holthuijsen , Martijn J.L. Bours , Simone J.P.M. Eussen , Matty P. Weijenberg , Judith J.M. Rijnhart

Objectives

Ratio variables (eg, body mass index (BMI), cholesterol ratios, and metabolite ratios) are widely used as exposure variables in epidemiologic studies on cause-and-effect. While statisticians have emphasized the importance of including main effects of the variables that make up a ratio variable in regression models, main effects are still often omitted in practice. The objective of this study is to demonstrate the impact of omitting main effects from regression models with a ratio variable as the focal exposure on bias in the effect estimates and type I error rates.

Study Design and Setting

We demonstrated the impact of omitting main effects in three steps. First, we showed the connection between regression models with ratio variables and regression models with product terms, which are well-understood by epidemiologists. Second, we estimated models with and without main effects of a ratio variable using a real-life data example. Third, we performed a simulation study to demonstrate the impact of omitting main effects on bias and type I error rates.

Results

We showed the impact of omitting main effects in regression models with ratio terms. In the real-life example, the ratio term was only statistically significantly associated with the outcome when omitting main effects. The simulation study results indicated that the omission of main effects often leads to biased effect estimates and inflated type I error rates.

Conclusion

Regression models with a ratio term as an exposure variable need to include main effects to avoid bias in the effect estimates and inflated type I error rates.
目的:比值变量(如BMI、胆固醇比值、代谢物比值)在流行病学因果关系研究中被广泛用作暴露变量。虽然统计学家强调在回归模型中包括构成比率变量的变量的主效应的重要性,但在实践中仍然经常忽略主效应。本研究的目的是证明从以比率变量作为焦点暴露的回归模型中省略主效应对效应估计偏差和I型错误率的影响。研究设计和设置:我们分三步论证了忽略主效应的影响。首先,我们展示了带有比率变量的回归模型和带有产品项的回归模型之间的联系,这是流行病学家所熟知的。其次,我们使用实际数据示例估计了具有和不具有比率变量主效应的模型。第三,我们进行了模拟研究,以证明忽略主效应对偏差和I型错误率的影响。结果:我们发现忽略主效应的影响在带有比率项和乘积项的回归模型中是相同的。在现实生活的例子中,当忽略主效应时,比率项仅与结果具有统计显著性相关。仿真研究结果表明,主效应的遗漏往往会导致效应估计的偏差和I型错误率的膨胀。结论:以比率项作为暴露变量的回归模型需要包括主效应,以避免效应估计中的偏差和夸大的I型错误率。
{"title":"Omission of main effects from regression models with a ratio variable as the focal exposure can result in bias and inflated type I error rates","authors":"Matthew J. Valente ,&nbsp;Biwei Cao ,&nbsp;Daniëlle D.B. Holthuijsen ,&nbsp;Martijn J.L. Bours ,&nbsp;Simone J.P.M. Eussen ,&nbsp;Matty P. Weijenberg ,&nbsp;Judith J.M. Rijnhart","doi":"10.1016/j.jclinepi.2025.112092","DOIUrl":"10.1016/j.jclinepi.2025.112092","url":null,"abstract":"<div><h3>Objectives</h3><div>Ratio variables (eg, body mass index (BMI), cholesterol ratios, and metabolite ratios) are widely used as exposure variables in epidemiologic studies on cause-and-effect. While statisticians have emphasized the importance of including main effects of the variables that make up a ratio variable in regression models, main effects are still often omitted in practice. The objective of this study is to demonstrate the impact of omitting main effects from regression models with a ratio variable as the focal exposure on bias in the effect estimates and type I error rates.</div></div><div><h3>Study Design and Setting</h3><div>We demonstrated the impact of omitting main effects in three steps. First, we showed the connection between regression models with ratio variables and regression models with product terms, which are well-understood by epidemiologists. Second, we estimated models with and without main effects of a ratio variable using a real-life data example. Third, we performed a simulation study to demonstrate the impact of omitting main effects on bias and type I error rates.</div></div><div><h3>Results</h3><div>We showed the impact of omitting main effects in regression models with ratio terms. In the real-life example, the ratio term was only statistically significantly associated with the outcome when omitting main effects. The simulation study results indicated that the omission of main effects often leads to biased effect estimates and inflated type I error rates.</div></div><div><h3>Conclusion</h3><div>Regression models with a ratio term as an exposure variable need to include main effects to avoid bias in the effect estimates and inflated type I error rates.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112092"},"PeriodicalIF":5.2,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145688599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating and standardizing functioning outcomes in rheumatoid arthritis pharmacological trials: a scoping review informed by the International Classification of Functioning, Disability and Health (ICF) 整合和标准化类风湿关节炎药理试验的功能结果:使用国际功能、残疾和健康分类(ICF)的范围综述
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-02-01 Epub Date: 2025-12-03 DOI: 10.1016/j.jclinepi.2025.112093
Adrian Martinez-De la Torre , Polina Leshetkina , Ogie Ahanor , Roxanne Maritz

Background

To examine how functioning-related outcomes in Phase III pharmacological clinical trials for rheumatoid arthritis (RA) align with the International Classification of Functioning, Disability and Health (ICF) Brief Core Set, and to identify which domains of functioning are most frequently represented.

Study Design and Setting

RA is a chronic autoimmune disease and a major cause of disability worldwide. While Phase III randomized controlled trials (RCTs) remain the gold standard for evaluating pharmacological treatments, they often rely on clinical and laboratory endpoints and overlook how therapies affect patients’ functioning. The International Classification of Functioning, Disability and Health (ICF) provides a standardized, patient-centered framework to assess functioning across key domains. A scoping review was conducted in accordance with the JBI methodology for scoping reviews and reported following PRISMA-ScR guidelines. Literature was searched in MEDLINE, EMBASE, and ClinicalTrials.gov from 2010 to 2025. Phase III RCTs evaluating pharmacological interventions in adult patients with RA were included. Functioning-related outcomes were extracted and mapped to ICF categories using standardized linking rules.

Results

Of 852 records screened, 91 met the inclusion criteria. Functioning was frequently assessed through patient-reported outcomes and composite clinical measures. The most commonly linked ICF categories were related to pain and joint mobility within the body functions domain, walking and carrying out daily activities within the activities and participation domain, and joint structures of the shoulder, upper, and lower limbs within body structures. Despite the broad representation, none of the studies explicitly used the ICF framework.

Conclusion

Functioning is often assessed in RA phase III RCTs, but only implicitly and without reference to the ICF framework. Explicitly integrating the ICF could bring greater standardization, comparability, and patient-centeredness in outcome measurement in pharmacological trials, not only in RA but across chronic conditions.
背景:类风湿关节炎(RA)是一种慢性自身免疫性疾病,是世界范围内致残的主要原因。虽然III期随机对照试验(rct)仍然是评估药物治疗的金标准,但它们往往依赖于临床和实验室终点,而忽略了治疗如何影响患者的功能。国际功能、残疾和健康分类(ICF)提供了一个标准化的、以患者为中心的框架来评估关键领域的功能。这篇范围综述研究了风湿性关节炎的III期药理随机对照试验中与功能相关的结果如何与ICF简要核心集一致,以及哪些功能领域最常被代表。方法:根据JBI范围审查方法进行范围审查,并按照PRISMA-ScR指南进行报告。文献检索自2010年至2025年的MEDLINE、EMBASE和ClinicalTrials.gov。纳入评估成年RA患者药物干预的III期随机对照试验。使用标准化链接规则提取与功能相关的结果并将其映射到ICF类别。结果:经筛选的852条记录中,91条符合纳入标准。功能通常通过患者报告的结果和综合临床措施进行评估。最常见的ICF类别与身体功能领域的疼痛和关节活动有关,与活动和参与领域的行走和日常活动有关,与身体结构中肩部、上肢和下肢的关节结构有关。尽管具有广泛的代表性,但没有一项研究明确使用了ICF框架。结论:功能通常在RA III期随机对照试验中进行评估,但仅隐含且不参考ICF框架。明确整合ICF可以在药理学试验的结果测量中带来更大的标准化、可比性和以患者为中心,不仅适用于RA,也适用于所有慢性疾病。
{"title":"Integrating and standardizing functioning outcomes in rheumatoid arthritis pharmacological trials: a scoping review informed by the International Classification of Functioning, Disability and Health (ICF)","authors":"Adrian Martinez-De la Torre ,&nbsp;Polina Leshetkina ,&nbsp;Ogie Ahanor ,&nbsp;Roxanne Maritz","doi":"10.1016/j.jclinepi.2025.112093","DOIUrl":"10.1016/j.jclinepi.2025.112093","url":null,"abstract":"<div><h3>Background</h3><div>To examine how functioning-related outcomes in Phase III pharmacological clinical trials for rheumatoid arthritis (RA) align with the International Classification of Functioning, Disability and Health (ICF) Brief Core Set, and to identify which domains of functioning are most frequently represented.</div></div><div><h3>Study Design and Setting</h3><div>RA is a chronic autoimmune disease and a major cause of disability worldwide. While Phase III randomized controlled trials (RCTs) remain the gold standard for evaluating pharmacological treatments, they often rely on clinical and laboratory endpoints and overlook how therapies affect patients’ functioning. The International Classification of Functioning, Disability and Health (ICF) provides a standardized, patient-centered framework to assess functioning across key domains. A scoping review was conducted in accordance with the JBI methodology for scoping reviews and reported following PRISMA-ScR guidelines. Literature was searched in MEDLINE, EMBASE, and ClinicalTrials.gov from 2010 to 2025. Phase III RCTs evaluating pharmacological interventions in adult patients with RA were included. Functioning-related outcomes were extracted and mapped to ICF categories using standardized linking rules.</div></div><div><h3>Results</h3><div>Of 852 records screened, 91 met the inclusion criteria. Functioning was frequently assessed through patient-reported outcomes and composite clinical measures. The most commonly linked ICF categories were related to pain and joint mobility within the body functions domain, walking and carrying out daily activities within the activities and participation domain, and joint structures of the shoulder, upper, and lower limbs within body structures. Despite the broad representation, none of the studies explicitly used the ICF framework.</div></div><div><h3>Conclusion</h3><div>Functioning is often assessed in RA phase III RCTs, but only implicitly and without reference to the ICF framework. Explicitly integrating the ICF could bring greater standardization, comparability, and patient-centeredness in outcome measurement in pharmacological trials, not only in RA but across chronic conditions.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112093"},"PeriodicalIF":5.2,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145688644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discrimination, calibration, and variable importance in statistical and machine learning models for predicting overall survival in advanced non–small cell lung cancer patients treated with immune checkpoint inhibitors 用于预测接受免疫检查点抑制剂治疗的晚期非小细胞肺癌患者总生存率的统计和机器学习模型的区分、校准和可变重要性
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-02-01 Epub Date: 2025-11-21 DOI: 10.1016/j.jclinepi.2025.112082
Lee X. Li , Ashley M. Hopkins , Richard Woodman , Ahmad Y. Abuhelwa , Yuan Gao , Natalie Parent , Andrew Rowland , Michael J. Sorich

Background and Objectives

Prognostic models can enhance clinician-patient communication and guide treatment decisions. Numerous machine learning (ML) algorithms are available and offer a novel approach to predicting survival in patients treated with immune checkpoint inhibitors. However, large-scale benchmarking of their performances—particularly in terms of calibration—has not been evaluated across multiple independent cohorts. This study aimed to develop, evaluate, and compare statistical and ML models regarding discrimination, calibration, and variable importance for predicting overall survival across seven clinical trial cohorts of advanced non–small cell lung cancer (NSCLC) undergoing immune checkpoint inhibitor treatment.

Methods

This study included atezolizumab-treated patients with advanced NSCLC from seven clinical trials. We compared two statistical models: Cox proportional-hazard (Coxph) and accelerated failure time models, and 6 ML models: CoxBoost, extreme gradient-boosting (XGBoost), gradient-boosting machines (GBMs), random survival forest, regularized Coxph models (least absolute shrinkage and selection operator [LASSO]), and support vector machines (SVMs). Models were evaluated on discrimination and calibration using a leave-one-study-out nested cross-validation (nCV) framework. Discrimination was assessed using Harrell's concordance index (Cindex), while calibration was assessed using integrated calibration index (ICI) and plot. Variable importance was assessed using Shapley Additive exPlanations (SHAP) values.

Results

In a cohort of 3203 patients, the two statistical models and 5 of the 6 ML models demonstrated comparable and moderate discrimination performances (aggregated Cindex: 0.69–0.70), while SVM exhibited poor discrimination (aggregated Cindex: 0.57). Regarding calibration, the models appeared largely comparable in aggregated plots, except for LASSO, although the XGBoost models demonstrated superior calibration numerically. Across the evaluation cohorts, individual performance measures varied and no single model consistently outperforming the others. Pretreatment neutrophil-to-lymphocyte ratios (NLRs) and Eastern Cooperative Oncology Group Performance Status (ECOGPS) were ranked among the top five most important predictors across all models.

Conclusion

There was no clear best-performing model for either discrimination or calibration, although XGBoost models showed possible superior calibration numerically. Performance of a given model varied across evaluation cohorts, highlighting the importance of model assessment using multiple independent datasets. All models identified pretreatment NLR and ECOGPS as the key prognostic factors.
背景与目的:预后模型可以加强临床与患者的沟通,指导治疗决策。许多机器学习(ML)算法可用,并提供了一种新的方法来预测接受免疫检查点抑制剂治疗的患者的生存。然而,对它们的性能进行大规模基准测试——特别是在校准方面——尚未在多个独立队列中进行评估。本研究旨在开发、评估和比较7个接受免疫检查点抑制剂治疗的晚期非小细胞肺癌(NSCLC)临床试验队列中关于区分、校准和可变重要性的统计和ML模型,以预测总生存期。患者和方法:本研究纳入了七项临床试验的atezolizumab治疗的晚期非小细胞肺癌患者。我们比较了两种统计模型:Cox比例风险(Coxph)和加速失效时间模型,以及6种ML模型:Cox boost、极端梯度增强(XGBoost)、梯度增强机(GBM)、随机生存森林、正则化Coxph模型(LASSO)和支持向量机(SVM)。使用留一项研究的嵌套交叉验证(nCV)框架对模型进行判别和校准评估。判别采用Harrell’s concordance index (Cindex)评价,校正采用integrated calibration index (ICI)和plot评价。变量重要性采用Shapley加性解释(SHAP)值进行评估。结果:在3203例患者队列中,两种统计模型和6种ML模型中的5种具有可比性和中等的判别性能(综合判别指数:0.69-0.70),而SVM的判别性能较差(综合判别指数:0.57)。在校准方面,除了LASSO之外,这些模型在汇总图中表现出很大的可比性,尽管XGBoost模型在数值上显示出更好的校准。在整个评估队列中,个人绩效指标各不相同,没有一个模型始终优于其他模型。治疗前中性粒细胞与淋巴细胞比率(NLR)和东部肿瘤合作组表现状态(ECOGPS)在所有模型中排名前五。结论:虽然XGBoost模型在数值上可能具有更好的校准效果,但在鉴别和校准方面没有明确的最佳模型。给定模型的性能在评估队列中有所不同,突出了使用多个独立数据集进行模型评估的重要性。所有模型均将治疗前NLR和ECOGPS确定为关键预后因素。
{"title":"Discrimination, calibration, and variable importance in statistical and machine learning models for predicting overall survival in advanced non–small cell lung cancer patients treated with immune checkpoint inhibitors","authors":"Lee X. Li ,&nbsp;Ashley M. Hopkins ,&nbsp;Richard Woodman ,&nbsp;Ahmad Y. Abuhelwa ,&nbsp;Yuan Gao ,&nbsp;Natalie Parent ,&nbsp;Andrew Rowland ,&nbsp;Michael J. Sorich","doi":"10.1016/j.jclinepi.2025.112082","DOIUrl":"10.1016/j.jclinepi.2025.112082","url":null,"abstract":"<div><h3>Background and Objectives</h3><div>Prognostic models can enhance clinician-patient communication and guide treatment decisions. Numerous machine learning (ML) algorithms are available and offer a novel approach to predicting survival in patients treated with immune checkpoint inhibitors. However, large-scale benchmarking of their performances—particularly in terms of calibration—has not been evaluated across multiple independent cohorts. This study aimed to develop, evaluate, and compare statistical and ML models regarding discrimination, calibration, and variable importance for predicting overall survival across seven clinical trial cohorts of advanced non–small cell lung cancer (NSCLC) undergoing immune checkpoint inhibitor treatment.</div></div><div><h3>Methods</h3><div>This study included atezolizumab-treated patients with advanced NSCLC from seven clinical trials. We compared two statistical models: Cox proportional-hazard (Coxph) and accelerated failure time models, and 6 ML models: CoxBoost, extreme gradient-boosting (XGBoost), gradient-boosting machines (GBMs), random survival forest, regularized Coxph models (least absolute shrinkage and selection operator [LASSO]), and support vector machines (SVMs). Models were evaluated on discrimination and calibration using a leave-one-study-out nested cross-validation (nCV) framework. Discrimination was assessed using Harrell's concordance index (Cindex), while calibration was assessed using integrated calibration index (ICI) and plot. Variable importance was assessed using Shapley Additive exPlanations (SHAP) values.</div></div><div><h3>Results</h3><div>In a cohort of 3203 patients, the two statistical models and 5 of the 6 ML models demonstrated comparable and moderate discrimination performances (aggregated Cindex: 0.69–0.70), while SVM exhibited poor discrimination (aggregated Cindex: 0.57). Regarding calibration, the models appeared largely comparable in aggregated plots, except for LASSO, although the XGBoost models demonstrated superior calibration numerically. Across the evaluation cohorts, individual performance measures varied and no single model consistently outperforming the others. Pretreatment neutrophil-to-lymphocyte ratios (NLRs) and Eastern Cooperative Oncology Group Performance Status (ECOGPS) were ranked among the top five most important predictors across all models.</div></div><div><h3>Conclusion</h3><div>There was no clear best-performing model for either discrimination or calibration, although XGBoost models showed possible superior calibration numerically. Performance of a given model varied across evaluation cohorts, highlighting the importance of model assessment using multiple independent datasets. All models identified pretreatment NLR and ECOGPS as the key prognostic factors.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112082"},"PeriodicalIF":5.2,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145589810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Clinical Epidemiology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1