首页 > 最新文献

Journal of Clinical Epidemiology最新文献

英文 中文
The interplay between PROM score distributions and treatment effect detection likelihood in randomized controlled trials–a metaepidemiologic study 随机对照试验中PROM评分分布与治疗效果检测可能性的相互作用-一项荟萃流行病学研究。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-19 DOI: 10.1016/j.jclinepi.2025.112114
Valtteri Panula , Antti Saarinen , Matias Vaajala , Rasmus Liukkonen , Oskari Pakarinen , Juho Laaksonen , Ville Ponkilainen , Ilari Kuitunen , Mikko Uimonen

Objectives

We hypothesized that, in musculoskeletal randomized controlled trials (RCTs) using patient-reported outcome measures (PROMs), higher baseline scores and the clustering of follow-up scores near the upper bound (ie, ceiling effect) compress variability and attenuate measurable between-group differences, thereby lowering the likelihood of observing a statistically significant effect. We therefore examined how score distributions at pretreatment and follow-up influence the likelihood of detecting between-group differences.

Study Design and Setting

We conducted a metaepidemiologic study of RCTs, published between 2015 and 2024, that compared treatment effects on musculoskeletal disorders between two study groups using PROMs. The observed distributions of the PROM scores at baseline and follow-up were collected from the included studies. All PROM scores were rescaled to 0-100 with higher scores indicating better health. The likelihood of observing a statistically significant difference in PROM scores between the study groups was examined by calculating the score difference required to achieve a P value <.05.

Results

A total of 255 RCTs were included. PROM scores improved from baseline to follow-up in most studies (98%), with a mean change of +28 points. The correlation coefficient between the mean baseline score and mean score change was −0.66 (95% CI -0.72 to −0.59) indicating that higher baseline scores were associated with lower score change. In addition, there was a moderate correlation between the mean and SD of PROM scores at follow-up (−0.39; 95% CI -0.48 to −0.28). The mean likelihood of detecting a difference was 65% (SD 11%) at baseline and 65% (SD 11%) at follow-up. The likelihood reached the 80% benchmark in only 8.5% and 8.1% of the studies at baseline and follow-up, respectively.

Conclusion

The concentration of PROM score distributions toward the high end of the scale, especially when higher baseline scores are present, diminishes the likelihood of detecting significant differences between study groups, particularly at follow-up assessments in studies analyzing musculoskeletal complaints. This underscores the importance of critically evaluating the conclusions drawn from these studies.
目的:我们假设,在使用PROMs的肌肉骨骼随机对照试验中,较高的基线评分和接近上限的随访评分聚类(即天花板效应)压缩了可变性,减弱了可测量的组间差异,从而降低了观察到统计学显著效应的可能性。因此,我们研究了治疗前和随访时的评分分布如何影响检测组间差异的可能性。研究设计和背景:我们对2015年至2024年间发表的随机对照试验进行了荟萃流行病学研究,比较了两个使用PROMs的研究组对肌肉骨骼疾病的治疗效果。从纳入的研究中收集了基线和随访时观察到的PROM分数分布。所有PROM分数重新调整为0-100分,分数越高表示健康状况越好。通过计算达到p值< 0.05所需的评分差异来检验观察到研究组之间PROM评分有统计学意义差异的可能性。结果:共纳入255项rct。大多数研究(98%)的PROM评分从基线到随访有所改善,平均变化为+28分。平均基线评分与平均评分变化之间的相关系数为-0.66 (95% CI -0.72 - -0.59),表明较高的基线评分与较低的评分变化相关。此外,随访时PROM评分的平均值和SD之间存在中等相关性(-0.39;95% CI -0.48 - -0.28)。检测差异的平均可能性在基线时为65% (SD 11%),在随访时为65% (SD 11%)。在基线和随访时,分别只有8.5%和8.1%的研究的可能性达到80%的基准。结论:PROM分数向量表高端分布的浓度,特别是当较高的基线分数存在时,降低了发现研究组之间显著差异的可能性,特别是在分析肌肉骨骼疾病的研究的随访评估中。这强调了批判性地评估从这些研究中得出的结论的重要性。
{"title":"The interplay between PROM score distributions and treatment effect detection likelihood in randomized controlled trials–a metaepidemiologic study","authors":"Valtteri Panula ,&nbsp;Antti Saarinen ,&nbsp;Matias Vaajala ,&nbsp;Rasmus Liukkonen ,&nbsp;Oskari Pakarinen ,&nbsp;Juho Laaksonen ,&nbsp;Ville Ponkilainen ,&nbsp;Ilari Kuitunen ,&nbsp;Mikko Uimonen","doi":"10.1016/j.jclinepi.2025.112114","DOIUrl":"10.1016/j.jclinepi.2025.112114","url":null,"abstract":"<div><h3>Objectives</h3><div>We hypothesized that, in musculoskeletal randomized controlled trials (RCTs) using patient-reported outcome measures (PROMs), higher baseline scores and the clustering of follow-up scores near the upper bound (ie, ceiling effect) compress variability and attenuate measurable between-group differences, thereby lowering the likelihood of observing a statistically significant effect. We therefore examined how score distributions at pretreatment and follow-up influence the likelihood of detecting between-group differences.</div></div><div><h3>Study Design and Setting</h3><div>We conducted a metaepidemiologic study of RCTs, published between 2015 and 2024, that compared treatment effects on musculoskeletal disorders between two study groups using PROMs. The observed distributions of the PROM scores at baseline and follow-up were collected from the included studies. All PROM scores were rescaled to 0-100 with higher scores indicating better health. The likelihood of observing a statistically significant difference in PROM scores between the study groups was examined by calculating the score difference required to achieve a <em>P</em> value &lt;.05.</div></div><div><h3>Results</h3><div>A total of 255 RCTs were included. PROM scores improved from baseline to follow-up in most studies (98%), with a mean change of +28 points. The correlation coefficient between the mean baseline score and mean score change was −0.66 (95% CI -0.72 to −0.59) indicating that higher baseline scores were associated with lower score change. In addition, there was a moderate correlation between the mean and SD of PROM scores at follow-up (−0.39; 95% CI -0.48 to −0.28). The mean likelihood of detecting a difference was 65% (SD 11%) at baseline and 65% (SD 11%) at follow-up. The likelihood reached the 80% benchmark in only 8.5% and 8.1% of the studies at baseline and follow-up, respectively.</div></div><div><h3>Conclusion</h3><div>The concentration of PROM score distributions toward the high end of the scale, especially when higher baseline scores are present, diminishes the likelihood of detecting significant differences between study groups, particularly at follow-up assessments in studies analyzing musculoskeletal complaints. This underscores the importance of critically evaluating the conclusions drawn from these studies.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"191 ","pages":"Article 112114"},"PeriodicalIF":5.2,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The use of guidelines in multimorbidity related practice: An exploratory questionnaire survey. 指南在多病相关实践中的应用:一项探索性问卷调查。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-19 DOI: 10.1016/j.jclinepi.2025.112115
Zijun Wang, Hongfeng He, Sergey K Zyryanov, Liliya E Ziganshina, Akihiko Ozaki, Natalia Dorofeeva, Myeong Soo Lee, Ivan D Florez, Etienne Ngeh, Abhilasha Sharma, Ekaterina V Yudina, Barbara C van Munster, Jako S Burgers, Opeyemi O Babatunde, Yaolong Chen, Janne Estill

Introduction: The use of guidelines in multimorbidity-related practice has not yet been extensively investigated. We aimed to explore how healthcare professionals use guidelines when managing individuals with multimorbidity.

Method: We conducted an exploratory survey among a convenience sample of medical professionals with clinical experience. The questionnaire addressed whether and how different types of guidelines are used in multimorbidity-related practice, the reasons for not using specific types of guidelines, and other approaches to inform multimorbidity practice. It was distributed through the investigators' contact networks. The results were presented descriptively.

Result: We received 311 valid responses: 136 from the WHO European Region, 137 from the Western Pacific Region, and 38 from other regions. Most participants were familiar with the concept of multimorbidity (n=245, 79%). Among the 269 respondents who reported using guidelines in multimorbidity practice, 124 (46%) used guidelines specifically focusing on combinations of diseases, and 148 (55%) multiple single-disease guidelines together. Lack of availability was the main reason for not using guidelines that address multimorbidity itself; and the high number of guidelines (n=76, 40%) and possible interactions between conditions or treatments (n=62, 38%) for not using single-disease guidelines. Respondents frequently consult experts or refer to systematic reviews and primary studies when existing guidelines do not meet their needs. The development of a tool or method to guide the use of multiple guidelines ranked highest among possible actions to improve multimorbidity practice.

Conclusion: Although the medical professionals in our sample were generally familiar with use of guidelines, there are many unmet needs and tool gaps related to guideline-informed multimorbidity-related practice.

导论:指南在多发病相关实践中的应用尚未得到广泛调查。我们的目的是探讨医疗保健专业人员在管理多病个体时如何使用指南。方法:在方便抽样的具有临床经验的医学专业人员中进行探索性调查。该问卷调查了是否以及如何在多种疾病相关的实践中使用不同类型的指南,不使用特定类型指南的原因,以及为多种疾病实践提供信息的其他方法。它是通过调查人员的联系网络分发的。结果是描述性的。结果:我们收到311份有效答复:世卫组织欧洲区域136份,西太平洋区域137份,其他区域38份。大多数参与者熟悉多重发病的概念(n=245, 79%)。在报告在多病实践中使用指南的269名答复者中,124名(46%)使用了专门侧重于疾病组合的指南,148名(55%)同时使用了多种单一疾病指南。缺乏可得性是不使用针对多重发病本身的指南的主要原因;不使用单一疾病指南的高指南数量(n= 76,40%)和疾病或治疗之间可能的相互作用(n= 62,38%)。当现有指南不能满足其需求时,应答者经常咨询专家或参考系统评价和初步研究。在改善多病实践的可能行动中,开发一种工具或方法来指导多重指南的使用是最重要的。结论:虽然我们样本中的医疗专业人员普遍熟悉指南的使用,但在指南相关的多病相关实践中,仍有许多未满足的需求和工具差距。
{"title":"The use of guidelines in multimorbidity related practice: An exploratory questionnaire survey.","authors":"Zijun Wang, Hongfeng He, Sergey K Zyryanov, Liliya E Ziganshina, Akihiko Ozaki, Natalia Dorofeeva, Myeong Soo Lee, Ivan D Florez, Etienne Ngeh, Abhilasha Sharma, Ekaterina V Yudina, Barbara C van Munster, Jako S Burgers, Opeyemi O Babatunde, Yaolong Chen, Janne Estill","doi":"10.1016/j.jclinepi.2025.112115","DOIUrl":"https://doi.org/10.1016/j.jclinepi.2025.112115","url":null,"abstract":"<p><strong>Introduction: </strong>The use of guidelines in multimorbidity-related practice has not yet been extensively investigated. We aimed to explore how healthcare professionals use guidelines when managing individuals with multimorbidity.</p><p><strong>Method: </strong>We conducted an exploratory survey among a convenience sample of medical professionals with clinical experience. The questionnaire addressed whether and how different types of guidelines are used in multimorbidity-related practice, the reasons for not using specific types of guidelines, and other approaches to inform multimorbidity practice. It was distributed through the investigators' contact networks. The results were presented descriptively.</p><p><strong>Result: </strong>We received 311 valid responses: 136 from the WHO European Region, 137 from the Western Pacific Region, and 38 from other regions. Most participants were familiar with the concept of multimorbidity (n=245, 79%). Among the 269 respondents who reported using guidelines in multimorbidity practice, 124 (46%) used guidelines specifically focusing on combinations of diseases, and 148 (55%) multiple single-disease guidelines together. Lack of availability was the main reason for not using guidelines that address multimorbidity itself; and the high number of guidelines (n=76, 40%) and possible interactions between conditions or treatments (n=62, 38%) for not using single-disease guidelines. Respondents frequently consult experts or refer to systematic reviews and primary studies when existing guidelines do not meet their needs. The development of a tool or method to guide the use of multiple guidelines ranked highest among possible actions to improve multimorbidity practice.</p><p><strong>Conclusion: </strong>Although the medical professionals in our sample were generally familiar with use of guidelines, there are many unmet needs and tool gaps related to guideline-informed multimorbidity-related practice.</p>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":" ","pages":"112115"},"PeriodicalIF":5.2,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sequential sample size calculations and learning curves safeguard the robust development of a clinical prediction model for individuals 顺序样本量计算和学习曲线保障了个体临床预测模型的稳健发展。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-19 DOI: 10.1016/j.jclinepi.2025.112117
Amardeep Legha , Joie Ensor , Rebecca Whittle , Lucinda Archer , Ben Van Calster , Evangelia Christodoulou , Kym I.E. Snell , Mohsen Sadatsafavi , Gary S. Collins , Richard D. Riley

Background and Objectives

When recruiting participants to a new study developing a clinical prediction model (CPM), sample size calculations are typically conducted before data collection based on sensible assumptions. This leads to a fixed sample size, but if the assumptions are inaccurate, the actual sample size required to develop a reliable model may be higher or even lower. To safeguard against this, adaptive sample size approaches have been proposed, based on sequential evaluation of (changes in) a model's predictive performance. The objective of the study was to illustrate and extend sequential sample size calculations for CPM development by (i) proposing stopping rules for prospective data collection based on minimizing uncertainty (instability) and misclassification of individual-level predictions and (ii) showcasing how it safeguards against inaccurate fixed sample size calculations.

Methods

Using the sequential approach repeats the predefined model development strategy every time a chosen number (eg, 100) of participants are recruited and adequately followed up. At each stage, CPM performance is evaluated using bootstrapping, leading to prediction and classification stability statistics and plots, alongside optimism-adjusted measures of calibration and discrimination. Learning curves display the trend of results against sample size and recruitment is stopped when a chosen stopping rule is met.

Results

Our approach is illustrated for model development of acute kidney injury using (penalized) logistic regression CPMs. Before recruitment based on perceived sensible assumptions, the fixed sample size calculation suggests recruiting 342 patients to minimize overfitting; however, during data collection, the sequential approach reveals that a much larger sample size of 1100 is required to minimize overfitting (targeting a bootstrap-corrected calibration slope ≥0.9). If the stopping rule criteria also target small uncertainty and misclassification probability of individual predictions, the sequential approach suggests an even larger sample size of about 1800.

Conclusion

For CPM development studies involving prospective data collection, a sequential sample size approach allows users to dynamically monitor individual-level prediction and classification instability. This helps determine when enough participants have been recruited and safeguards against using inaccurate assumptions in a sample size calculation before data collection. Engagement with patients and other stakeholders is crucial to identify sensible context-specific stopping rules for robust individual predictions.
背景:在招募参与者参与一项开发临床预测模型(CPM)的新研究时,通常在基于合理假设的数据收集之前进行样本量计算。这导致固定的样本量,但如果假设不准确,则开发可靠模型所需的实际样本量可能会更高甚至更低。为了防止这种情况,已经提出了基于对模型预测性能(变化)的顺序评估的自适应样本量方法。目标:通过(i)提出基于最小化不确定性(不稳定性)和个人水平预测的错误分类的前瞻性数据收集的停止规则,以及(ii)展示它如何防止不准确的固定样本量计算,来说明和扩展CPM开发的顺序样本量计算。方法:使用顺序方法重复预先定义的模型开发策略,每次选定数量的参与者(例如,100)被招募并充分跟踪。在每个阶段,CPM的性能使用自举来评估,导致预测和分类稳定性统计和图,以及乐观调整的校准和判别措施。学习曲线显示结果对样本量的趋势,当选择的停止规则满足时,招募将停止。结果:我们的方法是用(惩罚)逻辑回归cpm来说明急性肾损伤的模型开发。在招募之前,基于感知到的合理假设,固定样本量计算建议招募342名患者,以尽量减少过拟合;然而,在数据收集过程中,顺序方法显示需要更大的1100个样本量来最小化过拟合(目标是引导校正的校准斜率≥0.9)。如果停止规则标准也针对单个预测的小不确定性和错误分类概率,则顺序方法建议更大的样本量,约为n=1800。结论:对于涉及前瞻性数据收集的CPM发展研究,顺序样本量方法允许用户动态监测个人水平的预测和分类不稳定性。这有助于确定何时招募了足够的参与者,并防止在数据招募之前在样本大小计算中使用不准确的假设。与患者和其他利益相关者的接触对于确定合理的针对具体情况的停止规则以进行稳健的个人预测至关重要。
{"title":"Sequential sample size calculations and learning curves safeguard the robust development of a clinical prediction model for individuals","authors":"Amardeep Legha ,&nbsp;Joie Ensor ,&nbsp;Rebecca Whittle ,&nbsp;Lucinda Archer ,&nbsp;Ben Van Calster ,&nbsp;Evangelia Christodoulou ,&nbsp;Kym I.E. Snell ,&nbsp;Mohsen Sadatsafavi ,&nbsp;Gary S. Collins ,&nbsp;Richard D. Riley","doi":"10.1016/j.jclinepi.2025.112117","DOIUrl":"10.1016/j.jclinepi.2025.112117","url":null,"abstract":"<div><h3>Background and Objectives</h3><div>When recruiting participants to a new study developing a clinical prediction model (CPM), sample size calculations are typically conducted before data collection based on sensible assumptions. This leads to a fixed sample size, but if the assumptions are inaccurate, the actual sample size required to develop a reliable model may be higher or even lower. To safeguard against this, adaptive sample size approaches have been proposed, based on sequential evaluation of (changes in) a model's predictive performance. The objective of the study was to illustrate and extend sequential sample size calculations for CPM development by (i) proposing stopping rules for prospective data collection based on minimizing uncertainty (instability) and misclassification of individual-level predictions and (ii) showcasing how it safeguards against inaccurate fixed sample size calculations.</div></div><div><h3>Methods</h3><div>Using the sequential approach repeats the predefined model development strategy every time a chosen number (eg, 100) of participants are recruited and adequately followed up. At each stage, CPM performance is evaluated using bootstrapping, leading to prediction and classification stability statistics and plots, alongside optimism-adjusted measures of calibration and discrimination. Learning curves display the trend of results against sample size and recruitment is stopped when a chosen stopping rule is met.</div></div><div><h3>Results</h3><div>Our approach is illustrated for model development of acute kidney injury using (penalized) logistic regression CPMs. Before recruitment based on perceived sensible assumptions, the fixed sample size calculation suggests recruiting 342 patients to minimize overfitting; however, during data collection, the sequential approach reveals that a much larger sample size of 1100 is required to minimize overfitting (targeting a bootstrap-corrected calibration slope ≥0.9). If the stopping rule criteria also target small uncertainty and misclassification probability of individual predictions, the sequential approach suggests an even larger sample size of about 1800.</div></div><div><h3>Conclusion</h3><div>For CPM development studies involving prospective data collection, a sequential sample size approach allows users to dynamically monitor individual-level prediction and classification instability. This helps determine when enough participants have been recruited and safeguards against using inaccurate assumptions in a sample size calculation before data collection. Engagement with patients and other stakeholders is crucial to identify sensible context-specific stopping rules for robust individual predictions.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"191 ","pages":"Article 112117"},"PeriodicalIF":5.2,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text message incentives increased patient-reported outcomes survey response in emergency care: SWAT findings 短信激励在急诊护理中增加了患者报告结果调查的反应:SWAT的发现。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-19 DOI: 10.1016/j.jclinepi.2025.112116
Gemma Altinger , Sweekriti Sharma , Qiang Li , Anthony Devaux , Samantha Darby , Aidan van Wyk , Caitlin M.P. Jones , Chris G. Maher , Adrian C. Traeger
<div><h3>Objectives</h3><div>To determine if text message–based behavioral interventions could increase response rates to a patient-reported outcomes survey in the emergency department (ED).</div></div><div><h3>Study Design and Setting</h3><div>We conducted a study within a trial (SWAT), within the NUDG-ED trial. The NUDG-ED trial aimed to reduce low-value care for patients with back pain presenting to eight EDs in Sydney, Australia. This SWAT was a 3-arm randomized controlled trial (RCT) nested within the NUDG-ED trial. After discharge from the ED, patients were randomized to receive one of the three text message invitations to complete a follow-up patient-reported outcome survey: a standard control message, or one of two behaviorally informed messages including either a prize draw incentive or prosocial framing. Our primary outcome measure was the response rate in each study group. We performed a linear mixed-effects model controlling for hospital heterogeneity and patient characteristics to estimate the mean difference (MD) in proportions with 95% CI, to determine the effectiveness of the behavioral interventions.</div></div><div><h3>Results</h3><div>A total of 1494 patients were randomized between May 15, 2024 and January 29, 2025. Of these, 52% were women, the median age was 46 years (IQR 35, 62), 43% were from disadvantaged areas and 51% were triaged with a clinically urgent condition. Baseline characteristics were balanced across all groups. Our primary analysis found that compared to the control, the prize draw incentive increased response rates (<em>n</em> = 997 patients, MD = 6.9%, 95% CI: 1.8% to 11.9%, <em>P</em> = .007). Our adjusted mixed-effects model also found a significant increase in response rates (<em>n</em> = 979 patients, MD = 6.4%, 95% CI 1.3% to 11.4%, <em>P</em> = .013). Compared to the control, the prosocial framing message may have slightly increased response rates, but the results were not statistically significant (<em>n</em> = 996 patients, 17.2% vs 21.1%, MD = 3.9%, 95% CI: −1.1% to 8.9%).</div></div><div><h3>Conclusion</h3><div>In this randomized trial, a prize draw incentive modestly improved response rates to a patient-reported outcomes survey in routine emergency care settings. Prosocial framing may have slightly increased response rates, but the effect was uncertain. Both behavioral approaches warrant further testing in routine care settings.</div></div><div><h3>Plain Language Summary</h3><div>Patient-reported outcomes, such as surveys about how people feel and recover after care, are important for understanding what matters most to patients. However, response rates to these surveys are often very low, especially in real clinical settings. This makes it difficult to draw strong conclusions about whether treatments are helping patients. So, researchers and health services need to find ways to improve response rates. This study looked at whether simple text message strategies could encourage more patients to com
目的:确定基于短信的行为干预是否可以提高对急诊科(ED)患者报告结果调查的反应率。研究设计和设置:我们在NUDG-ED试验中进行了一项试验(SWAT)中的研究。NUDG-ED试验旨在减少澳大利亚悉尼8家急诊科对背痛患者的低价值护理。SWAT是一项嵌套在NUDG-ED试验中的三组随机对照试验。从急症室出院后,患者被随机分配收到三条短信邀请中的一条,以完成后续的患者报告结果调查:一条标准控制信息,或两条行为告知信息中的一条,其中包括抽奖激励或亲社会框架。我们的主要结局指标是每个研究组的反应率。我们采用线性混合效应模型,控制医院异质性和患者特征,以95%置信区间(CI)估计比例的平均差异(MD),以确定行为干预的有效性。结果:在2024年5月15日至2025年1月29日期间,共有1494名患者被随机分组。其中52%为女性,中位年龄为46岁(IQR 35,62), 43%来自贫困地区,51%为临床急症患者。所有组的基线特征都是平衡的。我们的初步分析发现,与对照组相比,抽奖激励提高了反应率(n = 997例患者,MD = 6.9%, 95% CI: 1.8%至11.9%,p = 0.007)。我们调整后的混合效应模型也发现缓解率显著增加(n = 979例患者,MD = 6.4%, 95% CI 1.3%至11.4%,p = 0.013)。与对照组相比,亲社会框架信息的反应率无统计学意义差异(n = 996例患者,17.2% vs. 21.1%, MD = 3.9%, 95% CI: -1.1% ~ 8.9%)。结论:在这项随机试验中,抽奖激励适度提高了常规急诊环境中患者报告结果调查的反应率。亲社会框架可能会略微提高反应率,但效果尚不确定。这两种行为方法都值得在常规护理环境中进一步测试。
{"title":"Text message incentives increased patient-reported outcomes survey response in emergency care: SWAT findings","authors":"Gemma Altinger ,&nbsp;Sweekriti Sharma ,&nbsp;Qiang Li ,&nbsp;Anthony Devaux ,&nbsp;Samantha Darby ,&nbsp;Aidan van Wyk ,&nbsp;Caitlin M.P. Jones ,&nbsp;Chris G. Maher ,&nbsp;Adrian C. Traeger","doi":"10.1016/j.jclinepi.2025.112116","DOIUrl":"10.1016/j.jclinepi.2025.112116","url":null,"abstract":"&lt;div&gt;&lt;h3&gt;Objectives&lt;/h3&gt;&lt;div&gt;To determine if text message–based behavioral interventions could increase response rates to a patient-reported outcomes survey in the emergency department (ED).&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Study Design and Setting&lt;/h3&gt;&lt;div&gt;We conducted a study within a trial (SWAT), within the NUDG-ED trial. The NUDG-ED trial aimed to reduce low-value care for patients with back pain presenting to eight EDs in Sydney, Australia. This SWAT was a 3-arm randomized controlled trial (RCT) nested within the NUDG-ED trial. After discharge from the ED, patients were randomized to receive one of the three text message invitations to complete a follow-up patient-reported outcome survey: a standard control message, or one of two behaviorally informed messages including either a prize draw incentive or prosocial framing. Our primary outcome measure was the response rate in each study group. We performed a linear mixed-effects model controlling for hospital heterogeneity and patient characteristics to estimate the mean difference (MD) in proportions with 95% CI, to determine the effectiveness of the behavioral interventions.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Results&lt;/h3&gt;&lt;div&gt;A total of 1494 patients were randomized between May 15, 2024 and January 29, 2025. Of these, 52% were women, the median age was 46 years (IQR 35, 62), 43% were from disadvantaged areas and 51% were triaged with a clinically urgent condition. Baseline characteristics were balanced across all groups. Our primary analysis found that compared to the control, the prize draw incentive increased response rates (&lt;em&gt;n&lt;/em&gt; = 997 patients, MD = 6.9%, 95% CI: 1.8% to 11.9%, &lt;em&gt;P&lt;/em&gt; = .007). Our adjusted mixed-effects model also found a significant increase in response rates (&lt;em&gt;n&lt;/em&gt; = 979 patients, MD = 6.4%, 95% CI 1.3% to 11.4%, &lt;em&gt;P&lt;/em&gt; = .013). Compared to the control, the prosocial framing message may have slightly increased response rates, but the results were not statistically significant (&lt;em&gt;n&lt;/em&gt; = 996 patients, 17.2% vs 21.1%, MD = 3.9%, 95% CI: −1.1% to 8.9%).&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Conclusion&lt;/h3&gt;&lt;div&gt;In this randomized trial, a prize draw incentive modestly improved response rates to a patient-reported outcomes survey in routine emergency care settings. Prosocial framing may have slightly increased response rates, but the effect was uncertain. Both behavioral approaches warrant further testing in routine care settings.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Plain Language Summary&lt;/h3&gt;&lt;div&gt;Patient-reported outcomes, such as surveys about how people feel and recover after care, are important for understanding what matters most to patients. However, response rates to these surveys are often very low, especially in real clinical settings. This makes it difficult to draw strong conclusions about whether treatments are helping patients. So, researchers and health services need to find ways to improve response rates. This study looked at whether simple text message strategies could encourage more patients to com","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"191 ","pages":"Article 112116"},"PeriodicalIF":5.2,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Paper 1: a semi-automated approach facilitated the assessment of the certainty of evidence in a network meta-analysis: part 1 – Direct comparisons 半自动化方法促进了网络荟萃分析中证据确定性的评估:第1部分-直接比较。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-18 DOI: 10.1016/j.jclinepi.2025.112109
Mohammed Mujaab Kamso , Samuel L. Whittle , Jordi Pardo Pardo , Rachelle Buchbinder , George Wells , Rob Deardon , Tolulope Sajobi , George Tomlinson , Jesse Elliott , Jocelyn Thomas , Shannon E. Kelly , Romina Brignardello-Petersen , Glen S. Hazlewood

Objectives

To implement and evaluate a semi-automated approach to facilitate rating the Grading, Recommendation, Assessment, Development and Evaluation (GRADE) certainty of evidence (CoE) for direct comparisons within two living network meta-analysis.

Methods

For each of three GRADE domains (study limitations, indirectness, and inconsistency), decision rules were developed and used to generate automated judgments for each domain and the overall certainty. Inputs included risk of bias and indirectness ratings for each study and measures of heterogeneity. Indirectness ratings were made by two independent reviewers and resolved through consensus. With the help of an online tool (customized to our project), two independent raters viewed forest plots and additional data and could confirm or modify the suggested rating. Disagreements were resolved by consensus. We evaluated inter-rater reliability and accuracy.

Results

Across 374 direct comparisons, there was perfect agreement (100%) between the automated judgment and reviewer consensus, when only a single study was available (n = 292), and near-perfect agreement when more than one study was available (99%–100% for the three GRADE domains and 96% for overall rating). Inter-rater reliability was near perfect (Gwet's AC1 kappa score ranging from 96% to 100%).

Conclusion

Automated judgments using established decision rules agreed with expert judgment for the vast majority of GRADE CoE ratings.
目的:实施和评估一种半自动化的方法,以促进对分级、推荐、评估、发展和评估(GRADE)证据确定性(CoE)进行评级,以便在两个生活网络元分析中进行直接比较。方法:对于三个GRADE领域(研究局限性、间隔性和不一致性)中的每一个,开发了决策规则,并用于为每个领域和总体确定性生成自动判断。输入包括每项研究的偏倚风险和间接评级以及异质性测量。间接评级由两名独立的评论者做出,并通过协商一致的方式解决。在在线工具(为我们的项目定制)的帮助下,两个独立的评分者查看了森林地块和其他数据,并可以确认或修改建议的评级。分歧通过协商一致得到解决。我们评估了评分者之间的信度和准确性。结果:在374个直接比较中,当只有一项研究可用时(n=292),自动判断和审稿人共识之间存在完全一致(100%),当有多个研究可用时(三个GRADE域为99 - 100%,总体评分为96%),几乎完全一致。评估间的信度近乎完美(Gwet的AC1 kappa在96%到100%之间)。结论:使用既定决策规则的自动判断与绝大多数GRADE CoE评级的专家判断一致。
{"title":"Paper 1: a semi-automated approach facilitated the assessment of the certainty of evidence in a network meta-analysis: part 1 – Direct comparisons","authors":"Mohammed Mujaab Kamso ,&nbsp;Samuel L. Whittle ,&nbsp;Jordi Pardo Pardo ,&nbsp;Rachelle Buchbinder ,&nbsp;George Wells ,&nbsp;Rob Deardon ,&nbsp;Tolulope Sajobi ,&nbsp;George Tomlinson ,&nbsp;Jesse Elliott ,&nbsp;Jocelyn Thomas ,&nbsp;Shannon E. Kelly ,&nbsp;Romina Brignardello-Petersen ,&nbsp;Glen S. Hazlewood","doi":"10.1016/j.jclinepi.2025.112109","DOIUrl":"10.1016/j.jclinepi.2025.112109","url":null,"abstract":"<div><h3>Objectives</h3><div>To implement and evaluate a semi-automated approach to facilitate rating the Grading, Recommendation, Assessment, Development and Evaluation (GRADE) certainty of evidence (CoE) for direct comparisons within two living network meta-analysis.</div></div><div><h3>Methods</h3><div>For each of three GRADE domains (study limitations, indirectness, and inconsistency), decision rules were developed and used to generate automated judgments for each domain and the overall certainty. Inputs included risk of bias and indirectness ratings for each study and measures of heterogeneity. Indirectness ratings were made by two independent reviewers and resolved through consensus. With the help of an online tool (customized to our project), two independent raters viewed forest plots and additional data and could confirm or modify the suggested rating. Disagreements were resolved by consensus. We evaluated inter-rater reliability and accuracy.</div></div><div><h3>Results</h3><div>Across 374 direct comparisons, there was perfect agreement (100%) between the automated judgment and reviewer consensus, when only a single study was available (<em>n</em> = 292), and near-perfect agreement when more than one study was available (99%–100% for the three GRADE domains and 96% for overall rating). Inter-rater reliability was near perfect (Gwet's AC1 kappa score ranging from 96% to 100%).</div></div><div><h3>Conclusion</h3><div>Automated judgments using established decision rules agreed with expert judgment for the vast majority of GRADE CoE ratings.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"191 ","pages":"Article 112109"},"PeriodicalIF":5.2,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145800878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The opacity and exemption of artificial intelligence or the epic of explainable artificial intelligence, reply to commentary by Rattanapitoon et al 评论:AI的不透明和豁免或可解释AI的史诗,回复Rattanapitoon等人的评论。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-17 DOI: 10.1016/j.jclinepi.2025.112111
Manuel Marques-Cruz, Rafael José Vieira, Sara Gil Mata, Bernardo Sousa-Pinto
{"title":"The opacity and exemption of artificial intelligence or the epic of explainable artificial intelligence, reply to commentary by Rattanapitoon et al","authors":"Manuel Marques-Cruz,&nbsp;Rafael José Vieira,&nbsp;Sara Gil Mata,&nbsp;Bernardo Sousa-Pinto","doi":"10.1016/j.jclinepi.2025.112111","DOIUrl":"10.1016/j.jclinepi.2025.112111","url":null,"abstract":"","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"191 ","pages":"Article 112111"},"PeriodicalIF":5.2,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145795605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Health experiences and inequalities across intersecting social identities in health research: a scoping review. 卫生研究中交叉社会身份的卫生经验和不平等:范围审查。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-16 DOI: 10.1016/j.jclinepi.2025.112112
Azar Alexander-Sefre, Frances Sherratt, Heidi Green, Shaun Treweek, Victoria Shepherd

Background: Intersectionality provides a framework to help enable critical thinking about how sociodemographic factors interact. There is currently limited evidence on whether having multiple sociodemographic factors, typically associated with underrepresentation in research (e.g., minority ethnicity, lower socioeconomic status), affect health conditions and outcomes. Given the essential role that clinical trials have in the development of effective treatments, this makes it challenging to address whether intersectionality should be considered in trials.

Aim/objectives: This scoping review aimed to map the existing literature on the impact of intersectionality on health inequalities and outcomes in developed economies, and identify how intersecting sociodemographic factors affect health.

Methods: Following the Arksey and O'Malley Framework and Joanna Briggs Institute methodology, the review adhered to PRISMA-ScR guidelines. Databases searched included Medline, Embase, Web of Science, International Bibliography of the Social Sciences, and Sociological Abstracts. Selection criteria were based on the Population-Concept-Context mnemonic, targeting studies that explicitly referenced intersecting sociodemographic factors and their impact on health experiences. Data were extracted from the Discussions section of the included studies, specifically any reports of the effects of intersectional sociodemographic factors, such as ethnicity, sex, gender, and socioeconomic status, on health conditions and outcomes.

Results: Thirty-three studies met the inclusion criteria. The review found that people who belong to more than one sociodemographic group typically under-served in research (e.g., minoritised ethnic and experience of socioeconomic disadvantage), tend to have poorer health. This review also found that context is an important component, with some traditionally privileged groups (e.g., white, male and with a high socioeconomic background) having relatively poorer health outcomes depending on the context.

Conclusion: Overall, possessing greater intersectionality is likely to lead to poorer health, however there is no simple relationship, and context plays a role. These findings emphasise the need for inclusive clinical trials that account for multiple sociodemographic factors and the necessity of designing inclusive research that reflects diverse populations.

背景:交叉性提供了一个框架,有助于对社会人口因素如何相互作用进行批判性思考。目前关于通常与研究中代表性不足有关的多种社会人口因素(例如,少数族裔、较低的社会经济地位)是否影响健康状况和结果的证据有限。鉴于临床试验在开发有效治疗方法中的重要作用,这使得解决是否应该在试验中考虑交叉性变得具有挑战性。目的/目标:本次范围审查的目的是绘制关于交叉性对发达经济体健康不平等和结果的影响的现有文献,并确定交叉性的社会人口因素如何影响健康。方法:遵循Arksey和O'Malley框架和Joanna Briggs研究所的方法,遵循PRISMA-ScR指南。检索的数据库包括Medline、Embase、Web of Science、International Bibliography of Social Sciences和Sociological Abstracts。选择标准基于人口-概念-背景助记法,目标研究明确引用交叉的社会人口因素及其对健康体验的影响。数据摘自纳入研究的讨论部分,特别是关于交叉社会人口因素(如种族、性别、性别和社会经济地位)对健康状况和结果的影响的任何报告。结果:33项研究符合纳入标准。审查发现,属于一个以上的社会人口群体的人通常在研究中得不到充分的服务(例如,少数民族和社会经济劣势经历),往往健康状况较差。该审查还发现,环境是一个重要组成部分,一些传统上享有特权的群体(例如,白人、男性和具有较高的社会经济背景)的健康结果相对较差,这取决于环境。结论:总体而言,拥有更大的交叉性可能导致更差的健康状况,但没有简单的关系,环境起作用。这些发现强调了考虑多种社会人口因素的包容性临床试验的必要性,以及设计反映不同人群的包容性研究的必要性。
{"title":"Health experiences and inequalities across intersecting social identities in health research: a scoping review.","authors":"Azar Alexander-Sefre, Frances Sherratt, Heidi Green, Shaun Treweek, Victoria Shepherd","doi":"10.1016/j.jclinepi.2025.112112","DOIUrl":"https://doi.org/10.1016/j.jclinepi.2025.112112","url":null,"abstract":"<p><strong>Background: </strong>Intersectionality provides a framework to help enable critical thinking about how sociodemographic factors interact. There is currently limited evidence on whether having multiple sociodemographic factors, typically associated with underrepresentation in research (e.g., minority ethnicity, lower socioeconomic status), affect health conditions and outcomes. Given the essential role that clinical trials have in the development of effective treatments, this makes it challenging to address whether intersectionality should be considered in trials.</p><p><strong>Aim/objectives: </strong>This scoping review aimed to map the existing literature on the impact of intersectionality on health inequalities and outcomes in developed economies, and identify how intersecting sociodemographic factors affect health.</p><p><strong>Methods: </strong>Following the Arksey and O'Malley Framework and Joanna Briggs Institute methodology, the review adhered to PRISMA-ScR guidelines. Databases searched included Medline, Embase, Web of Science, International Bibliography of the Social Sciences, and Sociological Abstracts. Selection criteria were based on the Population-Concept-Context mnemonic, targeting studies that explicitly referenced intersecting sociodemographic factors and their impact on health experiences. Data were extracted from the Discussions section of the included studies, specifically any reports of the effects of intersectional sociodemographic factors, such as ethnicity, sex, gender, and socioeconomic status, on health conditions and outcomes.</p><p><strong>Results: </strong>Thirty-three studies met the inclusion criteria. The review found that people who belong to more than one sociodemographic group typically under-served in research (e.g., minoritised ethnic and experience of socioeconomic disadvantage), tend to have poorer health. This review also found that context is an important component, with some traditionally privileged groups (e.g., white, male and with a high socioeconomic background) having relatively poorer health outcomes depending on the context.</p><p><strong>Conclusion: </strong>Overall, possessing greater intersectionality is likely to lead to poorer health, however there is no simple relationship, and context plays a role. These findings emphasise the need for inclusive clinical trials that account for multiple sociodemographic factors and the necessity of designing inclusive research that reflects diverse populations.</p>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":" ","pages":"112112"},"PeriodicalIF":5.2,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145783582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beautiful weights, misinterpreted effects: the use and misuse of overlap weighting in major medical journals, 2020–2025 美丽的权重,误解的效果:主要医学期刊重叠权重的使用和滥用,2020-2025。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-16 DOI: 10.1016/j.jclinepi.2025.112113
John G. Rizk , Giuseppe Lippi , Carl J. Lavie

Objectives

To evaluate the implementation and reporting practices of overlap weighting in major medical journals.

Study Design and Setting

We reviewed observational studies published from January 2020 to September 2025 in five major medical journals (Annals of Internal Medicine, The British Medical Journal [BMJ], Journal of the American Medical Association [JAMA], JAMA Internal Medicine, and The New England Journal of Medicine [NEJM]) that used overlap weighting as a primary or sensitivity adjustment method. Reporting quality was assessed for estimand specification, definition of the overlap population, justification of the method, acknowledgment of advantages, and discussion of interpretability.

Results

Seventeen eligible studies were identified. Four studies (24%) correctly named the estimand as the average treatment effect in the overlap population; two (12%) misreported the estimand as average treatment effect, and the remainder did not specify an estimand. Ten studies (59%) described the overlap population at least partially. Sixteen studies (94%) highlighted at least one statistical advantage of overlap weighting, yet none acknowledged that results apply only to the overlap population. These results point to a notable gap in estimand reporting.

Conclusion

Clearer specification of the estimand and its target population is essential to prevent misinterpretation. Strengthening reporting standards will support more transparent and appropriate use of overlap weighting in medical research.

Plain Language Summary

This study examined how major medical journals (from 2020–2025) report studies using overlap weighting, a method that focuses on patients who could receive either treatment. Most studies acknowledged advantages of using overlap weighting but did not clearly state that results apply only to this “overlap” group. Clear reporting of the target population and effect estimate is needed to avoid misleading interpretations.
目的:评价主要医学期刊重叠加权的实施和报告实践。研究设计和设置:我们回顾了2020年1月至2025年5月发表在五大医学期刊(Annals of Internal Medicine、BMJ、JAMA、JAMA Internal Medicine和NEJM)上的观察性研究,这些研究使用重叠加权作为主要或敏感性调整方法。对报告质量进行评估,包括评估规范、重叠人群的定义、方法的合理性、优势的确认和可解释性的讨论。结果:确定了17项符合条件的研究。四项研究(24%)正确地将估计命名为重叠人群(ATO)的平均治疗效果;2个(12%)错误地将估计报告为ATE,其余的没有指定估计。10项研究(59%)至少部分描述了重叠种群。16项研究(94%)强调了至少一个重叠加权的统计优势,但没有一个承认结果只适用于重叠人群。这些结果表明,在估算和报告方面存在显著差距。结论:明确评价指标及其目标人群是防止误读的关键。加强报告标准将有助于在医学研究中更加透明和适当地使用重叠加权。简明语言摘要:本评论研究了主要医学期刊(2020-2025)如何使用重叠加权法报道研究,这种方法侧重于可以接受任何一种治疗的患者。大多数研究正确地使用了这种方法,但没有明确说明结果只适用于这一“重叠”群体。需要清楚地报告目标人群和效果估计,以避免误导性的解释。
{"title":"Beautiful weights, misinterpreted effects: the use and misuse of overlap weighting in major medical journals, 2020–2025","authors":"John G. Rizk ,&nbsp;Giuseppe Lippi ,&nbsp;Carl J. Lavie","doi":"10.1016/j.jclinepi.2025.112113","DOIUrl":"10.1016/j.jclinepi.2025.112113","url":null,"abstract":"<div><h3>Objectives</h3><div>To evaluate the implementation and reporting practices of overlap weighting in major medical journals.</div></div><div><h3>Study Design and Setting</h3><div>We reviewed observational studies published from January 2020 to September 2025 in five major medical journals (<em>Annals of Internal Medicine</em>, <em>The British Medical Journal [</em><em>BMJ</em><em>]</em>, <em>Journal of the American Medical Association</em> [<em>JAMA</em>], <em>JAMA Internal Medicine</em>, and <em>The New England Journal of Medicine</em> [<em>NEJM</em>]) that used overlap weighting as a primary or sensitivity adjustment method. Reporting quality was assessed for estimand specification, definition of the overlap population, justification of the method, acknowledgment of advantages, and discussion of interpretability.</div></div><div><h3>Results</h3><div>Seventeen eligible studies were identified. Four studies (24%) correctly named the estimand as the average treatment effect in the overlap population; two (12%) misreported the estimand as average treatment effect, and the remainder did not specify an estimand. Ten studies (59%) described the overlap population at least partially. Sixteen studies (94%) highlighted at least one statistical advantage of overlap weighting, yet none acknowledged that results apply only to the overlap population. These results point to a notable gap in estimand reporting.</div></div><div><h3>Conclusion</h3><div>Clearer specification of the estimand and its target population is essential to prevent misinterpretation. Strengthening reporting standards will support more transparent and appropriate use of overlap weighting in medical research.</div></div><div><h3>Plain Language Summary</h3><div>This study examined how major medical journals (from 2020–2025) report studies using overlap weighting, a method that focuses on patients who could receive either treatment. Most studies acknowledged advantages of using overlap weighting but did not clearly state that results apply only to this “overlap” group. Clear reporting of the target population and effect estimate is needed to avoid misleading interpretations.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"191 ","pages":"Article 112113"},"PeriodicalIF":5.2,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145783628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Paper 2: a semi-automated approach facilitated the assessment of the certainty of evidence for in a network meta-analysis: part 2 – indirect and mixed comparisons 半自动化方法促进了网络荟萃分析中证据确定性的评估:第2部分-间接和混合比较。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-16 DOI: 10.1016/j.jclinepi.2025.112110
Mohammed Mujaab Kamso , Samuel L. Whittle , Jordi Pardo Pardo , Rachelle Buchbinder , George Wells , Rob Deardon , Tolulope Sajobi , George Tomlinson , Jesse Elliott , Jocelyn Thomas , Shannon E. Kelly , Romina Brignardello-Petersen , Glen S. Hazlewood

Objectives

To implement a semiautomated approach to facilitate rating the Grading, Recommendation, Assessment, Development and Evaluation certainty of evidence (CoE) for indirect and network meta-analysis (NMA) estimates.

Methods

We developed and implemented algorithms for generating automated ratings for the CoE for indirect and network estimates in two living NMAs of rheumatoid arthritis treatment. At the indirect stage, inputs included CoE ratings for direct estimates and the contribution matrix. Intransitivity ratings were assigned based on the indirectness ratings of the two direct estimates with the highest percent contribution. An online tool (customized to our project) facilitated assessment of imprecision on the network estimate. Automated ratings were reviewed by two independent experts.

Results

Across 1306 indirect comparisons, the contribution matrix identified the dominant branches of evidence regardless of whether a single first order loop was present (80%) or not. The reviewers agreed with all automated CoE ratings for incoherence (n = 34), network estimates (n = 34) and imprecision (n = 1447). They agreed with the automated intransitivity algorithm except when the total contribution of the top-two direct estimates was low (eg, <50%, which occurred in 38% of the estimates).

Conclusion

Automated approaches facilitated CoE ratings for indirect and network estimates. Further work is required to define appropriate algorithms for intransitivity.
目的:实施一种半自动方法,以方便对间接和网络元分析(NMA)估计的证据(CoE)的分级、推荐、评估、发展和评估(GRADE)确定性进行评级。方法:我们开发并实施了算法,用于为两个活体类风湿关节炎治疗nma的间接和网络估计生成CoE的自动评级。在间接阶段,输入包括直接估计的CoE等级和贡献矩阵。不可及性评级是根据贡献百分比最高的两个直接估计的间接评级来分配的。一个在线工具(为我们的项目定制)促进了对网络评估不精确性的评估。自动评级由两名独立专家审查。结果:在1306个间接比较中,贡献矩阵确定了证据的主要分支,而不管是否存在单个一阶循环(80%)。审稿人同意不一致性(n=34)、网络估计(n=34)和不精确(n=1447)的所有自动CoE评级。他们同意自动化不可传递性算法,除非前两种直接估计的总贡献较低(例如结论:自动化方法促进了间接和网络估计的CoE评级)。为不可及性定义适当的算法需要进一步的工作。
{"title":"Paper 2: a semi-automated approach facilitated the assessment of the certainty of evidence for in a network meta-analysis: part 2 – indirect and mixed comparisons","authors":"Mohammed Mujaab Kamso ,&nbsp;Samuel L. Whittle ,&nbsp;Jordi Pardo Pardo ,&nbsp;Rachelle Buchbinder ,&nbsp;George Wells ,&nbsp;Rob Deardon ,&nbsp;Tolulope Sajobi ,&nbsp;George Tomlinson ,&nbsp;Jesse Elliott ,&nbsp;Jocelyn Thomas ,&nbsp;Shannon E. Kelly ,&nbsp;Romina Brignardello-Petersen ,&nbsp;Glen S. Hazlewood","doi":"10.1016/j.jclinepi.2025.112110","DOIUrl":"10.1016/j.jclinepi.2025.112110","url":null,"abstract":"<div><h3>Objectives</h3><div>To implement a semiautomated approach to facilitate rating the Grading, Recommendation, Assessment, Development and Evaluation certainty of evidence (CoE) for indirect and network meta-analysis (NMA) estimates.</div></div><div><h3>Methods</h3><div>We developed and implemented algorithms for generating automated ratings for the CoE for indirect and network estimates in two living NMAs of rheumatoid arthritis treatment. At the indirect stage, inputs included CoE ratings for direct estimates and the contribution matrix. Intransitivity ratings were assigned based on the indirectness ratings of the two direct estimates with the highest percent contribution. An online tool (customized to our project) facilitated assessment of imprecision on the network estimate. Automated ratings were reviewed by two independent experts.</div></div><div><h3>Results</h3><div>Across 1306 indirect comparisons, the contribution matrix identified the dominant branches of evidence regardless of whether a single first order loop was present (80%) or not. The reviewers agreed with all automated CoE ratings for incoherence (<em>n</em> = 34), network estimates (<em>n</em> = 34) and imprecision (<em>n</em> = 1447). They agreed with the automated intransitivity algorithm except when the total contribution of the top-two direct estimates was low (eg, &lt;50%, which occurred in 38% of the estimates).</div></div><div><h3>Conclusion</h3><div>Automated approaches facilitated CoE ratings for indirect and network estimates. Further work is required to define appropriate algorithms for intransitivity.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"191 ","pages":"Article 112110"},"PeriodicalIF":5.2,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145783561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparison of AI-assisted and human-generated plain language summaries for Cochrane reviews: a randomised non-inferiority trial (HIET-1) [Registered Report - stage II] Cochrane综述中人工智能辅助和人工生成的简单语言摘要的比较:一项随机非劣效性试验(HIET-1)[注册报告- II期]
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-15 DOI: 10.1016/j.jclinepi.2025.112102
Declan Devane , Johanna Pope , Paula Byrne , Evan Forde , Isabel O'Byrne , Steven Woloshin , Eileen Culloty , Darren Dahly , Ingeborg Hess Elgersma , Heather Munthe-Kaas , Conor Judge , Martin O'Donnell , Finn Krewer , Sandra Galvin , Nikita N. Burke , Theresa Tierney , KM Saif-Ur-Rahman , Tom Conway , James Thomas
<div><h3>Objectives</h3><div>To compare the comprehension, readability, quality, safety, and trustworthiness of artificial intelligence (AI)-assisted vs human-generated plain language summaries (PLSs) for Cochrane systematic reviews.</div></div><div><h3>Study Design</h3><div>Randomized, parallel-group, two-arm, noninferiority trial (ISRCTN85699985).</div></div><div><h3>Setting</h3><div>Online survey platform, September 2025.</div></div><div><h3>Participants</h3><div>Adults aged 18 years or older with a minimum English reading proficiency of 7 out of 10, recruited via Prolific. Of the 500 individuals screened, 465 were randomized and 453 completed per-protocol analysis.</div></div><div><h3>Interventions</h3><div>Participants were randomly assigned to three AI-assisted PLSs developed with ChatGPT and human-in-the-loop verification, or to three published human-generated Cochrane PLSs for the same reviews.</div></div><div><h3>Outcomes</h3><div>Primary: comprehension (10-item questionnaire, noninferiority margin 10%). Secondary: readability quality and safety, trustworthiness, and authorship perception.</div></div><div><h3>Results</h3><div>Mean comprehension scores were 88.9% (<em>n</em> = 228) in the AI-assisted group and 89.0% (<em>n</em> = 225) in the human-generated group (mean difference −0.03 percentage points, 95% CI: −1.9% to 2.0%); the upper CI bound (2.0 percentage points) did not exceed the +10 percentage-point noninferiority margin, demonstrating noninferiority. Flesch-Kincaid Grade Level showed no significant difference (8.20 vs 8.38, <em>P</em> = .722), although formal noninferiority was missed (upper 95% CI bound 1.72 exceeded the 1.0 grade level margin). AI-assisted summaries scored higher on Flesch Reading Ease (63.33 vs 50.00, <em>P</em> = .008) and lower on the Coleman-Liau Index. All summaries met prespecified quality and safety standards (100% in both groups). Trustworthiness scores were comparable (3.98 vs 3.91, difference 0.068, 95% CI: −0.043 to 0.179; meeting noninferiority). Participants demonstrated limited ability to distinguish between authorship, correctly identifying AI-assisted summaries in 56.3% of cases and human-generated summaries in 34.7% (≈ chance for a three-option question), with 55.4% of human-generated summaries misattributed as AI-assisted. Exploratory subgroup analysis showed an age interaction (<em>P</em> = .023), though based on a small subgroup (<em>n</em> = 14, 3%).</div></div><div><h3>Conclusion</h3><div>AI-assisted PLSs with human oversight achieved comprehension levels noninferior to those of human-generated Cochrane summaries, with comparable quality, safety, and trust ratings. AI summaries were largely indistinguishable from those generated by humans. Pretrial verification identified and corrected numerical errors, confirming the need for human oversight. These findings support human-in-the-loop AI workflows for PLS production, though formal evaluation of the time and resource implications is needed
目的比较人工智能(AI)辅助与人工生成的简单语言摘要(pls)在Cochrane系统评价中的理解性、可读性、质量、安全性和可信度。研究设计:随机、平行组、双臂、非劣效性试验(ISRCTN85699985)。在线调查平台,2025年9月。参与者:18岁或以上的成年人,英语阅读能力至少达到7分(满分10分),通过多产网站招募。在筛选的500人中,465人被随机分配,453人完成了每个方案的分析。干预措施:参与者被随机分配到三个由ChatGPT和人在环验证开发的人工智能辅助PLSs中,或三个已发表的人工生成的Cochrane PLSs中进行相同的评价。主要结果:理解(10项问卷,非劣效度10%)。其次:可读性、质量和安全性、可信度和作者感知。结果人工智能辅助组的平均理解分数为88.9% (n = 228),人工辅助组的平均理解分数为89.0% (n = 225)(平均差异为- 0.03个百分点,95% CI: - 1.9% ~ 2.0%);CI上限(2.0个百分点)未超过+10个百分点的非劣效性边际,表明非劣效性。Flesch-Kincaid分级水平没有显着差异(8.20 vs 8.38, P = .722),尽管错过了正式的非劣效性(95% CI上限1.72超过1.0等级水平界限)。人工智能辅助摘要在Flesch Reading Ease得分较高(63.33 vs 50.00, P = 0.008),而在Coleman-Liau Index得分较低。所有总结均符合预先规定的质量和安全标准(两组均为100%)。可信度评分具有可比性(3.98 vs 3.91,差异0.068,95% CI: - 0.043 ~ 0.179;符合非劣效性)。参与者表现出有限的区分作者的能力,在56.3%的情况下正确识别人工智能辅助的摘要,在34.7%的情况下正确识别人工生成的摘要(三选项问题的概率≈),55.4%的人工生成的摘要被错误地归因于人工智能辅助。探索性亚组分析显示年龄相互作用(P = 0.023),尽管基于小亚组(n = 14.3%)。结论在人工监督下,人工智能辅助的sds达到了不低于人工生成的Cochrane摘要的理解水平,具有相当的质量、安全性和信任评级。人工智能的摘要在很大程度上与人类生成的摘要无法区分。审前验证识别并纠正了数值误差,确认了人工监督的必要性。这些发现支持PLS生产的人工智能工作流程,尽管需要对时间和资源影响进行正式评估,以建立优于传统手工方法的效率收益。
{"title":"Comparison of AI-assisted and human-generated plain language summaries for Cochrane reviews: a randomised non-inferiority trial (HIET-1) [Registered Report - stage II]","authors":"Declan Devane ,&nbsp;Johanna Pope ,&nbsp;Paula Byrne ,&nbsp;Evan Forde ,&nbsp;Isabel O'Byrne ,&nbsp;Steven Woloshin ,&nbsp;Eileen Culloty ,&nbsp;Darren Dahly ,&nbsp;Ingeborg Hess Elgersma ,&nbsp;Heather Munthe-Kaas ,&nbsp;Conor Judge ,&nbsp;Martin O'Donnell ,&nbsp;Finn Krewer ,&nbsp;Sandra Galvin ,&nbsp;Nikita N. Burke ,&nbsp;Theresa Tierney ,&nbsp;KM Saif-Ur-Rahman ,&nbsp;Tom Conway ,&nbsp;James Thomas","doi":"10.1016/j.jclinepi.2025.112102","DOIUrl":"10.1016/j.jclinepi.2025.112102","url":null,"abstract":"&lt;div&gt;&lt;h3&gt;Objectives&lt;/h3&gt;&lt;div&gt;To compare the comprehension, readability, quality, safety, and trustworthiness of artificial intelligence (AI)-assisted vs human-generated plain language summaries (PLSs) for Cochrane systematic reviews.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Study Design&lt;/h3&gt;&lt;div&gt;Randomized, parallel-group, two-arm, noninferiority trial (ISRCTN85699985).&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Setting&lt;/h3&gt;&lt;div&gt;Online survey platform, September 2025.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Participants&lt;/h3&gt;&lt;div&gt;Adults aged 18 years or older with a minimum English reading proficiency of 7 out of 10, recruited via Prolific. Of the 500 individuals screened, 465 were randomized and 453 completed per-protocol analysis.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Interventions&lt;/h3&gt;&lt;div&gt;Participants were randomly assigned to three AI-assisted PLSs developed with ChatGPT and human-in-the-loop verification, or to three published human-generated Cochrane PLSs for the same reviews.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Outcomes&lt;/h3&gt;&lt;div&gt;Primary: comprehension (10-item questionnaire, noninferiority margin 10%). Secondary: readability quality and safety, trustworthiness, and authorship perception.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Results&lt;/h3&gt;&lt;div&gt;Mean comprehension scores were 88.9% (&lt;em&gt;n&lt;/em&gt; = 228) in the AI-assisted group and 89.0% (&lt;em&gt;n&lt;/em&gt; = 225) in the human-generated group (mean difference −0.03 percentage points, 95% CI: −1.9% to 2.0%); the upper CI bound (2.0 percentage points) did not exceed the +10 percentage-point noninferiority margin, demonstrating noninferiority. Flesch-Kincaid Grade Level showed no significant difference (8.20 vs 8.38, &lt;em&gt;P&lt;/em&gt; = .722), although formal noninferiority was missed (upper 95% CI bound 1.72 exceeded the 1.0 grade level margin). AI-assisted summaries scored higher on Flesch Reading Ease (63.33 vs 50.00, &lt;em&gt;P&lt;/em&gt; = .008) and lower on the Coleman-Liau Index. All summaries met prespecified quality and safety standards (100% in both groups). Trustworthiness scores were comparable (3.98 vs 3.91, difference 0.068, 95% CI: −0.043 to 0.179; meeting noninferiority). Participants demonstrated limited ability to distinguish between authorship, correctly identifying AI-assisted summaries in 56.3% of cases and human-generated summaries in 34.7% (≈ chance for a three-option question), with 55.4% of human-generated summaries misattributed as AI-assisted. Exploratory subgroup analysis showed an age interaction (&lt;em&gt;P&lt;/em&gt; = .023), though based on a small subgroup (&lt;em&gt;n&lt;/em&gt; = 14, 3%).&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Conclusion&lt;/h3&gt;&lt;div&gt;AI-assisted PLSs with human oversight achieved comprehension levels noninferior to those of human-generated Cochrane summaries, with comparable quality, safety, and trust ratings. AI summaries were largely indistinguishable from those generated by humans. Pretrial verification identified and corrected numerical errors, confirming the need for human oversight. These findings support human-in-the-loop AI workflows for PLS production, though formal evaluation of the time and resource implications is needed","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"191 ","pages":"Article 112102"},"PeriodicalIF":5.2,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145897844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Clinical Epidemiology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1