首页 > 最新文献

Journal of Clinical Epidemiology最新文献

英文 中文
Psychometric properties of instruments for assessing the methodological quality and risk of bias in non-randomized studies of interventions: a systematic review and meta-analysis in accordance with the COSMIN guideline. 非随机干预研究中评估方法学质量和偏倚风险的工具的心理测量特性:根据COSMIN指南进行的系统回顾和荟萃分析。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-03-16 DOI: 10.1016/j.jclinepi.2026.112230
María Sánchez-Marco, Néstor Montoro-Pérez, María Rubio-Aparicio, María José Cabañero-Martínez, Mar Lozano-Casanova, Silvia Escribano

Objectives: Non-randomized studies of interventions provide valuable evidence in health sciences but are prone to biases that affect validity. Multiple instruments have been developed for critical appraisal, although their measurement properties remain insufficiently established. These instruments typically aim to evaluate two key theoretical constructs: methodological quality and risk of bias, which reflect different but complementary dimensions of study rigor and internal validity. The COSMIN framework offers internationally recognized standards for evaluating instrument robustness, thus enhancing transparency and comparability in their selection. This systematic review specifically focused on identifying and critically evaluating studies that have empirically tested the measurement properties of instruments developed to assess methodological quality and/or risk of bias in non-randomized studies of interventions.

Study design and setting: This systematic review and meta-analysis was conducted in accordance with the COSMIN initiative which is structured in three parts: (1) a literature search reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses and the PRISMA search extension; (2) an assessment of methodological quality and measurement properties using the "COSMIN Risk of Bias" tool, together with an evaluation of the certainty of evidence following the "Grading of Recommendations Assessment, Development, and Evaluation" and, (3) a meta-analysis of measurement properties when sufficient quantitative data were available across validation studies for a given instrument.

Results: Eleven instruments for critical appraisal of non-randomized studies of interventions were identified. None were evaluated for all COSMIN measurement properties; instrument development, content validity, and reliability were most consistently reported. MINORS, MMERSQI, and ASSESS demonstrated the highest quality evidence for methodological quality, while ROBINS-I provided the strongest evidence for risk of bias assessment. For instruments with sufficient comparable data, such as ROBINS-I, a meta-analysis of inter-rater reliability coefficients was conducted, showing moderate agreement for selection and exposure domains, with substantial heterogeneity across studies.

Conclusion: MINORS emerged as the most robust instrument for critical appraisal of methodological quality, whereas ROBINS-I stood out for risk of bias. ASSESS and MMERSQI provided adequate evidence of content validity, further assessment of different psychometric properties is needed, highlighting that only a small subset of available tools for NRSI have undergone formal psychometric validation. Further research is needed to strengthen the measurement evidence base for these instruments.

目的:干预措施的非随机研究为健康科学提供了有价值的证据,但容易产生影响有效性的偏差。已经开发了多种用于关键评估的仪器,尽管它们的测量特性仍然没有充分确定。这些工具通常旨在评估两个关键的理论结构:方法质量和偏倚风险,它们反映了研究严谨性和内部有效性的不同但互补的维度。COSMIN框架为评估仪器稳健性提供了国际公认的标准,从而提高了选择仪器的透明度和可比性。本系统综述特别侧重于识别和批判性评估研究,这些研究经验性地测试了用于评估方法质量和/或干预措施非随机研究中的偏倚风险的工具的测量特性。研究设计和背景:本系统评价和荟萃分析是根据COSMIN计划进行的,该计划分为三部分:(1)根据系统评价和荟萃分析的首选报告项目和PRISMA搜索扩展进行文献检索;(2)使用“COSMIN偏倚风险”工具对方法学质量和测量特性进行评估,并根据“建议评估、开发和评估的分级”对证据的确定性进行评估;(3)在对给定工具的验证研究中获得足够的定量数据时,对测量特性进行荟萃分析。结果:确定了11种对干预措施的非随机研究进行批判性评估的工具。没有人评估所有COSMIN测量特性;仪器开发、内容效度和可靠性是最一致的报告。minor、MMERSQI和ASSESS在方法学质量方面提供了最高质量的证据,而ROBINS-I在偏倚风险评估方面提供了最有力的证据。对于具有足够可比数据的仪器,如ROBINS-I,进行了评级间可靠性系数的荟萃分析,显示选择和暴露域的一致性中等,但研究之间存在很大的异质性。结论:对于方法学质量的关键评估,未成年人是最有力的工具,而ROBINS-I在偏倚风险方面表现突出。评估和MMERSQI提供了足够的内容效度证据,需要进一步评估不同的心理测量特性,强调只有一小部分可用的NRSI工具经过了正式的心理测量验证。需要进一步的研究来加强这些仪器的测量证据基础。
{"title":"Psychometric properties of instruments for assessing the methodological quality and risk of bias in non-randomized studies of interventions: a systematic review and meta-analysis in accordance with the COSMIN guideline.","authors":"María Sánchez-Marco, Néstor Montoro-Pérez, María Rubio-Aparicio, María José Cabañero-Martínez, Mar Lozano-Casanova, Silvia Escribano","doi":"10.1016/j.jclinepi.2026.112230","DOIUrl":"https://doi.org/10.1016/j.jclinepi.2026.112230","url":null,"abstract":"<p><strong>Objectives: </strong>Non-randomized studies of interventions provide valuable evidence in health sciences but are prone to biases that affect validity. Multiple instruments have been developed for critical appraisal, although their measurement properties remain insufficiently established. These instruments typically aim to evaluate two key theoretical constructs: methodological quality and risk of bias, which reflect different but complementary dimensions of study rigor and internal validity. The COSMIN framework offers internationally recognized standards for evaluating instrument robustness, thus enhancing transparency and comparability in their selection. This systematic review specifically focused on identifying and critically evaluating studies that have empirically tested the measurement properties of instruments developed to assess methodological quality and/or risk of bias in non-randomized studies of interventions.</p><p><strong>Study design and setting: </strong>This systematic review and meta-analysis was conducted in accordance with the COSMIN initiative which is structured in three parts: (1) a literature search reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses and the PRISMA search extension; (2) an assessment of methodological quality and measurement properties using the \"COSMIN Risk of Bias\" tool, together with an evaluation of the certainty of evidence following the \"Grading of Recommendations Assessment, Development, and Evaluation\" and, (3) a meta-analysis of measurement properties when sufficient quantitative data were available across validation studies for a given instrument.</p><p><strong>Results: </strong>Eleven instruments for critical appraisal of non-randomized studies of interventions were identified. None were evaluated for all COSMIN measurement properties; instrument development, content validity, and reliability were most consistently reported. MINORS, MMERSQI, and ASSESS demonstrated the highest quality evidence for methodological quality, while ROBINS-I provided the strongest evidence for risk of bias assessment. For instruments with sufficient comparable data, such as ROBINS-I, a meta-analysis of inter-rater reliability coefficients was conducted, showing moderate agreement for selection and exposure domains, with substantial heterogeneity across studies.</p><p><strong>Conclusion: </strong>MINORS emerged as the most robust instrument for critical appraisal of methodological quality, whereas ROBINS-I stood out for risk of bias. ASSESS and MMERSQI provided adequate evidence of content validity, further assessment of different psychometric properties is needed, highlighting that only a small subset of available tools for NRSI have undergone formal psychometric validation. Further research is needed to strengthen the measurement evidence base for these instruments.</p>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":" ","pages":"112230"},"PeriodicalIF":5.2,"publicationDate":"2026-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147482182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A systematic review to identify how children and young people were included in the development of pediatric core outcome sets. 一项确定如何将儿童和年轻人纳入儿科核心结局集制定的系统评价。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-03-12 DOI: 10.1016/j.jclinepi.2026.112220
Zelpha D'Souza, Andrada Ciucǎ, Jack Wilkinson, Ramona Moldovan, Jamie J Kirkham

Objectives: Core outcome sets (COS) are an agreed upon set of outcomes that should be evaluated in a clinical trial, allowing data from similar areas of healthcare to be compared and pooled in meta-analyses, informing whether treatments work and/or harm. Pediatric COS have been developed by identifying outcomes that are important to relevant stakeholders, and recently the importance of including children and young people (CYP) as stakeholders has been recognized. While research on the overall development of pediatric COS exists, no comprehensive review has examined methods implemented by COS developers to include CYP in the development process. This systematic review aims to establish current practice and identify issues in including CYP in developing a pediatric COS, particularly in the context of rare conditions.

Study design: We conducted a systematic review of studies in the Core Outcome Measures in Effectiveness Trials (COMET) Initiative database that developed pediatric core outcome sets (COS) for children and young people (CYP; 0-18 years). Data were extracted on methods used to include CYP, barriers to their inclusion and stakeholder participation. Authors were contacted when methodological details were unclear or insufficiently reported.

Results: 70 articles corresponding to 53 pediatric COS were analyzed; 18 (34%) included CYP, 16 (30%) were developed for a rare condition/condition with a rare form. CYP were included in advisory groups (6/18 COS), pilot surveys/questionnaires (3/18 COS), focus groups (2/18 COS), workshop (1/18 COS), qualitative interviews (8/18 COS), Delphi (and other) surveys (15/18 COS) and consensus meetings (5/18 COS). COS developers adapted methods for patient and public involvement and engagement work to improve participation in COS, to include CYP in generating outcomes and reach consensus on the most important outcomes to include in the COS. Barriers to including CYP were age and the condition/area of COS development.

Conclusion: Our findings show that with appropriate adaptations, CYP can be included in COS development, regardless of age, communication needs or whether conditions are rare or common. This review proposes adaptations to refine current pediatric COS methodology to increase CYP inclusion.

目的:核心结果集(COS)是一组应在临床试验中评估的商定结果集,允许来自类似医疗保健领域的数据进行比较并汇集在荟萃分析中,告知治疗是否有效和/或有害。通过确定对相关利益攸关方重要的结果来制定儿科COS,最近已认识到将儿童和年轻人(CYP)纳入利益攸关方的重要性。虽然对儿童COS的整体发展进行了研究,但尚未对COS开发商在开发过程中纳入CYP的方法进行全面审查。本系统综述旨在建立目前的实践,并确定包括CYP在发展儿科COS的问题,特别是在罕见的情况下。研究设计:我们对有效性试验核心结局测量(COMET)倡议数据库中的研究进行了系统回顾,该数据库为儿童和青少年(CYP; 0-18岁)开发了儿科核心结局集(COS)。数据被提取用于包括CYP,他们的纳入障碍和利益相关者参与的方法。当方法细节不清楚或报告不充分时,与作者联系。结果:共分析53例小儿COS的70篇文献;18例(34%)为CYP, 16例(30%)为罕见疾病/罕见形式的疾病。CYP被纳入咨询小组(6/18 COS)、试点调查/问卷调查(3/18 COS)、焦点小组(2/18 COS)、研讨会(1/18 COS)、定性访谈(8/18 COS)、德尔菲(及其他)调查(15/18 COS)和共识会议(5/18 COS)。COS开发人员调整了患者和公众参与和参与工作的方法,以提高COS的参与度,将CYP纳入产生结果,并就COS中最重要的结果达成共识。包括CYP的障碍是年龄和COS发展的条件/地区。结论:我们的研究结果表明,通过适当的适应,CYP可以包括在COS的发展中,无论年龄,沟通需求或疾病是罕见还是常见。本综述建议改进目前的儿科COS方法,以增加CYP的纳入。
{"title":"A systematic review to identify how children and young people were included in the development of pediatric core outcome sets.","authors":"Zelpha D'Souza, Andrada Ciucǎ, Jack Wilkinson, Ramona Moldovan, Jamie J Kirkham","doi":"10.1016/j.jclinepi.2026.112220","DOIUrl":"https://doi.org/10.1016/j.jclinepi.2026.112220","url":null,"abstract":"<p><strong>Objectives: </strong>Core outcome sets (COS) are an agreed upon set of outcomes that should be evaluated in a clinical trial, allowing data from similar areas of healthcare to be compared and pooled in meta-analyses, informing whether treatments work and/or harm. Pediatric COS have been developed by identifying outcomes that are important to relevant stakeholders, and recently the importance of including children and young people (CYP) as stakeholders has been recognized. While research on the overall development of pediatric COS exists, no comprehensive review has examined methods implemented by COS developers to include CYP in the development process. This systematic review aims to establish current practice and identify issues in including CYP in developing a pediatric COS, particularly in the context of rare conditions.</p><p><strong>Study design: </strong>We conducted a systematic review of studies in the Core Outcome Measures in Effectiveness Trials (COMET) Initiative database that developed pediatric core outcome sets (COS) for children and young people (CYP; 0-18 years). Data were extracted on methods used to include CYP, barriers to their inclusion and stakeholder participation. Authors were contacted when methodological details were unclear or insufficiently reported.</p><p><strong>Results: </strong>70 articles corresponding to 53 pediatric COS were analyzed; 18 (34%) included CYP, 16 (30%) were developed for a rare condition/condition with a rare form. CYP were included in advisory groups (6/18 COS), pilot surveys/questionnaires (3/18 COS), focus groups (2/18 COS), workshop (1/18 COS), qualitative interviews (8/18 COS), Delphi (and other) surveys (15/18 COS) and consensus meetings (5/18 COS). COS developers adapted methods for patient and public involvement and engagement work to improve participation in COS, to include CYP in generating outcomes and reach consensus on the most important outcomes to include in the COS. Barriers to including CYP were age and the condition/area of COS development.</p><p><strong>Conclusion: </strong>Our findings show that with appropriate adaptations, CYP can be included in COS development, regardless of age, communication needs or whether conditions are rare or common. This review proposes adaptations to refine current pediatric COS methodology to increase CYP inclusion.</p>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":" ","pages":"112220"},"PeriodicalIF":5.2,"publicationDate":"2026-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147460777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large language models show promising performance for some systematic review tasks but call for cautious implementation : a systematic review. 大型语言模型在一些系统审查任务中表现出良好的性能,但需要谨慎地实施:系统审查。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-03-12 DOI: 10.1016/j.jclinepi.2026.112221
Florian Laignelot, Guillaume L Martin, Mohamad Ossman, Ophélie Pingeon, Amine Boubaker, Emma Picovschi, Jia Kim, Xavier Tannier, Jérémie F Cohen, Agnès Dechartres

Objectives: With the exponential growth of biomedical literature, the challenge of conducting systematic reviews is becoming increasingly burdensome. We aimed to evaluate the performance of LLMs in the automation of some or all steps of systematic reviews and meta-analyses.

Study design and setting: In this systematic review, we searched PubMed, Embase, the Cochrane Library and preprint platforms up to 14/01/2025. We included any studies assessing the performance of LLMs (e.g., GPT, Claude, Mistral) in any step of the systematic review process. Pairs of reviewers independently extracted data and assessed risk of bias. We conducted analyses using median(IQR) for positive (PPA) and negative percent agreement (NPA), respectively analogous to sensitivity and specificity, between LLMs and human reviewers.

Results: From 3,889 unique references, we included 63 studies of which 52 reporting performance metrics for a total of 148 LLM performance assessments. Most assessments concerned GPT models (n=114, 77%). The most frequently evaluated tasks were Title and Abstract Screening (n=78, 53%), Data Extraction (n=23, 16%), and Full-Text screening (n=20, 14%). For Title and Abstract screening, overall median PPA was 0.92 (IQR 0.69-0.98) and median NPA was 0.89 (0.72-0.95). For full text screening, the overall median PPA was 0.93 (0.87-1.00) and median NPA was 0.92 (0.78-0.97). Late-generation LLMs released after GPT-4 seemed to provide higher performance than earlier models. For other tasks, authors reported overall good performances, but variability of performance metrics precluded complete quantitative synthesis. Global accuracy for data extraction tasks ranged from 0.36 to 1.00, with a median accuracy of 0.95 (IQR 0.91-0.97, n=11). For the 'Risk of Bias assessment' task, accuracy ranged from 0.44 to 0.90 (median = 0.62, IQR 0.53-0.76, n=6).

Conclusion: The performance of LLMs, particularly newer generations, shows promise in automating some repetitive steps of systematic reviews such as screening. However, their successful integration will require appropriate safeguards and careful implementation.

目的:随着生物医学文献的指数级增长,进行系统评价的挑战变得越来越繁重。我们的目的是评估法学硕士在系统评价和荟萃分析的部分或全部步骤自动化方面的表现。研究设计和环境:在本系统综述中,我们检索了截止到2025年1月14日的PubMed、Embase、Cochrane Library和预印本平台。我们纳入了在系统评价过程的任何步骤中评估法学硕士(例如,GPT, Claude, Mistral)表现的任何研究。对审稿人独立提取数据并评估偏倚风险。我们使用中位数(IQR)对阳性(PPA)和阴性百分比一致性(NPA)进行了分析,分别类似于llm和人类审稿人之间的敏感性和特异性。结果:从3,889个独特的参考文献中,我们纳入了63项研究,其中52项报告了148个法学硕士绩效评估的绩效指标。大多数评估涉及GPT模型(n= 114,77%)。最常被评估的任务是标题和摘要筛选(n=78, 53%)、数据提取(n=23, 16%)和全文筛选(n=20, 14%)。对于标题和摘要筛选,总体中位PPA为0.92 (IQR 0.69-0.98),中位NPA为0.89(0.72-0.95)。对于全文筛选,总体中位PPA为0.93(0.87-1.00),中位NPA为0.92(0.78-0.97)。在GPT-4之后发布的新一代llm似乎比早期型号提供了更高的性能。对于其他任务,作者报告了总体良好的性能,但性能指标的可变性妨碍了完整的定量综合。数据提取任务的全局精度范围为0.36 ~ 1.00,中位精度为0.95 (IQR为0.91 ~ 0.97,n=11)。对于“偏倚风险评估”任务,准确率范围为0.44至0.90(中位数= 0.62,IQR为0.53-0.76,n=6)。结论:llm的表现,特别是新一代的llm,在自动化系统评价的一些重复步骤(如筛选)方面显示出希望。然而,它们的成功整合将需要适当的保障和谨慎的实施。
{"title":"Large language models show promising performance for some systematic review tasks but call for cautious implementation : a systematic review.","authors":"Florian Laignelot, Guillaume L Martin, Mohamad Ossman, Ophélie Pingeon, Amine Boubaker, Emma Picovschi, Jia Kim, Xavier Tannier, Jérémie F Cohen, Agnès Dechartres","doi":"10.1016/j.jclinepi.2026.112221","DOIUrl":"https://doi.org/10.1016/j.jclinepi.2026.112221","url":null,"abstract":"<p><strong>Objectives: </strong>With the exponential growth of biomedical literature, the challenge of conducting systematic reviews is becoming increasingly burdensome. We aimed to evaluate the performance of LLMs in the automation of some or all steps of systematic reviews and meta-analyses.</p><p><strong>Study design and setting: </strong>In this systematic review, we searched PubMed, Embase, the Cochrane Library and preprint platforms up to 14/01/2025. We included any studies assessing the performance of LLMs (e.g., GPT, Claude, Mistral) in any step of the systematic review process. Pairs of reviewers independently extracted data and assessed risk of bias. We conducted analyses using median(IQR) for positive (PPA) and negative percent agreement (NPA), respectively analogous to sensitivity and specificity, between LLMs and human reviewers.</p><p><strong>Results: </strong>From 3,889 unique references, we included 63 studies of which 52 reporting performance metrics for a total of 148 LLM performance assessments. Most assessments concerned GPT models (n=114, 77%). The most frequently evaluated tasks were Title and Abstract Screening (n=78, 53%), Data Extraction (n=23, 16%), and Full-Text screening (n=20, 14%). For Title and Abstract screening, overall median PPA was 0.92 (IQR 0.69-0.98) and median NPA was 0.89 (0.72-0.95). For full text screening, the overall median PPA was 0.93 (0.87-1.00) and median NPA was 0.92 (0.78-0.97). Late-generation LLMs released after GPT-4 seemed to provide higher performance than earlier models. For other tasks, authors reported overall good performances, but variability of performance metrics precluded complete quantitative synthesis. Global accuracy for data extraction tasks ranged from 0.36 to 1.00, with a median accuracy of 0.95 (IQR 0.91-0.97, n=11). For the 'Risk of Bias assessment' task, accuracy ranged from 0.44 to 0.90 (median = 0.62, IQR 0.53-0.76, n=6).</p><p><strong>Conclusion: </strong>The performance of LLMs, particularly newer generations, shows promise in automating some repetitive steps of systematic reviews such as screening. However, their successful integration will require appropriate safeguards and careful implementation.</p>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":" ","pages":"112221"},"PeriodicalIF":5.2,"publicationDate":"2026-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147460836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trustworthiness and Transparency Features Were Less Frequent in Randomized Trials Presenting Large Effects for Continuous Outcomes in Abstracts. 摘要中可信度和透明度特征在随机试验中较少出现,对连续结果有较大影响。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-03-11 DOI: 10.1016/j.jclinepi.2026.112215
Jonathan F Henssler, Joana Reis-Pardal, Lina Koppel, John P A Ioannidis
<p><strong>Objectives: </strong>Large effect sizes (ESs), especially when prominently presented in trial abstracts, draw large attention, but it is important to understand whether they are trustworthy. We aimed to assess indicators of transparency and trustworthiness in randomized controlled trials (RCTs) reporting some large ES for continuous outcomes in their abstract, in comparison with RCTs presenting only non-large ESs in their abstract.</p><p><strong>Study design and setting: </strong>We included RCTs indexed in MEDLINE between January 1, 2024 and March 18, 2025, presenting at least one standardized mean differences of absolute value 0.8 or higher (large ES) versus those presenting only smaller absolute standardized mean differences in their abstract. Trial characteristics and methodological features were extracted systematically in large ES and non-large ES trials. Primary outcome was pre-specified protocol registration, secondary outcomes were any protocol registration and public availability or repository placement of raw data (pre-registered protocol: https://osf.io/8xasw).</p><p><strong>Results: </strong>We evaluated 152 trials with large ESs in their abstract and 175 trials with only non-large ESs in their abstract. Large ES trials had suggestively lower rates of pre-registered protocols (45% versus 61%, p=0.0054) and significantly lower rates of any protocol registration (74% versus 87%, p=0.0028) than non-large ES trials. There was no difference in raw data public availability or repository placement (6% versus 7%). Large ES trials were also less likely to be multicenter (p=0.0042), to have high-income country of corresponding author (p=0.0001), to be conducted in high-income country site(s) (p=0.0003), to have a published statistical analysis plan (p=0.0216), to result from between-group comparisons (p<0.0001), and tended to report less frequently allocation concealment (p=0.0351). Large effects were significantly more likely to involve non-drug/non-psychological interventions (p=0.0001).</p><p><strong>Conclusions: </strong>RCTs presenting large ESs in their abstracts are more likely to lack transparency and trustworthiness features and may operate with higher risk of lack of credibility.</p><p><strong>Plain languange summary: </strong>When a medical study reports that a treatment has a big effect, that naturally grabs attention. However, are impressive-sounding results actually reliable? This study investigated whether clinical trials claiming large treatment effects in their abstracts are as trustworthy as those reporting only more modest results. We compared two groups of recently published randomized controlled trials: 152 trials that reported large effects in their abstracts, and 175 trials that reported only smaller effects. We looked at several markers of good scientific practice, particularly whether researchers had registered their study plans in advance and whether they made their data openly available. We found that tria
目的:大效应量(ESs),尤其是在试验摘要中显著出现时,会引起大量关注,但了解它们是否值得信赖是很重要的。我们的目的是评估随机对照试验(rct)的透明度和可信度指标,这些随机对照试验在其摘要中报告了一些连续结果的大ES,并与在其摘要中仅报告非大ES的rct进行了比较。研究设计和设置:我们纳入了2024年1月1日至2025年3月18日MEDLINE索引的随机对照试验,这些随机对照试验中至少有一个标准化平均差异绝对值为0.8或更高(大ES),而那些在摘要中只有较小标准化平均差异的随机对照试验。系统地提取大型ES试验和非大型ES试验的试验特征和方法学特征。主要结局是预先指定的方案注册,次要结局是任何方案注册和原始数据的公开可用性或存储库放置(预先注册的方案:https://osf.io/8xasw).Results:我们评估了152个在其摘要中具有大ESs的试验和175个在其摘要中只有非大ESs的试验。与非大型ES试验相比,大型ES试验的预注册方案率明显较低(45%对61%,p=0.0054),任何方案注册率显著较低(74%对87%,p=0.0028)。在原始数据的公共可用性或存储库位置方面没有差异(6%对7%)。大型ES试验也不太可能是多中心的(p=0.0042),通讯作者为高收入国家(p=0.0001),在高收入国家进行(p=0.0003),有发表的统计分析计划(p=0.0216),组间比较的结果(p结论:在摘要中出现大型ES的随机对照试验更有可能缺乏透明度和可信度特征,并且可能存在更高的可信度缺乏风险。简明扼要:当一项医学研究报告说一种治疗方法有很大效果时,自然会引起人们的注意。然而,听起来令人印象深刻的结果真的可靠吗?这项研究调查了在摘要中声称有很大治疗效果的临床试验是否与那些报告结果比较温和的临床试验一样可信。我们比较了两组最近发表的随机对照试验:152个试验在其摘要中报告了较大的影响,175个试验仅报告了较小的影响。我们考察了良好科学实践的几个标志,特别是研究人员是否提前登记了他们的研究计划,以及他们是否公开了他们的数据。我们发现,报告大效果的试验不太可能预先注册了他们的方法,更可能根本没有注册的方案。这两组人在可信度的其他方面也存在差异。大效应试验不太可能涉及多个研究中心,不太可能有来自高收入国家的研究人员,也不太可能有已发表的统计分析计划。他们还倾向于测试不同类型的治疗方法——在药物或心理治疗之外的干预研究中,更经常出现大的效果。两组在公开原始数据方面都同样糟糕(只有大约6-7%)。总的来说,当一项试验声称发现了令人印象深刻的治疗效果时,可能需要进行一些额外的审查。在相信引人注目的结果之前,读者应该评估试验是否遵循了适当的科学检查和透明度保障。
{"title":"Trustworthiness and Transparency Features Were Less Frequent in Randomized Trials Presenting Large Effects for Continuous Outcomes in Abstracts.","authors":"Jonathan F Henssler, Joana Reis-Pardal, Lina Koppel, John P A Ioannidis","doi":"10.1016/j.jclinepi.2026.112215","DOIUrl":"https://doi.org/10.1016/j.jclinepi.2026.112215","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Objectives: &lt;/strong&gt;Large effect sizes (ESs), especially when prominently presented in trial abstracts, draw large attention, but it is important to understand whether they are trustworthy. We aimed to assess indicators of transparency and trustworthiness in randomized controlled trials (RCTs) reporting some large ES for continuous outcomes in their abstract, in comparison with RCTs presenting only non-large ESs in their abstract.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Study design and setting: &lt;/strong&gt;We included RCTs indexed in MEDLINE between January 1, 2024 and March 18, 2025, presenting at least one standardized mean differences of absolute value 0.8 or higher (large ES) versus those presenting only smaller absolute standardized mean differences in their abstract. Trial characteristics and methodological features were extracted systematically in large ES and non-large ES trials. Primary outcome was pre-specified protocol registration, secondary outcomes were any protocol registration and public availability or repository placement of raw data (pre-registered protocol: https://osf.io/8xasw).&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;We evaluated 152 trials with large ESs in their abstract and 175 trials with only non-large ESs in their abstract. Large ES trials had suggestively lower rates of pre-registered protocols (45% versus 61%, p=0.0054) and significantly lower rates of any protocol registration (74% versus 87%, p=0.0028) than non-large ES trials. There was no difference in raw data public availability or repository placement (6% versus 7%). Large ES trials were also less likely to be multicenter (p=0.0042), to have high-income country of corresponding author (p=0.0001), to be conducted in high-income country site(s) (p=0.0003), to have a published statistical analysis plan (p=0.0216), to result from between-group comparisons (p&lt;0.0001), and tended to report less frequently allocation concealment (p=0.0351). Large effects were significantly more likely to involve non-drug/non-psychological interventions (p=0.0001).&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;RCTs presenting large ESs in their abstracts are more likely to lack transparency and trustworthiness features and may operate with higher risk of lack of credibility.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Plain languange summary: &lt;/strong&gt;When a medical study reports that a treatment has a big effect, that naturally grabs attention. However, are impressive-sounding results actually reliable? This study investigated whether clinical trials claiming large treatment effects in their abstracts are as trustworthy as those reporting only more modest results. We compared two groups of recently published randomized controlled trials: 152 trials that reported large effects in their abstracts, and 175 trials that reported only smaller effects. We looked at several markers of good scientific practice, particularly whether researchers had registered their study plans in advance and whether they made their data openly available. We found that tria","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":" ","pages":"112215"},"PeriodicalIF":5.2,"publicationDate":"2026-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147460848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SPICE-GRADE: Simultaneous Processing of Indirect Causal Evidence in Complex Pathways Using GRADE -An Exploratory Case Study. SPICE-GRADE:使用GRADE在复杂路径中同时处理间接因果证据-一个探索性案例研究。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-03-07 DOI: 10.1016/j.jclinepi.2026.112219
Prashanti Eachempati, Gordon Guyatt, Per Olav Vandvik, Philippe J Guerin, Prabin Dahal, Karen I Barnes, Thomas Agoritsas

Background: Genetic mutations often result in antimicrobial resistance. Early identification of emerging resistance due to genetic mutations often relies on multiple co-existing pathways of indirect evidence. The GRADE approach, widely used to rate certainty in treatment comparisons, offers little guidance to address such situations. Motivated by the possibility of emerging resistance to malaria, we provide an example of how GRADE guidance might be adapted to address multiple pathways of indirect evidence, an approach, we term SPICE-GRADE (Simultaneous Processing of Indirect Sources of Causal Evidence using GRADE).

Purpose: We developed and here illustrate a structured approach for integrating direct and indirect causal evidence within GRADE, using antimalarial drug resistance mediated by Plasmodium falciparum Kelch13 mutations as a case study.

Methods: Confronted by the problem of applying GRADE to the question of emerging antimicrobial resistance to malaria, we simultaneous considered low certainty direct evidence and two pathways of indirect evidence addressing a possible causal relation between Kelch13 mutations and malaria recrudescence. Links between Kelch13 mutations and ring-stage survival and between Kelch13 mutations and delayed parasite clearance constitute the two indirect pathways. We addressed each link was independently, and for the two indirect links the extent of indirectness informed the final judgment.

Results: All three links relied on observational studies and therefore started as low certainty evidence. For both the direct link between Kelch13 mutations and recrudescence, and the indirect link between Kelch13 mutations and ring-stage survival, we rated the certainty up two levels for a very strong association, and then down two levels due to imprecision from small sample size in the direct link, and due to very serious indirectness in the indirect link. The third link, between Kelch13 mutations and delayed parasite clearance, was rated up one level for a strong association and down one for serious indirectness. Although each link remained at low certainty, the consistent direction of effect and coherence across all links strengthened the overall causal inference. Situating the entire body of evidence on a continuum allowed us to rate the overall certainty toward the higher end of low certainty, providing a more robust conclusion than relying on direct evidence alone.

Conclusion: In this case study, we illustrate SPICE-GRADE as a structured way of mapping and assessing multiple causal links within the GRADE framework. Establishing SPICE-GRADE as a robust methodology for GRADE assessment will require formal methodological development with application across multiple contexts.

背景:基因突变经常导致抗菌素耐药性。由于基因突变而出现的耐药性的早期识别往往依赖于多种共存的间接证据途径。GRADE方法被广泛用于评估治疗比较的确定性,但对解决此类情况几乎没有指导作用。由于出现疟疾耐药性的可能性,我们提供了一个例子,说明如何调整GRADE指南来处理间接证据的多种途径,我们将这种方法称为SPICE-GRADE(使用GRADE同时处理间接因果证据来源)。目的:我们以恶性疟原虫Kelch13突变介导的抗疟药耐药为例,开发并阐述了一种整合GRADE中直接和间接因果证据的结构化方法。方法:面对将GRADE应用于疟疾新出现的抗微生物药物耐药性问题的问题,我们同时考虑了低确定性的直接证据和两种间接证据途径,以解决Kelch13突变与疟疾复发之间可能的因果关系。Kelch13突变与环期存活之间的联系以及Kelch13突变与寄生虫清除延迟之间的联系构成了两条间接途径。我们处理的每个环节都是独立的,对于两个间接环节,间接的程度决定了最终的判断。结果:所有三个联系都依赖于观察性研究,因此开始时是低确定性证据。对于Kelch13突变与复发之间的直接联系,以及Kelch13突变与环期生存之间的间接联系,我们将确定性评级为非常强的关联提高了两个级别,然后由于直接联系中的小样本量不精确而降低了两个级别,并且由于间接联系中的非常严重的间间性。Kelch13突变与寄生虫清除延迟之间的第三个联系,被评为高一级的强关联,低一级的严重间接关联。尽管每个环节的确定性都很低,但所有环节的效果方向和一致性加强了总体因果推理。将整个证据置于一个连续体上,使我们能够将整体确定性评定为低确定性的较高一端,从而提供比单独依赖直接证据更有力的结论。结论:在本案例研究中,我们说明了SPICE-GRADE是一种在GRADE框架内绘制和评估多个因果关系的结构化方法。建立SPICE-GRADE作为GRADE评估的可靠方法将需要正式的方法开发,并在多种情况下应用。
{"title":"SPICE-GRADE: Simultaneous Processing of Indirect Causal Evidence in Complex Pathways Using GRADE -An Exploratory Case Study.","authors":"Prashanti Eachempati, Gordon Guyatt, Per Olav Vandvik, Philippe J Guerin, Prabin Dahal, Karen I Barnes, Thomas Agoritsas","doi":"10.1016/j.jclinepi.2026.112219","DOIUrl":"https://doi.org/10.1016/j.jclinepi.2026.112219","url":null,"abstract":"<p><strong>Background: </strong>Genetic mutations often result in antimicrobial resistance. Early identification of emerging resistance due to genetic mutations often relies on multiple co-existing pathways of indirect evidence. The GRADE approach, widely used to rate certainty in treatment comparisons, offers little guidance to address such situations. Motivated by the possibility of emerging resistance to malaria, we provide an example of how GRADE guidance might be adapted to address multiple pathways of indirect evidence, an approach, we term SPICE-GRADE (Simultaneous Processing of Indirect Sources of Causal Evidence using GRADE).</p><p><strong>Purpose: </strong>We developed and here illustrate a structured approach for integrating direct and indirect causal evidence within GRADE, using antimalarial drug resistance mediated by Plasmodium falciparum Kelch13 mutations as a case study.</p><p><strong>Methods: </strong>Confronted by the problem of applying GRADE to the question of emerging antimicrobial resistance to malaria, we simultaneous considered low certainty direct evidence and two pathways of indirect evidence addressing a possible causal relation between Kelch13 mutations and malaria recrudescence. Links between Kelch13 mutations and ring-stage survival and between Kelch13 mutations and delayed parasite clearance constitute the two indirect pathways. We addressed each link was independently, and for the two indirect links the extent of indirectness informed the final judgment.</p><p><strong>Results: </strong>All three links relied on observational studies and therefore started as low certainty evidence. For both the direct link between Kelch13 mutations and recrudescence, and the indirect link between Kelch13 mutations and ring-stage survival, we rated the certainty up two levels for a very strong association, and then down two levels due to imprecision from small sample size in the direct link, and due to very serious indirectness in the indirect link. The third link, between Kelch13 mutations and delayed parasite clearance, was rated up one level for a strong association and down one for serious indirectness. Although each link remained at low certainty, the consistent direction of effect and coherence across all links strengthened the overall causal inference. Situating the entire body of evidence on a continuum allowed us to rate the overall certainty toward the higher end of low certainty, providing a more robust conclusion than relying on direct evidence alone.</p><p><strong>Conclusion: </strong>In this case study, we illustrate SPICE-GRADE as a structured way of mapping and assessing multiple causal links within the GRADE framework. Establishing SPICE-GRADE as a robust methodology for GRADE assessment will require formal methodological development with application across multiple contexts.</p>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":" ","pages":"112219"},"PeriodicalIF":5.2,"publicationDate":"2026-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147391446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The importance and need for SWAT coordination. SWAT协调的重要性和必要性。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-03-07 DOI: 10.1016/j.jclinepi.2026.112216
C E Arundel, L C Clark, E Coleman, A Parker, S Treweek
<p><strong>Background: </strong>Studies Within A Trial (SWATs) provide evidence for trial process decisions by evaluating alternative trial processes or exploring why processes are undertaken. SWATs can be undertaken as: 'Individual', 'Individually with Coordination' and 'Coordinated Simultaneous'. Using these approaches efficiently can facilitate timely and definitive identification of effective and ineffective trial processes, however this rarely happens. This paper compares the three approaches and offers guidance on how to use these efficiently.</p><p><strong>Methods: </strong>Using direct experience of undertaking SWATs and data collected during the PROMETHEUS programme, the advantages and disadvantages of each SWAT approach were identified and summarised.</p><p><strong>Results: </strong>Individual' SWATs are best for simple strategies or where flexibility is needed. They do not usually reach a definitive conclusion meaning replications and meta-analysis are required. This enables replication in a range of populations but requires significant time to reach a conclusion. SWATs conducted 'individually with coordination' are like both 'individual' and 'coordinated simultaneous' SWATs. Coordination helps evidence accumulate more quickly, reduces intervention heterogeneity, and enables replications in specific populations; however, coordination can be difficult to facilitate. For 'Coordinated simultaneous' SWATs, the same intervention is evaluated at the same time in multiple trials enabling ascertainment of effectiveness in a single evaluation. Data must be combined and analysed promptly, requiring a dedicated team to manage SWAT conduct.</p><p><strong>Conclusion: </strong>Undertaking SWATs as 'individual', 'individually with coordination' or 'coordinated simultaneous' depends on the host trial, research team preferences, resources, and the proposed SWAT. A combination of approaches may be required to reach a definitive conclusion.</p><p><strong>Plain language summary: </strong>The best way to find out which health or social care treatment is the best is to do a randomised controlled trial ('trial'). In trials, some patients received one treatment being tested and others do receive a different treatment or no treatment at all. This allows researchers to compare what happens to the different groups and to see which treatment is best. Trials are the best way to test this, however we do not currently have good information to tell us the best ways to plan and run trials for example how to best ask people to take part in trials and how to keep them engaged with this. To understand how best to plan and run a trial, researchers can do a study within a trial ('SWAT'). These SWATs need to be done several times so that we can be sure that the process we are testing is the best one. Lots of SWATs have been done by researchers, but often these are done on their own and not repeated quickly. This means it takes a long time to find out what process does or do
背景:试验中的研究(SWATs)通过评估替代试验过程或探索为什么要进行试验过程,为试验过程决策提供证据。特警队可按“个别”、“个别配合”及“同时配合”进行。有效地使用这些方法可以促进及时和明确地确定有效和无效的审判程序,但这种情况很少发生。本文对这三种方法进行了比较,并对如何有效地利用这些方法提供了指导。方法:利用开展SWAT的直接经验和PROMETHEUS项目收集的数据,确定并总结每种SWAT方法的优缺点。结果:个人swat对于简单策略或需要灵活性的地方是最好的。他们通常不能得出明确的结论,这意味着需要重复和荟萃分析。这使得可以在一系列人群中进行复制,但需要大量时间才能得出结论。“单独协调”的特警队既像“单独”的特警队,也像“协调同步”的特警队。协调有助于更快地积累证据,减少干预的异质性,并在特定人群中进行重复;然而,协调可能很难促进。对于“协调同步”swat,在多个试验中同时评估相同的干预措施,从而在一次评估中确定有效性。数据必须及时合并和分析,这就需要一个专门的团队来管理SWAT的行为。结论:以“单独”、“单独协调”或“同时协调”的方式开展SWAT取决于宿主试验、研究团队的偏好、资源和拟议的SWAT。可能需要结合多种方法才能得出明确的结论。简单的语言总结:找出哪种健康或社会护理治疗是最好的最好方法是做一个随机对照试验(“试验”)。在试验中,一些患者接受了一种正在测试的治疗,而另一些患者接受了不同的治疗,或者根本不接受治疗。这使得研究人员可以比较不同组的情况,看看哪种治疗方法最好。试验是测试这一点的最佳方式,但是我们目前没有好的信息来告诉我们计划和运行试验的最佳方式,例如如何最好地邀请人们参加试验以及如何让他们参与其中。为了了解如何最好地计划和运行试验,研究人员可以在试验中进行研究(SWAT)。这些swat需要做几次,这样我们才能确定我们正在测试的过程是最好的。研究人员已经完成了许多swat,但这些通常是他们自己完成的,不会很快重复。这意味着要花很长时间才能发现哪些流程可行,哪些流程不可行。如果研究人员一起工作,在相同或相似的时间做这些swat,这可以帮助我们更快地找到答案。这样做有很多好处,但也有一些困难。本文提供了关于在同一时间或相似时间执行swat的优点和缺点的信息,以便研究人员可以更好地计划如何更有效地执行swat。
{"title":"The importance and need for SWAT coordination.","authors":"C E Arundel, L C Clark, E Coleman, A Parker, S Treweek","doi":"10.1016/j.jclinepi.2026.112216","DOIUrl":"https://doi.org/10.1016/j.jclinepi.2026.112216","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Studies Within A Trial (SWATs) provide evidence for trial process decisions by evaluating alternative trial processes or exploring why processes are undertaken. SWATs can be undertaken as: 'Individual', 'Individually with Coordination' and 'Coordinated Simultaneous'. Using these approaches efficiently can facilitate timely and definitive identification of effective and ineffective trial processes, however this rarely happens. This paper compares the three approaches and offers guidance on how to use these efficiently.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;Using direct experience of undertaking SWATs and data collected during the PROMETHEUS programme, the advantages and disadvantages of each SWAT approach were identified and summarised.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;Individual' SWATs are best for simple strategies or where flexibility is needed. They do not usually reach a definitive conclusion meaning replications and meta-analysis are required. This enables replication in a range of populations but requires significant time to reach a conclusion. SWATs conducted 'individually with coordination' are like both 'individual' and 'coordinated simultaneous' SWATs. Coordination helps evidence accumulate more quickly, reduces intervention heterogeneity, and enables replications in specific populations; however, coordination can be difficult to facilitate. For 'Coordinated simultaneous' SWATs, the same intervention is evaluated at the same time in multiple trials enabling ascertainment of effectiveness in a single evaluation. Data must be combined and analysed promptly, requiring a dedicated team to manage SWAT conduct.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusion: &lt;/strong&gt;Undertaking SWATs as 'individual', 'individually with coordination' or 'coordinated simultaneous' depends on the host trial, research team preferences, resources, and the proposed SWAT. A combination of approaches may be required to reach a definitive conclusion.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Plain language summary: &lt;/strong&gt;The best way to find out which health or social care treatment is the best is to do a randomised controlled trial ('trial'). In trials, some patients received one treatment being tested and others do receive a different treatment or no treatment at all. This allows researchers to compare what happens to the different groups and to see which treatment is best. Trials are the best way to test this, however we do not currently have good information to tell us the best ways to plan and run trials for example how to best ask people to take part in trials and how to keep them engaged with this. To understand how best to plan and run a trial, researchers can do a study within a trial ('SWAT'). These SWATs need to be done several times so that we can be sure that the process we are testing is the best one. Lots of SWATs have been done by researchers, but often these are done on their own and not repeated quickly. This means it takes a long time to find out what process does or do","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":" ","pages":"112216"},"PeriodicalIF":5.2,"publicationDate":"2026-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147391496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Summarizing an approach for tailoring the GRADE EtD framework for new contexts Commentary on GRADE Guidance 40: The GRADE evidence-to-decision framework for environmental and occupational health. 总结针对新情况调整GRADE EtD框架的方法对GRADE指南40的评论:GRADE环境和职业健康从证据到决策框架
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-03-07 DOI: 10.1016/j.jclinepi.2026.112217
Emily Senerth, Paul Whaley, Elie Akl, Brandy Beverly, Pablo Alonso-Coello, Andrew Rooney, Holger J Schünemann, Katya Tsaioun, Rebecca L Morgan

This commentary provides contextual information to support interpretation of new guidance for the use of the Grading of Recommendations Assessment, Development and Evaluation (GRADE) Evidence-to-Decision (EtD) framework for environmental and occupational health (EOH). Based on a systematic review and narrative synthesis of EOH decision frameworks, and input from subject matter experts in environmental health and GRADE, we developed and pilot-tested an EOH framework through a series of virtual workshops. The resulting guidance was approved by the GRADE Working Group in May 2023. The new EtD framework for EOH follows the same structure as existing EtD frameworks, including a scoping and contextualization process and twelve assessment criteria. Modifications to the framework to improve applicability for the EOH context include changes to terminology, tailoring of the framework detailed judgments (e.g., broadening the scope of the equity decision criterion to include non-health equity issues), and guidance for users (e.g., emphasis on the scoping phase of the guideline development process).

本评注提供了背景信息,以支持对环境和职业健康(EOH)使用建议分级、评估、制定和评价(GRADE)证据到决策(EtD)框架的新指南的解释。基于对环境卫生保健决策框架的系统审查和叙述综合,以及环境卫生和GRADE领域主题专家的投入,我们通过一系列虚拟研讨会制定了环境卫生保健框架,并对其进行了试点测试。由此产生的指南于2023年5月由GRADE工作组批准。EOH的新EtD框架遵循与现有EtD框架相同的结构,包括范围和情境化过程以及12个评估标准。对框架进行修改以提高对EOH背景的适用性,包括修改术语、调整框架详细判断(例如,扩大公平决策标准的范围以包括非健康公平问题)以及为用户提供指导(例如,强调准则制定过程的范围确定阶段)。
{"title":"Summarizing an approach for tailoring the GRADE EtD framework for new contexts Commentary on GRADE Guidance 40: The GRADE evidence-to-decision framework for environmental and occupational health.","authors":"Emily Senerth, Paul Whaley, Elie Akl, Brandy Beverly, Pablo Alonso-Coello, Andrew Rooney, Holger J Schünemann, Katya Tsaioun, Rebecca L Morgan","doi":"10.1016/j.jclinepi.2026.112217","DOIUrl":"https://doi.org/10.1016/j.jclinepi.2026.112217","url":null,"abstract":"<p><p>This commentary provides contextual information to support interpretation of new guidance for the use of the Grading of Recommendations Assessment, Development and Evaluation (GRADE) Evidence-to-Decision (EtD) framework for environmental and occupational health (EOH). Based on a systematic review and narrative synthesis of EOH decision frameworks, and input from subject matter experts in environmental health and GRADE, we developed and pilot-tested an EOH framework through a series of virtual workshops. The resulting guidance was approved by the GRADE Working Group in May 2023. The new EtD framework for EOH follows the same structure as existing EtD frameworks, including a scoping and contextualization process and twelve assessment criteria. Modifications to the framework to improve applicability for the EOH context include changes to terminology, tailoring of the framework detailed judgments (e.g., broadening the scope of the equity decision criterion to include non-health equity issues), and guidance for users (e.g., emphasis on the scoping phase of the guideline development process).</p>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":" ","pages":"112217"},"PeriodicalIF":5.2,"publicationDate":"2026-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147391476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harm Outcomes Applicable for Most Meta-Analyses of Randomized Trials of Biomedical Interventions: A Key Concept in Clinical Epidemiology. 适用于大多数生物医学干预随机试验荟萃分析的危害结果:临床流行病学的一个关键概念。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-03-07 DOI: 10.1016/j.jclinepi.2026.112218
Robin Christensen, Dorthe B Berthelsen, Peter Tugwell, Su Golder, Riaz Qureshi, Evan Mayo-Wilson, Lee S Simon, Joey Kwong, Paula R Williamson, Sunita Vohra

Randomized trials are used to evaluate healthcare interventions because they minimize confounding and selection bias through randomization. Harms reporting in trials remains suboptimal, despite established guidelines. To improve transparency and clinical relevance, trialists should share information about harms - whether assessed systematically or non-systematically. Non-systematically assessed adverse events warrant greater attention, as they are often underreported or inconsistently documented. Trialists should specify what was measured, when, and by whom. For each study arm, tables or data should be available that include all harms observed. For dichotomous outcomes, tables or data should include the number of people who experienced each harm in each group: the number of participants at risk of harms (i.e., randomized individuals); the number of deaths; participants with one or more adverse events; withdrawals (discontinuations) due to harms; and the total number of events, if appropriate. Thresholds should not be used to limit the sharing of information about harms. Zero events should be included for harms systematically assessed. For combined adverse events, such as the proportion of participants with one or more serious adverse events (SAEs), researchers should report or share data for all component events (e.g., deaths, major cardiovascular events, cancers, infections, psychiatric events). Better harms reporting could improve evidence synthesis, enhance interpretability, and support informed clinical decision-making as well as patient safety.

随机试验用于评估医疗干预措施,因为它们通过随机化最小化混淆和选择偏差。尽管有既定的指导方针,但在试验中报告危害仍然是次优的。为了提高透明度和临床相关性,试验人员应该共享有关危害的信息——无论是系统评估还是非系统评估。非系统评估的不良事件需要更多的关注,因为它们经常被低估或记录不一致。试验人员应该详细说明测量的内容、时间和人员。对于每个研究组,应提供包含观察到的所有危害的表格或数据。对于二分类结果,表格或数据应包括每组中经历每种伤害的人数:有伤害风险的参与者人数(即随机个体);死亡人数;有一个或多个不良事件的参与者;因危害而停药(停药);以及事件的总数,如果合适的话。不应使用阈值来限制有关危害信息的共享。系统评估的危害应包括零事件。对于合并不良事件,如发生一种或多种严重不良事件(SAEs)的参与者比例,研究人员应报告或共享所有组成事件(如死亡、主要心血管事件、癌症、感染、精神事件)的数据。更好的危害报告可以改善证据综合,增强可解释性,并支持知情的临床决策和患者安全。
{"title":"Harm Outcomes Applicable for Most Meta-Analyses of Randomized Trials of Biomedical Interventions: A Key Concept in Clinical Epidemiology.","authors":"Robin Christensen, Dorthe B Berthelsen, Peter Tugwell, Su Golder, Riaz Qureshi, Evan Mayo-Wilson, Lee S Simon, Joey Kwong, Paula R Williamson, Sunita Vohra","doi":"10.1016/j.jclinepi.2026.112218","DOIUrl":"https://doi.org/10.1016/j.jclinepi.2026.112218","url":null,"abstract":"<p><p>Randomized trials are used to evaluate healthcare interventions because they minimize confounding and selection bias through randomization. Harms reporting in trials remains suboptimal, despite established guidelines. To improve transparency and clinical relevance, trialists should share information about harms - whether assessed systematically or non-systematically. Non-systematically assessed adverse events warrant greater attention, as they are often underreported or inconsistently documented. Trialists should specify what was measured, when, and by whom. For each study arm, tables or data should be available that include all harms observed. For dichotomous outcomes, tables or data should include the number of people who experienced each harm in each group: the number of participants at risk of harms (i.e., randomized individuals); the number of deaths; participants with one or more adverse events; withdrawals (discontinuations) due to harms; and the total number of events, if appropriate. Thresholds should not be used to limit the sharing of information about harms. Zero events should be included for harms systematically assessed. For combined adverse events, such as the proportion of participants with one or more serious adverse events (SAEs), researchers should report or share data for all component events (e.g., deaths, major cardiovascular events, cancers, infections, psychiatric events). Better harms reporting could improve evidence synthesis, enhance interpretability, and support informed clinical decision-making as well as patient safety.</p>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":" ","pages":"112218"},"PeriodicalIF":5.2,"publicationDate":"2026-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147391462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cluster separation outperforms other metrics in validating multimorbidity patterns: Statistical simulation study. 聚类分离优于其他指标在验证多病态模式:统计模拟研究。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-03-06 DOI: 10.1016/j.jclinepi.2026.112209
Thamer Ba Dhafari, Alexander Pate, Glen P Martin, James Rafferty, Farideh Jalali-Najafabadi, Marlous Hall, Niels Peek

Background and objectives: Multimorbidity, defined as the presence of multiple long-term health conditions within an individual, remains a growing challenge in healthcare. Identifying frequently occurring multimorbidity clusters may help to develop targeted interventions and optimise care pathways. However, the validation of multimorbidity clusters derived from real-world data is complicated by the lack of a known "ground truth". We conducted a statistical simulation study that aimed to evaluate the performance of three common validation approaches (cluster separation, clustering stability, and strength of association with health outcomes) in assessing the quality of multimorbidity clusters, where performance was measured by agreement with known ground truth clusters.

Methods: Simulated datasets with predefined clusters were generated across 25 scenarios, varying parameters such as disease prevalence, sample size, and noise levels. Latent class analysis was applied to derive clusters from the simulated data, which were compared to the predefined clusters using the Adjusted Rand Index (ARI). The ARI served as our gold standard quality assessment of derived clusters.

Results: Cluster separation, measured by the Calinski-Harabasz Index, showed the strongest agreement with our gold standard in most scenarios (median correlation: 0.641, IQR: 0.505-0.728). Clustering stability, assessed using resampling, had mixed performance, with a median correlation of 0.421 (IQR: 0.127-0.526). The strength of association with health outcomes, assessed using Nagelkerke's R2, consistently showed poor agreement (median correlation: -0.424, IQR: -0.543 to -0.173) with the ARI.

Conclusion: Cluster separation seems to be the most reliable approach to validate multimorbidity clusters. Clustering stability can sometimes be used for validation but has limitations. Assessing the strength of association of multimorbidity clusters with health outcomes, though valuable for understanding clinical relevance, appears to not validate cluster quality despite being commonly used in published literature.

背景和目的:多重疾病,定义为个体内多种长期健康状况的存在,仍然是医疗保健领域日益增长的挑战。识别频繁发生的多病集群可能有助于制定有针对性的干预措施和优化护理途径。然而,由于缺乏已知的“基本事实”,从真实世界数据中得出的多病聚类的验证变得复杂。我们进行了一项统计模拟研究,旨在评估三种常见验证方法(聚类分离、聚类稳定性和与健康结果的关联强度)在评估多疾病聚类质量中的性能,其中性能是通过与已知的真实聚类的一致性来衡量的。方法:在25种情况下生成具有预定义聚类的模拟数据集,这些数据集具有不同的参数,如疾病患病率、样本量和噪声水平。应用潜在类分析从模拟数据中获得聚类,并使用调整后的兰德指数(ARI)将其与预定义聚类进行比较。ARI作为我们对衍生聚类质量评估的金标准。结果:通过Calinski-Harabasz指数测量的聚类分离在大多数情况下与我们的金标准最一致(中位数相关系数:0.641,IQR: 0.505-0.728)。通过重新抽样评估聚类稳定性,结果好坏参半,中位相关系数为0.421 (IQR: 0.127-0.526)。使用Nagelkerke R2评估,与健康结果的关联强度始终显示与ARI的一致性较差(中位数相关:-0.424,IQR: -0.543至-0.173)。结论:聚类分离似乎是验证多病聚类最可靠的方法。聚类稳定性有时可用于验证,但有其局限性。评估多病聚类与健康结果的关联强度,虽然对理解临床相关性很有价值,但似乎不能验证聚类质量,尽管在已发表的文献中被广泛使用。
{"title":"Cluster separation outperforms other metrics in validating multimorbidity patterns: Statistical simulation study.","authors":"Thamer Ba Dhafari, Alexander Pate, Glen P Martin, James Rafferty, Farideh Jalali-Najafabadi, Marlous Hall, Niels Peek","doi":"10.1016/j.jclinepi.2026.112209","DOIUrl":"https://doi.org/10.1016/j.jclinepi.2026.112209","url":null,"abstract":"<p><strong>Background and objectives: </strong>Multimorbidity, defined as the presence of multiple long-term health conditions within an individual, remains a growing challenge in healthcare. Identifying frequently occurring multimorbidity clusters may help to develop targeted interventions and optimise care pathways. However, the validation of multimorbidity clusters derived from real-world data is complicated by the lack of a known \"ground truth\". We conducted a statistical simulation study that aimed to evaluate the performance of three common validation approaches (cluster separation, clustering stability, and strength of association with health outcomes) in assessing the quality of multimorbidity clusters, where performance was measured by agreement with known ground truth clusters.</p><p><strong>Methods: </strong>Simulated datasets with predefined clusters were generated across 25 scenarios, varying parameters such as disease prevalence, sample size, and noise levels. Latent class analysis was applied to derive clusters from the simulated data, which were compared to the predefined clusters using the Adjusted Rand Index (ARI). The ARI served as our gold standard quality assessment of derived clusters.</p><p><strong>Results: </strong>Cluster separation, measured by the Calinski-Harabasz Index, showed the strongest agreement with our gold standard in most scenarios (median correlation: 0.641, IQR: 0.505-0.728). Clustering stability, assessed using resampling, had mixed performance, with a median correlation of 0.421 (IQR: 0.127-0.526). The strength of association with health outcomes, assessed using Nagelkerke's R<sup>2</sup>, consistently showed poor agreement (median correlation: -0.424, IQR: -0.543 to -0.173) with the ARI.</p><p><strong>Conclusion: </strong>Cluster separation seems to be the most reliable approach to validate multimorbidity clusters. Clustering stability can sometimes be used for validation but has limitations. Assessing the strength of association of multimorbidity clusters with health outcomes, though valuable for understanding clinical relevance, appears to not validate cluster quality despite being commonly used in published literature.</p>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":" ","pages":"112209"},"PeriodicalIF":5.2,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147379659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Key considerations for planning adaptive platform trials: part 2. 规划自适应平台试验的关键考虑事项:第2部分。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-03-06 DOI: 10.1016/j.jclinepi.2026.112213
Anders Granholm, Morten Hylander Møller, Aldana Rosso, Maj-Brit Nørregaard Kjær, Benjamin Skov Kaas-Hansen, Rikke Faebo Larsen, Inger Katrine Dahl-Petersen, Jeanett Friis Rohde, Anders Perner

Background: There is increased interest in adaptive platform trials (APTs), i.e., complex randomised clinical trials (RCTs) focusing on a population or a setting, conducted according to a core protocol, with continuous addition of interventions during trial conduct. APTs may run perpetually, assessing many different interventions. APTs come with benefits regarding increased flexibility, efficiency, and cost-effectiveness compared to conventional, stand-alone RCTs. However, planning APTs is complex, and limited guidance is available.

Methods: In this two-part series, we provide a narrative overview of key considerations for the planning of APTs in a clinical setting based on our own experiences planning and initiation APTs.

Results: The first part covered the following 5 overall considerations: stakeholder involvement; population and setting; interventions and comparisons; clinical outcomes; and adaptations. This second part covers 7 overall considerations: statistics and performance; other design features; data infrastructure; operations and organisation; funding; dissemination; and implementation.

Conclusion: The provided guidance on key considerations when planning APTs will aid researchers considering or planning APTs.

背景:人们对适应性平台试验(APTs)越来越感兴趣,即关注人群或环境的复杂随机临床试验(rct),根据核心方案进行,在试验过程中不断添加干预措施。apt可能会持续运行,评估许多不同的干预措施。与传统的独立随机对照试验相比,apt具有更高的灵活性、效率和成本效益。然而,计划apt是复杂的,而且可获得的指导有限。方法:在这个由两部分组成的系列文章中,我们根据我们自己的经验,提供了在临床环境中计划apt的关键考虑因素的叙述概述,计划和启动apt。结果:第一部分涵盖了以下5个总体考虑因素:利益相关者参与;人口与环境;干预和比较;临床结果;和适应性。第二部分涵盖了7个总体考虑因素:统计数据和性能;其他设计特点;数据基础设施;运作及组织;资金;传播;和实现。结论:本研究提供了规划APTs时应注意的关键问题的指导,有助于研究人员考虑或规划APTs。
{"title":"Key considerations for planning adaptive platform trials: part 2.","authors":"Anders Granholm, Morten Hylander Møller, Aldana Rosso, Maj-Brit Nørregaard Kjær, Benjamin Skov Kaas-Hansen, Rikke Faebo Larsen, Inger Katrine Dahl-Petersen, Jeanett Friis Rohde, Anders Perner","doi":"10.1016/j.jclinepi.2026.112213","DOIUrl":"https://doi.org/10.1016/j.jclinepi.2026.112213","url":null,"abstract":"<p><strong>Background: </strong>There is increased interest in adaptive platform trials (APTs), i.e., complex randomised clinical trials (RCTs) focusing on a population or a setting, conducted according to a core protocol, with continuous addition of interventions during trial conduct. APTs may run perpetually, assessing many different interventions. APTs come with benefits regarding increased flexibility, efficiency, and cost-effectiveness compared to conventional, stand-alone RCTs. However, planning APTs is complex, and limited guidance is available.</p><p><strong>Methods: </strong>In this two-part series, we provide a narrative overview of key considerations for the planning of APTs in a clinical setting based on our own experiences planning and initiation APTs.</p><p><strong>Results: </strong>The first part covered the following 5 overall considerations: stakeholder involvement; population and setting; interventions and comparisons; clinical outcomes; and adaptations. This second part covers 7 overall considerations: statistics and performance; other design features; data infrastructure; operations and organisation; funding; dissemination; and implementation.</p><p><strong>Conclusion: </strong>The provided guidance on key considerations when planning APTs will aid researchers considering or planning APTs.</p>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":" ","pages":"112213"},"PeriodicalIF":5.2,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147379575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Clinical Epidemiology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1