首页 > 最新文献

Journal of Clinical Epidemiology最新文献

英文 中文
Machine learning approaches to evaluate heterogeneous treatment effects in randomized controlled trials: a scoping review 评估随机对照试验中异质性治疗效果的机器学习方法:范围综述》。
IF 7.3 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2024-09-19 DOI: 10.1016/j.jclinepi.2024.111538
Kosuke Inoue , Motohiko Adomi , Orestis Efthimiou , Toshiaki Komura , Kenji Omae , Akira Onishi , Yusuke Tsutsumi , Tomoko Fujii , Naoki Kondo , Toshi A. Furukawa

Background and Objectives

Estimating heterogeneous treatment effects (HTEs) in randomized controlled trials (RCTs) has received substantial attention recently. This has led to the development of several statistical and machine learning (ML) algorithms to assess HTEs through identifying individualized treatment effects. However, a comprehensive review of these algorithms is lacking. We thus aimed to catalog and outline currently available statistical and ML methods for identifying HTEs via effect modeling using clinical RCT data and summarize how they have been applied in practice.

Study Design and Setting

We performed a scoping review using prespecified search terms in MEDLINE and Embase, aiming to identify studies that assessed HTEs using advanced statistical and ML methods in RCT data published from 2010 to 2022.

Results

Among a total of 32 studies identified in the review, 17 studies applied existing algorithms to RCT data, and 15 extended existing algorithms or proposed new algorithms. Applied algorithms included penalized regression, causal forest, Bayesian causal forest, and other metalearner frameworks. Of these methods, causal forest was the most frequently used (7 studies) followed by Bayesian causal forest (4 studies). Most applications were in cardiology (6 studies), followed by psychiatry (4 studies). We provide example R codes in simulated data to illustrate how to implement these algorithms.

Conclusion

This review identified and outlined various algorithms currently used to identify HTEs and individualized treatment effects in RCT data. Given the increasing availability of new algorithms, analysts should carefully select them after examining model performance and considering how the models will be used in practice.
背景:最近,在随机对照试验(RCT)中估算异质性治疗效果(HTE)受到了广泛关注。因此,人们开发了多种统计和机器学习(ML)算法,通过识别个体化治疗效果来评估异质性治疗效果。然而,目前还缺乏对这些算法的全面回顾。因此,我们旨在对目前可用的统计和 ML 方法进行编目和概述,以便利用临床 RCT 数据通过效应建模识别 HTE,并总结这些方法在实践中的应用情况:我们在MEDLINE和Embase中使用预先指定的检索词进行了范围综述,旨在确定2010年至2022年期间发表的使用高级统计和ML方法在RCT数据中评估HTE的研究:在综述中确定的 32 项研究中,17 项研究将现有算法应用于 RCT 数据,15 项研究扩展了现有算法或提出了新算法。应用的算法包括惩罚回归、因果森林、贝叶斯因果森林和其他元学习框架。在这些方法中,因果森林最常用(7 项研究),其次是贝叶斯因果森林(4 项研究)。应用最多的是心脏病学(6 项研究),其次是精神病学(4 项研究)。我们提供了 R 代码示例,以说明如何实施这些算法:本综述确定并概述了目前用于识别 RCT 数据中 HTEs 和个体化治疗效果的各种算法。鉴于新算法的可用性越来越高,分析师应在检查模型性能并考虑如何在实践中使用模型后谨慎选择这些算法。
{"title":"Machine learning approaches to evaluate heterogeneous treatment effects in randomized controlled trials: a scoping review","authors":"Kosuke Inoue ,&nbsp;Motohiko Adomi ,&nbsp;Orestis Efthimiou ,&nbsp;Toshiaki Komura ,&nbsp;Kenji Omae ,&nbsp;Akira Onishi ,&nbsp;Yusuke Tsutsumi ,&nbsp;Tomoko Fujii ,&nbsp;Naoki Kondo ,&nbsp;Toshi A. Furukawa","doi":"10.1016/j.jclinepi.2024.111538","DOIUrl":"10.1016/j.jclinepi.2024.111538","url":null,"abstract":"<div><h3>Background and Objectives</h3><div>Estimating heterogeneous treatment effects (HTEs) in randomized controlled trials (RCTs) has received substantial attention recently. This has led to the development of several statistical and machine learning (ML) algorithms to assess HTEs through identifying individualized treatment effects. However, a comprehensive review of these algorithms is lacking. We thus aimed to catalog and outline currently available statistical and ML methods for identifying HTEs via effect modeling using clinical RCT data and summarize how they have been applied in practice.</div></div><div><h3>Study Design and Setting</h3><div>We performed a scoping review using prespecified search terms in MEDLINE and Embase, aiming to identify studies that assessed HTEs using advanced statistical and ML methods in RCT data published from 2010 to 2022.</div></div><div><h3>Results</h3><div>Among a total of 32 studies identified in the review, 17 studies applied existing algorithms to RCT data, and 15 extended existing algorithms or proposed new algorithms. Applied algorithms included penalized regression, causal forest, Bayesian causal forest, and other metalearner frameworks. Of these methods, causal forest was the most frequently used (7 studies) followed by Bayesian causal forest (4 studies). Most applications were in cardiology (6 studies), followed by psychiatry (4 studies). We provide example R codes in simulated data to illustrate how to implement these algorithms.</div></div><div><h3>Conclusion</h3><div>This review identified and outlined various algorithms currently used to identify HTEs and individualized treatment effects in RCT data. Given the increasing availability of new algorithms, analysts should carefully select them after examining model performance and considering how the models will be used in practice.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"176 ","pages":"Article 111538"},"PeriodicalIF":7.3,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142300198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using causal diagrams within the Grading of Recommendations, Assessment, Development and Evaluation framework to evaluate confounding adjustment in observational studies 在 GRADE 框架内使用因果图评估观察研究中的混杂调整。
IF 7.3 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2024-09-18 DOI: 10.1016/j.jclinepi.2024.111532
Kevin J. McIntyre , Karina N. Tassiopoulos , Curtis Jeffrey , Saverio Stranges , Janet Martin

Background and Objectives

The current Grading of Recommendations, Assessment, Development and Evaluation (GRADE) system instructs appraisers to evaluate whether individual observational studies have sufficiently adjusted for confounding. However, it does not provide an explicit, transparent, or reproducible method for doing so. This article explores how implementing causal graphs into the GRADE framework can help appraisers and end-users of GRADE products to evaluate the adequacy of confounding control from observational studies.

Methods

Using modern epidemiological theory, we propose a system for incorporating causal diagrams into the GRADE process to assess confounding control.

Results

Integrating causal graphs into the GRADE framework enables appraisers to provide a theoretically grounded rationale for their evaluations of confounding control in observational studies. Additionally, the inclusion of causal graphs in GRADE may assist appraisers in demonstrating evidence for their appraisals in other domains of quality of evidence beyond confounding control. To support practical application, a worked example is included in the supplemental material to guide users through this approach.

Conclusion

GRADE calls for the explicit and transparent appraisal of evidence in the process of evidence synthesis. Incorporating causal diagrams into the evaluation of confounding control in observational studies aligns with the core principles of the GRADE framework, providing a clear, theory-based method for the adequacy of confounding control in observational studies.
背景和目的:目前的《建议、评估、发展与评价分级》(GRADE)系统指导评估人员评估各项观察性研究是否对混杂因素进行了充分调整。然而,该系统并未提供明确、透明或可重复的方法来进行评估。本文探讨了在 GRADE 框架中实施因果关系图如何帮助评估者和 GRADE 产品的最终用户评估观察性研究的混杂控制是否充分:方法:利用现代流行病学理论,我们提出了一种将因果图纳入GRADE流程以评估混杂控制的系统:结果:将因果关系图纳入 GRADE 框架可使评估者在评估观察性研究的混杂控制时提供有理论依据的理由。此外,将因果关系图纳入GRADE还有助于评估者在混杂控制之外的其他证据质量领域展示评估证据。为支持实际应用,补充材料中包含了一个工作示例,以指导用户使用这种方法:GRADE要求在证据综合过程中对证据进行明确、透明的评估。将因果关系图纳入观察性研究中混杂控制的评估符合 GRADE 框架的核心原则,为观察性研究中混杂控制的充分性提供了一种清晰、基于理论的方法。
{"title":"Using causal diagrams within the Grading of Recommendations, Assessment, Development and Evaluation framework to evaluate confounding adjustment in observational studies","authors":"Kevin J. McIntyre ,&nbsp;Karina N. Tassiopoulos ,&nbsp;Curtis Jeffrey ,&nbsp;Saverio Stranges ,&nbsp;Janet Martin","doi":"10.1016/j.jclinepi.2024.111532","DOIUrl":"10.1016/j.jclinepi.2024.111532","url":null,"abstract":"<div><h3>Background and Objectives</h3><div>The current Grading of Recommendations, Assessment, Development and Evaluation (GRADE) system instructs appraisers to evaluate whether individual observational studies have sufficiently adjusted for confounding. However, it does not provide an explicit, transparent, or reproducible method for doing so. This article explores how implementing causal graphs into the GRADE framework can help appraisers and end-users of GRADE products to evaluate the adequacy of confounding control from observational studies.</div></div><div><h3>Methods</h3><div>Using modern epidemiological theory, we propose a system for incorporating causal diagrams into the GRADE process to assess confounding control.</div></div><div><h3>Results</h3><div>Integrating causal graphs into the GRADE framework enables appraisers to provide a theoretically grounded rationale for their evaluations of confounding control in observational studies. Additionally, the inclusion of causal graphs in GRADE may assist appraisers in demonstrating evidence for their appraisals in other domains of quality of evidence beyond confounding control. To support practical application, a worked example is included in the supplemental material to guide users through this approach.</div></div><div><h3>Conclusion</h3><div>GRADE calls for the explicit and transparent appraisal of evidence in the process of evidence synthesis. Incorporating causal diagrams into the evaluation of confounding control in observational studies aligns with the core principles of the GRADE framework, providing a clear, theory-based method for the adequacy of confounding control in observational studies.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"175 ","pages":"Article 111532"},"PeriodicalIF":7.3,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142300202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum to 'Pilot and feasibility trials in surgery are incompletely reported according to the CONSORT checklist: a meta-research study' [Journal of Clinical Epidemiology 170 (2024)]. 根据 CONSORT 核对表不完整报告外科试验和可行性试验:一项荟萃研究》[《临床流行病学杂志》170 (2024)] 的更正。
IF 7.3 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2024-09-17 DOI: 10.1016/j.jclinepi.2024.111510
Tyler McKechnie, Tania Kazi, Austine Wang, Sophia Zhang, Alex Thabane, Keean Nanji, Phillip Staibano, Lily J Park, Aristithes Doumouras, Cagla Eskicioglu, Lehana Thabane, Sameer Parpia, Mohit Bhandari
{"title":"Corrigendum to 'Pilot and feasibility trials in surgery are incompletely reported according to the CONSORT checklist: a meta-research study' [Journal of Clinical Epidemiology 170 (2024)].","authors":"Tyler McKechnie, Tania Kazi, Austine Wang, Sophia Zhang, Alex Thabane, Keean Nanji, Phillip Staibano, Lily J Park, Aristithes Doumouras, Cagla Eskicioglu, Lehana Thabane, Sameer Parpia, Mohit Bhandari","doi":"10.1016/j.jclinepi.2024.111510","DOIUrl":"https://doi.org/10.1016/j.jclinepi.2024.111510","url":null,"abstract":"","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":" ","pages":"111510"},"PeriodicalIF":7.3,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142300296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comments, suggestions, and criticisms of the Pragmatic Explanatory Continuum Indicator Summary-2 design tool: a citation analysis PRECIS-2 设计工具的意见、建议和批评:引用分析。
IF 7.3 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2024-09-14 DOI: 10.1016/j.jclinepi.2024.111534
Andrew Willis , Frances Shiely , Shaun Treweek , Monica Taljaard , Kirsty Loudon , Alison Howie , Merrick Zwarenstein

Introduction

The pragmatic explanatory continuum indicator summary (PRECIS) tool, initially published in 2009 and revised in 2015, was created to assist trialists to align their design choices with the intended purpose of their randomised controlled trial (RCT): either to guide real-world decisions between alternative interventions (pragmatic) or to test hypotheses about intervention mechanisms by minimising sources of variation (explanatory). There have been many comments, suggestions, and criticisms of PRECIS-2. This summary will be used to facilitate the development of to the next revision, which is PRECIS-3.

Methods

We used Web of Science to identify all publication types citing PRECIS-2, published between May 2015 and July 2023. Citations were eligible if they contained ‘substantive’ suggestions, comments, or criticism of the PRECIS-2 tool. We defined ‘substantive’ as comments explicitly referencing at least one PRECIS-2 domain or a concept directly linked to an existing or newly proposed domain.
Two reviewers independently extracted comments, suggestions, and criticisms, noting their implications for the update. These were discussed among authors to achieve consensus on the interpretation of each comment and its implications for PRECIS-3.

Results

The search yielded 885 publications, and after full-text review, 89 articles met the inclusion criteria. Comments pertained to new domains, changes in existing domains, or were relevant across several or all domains. Proposed new domains included assessment of the comparator arm and a domain to describe blinding. There were concerns about scoring eligibility and recruitment domains for cluster trials. Suggested areas for improvement across domains included the need for more scoring guidance for explanatory design choices.

Discussion

Published comments recognise PRECIS-2's success in aiding trialists with pragmatic or explanatory design choices. Enhancing its implementation and widespread use will involve adding new domains, refining domain definitions, and addressing overall tool issues. This citation review offers valuable user feedback, pivotal for shaping the upcoming version of the PRECIS tool, PRECIS-3.
简介:实用解释性连续性指标摘要(PRECIS)工具最初发布于 2009 年,并于 2015 年进行了修订,该工具旨在帮助试验人员将其设计选择与随机对照试验(RCT)的预期目的相一致:或者指导现实世界中替代干预措施之间的决策(实用性),或者通过最大限度地减少变异来源(解释性)来检验有关干预机制的假设。对 PRECIS-2 提出了许多意见、建议和批评。本摘要将用于促进下一次修订版 PRECIS-3 的开发:我们使用 "科学网 "识别了 2015 年 5 月至 2023 年 7 月间发表的所有引用 PRECIS-2 的出版物类型。如果引用中包含对 PRECIS-2 工具的 "实质性 "建议、评论或批评,则符合条件。我们将 "实质性 "定义为明确引用了至少一个 PRECIS-2 领域的评论,或与现有或新提出的领域直接相关的概念 两名审稿人独立提取评论、建议和批评,并指出其对更新的影响。作者们对这些意见进行了讨论,以便就每条意见的解释及其对 PRECIS-3 的影响达成共识:搜索结果显示有 885 篇出版物,全文审阅后有 89 篇文章符合纳入标准。评论涉及新领域、现有领域的变化,或与多个或所有领域相关。拟议的新领域包括对参照组的评估和描述盲法的领域。有人对分组试验的资格和招募领域的评分表示担忧。建议改进的领域包括需要为解释性设计选择提供更多评分指导:已发表的意见肯定了PRECIS-2在帮助试验人员进行实用性或解释性设计选择方面所取得的成功。要加强其实施和广泛使用,需要增加新的领域、完善领域定义并解决整个工具的问题。本引用综述提供了宝贵的用户反馈,对即将推出的 PRECIS 工具 PRECIS-3 版本的设计至关重要。
{"title":"Comments, suggestions, and criticisms of the Pragmatic Explanatory Continuum Indicator Summary-2 design tool: a citation analysis","authors":"Andrew Willis ,&nbsp;Frances Shiely ,&nbsp;Shaun Treweek ,&nbsp;Monica Taljaard ,&nbsp;Kirsty Loudon ,&nbsp;Alison Howie ,&nbsp;Merrick Zwarenstein","doi":"10.1016/j.jclinepi.2024.111534","DOIUrl":"10.1016/j.jclinepi.2024.111534","url":null,"abstract":"<div><h3>Introduction</h3><div>The pragmatic explanatory continuum indicator summary (PRECIS) tool, initially published in 2009 and revised in 2015, was created to assist trialists to align their design choices with the intended purpose of their randomised controlled trial (RCT): either to guide real-world decisions between alternative interventions (pragmatic) or to test hypotheses about intervention mechanisms by minimising sources of variation (explanatory). There have been many comments, suggestions, and criticisms of PRECIS-2. This summary will be used to facilitate the development of to the next revision, which is PRECIS-3.</div></div><div><h3>Methods</h3><div>We used Web of Science to identify all publication types citing PRECIS-2, published between May 2015 and July 2023. Citations were eligible if they contained ‘substantive’ suggestions, comments, or criticism of the PRECIS-2 tool. We defined ‘substantive’ as comments explicitly referencing at least one PRECIS-2 domain or a concept directly linked to an existing or newly proposed domain.</div><div>Two reviewers independently extracted comments, suggestions, and criticisms, noting their implications for the update. These were discussed among authors to achieve consensus on the interpretation of each comment and its implications for PRECIS-3.</div></div><div><h3>Results</h3><div>The search yielded 885 publications, and after full-text review, 89 articles met the inclusion criteria. Comments pertained to new domains, changes in existing domains, or were relevant across several or all domains. Proposed new domains included assessment of the comparator arm and a domain to describe blinding. There were concerns about scoring eligibility and recruitment domains for cluster trials. Suggested areas for improvement across domains included the need for more scoring guidance for explanatory design choices.</div></div><div><h3>Discussion</h3><div>Published comments recognise PRECIS-2's success in aiding trialists with pragmatic or explanatory design choices. Enhancing its implementation and widespread use will involve adding new domains, refining domain definitions, and addressing overall tool issues. This citation review offers valuable user feedback, pivotal for shaping the upcoming version of the PRECIS tool, PRECIS-3.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"176 ","pages":"Article 111534"},"PeriodicalIF":7.3,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142300295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementation of a dynamic model updating pipeline provides a systematic process for maintaining performance of prediction models 动态模型更新管道的实施为保持预测模型的性能提供了一个系统化流程。
IF 7.3 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2024-09-12 DOI: 10.1016/j.jclinepi.2024.111531
Kamaryn T. Tanner , Karla Diaz-Ordaz , Ruth H. Keogh

Objectives

We describe the steps for implementing a dynamic updating pipeline for clinical prediction models and illustrate the proposed methods in an application of 5-year survival prediction in cystic fibrosis.

Study Design and Setting

Dynamic model updating refers to the process of repeated updating of a clinical prediction model with new information to counter performance degradation. We describe 2 types of updating pipeline: “proactive updating” where candidate model updates are tested any time new data are available, and “reactive updating” where updates are only made when performance of the current model declines or the model structure changes. Methods for selecting the best candidate updating model are based on measures of predictive performance under the 2 pipelines. The methods are illustrated in our motivating example of a 5-year survival prediction model in cystic fibrosis. Over a dynamic updating period of 10 years, we report the updating decisions made and the performance of the prediction models selected under each pipeline.

Results

Both the proactive and reactive updating pipelines produced survival prediction models that overall had better performance in terms of calibration and discrimination than a model that was not updated. Further, use of the dynamic updating pipelines ensured that the prediction model’s performance was consistently and frequently reviewed in new data.

Conclusion

Implementing a dynamic updating pipeline will help guard against model performance degradation while ensuring that the updating process is principled and data-driven.
目的:我们描述了临床预测模型动态更新管道的实施步骤,并在囊性纤维化患者 5 年生存率预测中应用所提出的方法进行说明:我们描述了为临床预测模型实施动态更新管道的步骤,并在囊性纤维化的 5 年生存预测应用中说明了所提出的方法:动态模型更新是指利用新信息反复更新临床预测模型,以应对性能下降的过程。我们描述了两种类型的更新管道:"主动更新 "和 "被动更新"。"主动更新 "是指在获得新数据时测试候选模型更新;而 "被动更新 "是指只有在当前模型性能下降或模型结构发生变化时才进行更新。选择最佳候选更新模型的方法基于两种管道下的预测性能测量。我们以囊性纤维化的 5 年生存预测模型为例说明了这些方法。在 10 年的动态更新期内,我们报告了在每种管道下做出的更新决策和所选预测模型的性能:结果:主动更新管道和被动更新管道所生成的生存预测模型在校准和判别方面的整体性能均优于未更新的模型。此外,使用动态更新管道可确保预测模型的性能在新数据中得到持续和频繁的审查:结论:实施动态更新管道有助于防止模型性能下降,同时确保更新过程遵循原则并以数据为导向。
{"title":"Implementation of a dynamic model updating pipeline provides a systematic process for maintaining performance of prediction models","authors":"Kamaryn T. Tanner ,&nbsp;Karla Diaz-Ordaz ,&nbsp;Ruth H. Keogh","doi":"10.1016/j.jclinepi.2024.111531","DOIUrl":"10.1016/j.jclinepi.2024.111531","url":null,"abstract":"<div><h3>Objectives</h3><div>We describe the steps for implementing a dynamic updating pipeline for clinical prediction models and illustrate the proposed methods in an application of 5-year survival prediction in cystic fibrosis.</div></div><div><h3>Study Design and Setting</h3><div>Dynamic model updating refers to the process of repeated updating of a clinical prediction model with new information to counter performance degradation. We describe 2 types of updating pipeline: “proactive updating” where candidate model updates are tested any time new data are available, and “reactive updating” where updates are only made when performance of the current model declines or the model structure changes. Methods for selecting the best candidate updating model are based on measures of predictive performance under the 2 pipelines. The methods are illustrated in our motivating example of a 5-year survival prediction model in cystic fibrosis. Over a dynamic updating period of 10 years, we report the updating decisions made and the performance of the prediction models selected under each pipeline.</div></div><div><h3>Results</h3><div>Both the proactive and reactive updating pipelines produced survival prediction models that overall had better performance in terms of calibration and discrimination than a model that was not updated. Further, use of the dynamic updating pipelines ensured that the prediction model’s performance was consistently and frequently reviewed in new data.</div></div><div><h3>Conclusion</h3><div>Implementing a dynamic updating pipeline will help guard against model performance degradation while ensuring that the updating process is principled and data-driven.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"175 ","pages":"Article 111531"},"PeriodicalIF":7.3,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142300197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking Human–AI collaboration for common evidence appraisal tools 以人类与人工智能合作为基准,开发通用证据评估工具。
IF 7.3 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2024-09-12 DOI: 10.1016/j.jclinepi.2024.111533
Tim Woelfle , Julian Hirt , Perrine Janiaud , Ludwig Kappos , John P.A. Ioannidis , Lars G. Hemkens

Background and Objective

It is unknown whether large language models (LLMs) may facilitate time- and resource-intensive text-related processes in evidence appraisal. The objective was to quantify the agreement of LLMs with human consensus in appraisal of scientific reporting (Preferred Reporting Items for Systematic reviews and Meta-Analyses [PRISMA]) and methodological rigor (A MeaSurement Tool to Assess systematic Reviews [AMSTAR]) of systematic reviews and design of clinical trials (PRagmatic Explanatory Continuum Indicator Summary 2 [PRECIS-2]) and to identify areas where collaboration between humans and artificial intelligence (AI) would outperform the traditional consensus process of human raters in efficiency.

Study Design and Setting

Five LLMs (Claude-3-Opus, Claude-2, GPT-4, GPT-3.5, Mixtral-8x22B) assessed 112 systematic reviews applying the PRISMA and AMSTAR criteria and 56 randomized controlled trials applying PRECIS-2. We quantified the agreement between human consensus and (1) individual human raters; (2) individual LLMs; (3) combined LLMs approach; (4) human–AI collaboration. Ratings were marked as deferred (undecided) in case of inconsistency between combined LLMs or between the human rater and the LLM.

Results

Individual human rater accuracy was 89% for PRISMA and AMSTAR, and 75% for PRECIS-2. Individual LLM accuracy was ranging from 63% (GPT-3.5) to 70% (Claude-3-Opus) for PRISMA, 53% (GPT-3.5) to 74% (Claude-3-Opus) for AMSTAR, and 38% (GPT-4) to 55% (GPT-3.5) for PRECIS-2. Combined LLM ratings led to accuracies of 75%–88% for PRISMA (4%–74% deferred), 74%–89% for AMSTAR (6%–84% deferred), and 64%–79% for PRECIS-2 (29%–88% deferred). Human–AI collaboration resulted in the best accuracies from 89% to 96% for PRISMA (25/35% deferred), 91%–95% for AMSTAR (27/30% deferred), and 80%–86% for PRECIS-2 (76/71% deferred).

Conclusion

Current LLMs alone appraised evidence worse than humans. Human–AI collaboration may reduce workload for the second human rater for the assessment of reporting (PRISMA) and methodological rigor (AMSTAR) but not for complex tasks such as PRECIS-2.
背景:大语言模型(LLMs)是否能促进证据评估中与文本相关的时间和资源密集型流程尚不清楚:量化大型语言模型与人类共识在科学报告评估(PRISMA)和系统综述方法严谨性(AMSTAR)以及临床试验设计(PRECIS-2)方面的一致性。确定在哪些领域,人类与人工智能的合作将在效率上优于传统的人类评审员共识流程:设计:五位 LLM(Claude-3-Opus、Claude-2、GPT-4、GPT-3.5、Mixtral-8x22B)采用 PRISMA 和 AMSTAR 标准评估了 112 篇系统综述,并采用 PRECIS-2 评估了 56 项随机对照试验。我们量化了人类共识与 (1) 单个人类评分者;(2) 单个 LLMs;(3) LLMs 组合方法;(4) 人类与人工智能合作之间的一致性。在综合 LLMs 之间或人类评分者与 LLM 之间出现不一致时,评分被标记为推迟(未决定):PRISMA和AMSTAR的人类评分者个人准确率为89%,PRECIS-2为75%。PRISMA 的单个 LLM 准确率从 63% (GPT-3.5) 到 70% (Claude-3-Opus)不等,AMSTAR 的单个 LLM 准确率从 53% (GPT-3.5) 到 74% (Claude-3-Opus)不等,PRECIS-2 的单个 LLM 准确率从 38% (GPT-4) 到 55% (GPT-3.5)不等。综合 LLM 评级使 PRISMA 的准确率达到 75-88%(4-74% 延迟),AMSTAR 的准确率达到 74-89%(6-84% 延迟),PRECIS-2 的准确率达到 64-79%(29-88% 延迟)。人类与人工智能合作的最佳准确率为:PRISMA 89-96%(25/35%延迟),AMSTAR 91-95%(27/30%延迟),PRECIS-2 80-86%(76/71%延迟):结论:目前的 LLMs 对证据的单独评估不如人类。在评估报告(PRISMA)和方法论严谨性(AMSTAR)时,人类与人工智能的合作可以减轻第二位人类评审员的工作量,但在评估 PRECIS-2 等复杂任务时则无法减轻工作量。
{"title":"Benchmarking Human–AI collaboration for common evidence appraisal tools","authors":"Tim Woelfle ,&nbsp;Julian Hirt ,&nbsp;Perrine Janiaud ,&nbsp;Ludwig Kappos ,&nbsp;John P.A. Ioannidis ,&nbsp;Lars G. Hemkens","doi":"10.1016/j.jclinepi.2024.111533","DOIUrl":"10.1016/j.jclinepi.2024.111533","url":null,"abstract":"<div><h3>Background and Objective</h3><div>It is unknown whether large language models (LLMs) may facilitate time- and resource-intensive text-related processes in evidence appraisal. The objective was to quantify the agreement of LLMs with human consensus in appraisal of scientific reporting (Preferred Reporting Items for Systematic reviews and Meta-Analyses [PRISMA]) and methodological rigor (A MeaSurement Tool to Assess systematic Reviews [AMSTAR]) of systematic reviews and design of clinical trials (PRagmatic Explanatory Continuum Indicator Summary 2 [PRECIS-2]) and to identify areas where collaboration between humans and artificial intelligence (AI) would outperform the traditional consensus process of human raters in efficiency.</div></div><div><h3>Study Design and Setting</h3><div>Five LLMs (Claude-3-Opus, Claude-2, GPT-4, GPT-3.5, Mixtral-8x22B) assessed 112 systematic reviews applying the PRISMA and AMSTAR criteria and 56 randomized controlled trials applying PRECIS-2. We quantified the agreement between human consensus and (1) individual human raters; (2) individual LLMs; (3) combined LLMs approach; (4) human–AI collaboration. Ratings were marked as deferred (undecided) in case of inconsistency between combined LLMs or between the human rater and the LLM.</div></div><div><h3>Results</h3><div>Individual human rater accuracy was 89% for PRISMA and AMSTAR, and 75% for PRECIS-2. Individual LLM accuracy was ranging from 63% (GPT-3.5) to 70% (Claude-3-Opus) for PRISMA, 53% (GPT-3.5) to 74% (Claude-3-Opus) for AMSTAR, and 38% (GPT-4) to 55% (GPT-3.5) for PRECIS-2. Combined LLM ratings led to accuracies of 75%–88% for PRISMA (4%–74% deferred), 74%–89% for AMSTAR (6%–84% deferred), and 64%–79% for PRECIS-2 (29%–88% deferred). Human–AI collaboration resulted in the best accuracies from 89% to 96% for PRISMA (25/35% deferred), 91%–95% for AMSTAR (27/30% deferred), and 80%–86% for PRECIS-2 (76/71% deferred).</div></div><div><h3>Conclusion</h3><div>Current LLMs alone appraised evidence worse than humans. Human–AI collaboration may reduce workload for the second human rater for the assessment of reporting (PRISMA) and methodological rigor (AMSTAR) but not for complex tasks such as PRECIS-2.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"175 ","pages":"Article 111533"},"PeriodicalIF":7.3,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142300294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A machine learning model for early diagnosis of type 1 Gaucher disease using real-life data 利用真实生活数据早期诊断 1 型戈谢病的机器学习模型。
IF 7.3 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2024-09-06 DOI: 10.1016/j.jclinepi.2024.111517
Avraham Tenenbaum , Shoshana Revel-Vilk , Sivan Gazit , Michael Roimi , Aidan Gill , Dafna Gilboa , Ora Paltiel , Orly Manor , Varda Shalev , Gabriel Chodick
<div><h3>Objective</h3><div>The diagnosis of Gaucher disease (GD) presents a major challenge due to the high variability and low specificity of its clinical characteristics, along with limited physician awareness of the disease’s early symptoms. Early and accurate diagnosis is important to enable effective treatment decisions, prevent unnecessary testing, and facilitate genetic counseling. This study aimed to develop a machine learning (ML) model for GD screening and GD early diagnosis based on real-world clinical data using the Maccabi Healthcare Services electronic database, which contains 20 years of longitudinal data on approximately 2.6 million patients.</div></div><div><h3>Study Design and Setting</h3><div>We screened the Maccabi Healthcare Services database for patients with GD between January 1998 and May 2022. Eligible controls were matched by year of birth, sex, and socioeconomic status in a 1:13 ratio. The data were partitioned into 75% training and 25% test sets and trained to predict GD using features obtained from medical and laboratory records. Model performances were evaluated using the area under the receiver operating characteristic curve and the area under the precision-recall curve.</div></div><div><h3>Results</h3><div>We detected 264 confirmed patients with GD to which we matched 3,429 controls. The best model performance (which included known GD signs and symptoms, previously unknown clinical features, and administrative codes) on the test set had an area under the receiver operating characteristic curve = 0.95 ± 0.03 and area under the precision-recall curve = 0.80 ± 0.08, which yielded a median GD identification of 2.78 years earlier than the clinical diagnosis (25th–75th percentile: 1.29–4.53).</div></div><div><h3>Conclusion</h3><div>Using an ML approach on real-world data led to excellent discrimination between GD patients and controls, with the ability to detect GD significantly earlier than the time of actual diagnosis. Hence, this approach might be useful as a screening tool for GD and lead to earlier diagnosis and treatment. Furthermore, advanced ML analytics may highlight previously unrecognized features associated with GD, including clinical diagnoses and health-seeking behaviors.</div></div><div><h3>Plain Language Summary</h3><div>Diagnosing Gaucher disease is difficult, which often leads to late or incorrect diagnoses. As a result, patients may undergo unnecessary tests and treatments and experience health deterioration despite medications availability for Gaucher disease. In this study, we used electronic health data to develop machine learning models for early diagnosis of Gaucher disease type 1. Our models, which included known Gaucher disease signs and symptoms, previously unknown clinical features, and administrative codes, were able to significantly outperform other models and expert opinions, detecting type 1 Gaucher disease 3 years on average before actual diagnosis. Our models also revealed new features
目的:戈谢病(GD)的临床特征变异性大、特异性低,而且医生对该病的早期症状认识有限,因此诊断该病是一项重大挑战。早期准确的诊断对于做出有效的治疗决定、避免不必要的检查以及促进遗传咨询非常重要。本研究旨在利用马卡比医疗保健服务(MHS)电子数据库(该数据库包含约 260 万名患者的 20 年纵向数据),基于真实世界的临床数据,开发一种用于 GD 筛查和 GD 早期诊断的机器学习(ML)模型:我们在 Maccabi Healthcare Services(MHS)数据库中筛选了 1998 年 1 月至 2022 年 5 月间的 GD 患者。符合条件的对照组按出生年份、性别和社会经济地位以 1:13 的比例进行匹配。数据被分为 75% 的训练集和 25% 的测试集,并利用从医疗和化验记录中获取的特征进行训练,以预测 GD。使用接收者操作特征曲线下面积(AUROC)和精确度-召回曲线下面积(AUPRC)对模型性能进行评估:我们发现了 264 名确诊的 GD 患者,并与 3429 名对照者进行了配对。测试集上的最佳模型性能(包括已知的 GD 体征和症状、先前未知的临床特征和管理代码)为 AUROC = 0.95 ± 0.03 和 AUPRC = 0.80 ± 0.08,GD 鉴定的中位数比临床诊断早 2.78 年(第 25-75 百分位数:1.29-4.53):在真实世界的数据中使用多重层析方法,可以很好地区分 GD 患者和对照组,并能显著早于实际诊断时间发现 GD。因此,这种方法可作为 GD 的筛查工具,并有助于早期诊断和治疗。此外,先进的 ML 分析可能会突出以前未认识到的与 GD 相关的特征,包括临床诊断和寻求健康的行为。
{"title":"A machine learning model for early diagnosis of type 1 Gaucher disease using real-life data","authors":"Avraham Tenenbaum ,&nbsp;Shoshana Revel-Vilk ,&nbsp;Sivan Gazit ,&nbsp;Michael Roimi ,&nbsp;Aidan Gill ,&nbsp;Dafna Gilboa ,&nbsp;Ora Paltiel ,&nbsp;Orly Manor ,&nbsp;Varda Shalev ,&nbsp;Gabriel Chodick","doi":"10.1016/j.jclinepi.2024.111517","DOIUrl":"10.1016/j.jclinepi.2024.111517","url":null,"abstract":"&lt;div&gt;&lt;h3&gt;Objective&lt;/h3&gt;&lt;div&gt;The diagnosis of Gaucher disease (GD) presents a major challenge due to the high variability and low specificity of its clinical characteristics, along with limited physician awareness of the disease’s early symptoms. Early and accurate diagnosis is important to enable effective treatment decisions, prevent unnecessary testing, and facilitate genetic counseling. This study aimed to develop a machine learning (ML) model for GD screening and GD early diagnosis based on real-world clinical data using the Maccabi Healthcare Services electronic database, which contains 20 years of longitudinal data on approximately 2.6 million patients.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Study Design and Setting&lt;/h3&gt;&lt;div&gt;We screened the Maccabi Healthcare Services database for patients with GD between January 1998 and May 2022. Eligible controls were matched by year of birth, sex, and socioeconomic status in a 1:13 ratio. The data were partitioned into 75% training and 25% test sets and trained to predict GD using features obtained from medical and laboratory records. Model performances were evaluated using the area under the receiver operating characteristic curve and the area under the precision-recall curve.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Results&lt;/h3&gt;&lt;div&gt;We detected 264 confirmed patients with GD to which we matched 3,429 controls. The best model performance (which included known GD signs and symptoms, previously unknown clinical features, and administrative codes) on the test set had an area under the receiver operating characteristic curve = 0.95 ± 0.03 and area under the precision-recall curve = 0.80 ± 0.08, which yielded a median GD identification of 2.78 years earlier than the clinical diagnosis (25th–75th percentile: 1.29–4.53).&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Conclusion&lt;/h3&gt;&lt;div&gt;Using an ML approach on real-world data led to excellent discrimination between GD patients and controls, with the ability to detect GD significantly earlier than the time of actual diagnosis. Hence, this approach might be useful as a screening tool for GD and lead to earlier diagnosis and treatment. Furthermore, advanced ML analytics may highlight previously unrecognized features associated with GD, including clinical diagnoses and health-seeking behaviors.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Plain Language Summary&lt;/h3&gt;&lt;div&gt;Diagnosing Gaucher disease is difficult, which often leads to late or incorrect diagnoses. As a result, patients may undergo unnecessary tests and treatments and experience health deterioration despite medications availability for Gaucher disease. In this study, we used electronic health data to develop machine learning models for early diagnosis of Gaucher disease type 1. Our models, which included known Gaucher disease signs and symptoms, previously unknown clinical features, and administrative codes, were able to significantly outperform other models and expert opinions, detecting type 1 Gaucher disease 3 years on average before actual diagnosis. Our models also revealed new features ","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"175 ","pages":"Article 111517"},"PeriodicalIF":7.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142156574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harmonization of data collection to improve clinical and public health evidence-based decision making. 统一数据收集,改进临床和公共卫生循证决策。
IF 7.3 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2024-09-06 DOI: 10.1016/j.jclinepi.2024.111530
Jorge Arias-de la Torre, Jordi Alonso, Jose M Valderas
{"title":"Harmonization of data collection to improve clinical and public health evidence-based decision making.","authors":"Jorge Arias-de la Torre, Jordi Alonso, Jose M Valderas","doi":"10.1016/j.jclinepi.2024.111530","DOIUrl":"10.1016/j.jclinepi.2024.111530","url":null,"abstract":"","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":" ","pages":"111530"},"PeriodicalIF":7.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142146796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Key concepts in rapid reviews: an overview 快速审查的关键概念:概述。
IF 7.3 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2024-09-06 DOI: 10.1016/j.jclinepi.2024.111518
Declan Devane , Candyce Hamel , Gerald Gartlehner , Barbara Nussbaumer-Streit , Ursula Griebler , Lisa Affengruber , KM Saif-Ur-Rahman , Chantelle Garritty

Background and Objective

Rapid reviews have gained popularity as a pragmatic approach to synthesize evidence in a timely manner to inform decision-making in healthcare. This article provides an overview of the key concepts and methodological considerations in conducting rapid reviews, drawing from a series of recently published guidance papers by the Cochrane Rapid Reviews Methods Group.

Study Design and Setting

We discuss the definition, characteristics, and potential applications of rapid reviews and the trade-offs between speed and rigor. We present a practical example of a rapid review and highlight the methodological considerations outlined in the updated Cochrane guidance, including recommendations for literature searching, study selection, data extraction, risk of bias assessment, synthesis, and assessing the certainty of evidence.

Results

Rapid reviews can be a valuable tool for evidence-based decision-making, but it is essential to understand their limitations and adhere to methodological standards to ensure their validity and reliability.

Conclusion

As the demand for rapid evidence synthesis continues to grow, further research is needed to refine and standardize the methods and reporting of rapid reviews.

Plain Language Summary

Rapid reviews are a type of research method designed to quickly gather and summarize evidence to support decision-making in healthcare. They are particularly useful when timely information is needed, such as during a public health emergency. This article explains the key aspects of how rapid reviews are conducted, based on the latest guidance from experts. Rapid reviews involve several steps, including searching for relevant studies, selecting which studies to include, and carefully examining the quality of the evidence. Although rapid reviews are faster to complete than full systematic reviews, they still follow rigorous processes to ensure that the findings are reliable. This article also provides an example of a rapid review in action, demonstrating how these reviews can be applied in real-world situations. While rapid reviews are a powerful tool for making quick, evidence-based decisions, it is important to be aware of their limitations. Researchers must follow established methods to make sure the results are as accurate and useful as possible. As more people use rapid reviews, ongoing research is needed to improve and standardize how they are done.
快速综述作为一种及时综合证据、为医疗决策提供信息的实用方法,已经越来越受欢迎。本文借鉴了科克伦快速综述方法小组最近发布的一系列指导文件,概述了开展快速综述的关键概念和方法注意事项。我们讨论了快速综述的定义、特点和潜在应用,以及在速度和严谨性之间的权衡。我们介绍了一个快速综述的实际案例,并强调了更新版 Cochrane 指南中概述的方法学注意事项,包括文献检索、研究选择、数据提取、偏倚风险评估、综合以及证据确定性评估等方面的建议。快速综述是循证决策的重要工具,但必须了解其局限性并遵守方法标准,以确保其有效性和可靠性。随着对快速证据综合的需求不断增长,需要进一步开展研究,以完善和规范快速综述的方法和报告。
{"title":"Key concepts in rapid reviews: an overview","authors":"Declan Devane ,&nbsp;Candyce Hamel ,&nbsp;Gerald Gartlehner ,&nbsp;Barbara Nussbaumer-Streit ,&nbsp;Ursula Griebler ,&nbsp;Lisa Affengruber ,&nbsp;KM Saif-Ur-Rahman ,&nbsp;Chantelle Garritty","doi":"10.1016/j.jclinepi.2024.111518","DOIUrl":"10.1016/j.jclinepi.2024.111518","url":null,"abstract":"<div><h3>Background and Objective</h3><div>Rapid reviews have gained popularity as a pragmatic approach to synthesize evidence in a timely manner to inform decision-making in healthcare. This article provides an overview of the key concepts and methodological considerations in conducting rapid reviews, drawing from a series of recently published guidance papers by the Cochrane Rapid Reviews Methods Group.</div></div><div><h3>Study Design and Setting</h3><div>We discuss the definition, characteristics, and potential applications of rapid reviews and the trade-offs between speed and rigor. We present a practical example of a rapid review and highlight the methodological considerations outlined in the updated Cochrane guidance, including recommendations for literature searching, study selection, data extraction, risk of bias assessment, synthesis, and assessing the certainty of evidence.</div></div><div><h3>Results</h3><div>Rapid reviews can be a valuable tool for evidence-based decision-making, but it is essential to understand their limitations and adhere to methodological standards to ensure their validity and reliability.</div></div><div><h3>Conclusion</h3><div>As the demand for rapid evidence synthesis continues to grow, further research is needed to refine and standardize the methods and reporting of rapid reviews.</div></div><div><h3>Plain Language Summary</h3><div>Rapid reviews are a type of research method designed to quickly gather and summarize evidence to support decision-making in healthcare. They are particularly useful when timely information is needed, such as during a public health emergency. This article explains the key aspects of how rapid reviews are conducted, based on the latest guidance from experts. Rapid reviews involve several steps, including searching for relevant studies, selecting which studies to include, and carefully examining the quality of the evidence. Although rapid reviews are faster to complete than full systematic reviews, they still follow rigorous processes to ensure that the findings are reliable. This article also provides an example of a rapid review in action, demonstrating how these reviews can be applied in real-world situations. While rapid reviews are a powerful tool for making quick, evidence-based decisions, it is important to be aware of their limitations. Researchers must follow established methods to make sure the results are as accurate and useful as possible. As more people use rapid reviews, ongoing research is needed to improve and standardize how they are done.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"175 ","pages":"Article 111518"},"PeriodicalIF":7.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142156575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data quality assessment of interventional trials in public trial databases 公共试验数据库中介入试验的数据质量评估。
IF 7.3 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2024-09-05 DOI: 10.1016/j.jclinepi.2024.111516
Annabelle R. Iken , Rudolf W. Poolman , Maaike G.J. Gademan

Objective

High-quality data entry in clinical trial databases is crucial to the usefulness, validity, and replicability of research findings, as it influences evidence-based medical practice and future research. Our aim is to assess the quality of self-reported data in trial registries and present practical and systematic methods for identifying and evaluating data quality.

Study Design and Setting

We searched ClinicalTrials.Gov (CTG) for interventional total knee arthroplasty (TKA) trials between 2000 and 2015. We extracted required and optional trial information elements and used the CTG's variables' definitions. We performed a literature review on data quality reporting on frameworks, checklists, and overviews of irregularities in healthcare databases. We identified and assessed data quality attributes as follows: consistency, accuracy, completeness, and timeliness.

Results

We included 816 interventional TKA trials. Data irregularities varied widely: 0%–100%. Inconsistency ranged from 0% to 36%, and most often nonrandomized labeled allocation was combined with a “single-group” assignment trial design. Inaccuracy ranged from 0% to 100%. Incompleteness ranged from 0% to 61%; 61% of finished TKA trials did not report their outcome. With regard to irregularities in timeliness, 49% of the trials were registered more than 3 months after the start date.

Conclusion

We found significant variations in the data quality of registered clinical TKA trials. Trial sponsors should be committed to ensuring that the information they provide is reliable, consistent, up-to-date, transparent, and accurate. CTG's users need to be critical when drawing conclusions based on the registered data. We believe this awareness will increase well-informed decisions about published articles and treatment protocols, including replicating and improving trial designs.
目的:临床试验数据库中的高质量数据录入对研究结果的实用性、有效性和可复制性至关重要,因为它影响着循证医学实践和未来研究。我们的目的是评估试验登记中自我报告数据的质量,并提出识别和评估数据质量的实用而系统的方法:我们在ClinicalTrials.Gov网站上搜索了2000-2015年间的介入性全膝关节置换术(TKA)试验。我们提取了必要的和可选的试验信息要素,并使用了 CTG 的变量定义。我们对有关数据质量报告的框架、核对表和医疗数据库违规概述进行了文献综述。我们确定并评估了数据质量属性:一致性、准确性、完整性和及时性:结果:我们纳入了 816 项介入性 TKA 试验。数据不规范程度差异很大:从 0% 到 100% 不等。不一致性从0%到36%不等,最常见的是非随机标记分配与 "单组 "分配试验设计相结合。不准确率从 0% 到 100% 不等。不完整性从0%到61%不等:61%的TKA试验未报告结果。在及时性不规范方面:49%的试验在开始日期超过3个月后才登记:我们发现已登记的临床 TKA 试验在数据质量方面存在很大差异。试验发起者应致力于确保其提供的信息可靠、一致、及时、透明和准确。CTG 的用户在根据注册数据得出结论时需要保持批判精神。我们相信,这种意识将提高对已发表文章和治疗方案的明智决策,包括复制和改进试验设计。
{"title":"Data quality assessment of interventional trials in public trial databases","authors":"Annabelle R. Iken ,&nbsp;Rudolf W. Poolman ,&nbsp;Maaike G.J. Gademan","doi":"10.1016/j.jclinepi.2024.111516","DOIUrl":"10.1016/j.jclinepi.2024.111516","url":null,"abstract":"<div><h3>Objective</h3><div>High-quality data entry in clinical trial databases is crucial to the usefulness, validity, and replicability of research findings, as it influences evidence-based medical practice and future research. Our aim is to assess the quality of self-reported data in trial registries and present practical and systematic methods for identifying and evaluating data quality.</div></div><div><h3>Study Design and Setting</h3><div>We searched ClinicalTrials.Gov (CTG) for interventional total knee arthroplasty (TKA) trials between 2000 and 2015. We extracted required and optional trial information elements and used the CTG's variables' definitions. We performed a literature review on data quality reporting on frameworks, checklists, and overviews of irregularities in healthcare databases. We identified and assessed data quality attributes as follows: consistency, accuracy, completeness, and timeliness.</div></div><div><h3>Results</h3><div>We included 816 interventional TKA trials. Data irregularities varied widely: 0%–100%. Inconsistency ranged from 0% to 36%, and most often nonrandomized labeled allocation was combined with a “single-group” assignment trial design. Inaccuracy ranged from 0% to 100%. Incompleteness ranged from 0% to 61%; 61% of finished TKA trials did not report their outcome. With regard to irregularities in timeliness, 49% of the trials were registered more than 3 months after the start date.</div></div><div><h3>Conclusion</h3><div>We found significant variations in the data quality of registered clinical TKA trials. Trial sponsors should be committed to ensuring that the information they provide is reliable, consistent, up-to-date, transparent, and accurate. CTG's users need to be critical when drawing conclusions based on the registered data. We believe this awareness will increase well-informed decisions about published articles and treatment protocols, including replicating and improving trial designs.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"175 ","pages":"Article 111516"},"PeriodicalIF":7.3,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142146795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Clinical Epidemiology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1