Estimating heterogeneous treatment effects (HTEs) in randomized controlled trials (RCTs) has received substantial attention recently. This has led to the development of several statistical and machine learning (ML) algorithms to assess HTEs through identifying individualized treatment effects. However, a comprehensive review of these algorithms is lacking. We thus aimed to catalog and outline currently available statistical and ML methods for identifying HTEs via effect modeling using clinical RCT data and summarize how they have been applied in practice.
Study Design and Setting
We performed a scoping review using prespecified search terms in MEDLINE and Embase, aiming to identify studies that assessed HTEs using advanced statistical and ML methods in RCT data published from 2010 to 2022.
Results
Among a total of 32 studies identified in the review, 17 studies applied existing algorithms to RCT data, and 15 extended existing algorithms or proposed new algorithms. Applied algorithms included penalized regression, causal forest, Bayesian causal forest, and other metalearner frameworks. Of these methods, causal forest was the most frequently used (7 studies) followed by Bayesian causal forest (4 studies). Most applications were in cardiology (6 studies), followed by psychiatry (4 studies). We provide example R codes in simulated data to illustrate how to implement these algorithms.
Conclusion
This review identified and outlined various algorithms currently used to identify HTEs and individualized treatment effects in RCT data. Given the increasing availability of new algorithms, analysts should carefully select them after examining model performance and considering how the models will be used in practice.
{"title":"Machine learning approaches to evaluate heterogeneous treatment effects in randomized controlled trials: a scoping review","authors":"Kosuke Inoue , Motohiko Adomi , Orestis Efthimiou , Toshiaki Komura , Kenji Omae , Akira Onishi , Yusuke Tsutsumi , Tomoko Fujii , Naoki Kondo , Toshi A. Furukawa","doi":"10.1016/j.jclinepi.2024.111538","DOIUrl":"10.1016/j.jclinepi.2024.111538","url":null,"abstract":"<div><h3>Background and Objectives</h3><div>Estimating heterogeneous treatment effects (HTEs) in randomized controlled trials (RCTs) has received substantial attention recently. This has led to the development of several statistical and machine learning (ML) algorithms to assess HTEs through identifying individualized treatment effects. However, a comprehensive review of these algorithms is lacking. We thus aimed to catalog and outline currently available statistical and ML methods for identifying HTEs via effect modeling using clinical RCT data and summarize how they have been applied in practice.</div></div><div><h3>Study Design and Setting</h3><div>We performed a scoping review using prespecified search terms in MEDLINE and Embase, aiming to identify studies that assessed HTEs using advanced statistical and ML methods in RCT data published from 2010 to 2022.</div></div><div><h3>Results</h3><div>Among a total of 32 studies identified in the review, 17 studies applied existing algorithms to RCT data, and 15 extended existing algorithms or proposed new algorithms. Applied algorithms included penalized regression, causal forest, Bayesian causal forest, and other metalearner frameworks. Of these methods, causal forest was the most frequently used (7 studies) followed by Bayesian causal forest (4 studies). Most applications were in cardiology (6 studies), followed by psychiatry (4 studies). We provide example R codes in simulated data to illustrate how to implement these algorithms.</div></div><div><h3>Conclusion</h3><div>This review identified and outlined various algorithms currently used to identify HTEs and individualized treatment effects in RCT data. Given the increasing availability of new algorithms, analysts should carefully select them after examining model performance and considering how the models will be used in practice.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"176 ","pages":"Article 111538"},"PeriodicalIF":7.3,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142300198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-18DOI: 10.1016/j.jclinepi.2024.111532
Kevin J. McIntyre , Karina N. Tassiopoulos , Curtis Jeffrey , Saverio Stranges , Janet Martin
Background and Objectives
The current Grading of Recommendations, Assessment, Development and Evaluation (GRADE) system instructs appraisers to evaluate whether individual observational studies have sufficiently adjusted for confounding. However, it does not provide an explicit, transparent, or reproducible method for doing so. This article explores how implementing causal graphs into the GRADE framework can help appraisers and end-users of GRADE products to evaluate the adequacy of confounding control from observational studies.
Methods
Using modern epidemiological theory, we propose a system for incorporating causal diagrams into the GRADE process to assess confounding control.
Results
Integrating causal graphs into the GRADE framework enables appraisers to provide a theoretically grounded rationale for their evaluations of confounding control in observational studies. Additionally, the inclusion of causal graphs in GRADE may assist appraisers in demonstrating evidence for their appraisals in other domains of quality of evidence beyond confounding control. To support practical application, a worked example is included in the supplemental material to guide users through this approach.
Conclusion
GRADE calls for the explicit and transparent appraisal of evidence in the process of evidence synthesis. Incorporating causal diagrams into the evaluation of confounding control in observational studies aligns with the core principles of the GRADE framework, providing a clear, theory-based method for the adequacy of confounding control in observational studies.
{"title":"Using causal diagrams within the Grading of Recommendations, Assessment, Development and Evaluation framework to evaluate confounding adjustment in observational studies","authors":"Kevin J. McIntyre , Karina N. Tassiopoulos , Curtis Jeffrey , Saverio Stranges , Janet Martin","doi":"10.1016/j.jclinepi.2024.111532","DOIUrl":"10.1016/j.jclinepi.2024.111532","url":null,"abstract":"<div><h3>Background and Objectives</h3><div>The current Grading of Recommendations, Assessment, Development and Evaluation (GRADE) system instructs appraisers to evaluate whether individual observational studies have sufficiently adjusted for confounding. However, it does not provide an explicit, transparent, or reproducible method for doing so. This article explores how implementing causal graphs into the GRADE framework can help appraisers and end-users of GRADE products to evaluate the adequacy of confounding control from observational studies.</div></div><div><h3>Methods</h3><div>Using modern epidemiological theory, we propose a system for incorporating causal diagrams into the GRADE process to assess confounding control.</div></div><div><h3>Results</h3><div>Integrating causal graphs into the GRADE framework enables appraisers to provide a theoretically grounded rationale for their evaluations of confounding control in observational studies. Additionally, the inclusion of causal graphs in GRADE may assist appraisers in demonstrating evidence for their appraisals in other domains of quality of evidence beyond confounding control. To support practical application, a worked example is included in the supplemental material to guide users through this approach.</div></div><div><h3>Conclusion</h3><div>GRADE calls for the explicit and transparent appraisal of evidence in the process of evidence synthesis. Incorporating causal diagrams into the evaluation of confounding control in observational studies aligns with the core principles of the GRADE framework, providing a clear, theory-based method for the adequacy of confounding control in observational studies.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"175 ","pages":"Article 111532"},"PeriodicalIF":7.3,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142300202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Corrigendum to 'Pilot and feasibility trials in surgery are incompletely reported according to the CONSORT checklist: a meta-research study' [Journal of Clinical Epidemiology 170 (2024)].","authors":"Tyler McKechnie, Tania Kazi, Austine Wang, Sophia Zhang, Alex Thabane, Keean Nanji, Phillip Staibano, Lily J Park, Aristithes Doumouras, Cagla Eskicioglu, Lehana Thabane, Sameer Parpia, Mohit Bhandari","doi":"10.1016/j.jclinepi.2024.111510","DOIUrl":"https://doi.org/10.1016/j.jclinepi.2024.111510","url":null,"abstract":"","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":" ","pages":"111510"},"PeriodicalIF":7.3,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142300296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The pragmatic explanatory continuum indicator summary (PRECIS) tool, initially published in 2009 and revised in 2015, was created to assist trialists to align their design choices with the intended purpose of their randomised controlled trial (RCT): either to guide real-world decisions between alternative interventions (pragmatic) or to test hypotheses about intervention mechanisms by minimising sources of variation (explanatory). There have been many comments, suggestions, and criticisms of PRECIS-2. This summary will be used to facilitate the development of to the next revision, which is PRECIS-3.
Methods
We used Web of Science to identify all publication types citing PRECIS-2, published between May 2015 and July 2023. Citations were eligible if they contained ‘substantive’ suggestions, comments, or criticism of the PRECIS-2 tool. We defined ‘substantive’ as comments explicitly referencing at least one PRECIS-2 domain or a concept directly linked to an existing or newly proposed domain.
Two reviewers independently extracted comments, suggestions, and criticisms, noting their implications for the update. These were discussed among authors to achieve consensus on the interpretation of each comment and its implications for PRECIS-3.
Results
The search yielded 885 publications, and after full-text review, 89 articles met the inclusion criteria. Comments pertained to new domains, changes in existing domains, or were relevant across several or all domains. Proposed new domains included assessment of the comparator arm and a domain to describe blinding. There were concerns about scoring eligibility and recruitment domains for cluster trials. Suggested areas for improvement across domains included the need for more scoring guidance for explanatory design choices.
Discussion
Published comments recognise PRECIS-2's success in aiding trialists with pragmatic or explanatory design choices. Enhancing its implementation and widespread use will involve adding new domains, refining domain definitions, and addressing overall tool issues. This citation review offers valuable user feedback, pivotal for shaping the upcoming version of the PRECIS tool, PRECIS-3.
{"title":"Comments, suggestions, and criticisms of the Pragmatic Explanatory Continuum Indicator Summary-2 design tool: a citation analysis","authors":"Andrew Willis , Frances Shiely , Shaun Treweek , Monica Taljaard , Kirsty Loudon , Alison Howie , Merrick Zwarenstein","doi":"10.1016/j.jclinepi.2024.111534","DOIUrl":"10.1016/j.jclinepi.2024.111534","url":null,"abstract":"<div><h3>Introduction</h3><div>The pragmatic explanatory continuum indicator summary (PRECIS) tool, initially published in 2009 and revised in 2015, was created to assist trialists to align their design choices with the intended purpose of their randomised controlled trial (RCT): either to guide real-world decisions between alternative interventions (pragmatic) or to test hypotheses about intervention mechanisms by minimising sources of variation (explanatory). There have been many comments, suggestions, and criticisms of PRECIS-2. This summary will be used to facilitate the development of to the next revision, which is PRECIS-3.</div></div><div><h3>Methods</h3><div>We used Web of Science to identify all publication types citing PRECIS-2, published between May 2015 and July 2023. Citations were eligible if they contained ‘substantive’ suggestions, comments, or criticism of the PRECIS-2 tool. We defined ‘substantive’ as comments explicitly referencing at least one PRECIS-2 domain or a concept directly linked to an existing or newly proposed domain.</div><div>Two reviewers independently extracted comments, suggestions, and criticisms, noting their implications for the update. These were discussed among authors to achieve consensus on the interpretation of each comment and its implications for PRECIS-3.</div></div><div><h3>Results</h3><div>The search yielded 885 publications, and after full-text review, 89 articles met the inclusion criteria. Comments pertained to new domains, changes in existing domains, or were relevant across several or all domains. Proposed new domains included assessment of the comparator arm and a domain to describe blinding. There were concerns about scoring eligibility and recruitment domains for cluster trials. Suggested areas for improvement across domains included the need for more scoring guidance for explanatory design choices.</div></div><div><h3>Discussion</h3><div>Published comments recognise PRECIS-2's success in aiding trialists with pragmatic or explanatory design choices. Enhancing its implementation and widespread use will involve adding new domains, refining domain definitions, and addressing overall tool issues. This citation review offers valuable user feedback, pivotal for shaping the upcoming version of the PRECIS tool, PRECIS-3.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"176 ","pages":"Article 111534"},"PeriodicalIF":7.3,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142300295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-12DOI: 10.1016/j.jclinepi.2024.111531
Kamaryn T. Tanner , Karla Diaz-Ordaz , Ruth H. Keogh
Objectives
We describe the steps for implementing a dynamic updating pipeline for clinical prediction models and illustrate the proposed methods in an application of 5-year survival prediction in cystic fibrosis.
Study Design and Setting
Dynamic model updating refers to the process of repeated updating of a clinical prediction model with new information to counter performance degradation. We describe 2 types of updating pipeline: “proactive updating” where candidate model updates are tested any time new data are available, and “reactive updating” where updates are only made when performance of the current model declines or the model structure changes. Methods for selecting the best candidate updating model are based on measures of predictive performance under the 2 pipelines. The methods are illustrated in our motivating example of a 5-year survival prediction model in cystic fibrosis. Over a dynamic updating period of 10 years, we report the updating decisions made and the performance of the prediction models selected under each pipeline.
Results
Both the proactive and reactive updating pipelines produced survival prediction models that overall had better performance in terms of calibration and discrimination than a model that was not updated. Further, use of the dynamic updating pipelines ensured that the prediction model’s performance was consistently and frequently reviewed in new data.
Conclusion
Implementing a dynamic updating pipeline will help guard against model performance degradation while ensuring that the updating process is principled and data-driven.
{"title":"Implementation of a dynamic model updating pipeline provides a systematic process for maintaining performance of prediction models","authors":"Kamaryn T. Tanner , Karla Diaz-Ordaz , Ruth H. Keogh","doi":"10.1016/j.jclinepi.2024.111531","DOIUrl":"10.1016/j.jclinepi.2024.111531","url":null,"abstract":"<div><h3>Objectives</h3><div>We describe the steps for implementing a dynamic updating pipeline for clinical prediction models and illustrate the proposed methods in an application of 5-year survival prediction in cystic fibrosis.</div></div><div><h3>Study Design and Setting</h3><div>Dynamic model updating refers to the process of repeated updating of a clinical prediction model with new information to counter performance degradation. We describe 2 types of updating pipeline: “proactive updating” where candidate model updates are tested any time new data are available, and “reactive updating” where updates are only made when performance of the current model declines or the model structure changes. Methods for selecting the best candidate updating model are based on measures of predictive performance under the 2 pipelines. The methods are illustrated in our motivating example of a 5-year survival prediction model in cystic fibrosis. Over a dynamic updating period of 10 years, we report the updating decisions made and the performance of the prediction models selected under each pipeline.</div></div><div><h3>Results</h3><div>Both the proactive and reactive updating pipelines produced survival prediction models that overall had better performance in terms of calibration and discrimination than a model that was not updated. Further, use of the dynamic updating pipelines ensured that the prediction model’s performance was consistently and frequently reviewed in new data.</div></div><div><h3>Conclusion</h3><div>Implementing a dynamic updating pipeline will help guard against model performance degradation while ensuring that the updating process is principled and data-driven.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"175 ","pages":"Article 111531"},"PeriodicalIF":7.3,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142300197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-12DOI: 10.1016/j.jclinepi.2024.111533
Tim Woelfle , Julian Hirt , Perrine Janiaud , Ludwig Kappos , John P.A. Ioannidis , Lars G. Hemkens
Background and Objective
It is unknown whether large language models (LLMs) may facilitate time- and resource-intensive text-related processes in evidence appraisal. The objective was to quantify the agreement of LLMs with human consensus in appraisal of scientific reporting (Preferred Reporting Items for Systematic reviews and Meta-Analyses [PRISMA]) and methodological rigor (A MeaSurement Tool to Assess systematic Reviews [AMSTAR]) of systematic reviews and design of clinical trials (PRagmatic Explanatory Continuum Indicator Summary 2 [PRECIS-2]) and to identify areas where collaboration between humans and artificial intelligence (AI) would outperform the traditional consensus process of human raters in efficiency.
Study Design and Setting
Five LLMs (Claude-3-Opus, Claude-2, GPT-4, GPT-3.5, Mixtral-8x22B) assessed 112 systematic reviews applying the PRISMA and AMSTAR criteria and 56 randomized controlled trials applying PRECIS-2. We quantified the agreement between human consensus and (1) individual human raters; (2) individual LLMs; (3) combined LLMs approach; (4) human–AI collaboration. Ratings were marked as deferred (undecided) in case of inconsistency between combined LLMs or between the human rater and the LLM.
Results
Individual human rater accuracy was 89% for PRISMA and AMSTAR, and 75% for PRECIS-2. Individual LLM accuracy was ranging from 63% (GPT-3.5) to 70% (Claude-3-Opus) for PRISMA, 53% (GPT-3.5) to 74% (Claude-3-Opus) for AMSTAR, and 38% (GPT-4) to 55% (GPT-3.5) for PRECIS-2. Combined LLM ratings led to accuracies of 75%–88% for PRISMA (4%–74% deferred), 74%–89% for AMSTAR (6%–84% deferred), and 64%–79% for PRECIS-2 (29%–88% deferred). Human–AI collaboration resulted in the best accuracies from 89% to 96% for PRISMA (25/35% deferred), 91%–95% for AMSTAR (27/30% deferred), and 80%–86% for PRECIS-2 (76/71% deferred).
Conclusion
Current LLMs alone appraised evidence worse than humans. Human–AI collaboration may reduce workload for the second human rater for the assessment of reporting (PRISMA) and methodological rigor (AMSTAR) but not for complex tasks such as PRECIS-2.
{"title":"Benchmarking Human–AI collaboration for common evidence appraisal tools","authors":"Tim Woelfle , Julian Hirt , Perrine Janiaud , Ludwig Kappos , John P.A. Ioannidis , Lars G. Hemkens","doi":"10.1016/j.jclinepi.2024.111533","DOIUrl":"10.1016/j.jclinepi.2024.111533","url":null,"abstract":"<div><h3>Background and Objective</h3><div>It is unknown whether large language models (LLMs) may facilitate time- and resource-intensive text-related processes in evidence appraisal. The objective was to quantify the agreement of LLMs with human consensus in appraisal of scientific reporting (Preferred Reporting Items for Systematic reviews and Meta-Analyses [PRISMA]) and methodological rigor (A MeaSurement Tool to Assess systematic Reviews [AMSTAR]) of systematic reviews and design of clinical trials (PRagmatic Explanatory Continuum Indicator Summary 2 [PRECIS-2]) and to identify areas where collaboration between humans and artificial intelligence (AI) would outperform the traditional consensus process of human raters in efficiency.</div></div><div><h3>Study Design and Setting</h3><div>Five LLMs (Claude-3-Opus, Claude-2, GPT-4, GPT-3.5, Mixtral-8x22B) assessed 112 systematic reviews applying the PRISMA and AMSTAR criteria and 56 randomized controlled trials applying PRECIS-2. We quantified the agreement between human consensus and (1) individual human raters; (2) individual LLMs; (3) combined LLMs approach; (4) human–AI collaboration. Ratings were marked as deferred (undecided) in case of inconsistency between combined LLMs or between the human rater and the LLM.</div></div><div><h3>Results</h3><div>Individual human rater accuracy was 89% for PRISMA and AMSTAR, and 75% for PRECIS-2. Individual LLM accuracy was ranging from 63% (GPT-3.5) to 70% (Claude-3-Opus) for PRISMA, 53% (GPT-3.5) to 74% (Claude-3-Opus) for AMSTAR, and 38% (GPT-4) to 55% (GPT-3.5) for PRECIS-2. Combined LLM ratings led to accuracies of 75%–88% for PRISMA (4%–74% deferred), 74%–89% for AMSTAR (6%–84% deferred), and 64%–79% for PRECIS-2 (29%–88% deferred). Human–AI collaboration resulted in the best accuracies from 89% to 96% for PRISMA (25/35% deferred), 91%–95% for AMSTAR (27/30% deferred), and 80%–86% for PRECIS-2 (76/71% deferred).</div></div><div><h3>Conclusion</h3><div>Current LLMs alone appraised evidence worse than humans. Human–AI collaboration may reduce workload for the second human rater for the assessment of reporting (PRISMA) and methodological rigor (AMSTAR) but not for complex tasks such as PRECIS-2.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"175 ","pages":"Article 111533"},"PeriodicalIF":7.3,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142300294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-06DOI: 10.1016/j.jclinepi.2024.111517
Avraham Tenenbaum , Shoshana Revel-Vilk , Sivan Gazit , Michael Roimi , Aidan Gill , Dafna Gilboa , Ora Paltiel , Orly Manor , Varda Shalev , Gabriel Chodick
<div><h3>Objective</h3><div>The diagnosis of Gaucher disease (GD) presents a major challenge due to the high variability and low specificity of its clinical characteristics, along with limited physician awareness of the disease’s early symptoms. Early and accurate diagnosis is important to enable effective treatment decisions, prevent unnecessary testing, and facilitate genetic counseling. This study aimed to develop a machine learning (ML) model for GD screening and GD early diagnosis based on real-world clinical data using the Maccabi Healthcare Services electronic database, which contains 20 years of longitudinal data on approximately 2.6 million patients.</div></div><div><h3>Study Design and Setting</h3><div>We screened the Maccabi Healthcare Services database for patients with GD between January 1998 and May 2022. Eligible controls were matched by year of birth, sex, and socioeconomic status in a 1:13 ratio. The data were partitioned into 75% training and 25% test sets and trained to predict GD using features obtained from medical and laboratory records. Model performances were evaluated using the area under the receiver operating characteristic curve and the area under the precision-recall curve.</div></div><div><h3>Results</h3><div>We detected 264 confirmed patients with GD to which we matched 3,429 controls. The best model performance (which included known GD signs and symptoms, previously unknown clinical features, and administrative codes) on the test set had an area under the receiver operating characteristic curve = 0.95 ± 0.03 and area under the precision-recall curve = 0.80 ± 0.08, which yielded a median GD identification of 2.78 years earlier than the clinical diagnosis (25th–75th percentile: 1.29–4.53).</div></div><div><h3>Conclusion</h3><div>Using an ML approach on real-world data led to excellent discrimination between GD patients and controls, with the ability to detect GD significantly earlier than the time of actual diagnosis. Hence, this approach might be useful as a screening tool for GD and lead to earlier diagnosis and treatment. Furthermore, advanced ML analytics may highlight previously unrecognized features associated with GD, including clinical diagnoses and health-seeking behaviors.</div></div><div><h3>Plain Language Summary</h3><div>Diagnosing Gaucher disease is difficult, which often leads to late or incorrect diagnoses. As a result, patients may undergo unnecessary tests and treatments and experience health deterioration despite medications availability for Gaucher disease. In this study, we used electronic health data to develop machine learning models for early diagnosis of Gaucher disease type 1. Our models, which included known Gaucher disease signs and symptoms, previously unknown clinical features, and administrative codes, were able to significantly outperform other models and expert opinions, detecting type 1 Gaucher disease 3 years on average before actual diagnosis. Our models also revealed new features
{"title":"A machine learning model for early diagnosis of type 1 Gaucher disease using real-life data","authors":"Avraham Tenenbaum , Shoshana Revel-Vilk , Sivan Gazit , Michael Roimi , Aidan Gill , Dafna Gilboa , Ora Paltiel , Orly Manor , Varda Shalev , Gabriel Chodick","doi":"10.1016/j.jclinepi.2024.111517","DOIUrl":"10.1016/j.jclinepi.2024.111517","url":null,"abstract":"<div><h3>Objective</h3><div>The diagnosis of Gaucher disease (GD) presents a major challenge due to the high variability and low specificity of its clinical characteristics, along with limited physician awareness of the disease’s early symptoms. Early and accurate diagnosis is important to enable effective treatment decisions, prevent unnecessary testing, and facilitate genetic counseling. This study aimed to develop a machine learning (ML) model for GD screening and GD early diagnosis based on real-world clinical data using the Maccabi Healthcare Services electronic database, which contains 20 years of longitudinal data on approximately 2.6 million patients.</div></div><div><h3>Study Design and Setting</h3><div>We screened the Maccabi Healthcare Services database for patients with GD between January 1998 and May 2022. Eligible controls were matched by year of birth, sex, and socioeconomic status in a 1:13 ratio. The data were partitioned into 75% training and 25% test sets and trained to predict GD using features obtained from medical and laboratory records. Model performances were evaluated using the area under the receiver operating characteristic curve and the area under the precision-recall curve.</div></div><div><h3>Results</h3><div>We detected 264 confirmed patients with GD to which we matched 3,429 controls. The best model performance (which included known GD signs and symptoms, previously unknown clinical features, and administrative codes) on the test set had an area under the receiver operating characteristic curve = 0.95 ± 0.03 and area under the precision-recall curve = 0.80 ± 0.08, which yielded a median GD identification of 2.78 years earlier than the clinical diagnosis (25th–75th percentile: 1.29–4.53).</div></div><div><h3>Conclusion</h3><div>Using an ML approach on real-world data led to excellent discrimination between GD patients and controls, with the ability to detect GD significantly earlier than the time of actual diagnosis. Hence, this approach might be useful as a screening tool for GD and lead to earlier diagnosis and treatment. Furthermore, advanced ML analytics may highlight previously unrecognized features associated with GD, including clinical diagnoses and health-seeking behaviors.</div></div><div><h3>Plain Language Summary</h3><div>Diagnosing Gaucher disease is difficult, which often leads to late or incorrect diagnoses. As a result, patients may undergo unnecessary tests and treatments and experience health deterioration despite medications availability for Gaucher disease. In this study, we used electronic health data to develop machine learning models for early diagnosis of Gaucher disease type 1. Our models, which included known Gaucher disease signs and symptoms, previously unknown clinical features, and administrative codes, were able to significantly outperform other models and expert opinions, detecting type 1 Gaucher disease 3 years on average before actual diagnosis. Our models also revealed new features ","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"175 ","pages":"Article 111517"},"PeriodicalIF":7.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142156574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-06DOI: 10.1016/j.jclinepi.2024.111530
Jorge Arias-de la Torre, Jordi Alonso, Jose M Valderas
{"title":"Harmonization of data collection to improve clinical and public health evidence-based decision making.","authors":"Jorge Arias-de la Torre, Jordi Alonso, Jose M Valderas","doi":"10.1016/j.jclinepi.2024.111530","DOIUrl":"10.1016/j.jclinepi.2024.111530","url":null,"abstract":"","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":" ","pages":"111530"},"PeriodicalIF":7.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142146796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-06DOI: 10.1016/j.jclinepi.2024.111518
Declan Devane , Candyce Hamel , Gerald Gartlehner , Barbara Nussbaumer-Streit , Ursula Griebler , Lisa Affengruber , KM Saif-Ur-Rahman , Chantelle Garritty
Background and Objective
Rapid reviews have gained popularity as a pragmatic approach to synthesize evidence in a timely manner to inform decision-making in healthcare. This article provides an overview of the key concepts and methodological considerations in conducting rapid reviews, drawing from a series of recently published guidance papers by the Cochrane Rapid Reviews Methods Group.
Study Design and Setting
We discuss the definition, characteristics, and potential applications of rapid reviews and the trade-offs between speed and rigor. We present a practical example of a rapid review and highlight the methodological considerations outlined in the updated Cochrane guidance, including recommendations for literature searching, study selection, data extraction, risk of bias assessment, synthesis, and assessing the certainty of evidence.
Results
Rapid reviews can be a valuable tool for evidence-based decision-making, but it is essential to understand their limitations and adhere to methodological standards to ensure their validity and reliability.
Conclusion
As the demand for rapid evidence synthesis continues to grow, further research is needed to refine and standardize the methods and reporting of rapid reviews.
Plain Language Summary
Rapid reviews are a type of research method designed to quickly gather and summarize evidence to support decision-making in healthcare. They are particularly useful when timely information is needed, such as during a public health emergency. This article explains the key aspects of how rapid reviews are conducted, based on the latest guidance from experts. Rapid reviews involve several steps, including searching for relevant studies, selecting which studies to include, and carefully examining the quality of the evidence. Although rapid reviews are faster to complete than full systematic reviews, they still follow rigorous processes to ensure that the findings are reliable. This article also provides an example of a rapid review in action, demonstrating how these reviews can be applied in real-world situations. While rapid reviews are a powerful tool for making quick, evidence-based decisions, it is important to be aware of their limitations. Researchers must follow established methods to make sure the results are as accurate and useful as possible. As more people use rapid reviews, ongoing research is needed to improve and standardize how they are done.
{"title":"Key concepts in rapid reviews: an overview","authors":"Declan Devane , Candyce Hamel , Gerald Gartlehner , Barbara Nussbaumer-Streit , Ursula Griebler , Lisa Affengruber , KM Saif-Ur-Rahman , Chantelle Garritty","doi":"10.1016/j.jclinepi.2024.111518","DOIUrl":"10.1016/j.jclinepi.2024.111518","url":null,"abstract":"<div><h3>Background and Objective</h3><div>Rapid reviews have gained popularity as a pragmatic approach to synthesize evidence in a timely manner to inform decision-making in healthcare. This article provides an overview of the key concepts and methodological considerations in conducting rapid reviews, drawing from a series of recently published guidance papers by the Cochrane Rapid Reviews Methods Group.</div></div><div><h3>Study Design and Setting</h3><div>We discuss the definition, characteristics, and potential applications of rapid reviews and the trade-offs between speed and rigor. We present a practical example of a rapid review and highlight the methodological considerations outlined in the updated Cochrane guidance, including recommendations for literature searching, study selection, data extraction, risk of bias assessment, synthesis, and assessing the certainty of evidence.</div></div><div><h3>Results</h3><div>Rapid reviews can be a valuable tool for evidence-based decision-making, but it is essential to understand their limitations and adhere to methodological standards to ensure their validity and reliability.</div></div><div><h3>Conclusion</h3><div>As the demand for rapid evidence synthesis continues to grow, further research is needed to refine and standardize the methods and reporting of rapid reviews.</div></div><div><h3>Plain Language Summary</h3><div>Rapid reviews are a type of research method designed to quickly gather and summarize evidence to support decision-making in healthcare. They are particularly useful when timely information is needed, such as during a public health emergency. This article explains the key aspects of how rapid reviews are conducted, based on the latest guidance from experts. Rapid reviews involve several steps, including searching for relevant studies, selecting which studies to include, and carefully examining the quality of the evidence. Although rapid reviews are faster to complete than full systematic reviews, they still follow rigorous processes to ensure that the findings are reliable. This article also provides an example of a rapid review in action, demonstrating how these reviews can be applied in real-world situations. While rapid reviews are a powerful tool for making quick, evidence-based decisions, it is important to be aware of their limitations. Researchers must follow established methods to make sure the results are as accurate and useful as possible. As more people use rapid reviews, ongoing research is needed to improve and standardize how they are done.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"175 ","pages":"Article 111518"},"PeriodicalIF":7.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142156575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-05DOI: 10.1016/j.jclinepi.2024.111516
Annabelle R. Iken , Rudolf W. Poolman , Maaike G.J. Gademan
Objective
High-quality data entry in clinical trial databases is crucial to the usefulness, validity, and replicability of research findings, as it influences evidence-based medical practice and future research. Our aim is to assess the quality of self-reported data in trial registries and present practical and systematic methods for identifying and evaluating data quality.
Study Design and Setting
We searched ClinicalTrials.Gov (CTG) for interventional total knee arthroplasty (TKA) trials between 2000 and 2015. We extracted required and optional trial information elements and used the CTG's variables' definitions. We performed a literature review on data quality reporting on frameworks, checklists, and overviews of irregularities in healthcare databases. We identified and assessed data quality attributes as follows: consistency, accuracy, completeness, and timeliness.
Results
We included 816 interventional TKA trials. Data irregularities varied widely: 0%–100%. Inconsistency ranged from 0% to 36%, and most often nonrandomized labeled allocation was combined with a “single-group” assignment trial design. Inaccuracy ranged from 0% to 100%. Incompleteness ranged from 0% to 61%; 61% of finished TKA trials did not report their outcome. With regard to irregularities in timeliness, 49% of the trials were registered more than 3 months after the start date.
Conclusion
We found significant variations in the data quality of registered clinical TKA trials. Trial sponsors should be committed to ensuring that the information they provide is reliable, consistent, up-to-date, transparent, and accurate. CTG's users need to be critical when drawing conclusions based on the registered data. We believe this awareness will increase well-informed decisions about published articles and treatment protocols, including replicating and improving trial designs.
{"title":"Data quality assessment of interventional trials in public trial databases","authors":"Annabelle R. Iken , Rudolf W. Poolman , Maaike G.J. Gademan","doi":"10.1016/j.jclinepi.2024.111516","DOIUrl":"10.1016/j.jclinepi.2024.111516","url":null,"abstract":"<div><h3>Objective</h3><div>High-quality data entry in clinical trial databases is crucial to the usefulness, validity, and replicability of research findings, as it influences evidence-based medical practice and future research. Our aim is to assess the quality of self-reported data in trial registries and present practical and systematic methods for identifying and evaluating data quality.</div></div><div><h3>Study Design and Setting</h3><div>We searched ClinicalTrials.Gov (CTG) for interventional total knee arthroplasty (TKA) trials between 2000 and 2015. We extracted required and optional trial information elements and used the CTG's variables' definitions. We performed a literature review on data quality reporting on frameworks, checklists, and overviews of irregularities in healthcare databases. We identified and assessed data quality attributes as follows: consistency, accuracy, completeness, and timeliness.</div></div><div><h3>Results</h3><div>We included 816 interventional TKA trials. Data irregularities varied widely: 0%–100%. Inconsistency ranged from 0% to 36%, and most often nonrandomized labeled allocation was combined with a “single-group” assignment trial design. Inaccuracy ranged from 0% to 100%. Incompleteness ranged from 0% to 61%; 61% of finished TKA trials did not report their outcome. With regard to irregularities in timeliness, 49% of the trials were registered more than 3 months after the start date.</div></div><div><h3>Conclusion</h3><div>We found significant variations in the data quality of registered clinical TKA trials. Trial sponsors should be committed to ensuring that the information they provide is reliable, consistent, up-to-date, transparent, and accurate. CTG's users need to be critical when drawing conclusions based on the registered data. We believe this awareness will increase well-informed decisions about published articles and treatment protocols, including replicating and improving trial designs.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"175 ","pages":"Article 111516"},"PeriodicalIF":7.3,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142146795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}