John A Spanias, Robbie Buderi, Pierre-Louis Bourlon, Christopher Tso, Caleb Strait, David Saunders, Kayleen Ports, Weixi Chen, Rahul Jain, Bhargav Koduru, Danielle Gerome, Eric Yang, Silvy Saltzmann, Aniketh Talwai, Tanmay Jain, Jacob Aptekar
{"title":"Machine learning approach for detection of MACE events within clinical trial data.","authors":"John A Spanias, Robbie Buderi, Pierre-Louis Bourlon, Christopher Tso, Caleb Strait, David Saunders, Kayleen Ports, Weixi Chen, Rahul Jain, Bhargav Koduru, Danielle Gerome, Eric Yang, Silvy Saltzmann, Aniketh Talwai, Tanmay Jain, Jacob Aptekar","doi":"10.1080/10543406.2024.2420640","DOIUrl":null,"url":null,"abstract":"<p><p>Randomized controlled trials (RCTs) are the gold standard for clinical research but may not accurately reflect the impact of medicines in real-world settings. Supplementing RCTs with insights from real-world data (RWD) can address known limitations by including more diverse patient populations, additional types of sites-of-care, and practices more representative of the care most people receive. One current challenge in using RWD is the lack of an algorithmic approach to identifying outcomes. To address this, machine learning models for identifying a frequently used outcome, Major Adverse Cardiovascular Events (MACE), were developed in Clinical Trial Data (CTD). Anonymized CTD sourced from the Medidata Enterprise Data Store were used to develop model features on the condition that they would be useful for labelling MACE events and that they could also be found in RWD. These features were used to train three random forest models to identify each component of 3-point MACE in a patient's clinical trial journey. Performance metrics for the models are presented (recall = 0.72 [0.07], precision = 0.68 [0.12] - mean, [SD]) along with the top contributing features. We show that the models can be tuned specifically to replicate the adjudication panels' results and present a cost-benefit analysis for deploying such models in clinical trial settings. We demonstrate the viability of using advanced algorithms for identifying clinical outcomes in prospective clinical trials. Deployment of such models could reduce the resources required to conduct RCTs. Extending such models to RWD would facilitate approval of pragmatic clinical trials for regulatory submissions.</p>","PeriodicalId":54870,"journal":{"name":"Journal of Biopharmaceutical Statistics","volume":" ","pages":"1-16"},"PeriodicalIF":1.2000,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biopharmaceutical Statistics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/10543406.2024.2420640","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
引用次数: 0
Abstract
Randomized controlled trials (RCTs) are the gold standard for clinical research but may not accurately reflect the impact of medicines in real-world settings. Supplementing RCTs with insights from real-world data (RWD) can address known limitations by including more diverse patient populations, additional types of sites-of-care, and practices more representative of the care most people receive. One current challenge in using RWD is the lack of an algorithmic approach to identifying outcomes. To address this, machine learning models for identifying a frequently used outcome, Major Adverse Cardiovascular Events (MACE), were developed in Clinical Trial Data (CTD). Anonymized CTD sourced from the Medidata Enterprise Data Store were used to develop model features on the condition that they would be useful for labelling MACE events and that they could also be found in RWD. These features were used to train three random forest models to identify each component of 3-point MACE in a patient's clinical trial journey. Performance metrics for the models are presented (recall = 0.72 [0.07], precision = 0.68 [0.12] - mean, [SD]) along with the top contributing features. We show that the models can be tuned specifically to replicate the adjudication panels' results and present a cost-benefit analysis for deploying such models in clinical trial settings. We demonstrate the viability of using advanced algorithms for identifying clinical outcomes in prospective clinical trials. Deployment of such models could reduce the resources required to conduct RCTs. Extending such models to RWD would facilitate approval of pragmatic clinical trials for regulatory submissions.
期刊介绍:
The Journal of Biopharmaceutical Statistics, a rapid publication journal, discusses quality applications of statistics in biopharmaceutical research and development. Now publishing six times per year, it includes expositions of statistical methodology with immediate applicability to biopharmaceutical research in the form of full-length and short manuscripts, review articles, selected/invited conference papers, short articles, and letters to the editor. Addressing timely and provocative topics important to the biostatistical profession, the journal covers:
Drug, device, and biological research and development;
Drug screening and drug design;
Assessment of pharmacological activity;
Pharmaceutical formulation and scale-up;
Preclinical safety assessment;
Bioavailability, bioequivalence, and pharmacokinetics;
Phase, I, II, and III clinical development including complex innovative designs;
Premarket approval assessment of clinical safety;
Postmarketing surveillance;
Big data and artificial intelligence and applications.