A comparison of alternative ranking methods in two-stage clinical trials with multiple interventions: An application to the anxiolysis for laceration repair in children trial.
Nam-Anh Tran, Abigail McGrory, Naveen Poonai, Anna Heath
{"title":"A comparison of alternative ranking methods in two-stage clinical trials with multiple interventions: An application to the anxiolysis for laceration repair in children trial.","authors":"Nam-Anh Tran, Abigail McGrory, Naveen Poonai, Anna Heath","doi":"10.1177/17407745241251812","DOIUrl":null,"url":null,"abstract":"<p><strong>Background/aims: </strong>Multi-arm, multi-stage trials frequently include a standard care to which all interventions are compared. This may increase costs and hinders comparisons among the experimental arms. Furthermore, the standard care may not be evident, particularly when there is a large variation in standard practice. Thus, we aimed to develop an adaptive clinical trial that drops ineffective interventions following an interim analysis before selecting the best intervention at the final stage without requiring a standard care.</p><p><strong>Methods: </strong>We used Bayesian methods to develop a multi-arm, two-stage adaptive trial and evaluated two different methods for ranking interventions, the probability that each intervention was optimal (P<sub><i>best</i></sub>) and using the surface under the cumulative ranking curve (SUCRA), at both the interim and final analysis. The proposed trial design determines the maximum sample size for each intervention using the Average Length Criteria. The interim analysis takes place at approximately half the pre-specified maximum sample size and aims to drop interventions for futility if either P<sub><i>best</i></sub> or the SUCRA is below a pre-specified threshold. The final analysis compares all remaining interventions at the maximum sample size to conclude superiority based on either P<sub><i>best</i></sub> or the SUCRA. The two ranking methods were compared across 12 scenarios that vary the number of interventions and the assumed differences between the interventions. The thresholds for futility and superiority were chosen to control type 1 error, and then the predictive power and expected sample size were evaluated across scenarios. A trial comparing three interventions that aim to reduce anxiety for children undergoing a laceration repair in the emergency department was then designed, known as the Anxiolysis for Laceration Repair in Children Trial (ALICE) trial.</p><p><strong>Results: </strong>As the number of interventions increases, the SUCRA results in a higher predictive power compared with P<sub><i>best</i></sub>. Using P<sub><i>best</i></sub> results in a lower expected sample size when there is an effective intervention. Using the Average Length Criterion, the ALICE trial has a maximum sample size for each arm of 100 patients. This sample size results in a 86% and 85% predictive power using P<sub><i>best</i></sub> and the SUCRA, respectively. Thus, we chose P<sub><i>best</i></sub> as the ranking method for the ALICE trial.</p><p><strong>Conclusion: </strong>Bayesian ranking methods can be used in multi-arm, multi-stage trials with no clear control intervention. When more interventions are included, the SUCRA results in a higher power than P<sub><i>best</i></sub>. Future work should consider whether other ranking methods may also be relevant for clinical trial design.</p>","PeriodicalId":10685,"journal":{"name":"Clinical Trials","volume":" ","pages":"17407745241251812"},"PeriodicalIF":2.2000,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11528845/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Trials","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/17407745241251812","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background/aims: Multi-arm, multi-stage trials frequently include a standard care to which all interventions are compared. This may increase costs and hinders comparisons among the experimental arms. Furthermore, the standard care may not be evident, particularly when there is a large variation in standard practice. Thus, we aimed to develop an adaptive clinical trial that drops ineffective interventions following an interim analysis before selecting the best intervention at the final stage without requiring a standard care.
Methods: We used Bayesian methods to develop a multi-arm, two-stage adaptive trial and evaluated two different methods for ranking interventions, the probability that each intervention was optimal (Pbest) and using the surface under the cumulative ranking curve (SUCRA), at both the interim and final analysis. The proposed trial design determines the maximum sample size for each intervention using the Average Length Criteria. The interim analysis takes place at approximately half the pre-specified maximum sample size and aims to drop interventions for futility if either Pbest or the SUCRA is below a pre-specified threshold. The final analysis compares all remaining interventions at the maximum sample size to conclude superiority based on either Pbest or the SUCRA. The two ranking methods were compared across 12 scenarios that vary the number of interventions and the assumed differences between the interventions. The thresholds for futility and superiority were chosen to control type 1 error, and then the predictive power and expected sample size were evaluated across scenarios. A trial comparing three interventions that aim to reduce anxiety for children undergoing a laceration repair in the emergency department was then designed, known as the Anxiolysis for Laceration Repair in Children Trial (ALICE) trial.
Results: As the number of interventions increases, the SUCRA results in a higher predictive power compared with Pbest. Using Pbest results in a lower expected sample size when there is an effective intervention. Using the Average Length Criterion, the ALICE trial has a maximum sample size for each arm of 100 patients. This sample size results in a 86% and 85% predictive power using Pbest and the SUCRA, respectively. Thus, we chose Pbest as the ranking method for the ALICE trial.
Conclusion: Bayesian ranking methods can be used in multi-arm, multi-stage trials with no clear control intervention. When more interventions are included, the SUCRA results in a higher power than Pbest. Future work should consider whether other ranking methods may also be relevant for clinical trial design.
期刊介绍:
Clinical Trials is dedicated to advancing knowledge on the design and conduct of clinical trials related research methodologies. Covering the design, conduct, analysis, synthesis and evaluation of key methodologies, the journal remains on the cusp of the latest topics, including ethics, regulation and policy impact.