Pub Date : 2026-02-26DOI: 10.1177/17407745251414681
Brian L Wiens
{"title":"Comment on: Shaping the future of clinical trials through strategic foresight.","authors":"Brian L Wiens","doi":"10.1177/17407745251414681","DOIUrl":"https://doi.org/10.1177/17407745251414681","url":null,"abstract":"","PeriodicalId":10685,"journal":{"name":"Clinical Trials","volume":" ","pages":"17407745251414681"},"PeriodicalIF":2.2,"publicationDate":"2026-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147289390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-25DOI: 10.1177/17407745261415792
Yousra Kherabi, Michael A Proschan, Lori E Dodd
Background: Response-adaptive randomization is controversial even in the best circumstances when based on a quickly determined primary outcome. In disease settings in which the primary outcome requires long follow-up, an intermediate endpoint may be chosen to update randomization allocations. The aim of our study is to evaluate the impact of response-adaptive randomization applied to an imperfect intermediate endpoint. We use tuberculosis trials as the motivating example.
Methods: We simulated a response-adaptive randomization design, adapting randomization allocations using an imperfect intermediate endpoint, in a superiority trial of two experimental regimens and one control arm. The primary study outcome was treatment success after 73 weeks from randomization; the intermediate endpoint was culture conversion at 8 weeks. We compared different sensitivity (Se) and specificity (Spe) scenarios for the intermediate endpoint, while varying the true treatment efficacy. We evaluated the performance of response-adaptive randomization to achieve its primary goal of allocating more participants to the better arm and the impact of time-trends on type I error rate.
Results: Even in an ideal state of perfect accuracy (i.e. intermediate endpoint with Se = 100% and Spe = 100%), response-adaptive randomization did not always live up to its main purpose of allocating more patients to the better arm. Lower accuracy of the intermediate endpoint leads to greater divergence from the goal of more allocations to the better arm. The larger the difference in treatment efficacy between the arms, the more striking the impact of an intermediate endpoint with poor diagnostic accuracy. Time-trends inflate the type I error rate, and while stratified tests can correct this, they do so at the cost of a power loss. Allocating more patients to the worst arm increases power for comparisons with this arm but reduces power for comparisons of the best arm to control.
Conclusion: Given the objective of evaluating several new therapeutic regimens in a timely manner, response-adaptive randomization is tempting. However, it requires at least reliance on highly accurate intermediate endpoints, which are still no guarantee of response-adaptive randomization's trustworthiness.
{"title":"Response-adaptive randomization with imperfect intermediate endpoints.","authors":"Yousra Kherabi, Michael A Proschan, Lori E Dodd","doi":"10.1177/17407745261415792","DOIUrl":"https://doi.org/10.1177/17407745261415792","url":null,"abstract":"<p><strong>Background: </strong>Response-adaptive randomization is controversial even in the best circumstances when based on a quickly determined primary outcome. In disease settings in which the primary outcome requires long follow-up, an intermediate endpoint may be chosen to update randomization allocations. The aim of our study is to evaluate the impact of response-adaptive randomization applied to an imperfect intermediate endpoint. We use tuberculosis trials as the motivating example.</p><p><strong>Methods: </strong>We simulated a response-adaptive randomization design, adapting randomization allocations using an imperfect intermediate endpoint, in a superiority trial of two experimental regimens and one control arm. The primary study outcome was treatment success after 73 weeks from randomization; the intermediate endpoint was culture conversion at 8 weeks. We compared different sensitivity (Se) and specificity (Spe) scenarios for the intermediate endpoint, while varying the true treatment efficacy. We evaluated the performance of response-adaptive randomization to achieve its primary goal of allocating more participants to the better arm and the impact of time-trends on type I error rate.</p><p><strong>Results: </strong>Even in an ideal state of perfect accuracy (i.e. intermediate endpoint with Se = 100% and Spe = 100%), response-adaptive randomization did not always live up to its main purpose of allocating more patients to the better arm. Lower accuracy of the intermediate endpoint leads to greater divergence from the goal of more allocations to the better arm. The larger the difference in treatment efficacy between the arms, the more striking the impact of an intermediate endpoint with poor diagnostic accuracy. Time-trends inflate the type I error rate, and while stratified tests can correct this, they do so at the cost of a power loss. Allocating more patients to the worst arm increases power for comparisons with this arm but reduces power for comparisons of the best arm to control.</p><p><strong>Conclusion: </strong>Given the objective of evaluating several new therapeutic regimens in a timely manner, response-adaptive randomization is tempting. However, it requires at least reliance on highly accurate intermediate endpoints, which are still no guarantee of response-adaptive randomization's trustworthiness.</p>","PeriodicalId":10685,"journal":{"name":"Clinical Trials","volume":" ","pages":"17407745261415792"},"PeriodicalIF":2.2,"publicationDate":"2026-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147282663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-23DOI: 10.1177/17407745261417337
Steven Joffe, Elizabeth F Bair, Katharine A Gleason, Deborah E Sellers, Sarah A McGraw, Cary P Gross, Donna T Chen, Eric G Campbell, Michelle M Mello
<p><strong>Background/aims: </strong>Clinical trials are commonly believed to benefit from the involvement of an academic principal investigator who accepts responsibility for design, conduct, and reporting. Little evidence exists, however, about the importance that diverse stakeholders assign to the principal investigator's role in leadership of trials. Furthermore, few studies have examined whether and how beliefs about the role of the principal investigator might vary by funding source.</p><p><strong>Methods: </strong>We conducted parallel Delphi panel surveys with seven stakeholder groups (principal investigators, patient advocates, journal editors, public funders, industry representatives, United States Food and Drug Administration officials, and clinical trial cooperative-group chairs) to assess the extent to which respondents believed leadership of a multicenter randomized controlled trial by an academic principal investigator to be important, considering publicly and industry-funded trials separately. We then surveyed an international sample of principal investigators (N = 92) who had recently published a multicenter randomized controlled trial in a high-impact general medical, oncology, cardiovascular, or psychiatry journal to assess their normative views on the importance of the academic principal investigator in leading both publicly and industry-funded trials.</p><p><strong>Results: </strong>Several patterns emerged from the Delphi panel surveys. First, panelists viewed involvement of an identified academic principal investigator as most important at the design and planning and the interpretation and dissemination phases of a trial, as compared with the implementation and data collection phase. Second, panelists generally viewed involvement of an identified academic principal investigator as more important in publicly funded than in industry-funded trials. Finally, panelists representing industry stakeholders and United States Food and Drug Administration officials viewed involvement of an identified academic principal investigator as less important, especially for industry-funded trials, than did other groups. Respondents to the normative principal investigator survey generally endorsed the importance of academic principal investigators in leading multicenter randomized controlled trials, both overall (median rating 6 on the 0-6 point scale) and for trial-specific tasks. Both overall and with respect to specific tasks, however, respondents viewed an academic principal investigator's leadership as more important when considering publicly funded as compared with industry-funded trials.</p><p><strong>Conclusion: </strong>Although members of most stakeholder groups participating in Delphi surveys view involvement of an academic principal investigator with overall responsibility for a multicenter randomized controlled trial as very important, there are notable differences depending on the respondent's perspective, the specific trial-relat
{"title":"Stakeholder views about the responsibilities of principal investigators in multicenter randomized controlled trials.","authors":"Steven Joffe, Elizabeth F Bair, Katharine A Gleason, Deborah E Sellers, Sarah A McGraw, Cary P Gross, Donna T Chen, Eric G Campbell, Michelle M Mello","doi":"10.1177/17407745261417337","DOIUrl":"10.1177/17407745261417337","url":null,"abstract":"<p><strong>Background/aims: </strong>Clinical trials are commonly believed to benefit from the involvement of an academic principal investigator who accepts responsibility for design, conduct, and reporting. Little evidence exists, however, about the importance that diverse stakeholders assign to the principal investigator's role in leadership of trials. Furthermore, few studies have examined whether and how beliefs about the role of the principal investigator might vary by funding source.</p><p><strong>Methods: </strong>We conducted parallel Delphi panel surveys with seven stakeholder groups (principal investigators, patient advocates, journal editors, public funders, industry representatives, United States Food and Drug Administration officials, and clinical trial cooperative-group chairs) to assess the extent to which respondents believed leadership of a multicenter randomized controlled trial by an academic principal investigator to be important, considering publicly and industry-funded trials separately. We then surveyed an international sample of principal investigators (N = 92) who had recently published a multicenter randomized controlled trial in a high-impact general medical, oncology, cardiovascular, or psychiatry journal to assess their normative views on the importance of the academic principal investigator in leading both publicly and industry-funded trials.</p><p><strong>Results: </strong>Several patterns emerged from the Delphi panel surveys. First, panelists viewed involvement of an identified academic principal investigator as most important at the design and planning and the interpretation and dissemination phases of a trial, as compared with the implementation and data collection phase. Second, panelists generally viewed involvement of an identified academic principal investigator as more important in publicly funded than in industry-funded trials. Finally, panelists representing industry stakeholders and United States Food and Drug Administration officials viewed involvement of an identified academic principal investigator as less important, especially for industry-funded trials, than did other groups. Respondents to the normative principal investigator survey generally endorsed the importance of academic principal investigators in leading multicenter randomized controlled trials, both overall (median rating 6 on the 0-6 point scale) and for trial-specific tasks. Both overall and with respect to specific tasks, however, respondents viewed an academic principal investigator's leadership as more important when considering publicly funded as compared with industry-funded trials.</p><p><strong>Conclusion: </strong>Although members of most stakeholder groups participating in Delphi surveys view involvement of an academic principal investigator with overall responsibility for a multicenter randomized controlled trial as very important, there are notable differences depending on the respondent's perspective, the specific trial-relat","PeriodicalId":10685,"journal":{"name":"Clinical Trials","volume":" ","pages":"17407745261417337"},"PeriodicalIF":2.2,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12944536/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147275788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-23DOI: 10.1177/17407745261417308
Emma Davies Smith, Yun-Hee Choi, Vipul Jairath, Guangyong Zou
Background/AimsCluster randomized trials with multiple endpoints feature complex correlation structures. Estimating treatment effects in a meaningful way that respects differences in scale and clinical importance is challenging. Pairwise comparison methods address these challenges by constructing all pairs featuring one treatment and one control participant, then evaluating endpoints in hierarchical order. For cluster randomized trials featuring such a "hierarchical composite endpoint," we develop large-sample confidence interval estimators and hypothesis tests for the nonparametric treatment effect referred to here as the "win probability."MethodsFor each pair of participants (one treated and one control), responses on each endpoint are compared in order of descending clinical importance until it can be determined which participant responded better ("won") or all endpoints are exhausted. Dividing the number of wins attributed to the treatment arm by the total number of "pairwise comparisons" yields a point estimate of the win probability. The win probability can be transformed into alternative effect measures, including the "win difference" and "win odds." A two-stage procedure, or "win fraction" approach, is used to obtain variance estimators for the win probability. Each participant's multivariate response is transformed into a univariate "win fraction," which quantifies the proportion of times they won when compared to all participants in the comparison arm. A working linear mixed model is applied to the win fractions to obtain cluster-adjusted point estimates of the win probability and its variance. Inference proceeds by the central limit theorem. Simulation is used to assess the performance of the proposed estimators for a hierarchical composite endpoint comprised of one binary component (more important) and one continuous component (less important) across a range of cluster trial designs. Performance of an empirical bootstrap estimator is also investigated. A case study using data from the REACT cluster trial demonstrates application of the methods, and corresponding SAS and R code is provided.ResultsSimulation suggests that the nominal 95% coverage probability is well maintained and type I error is controlled. Due to the large-sample nature of our method, confidence intervals may be conservative (over coverage) for fewer than 30 clusters. In comparison, the empirical bootstrap estimator is liberal (under coverage) for all numbers of randomized clusters (up to 50).ConclusionOur win fraction method uses a working linear mixed model to obtain confidence intervals and hypothesis tests which respect coverage and type I error. It is faster than the bootstrap, applicable to multiple components on different scales, bypasses specification of complex correlation matrices, permits adjustment, and can be implemented in existing software.
{"title":"Confidence interval estimation for the win probability in cluster randomized trials with hierarchical composite endpoints using win fractions.","authors":"Emma Davies Smith, Yun-Hee Choi, Vipul Jairath, Guangyong Zou","doi":"10.1177/17407745261417308","DOIUrl":"10.1177/17407745261417308","url":null,"abstract":"<p><p>Background/AimsCluster randomized trials with multiple endpoints feature complex correlation structures. Estimating treatment effects in a meaningful way that respects differences in scale and clinical importance is challenging. Pairwise comparison methods address these challenges by constructing all pairs featuring one treatment and one control participant, then evaluating endpoints in hierarchical order. For cluster randomized trials featuring such a \"hierarchical composite endpoint,\" we develop large-sample confidence interval estimators and hypothesis tests for the nonparametric treatment effect referred to here as the \"win probability.\"MethodsFor each pair of participants (one treated and one control), responses on each endpoint are compared in order of descending clinical importance until it can be determined which participant responded better (\"won\") or all endpoints are exhausted. Dividing the number of wins attributed to the treatment arm by the total number of \"pairwise comparisons\" yields a point estimate of the win probability. The win probability can be transformed into alternative effect measures, including the \"win difference\" and \"win odds.\" A two-stage procedure, or \"win fraction\" approach, is used to obtain variance estimators for the win probability. Each participant's multivariate response is transformed into a univariate \"win fraction,\" which quantifies the proportion of times they won when compared to all participants in the comparison arm. A working linear mixed model is applied to the win fractions to obtain cluster-adjusted point estimates of the win probability and its variance. Inference proceeds by the central limit theorem. Simulation is used to assess the performance of the proposed estimators for a hierarchical composite endpoint comprised of one binary component (more important) and one continuous component (less important) across a range of cluster trial designs. Performance of an empirical bootstrap estimator is also investigated. A case study using data from the REACT cluster trial demonstrates application of the methods, and corresponding SAS and R code is provided.ResultsSimulation suggests that the nominal 95% coverage probability is well maintained and type I error is controlled. Due to the large-sample nature of our method, confidence intervals may be conservative (over coverage) for fewer than 30 clusters. In comparison, the empirical bootstrap estimator is liberal (under coverage) for all numbers of randomized clusters (up to 50).ConclusionOur win fraction method uses a working linear mixed model to obtain confidence intervals and hypothesis tests which respect coverage and type I error. It is faster than the bootstrap, applicable to multiple components on different scales, bypasses specification of complex correlation matrices, permits adjustment, and can be implemented in existing software.</p>","PeriodicalId":10685,"journal":{"name":"Clinical Trials","volume":" ","pages":"17407745261417308"},"PeriodicalIF":2.2,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12931659/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147269734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-17DOI: 10.1177/17407745251415538
Dongquan Bi, Andrew Copas, Brennan C Kahan
Background: An estimand is a clear description of the treatment effect a study aims to quantify. The ICH E9(R1) addendum lists five attributes that should be described as part of the estimand definition. However, the addendum was primarily developed for individually randomised trials. Cluster randomised trials, in which groups of individuals are randomised, have additional considerations for defining estimands (e.g. how individuals and clusters are weighted, how cluster-level intercurrent events are handled). However, it is currently unknown if estimands are being used in cluster randomised trials, or whether the considerations specific to cluster randomised trials are being described.
Methods: We reviewed 73 cluster randomised trials published between October 2023 and January 2024 that were indexed in MEDLINE. For each trial, we assessed whether the estimand for the primary outcome was described, or if not, whether it could be inferred from the statistical methods. We also assessed whether considerations specific to cluster randomised trials were described or inferable, how trials were analysed and whether key assumptions being made in the analysis (e.g. 'no informative cluster size') could be identified.
Results: No trials attempted to describe the estimand for their primary outcome. We were able to infer the five attributes outlined in ICH E9(R1) in only 49% of trials, and when including additional considerations specific to cluster randomised trials, this figure dropped to 21%. Key drivers of this ambiguity were lack of clarity around whether individual- or cluster-average effects were of interest (unclear in 63% of trials), and how cluster-level intercurrent events were handled (unclear in 21% of trials for which this was applicable). Over half of trials used mixed-effects models or generalising estimating equations with an exchangeable correlation structure, which make the assumption that there is no informative cluster size; however, only one of these trials performed sensitivity analyses to evaluate robustness of results to deviations from this assumption. There were 14% of trials that used independence estimating equations or the analysis of cluster-level summaries; however, because no trials stated whether they were targeting the individual- or cluster-average effect, it was impossible to determine whether these methods implemented the appropriate weighting scheme and were thus unbiased.
Conclusion: The uptake of estimands in published cluster randomised trial articles is low, making it difficult to ascertain which questions were being investigated or whether statistical estimators were appropriate for those questions. This highlights an urgent need to develop guidelines on defining estimands that cover unique aspects of cluster randomised trials to ensure clarity of research questions in these trials.
{"title":"Use of estimands in cluster randomised trials: A review.","authors":"Dongquan Bi, Andrew Copas, Brennan C Kahan","doi":"10.1177/17407745251415538","DOIUrl":"https://doi.org/10.1177/17407745251415538","url":null,"abstract":"<p><strong>Background: </strong>An estimand is a clear description of the treatment effect a study aims to quantify. The ICH E9(R1) addendum lists five attributes that should be described as part of the estimand definition. However, the addendum was primarily developed for individually randomised trials. Cluster randomised trials, in which groups of individuals are randomised, have additional considerations for defining estimands (e.g. how individuals and clusters are weighted, how cluster-level intercurrent events are handled). However, it is currently unknown if estimands are being used in cluster randomised trials, or whether the considerations specific to cluster randomised trials are being described.</p><p><strong>Methods: </strong>We reviewed 73 cluster randomised trials published between October 2023 and January 2024 that were indexed in MEDLINE. For each trial, we assessed whether the estimand for the primary outcome was described, or if not, whether it could be inferred from the statistical methods. We also assessed whether considerations specific to cluster randomised trials were described or inferable, how trials were analysed and whether key assumptions being made in the analysis (e.g. 'no informative cluster size') could be identified.</p><p><strong>Results: </strong>No trials attempted to describe the estimand for their primary outcome. We were able to infer the five attributes outlined in ICH E9(R1) in only 49% of trials, and when including additional considerations specific to cluster randomised trials, this figure dropped to 21%. Key drivers of this ambiguity were lack of clarity around whether individual- or cluster-average effects were of interest (unclear in 63% of trials), and how cluster-level intercurrent events were handled (unclear in 21% of trials for which this was applicable). Over half of trials used mixed-effects models or generalising estimating equations with an exchangeable correlation structure, which make the assumption that there is no informative cluster size; however, only one of these trials performed sensitivity analyses to evaluate robustness of results to deviations from this assumption. There were 14% of trials that used independence estimating equations or the analysis of cluster-level summaries; however, because no trials stated whether they were targeting the individual- or cluster-average effect, it was impossible to determine whether these methods implemented the appropriate weighting scheme and were thus unbiased.</p><p><strong>Conclusion: </strong>The uptake of estimands in published cluster randomised trial articles is low, making it difficult to ascertain which questions were being investigated or whether statistical estimators were appropriate for those questions. This highlights an urgent need to develop guidelines on defining estimands that cover unique aspects of cluster randomised trials to ensure clarity of research questions in these trials.</p>","PeriodicalId":10685,"journal":{"name":"Clinical Trials","volume":" ","pages":"17407745251415538"},"PeriodicalIF":2.2,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146212420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-16DOI: 10.1177/17407745251415535
Erica H Brittain, Raphaël N Morsomme, Michael A Proschan
Background/aims: In randomized two-armed clinical trials with binary endpoints, there may be uncertainty about the event probability, which is needed for sample size calculation. Survival trials are powered based on number of events rather than people, and this is advantageous because the number of events needed to achieve a desired power is less sensitive to an unknown parameter than is the number of people needed. We investigate and quantify this relative stability of number of events compared to number of people in the context of a randomized two-armed trial with equal sample sizes and a binary endpoint. In binary endpoint settings with such relative stability, we consider (1) enhancement of traditional adaptive trial design and (2) potential benefits of a simple event-driven strategy.
Methods: Using sample size formulas, we determine the relative stability of the expected number of events compared to the sample size for binary outcome trials using the relative risk, odds ratio, or risk difference. Simulations consider a simple event-driven design when there is relative stability; we evaluate type I error rate and power under various analysis methods and approaches to halting the trial.
Results: We find that the number of events is at least three times more stable than the sample size to achieve a specified power for the relative risk when the overall event probability is less than 1/3, and for the odds ratio when the overall event probability is less than 0.20. We show that this relative stability is independent of the type 1 and type 2 error rates and magnitude of the treatment effect. In a setting where the overall event probability is consistent with relative stability, simulations of an event-driven design show that asymptotic methods may have modestly high type I error rates, but that other approaches appear to have good operating characteristics.
Conclusion: In settings with moderately low event probabilities, thinking in terms of the number of events instead of sample size may (1) facilitate the planning of clinical trials and help determine whether a trial is futile, and (2) lead to a simple event-driven design for binary endpoints that may be feasible and appealing.
{"title":"Event-driven planning of two-armed trials with a binary endpoint.","authors":"Erica H Brittain, Raphaël N Morsomme, Michael A Proschan","doi":"10.1177/17407745251415535","DOIUrl":"https://doi.org/10.1177/17407745251415535","url":null,"abstract":"<p><strong>Background/aims: </strong>In randomized two-armed clinical trials with binary endpoints, there may be uncertainty about the event probability, which is needed for sample size calculation. Survival trials are powered based on number of events rather than people, and this is advantageous because the number of events needed to achieve a desired power is less sensitive to an unknown parameter than is the number of people needed. We investigate and quantify this relative stability of number of events compared to number of people in the context of a randomized two-armed trial with equal sample sizes and a binary endpoint. In binary endpoint settings with such relative stability, we consider (1) enhancement of traditional adaptive trial design and (2) potential benefits of a simple event-driven strategy.</p><p><strong>Methods: </strong>Using sample size formulas, we determine the relative stability of the expected number of events compared to the sample size for binary outcome trials using the relative risk, odds ratio, or risk difference. Simulations consider a simple event-driven design when there is relative stability; we evaluate type I error rate and power under various analysis methods and approaches to halting the trial.</p><p><strong>Results: </strong>We find that the number of events is at least three times more stable than the sample size to achieve a specified power for the relative risk when the overall event probability is less than 1/3, and for the odds ratio when the overall event probability is less than 0.20. We show that this relative stability is independent of the type 1 and type 2 error rates and magnitude of the treatment effect. In a setting where the overall event probability is consistent with relative stability, simulations of an event-driven design show that asymptotic methods may have modestly high type I error rates, but that other approaches appear to have good operating characteristics.</p><p><strong>Conclusion: </strong>In settings with moderately low event probabilities, thinking in terms of the number of events instead of sample size may (1) facilitate the planning of clinical trials and help determine whether a trial is futile, and (2) lead to a simple event-driven design for binary endpoints that may be feasible and appealing.</p>","PeriodicalId":10685,"journal":{"name":"Clinical Trials","volume":" ","pages":"17407745251415535"},"PeriodicalIF":2.2,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146206838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-12DOI: 10.1177/17407745251405386
Robert Y Lee, Kevin S Li, James Sibley, Trevor Cohen, William B Lober, Janaki O'Brien, Nicole LeDuc, Kasey Mallon Andrews, Anna Ungar, Jessica Walsh, Elizabeth L Nielsen, Danae G Dotolo, Erin K Kross
Background: Natural language processing allows efficient extraction of clinical variables and outcomes from electronic health records (EHRs). However, measuring pragmatic clinical trial outcomes may demand accuracy that exceeds natural language processing performance. Combining natural language processing with human adjudication can address this gap, yet few software solutions support such workflows. We developed a modular, scalable system for natural language processing-screened human abstraction to measure the primary outcomes of two clinical trials.
Methods: In two clinical trials of hospitalized patients with serious illness, a deep-learning natural language processing model screened electronic health record passages for documented goals-of-care discussions. Screen-positive passages were referred for human adjudication using a REDCap-based system to measure the trial outcomes. Dynamic pooling of passages using structured query language within the REDCap database reduced unnecessary abstraction while ensuring data completeness.
Results: In the first trial (N = 2512), natural language processing identified 22,187 screen-positive passages (0.8%) from 2.6 million electronic health record passages. Human reviewers adjudicated 7494 passages over 34.3 abstractor-hours to measure the cumulative incidence and time to first documented goals-of-care discussion for all patients with 92.6% patient-level sensitivity. In the second trial (N = 617), natural language processing identified 8952 screen-positive passages (1.6%) from 559,596 passages at a threshold with near-100% sensitivity. Human reviewers adjudicated 3509 passages over 27.9 abstractor-hours to measure the same outcome for all patients.
Discussion: We present the design and source code for a scalable and efficient pipeline for measuring complex electronic health record-derived outcomes using natural language processing-screened human abstraction. This implementation is adaptable to diverse research needs, and its modular pipeline represents a practical middle ground between custom software and commercial platforms.
{"title":"A modular pipeline for natural language processing-screened human abstraction of a pragmatic trial outcome from electronic health records.","authors":"Robert Y Lee, Kevin S Li, James Sibley, Trevor Cohen, William B Lober, Janaki O'Brien, Nicole LeDuc, Kasey Mallon Andrews, Anna Ungar, Jessica Walsh, Elizabeth L Nielsen, Danae G Dotolo, Erin K Kross","doi":"10.1177/17407745251405386","DOIUrl":"10.1177/17407745251405386","url":null,"abstract":"<p><strong>Background: </strong>Natural language processing allows efficient extraction of clinical variables and outcomes from electronic health records (EHRs). However, measuring pragmatic clinical trial outcomes may demand accuracy that exceeds natural language processing performance. Combining natural language processing with human adjudication can address this gap, yet few software solutions support such workflows. We developed a modular, scalable system for natural language processing-screened human abstraction to measure the primary outcomes of two clinical trials.</p><p><strong>Methods: </strong>In two clinical trials of hospitalized patients with serious illness, a deep-learning natural language processing model screened electronic health record passages for documented goals-of-care discussions. Screen-positive passages were referred for human adjudication using a REDCap-based system to measure the trial outcomes. Dynamic pooling of passages using structured query language within the REDCap database reduced unnecessary abstraction while ensuring data completeness.</p><p><strong>Results: </strong>In the first trial (N = 2512), natural language processing identified 22,187 screen-positive passages (0.8%) from 2.6 million electronic health record passages. Human reviewers adjudicated 7494 passages over 34.3 abstractor-hours to measure the cumulative incidence and time to first documented goals-of-care discussion for all patients with 92.6% patient-level sensitivity. In the second trial (N = 617), natural language processing identified 8952 screen-positive passages (1.6%) from 559,596 passages at a threshold with near-100% sensitivity. Human reviewers adjudicated 3509 passages over 27.9 abstractor-hours to measure the same outcome for all patients.</p><p><strong>Discussion: </strong>We present the design and source code for a scalable and efficient pipeline for measuring complex electronic health record-derived outcomes using natural language processing-screened human abstraction. This implementation is adaptable to diverse research needs, and its modular pipeline represents a practical middle ground between custom software and commercial platforms.</p>","PeriodicalId":10685,"journal":{"name":"Clinical Trials","volume":" ","pages":"17407745251405386"},"PeriodicalIF":2.2,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12912770/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146178017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-05DOI: 10.1177/17407745251415190
Louis Y Li, Reshma Ramachandran, Joseph S Ross, Joshua D Wallach
<p><strong>Background/aims: </strong>There is growing interest in leveraging real-world data, such as electronic health records, administrative claims data, and patient registries, to generate real-world evidence studies that support the U.S. Food and Drug Administration's premarket and postmarket regulatory determinations of effectiveness and/or safety for novel therapeutics. We examined the frequency and characteristics of real-world evidence studies used by the U.S. Food and Drug Administration to support premarket determinations of effectiveness and/or safety, as well as those required or requested by the U.S. Food and Drug Administration to be conducted postmarket after approval.</p><p><strong>Methods: </strong>We identified all novel therapeutics approved by the U.S. Food and Drug Administration between 2016 and 2024, using action packages from the Drugs@FDA database. Product labels, approval letters, and review documents were used to identify real-world evidence studies supporting premarket determinations of effectiveness and/or safety, as well as all postmarketing requirements or commitments outlined at the time of approval. Outcomes included the number of novel therapeutics approved with premarket and/or postmarket real-world evidence studies and characteristics of these studies, including study design, data source, and primary objectives.</p><p><strong>Results: </strong>From 2016 to 2024, the U.S. Food and Drug Administration approved 400 novel therapeutics for 543 indications, of which 43 (10.8%) had at least one real-world evidence study that supported premarket determinations of effectiveness and/or safety (64 unique studies), and 138 (34.5%) had at least one real-world evidence study required or requested by the U.S. Food and Drug Administration to be conducted postmarket after approval (208 unique studies). Among the 64 unique premarket real-world evidence studies, the most common study designs were non-interventional (observational) studies (35, 54.7%) and externally controlled trials (17, 26.6%); 38 (59.4%) studies utilized electronic health or medical records, and 47 (73.4%) provided evidence on effectiveness. Among the 208 unique postmarket real-world evidence studies, the most common study design was non-interventional (observational) studies (159, 76.4%); 61 (29.3%) studies identified registries as the proposed data source, and 197 (94.7%) were designed to provide evidence on safety alone. The proportion of therapeutics approved with at least one postmarket real-world evidence study increased over time from 2 of 20 (10.0%) in 2016 to 23 of 47 (48.9%) in 2024; however, only 7 (3.4%) of these studies were classified by the U.S. Food and Drug Administration as fulfilled or submitted as of May 2025.</p><p><strong>Conclusions: </strong>Real-world evidence studies are infrequently used to support the U.S. Food and Drug Administration's premarket determinations of effectiveness and/or safety but have been increasingly required or re
{"title":"Premarket and postmarket real-world evidence studies supporting U.S. Food and Drug Administration regulatory decision-making, 2016-2024.","authors":"Louis Y Li, Reshma Ramachandran, Joseph S Ross, Joshua D Wallach","doi":"10.1177/17407745251415190","DOIUrl":"10.1177/17407745251415190","url":null,"abstract":"<p><strong>Background/aims: </strong>There is growing interest in leveraging real-world data, such as electronic health records, administrative claims data, and patient registries, to generate real-world evidence studies that support the U.S. Food and Drug Administration's premarket and postmarket regulatory determinations of effectiveness and/or safety for novel therapeutics. We examined the frequency and characteristics of real-world evidence studies used by the U.S. Food and Drug Administration to support premarket determinations of effectiveness and/or safety, as well as those required or requested by the U.S. Food and Drug Administration to be conducted postmarket after approval.</p><p><strong>Methods: </strong>We identified all novel therapeutics approved by the U.S. Food and Drug Administration between 2016 and 2024, using action packages from the Drugs@FDA database. Product labels, approval letters, and review documents were used to identify real-world evidence studies supporting premarket determinations of effectiveness and/or safety, as well as all postmarketing requirements or commitments outlined at the time of approval. Outcomes included the number of novel therapeutics approved with premarket and/or postmarket real-world evidence studies and characteristics of these studies, including study design, data source, and primary objectives.</p><p><strong>Results: </strong>From 2016 to 2024, the U.S. Food and Drug Administration approved 400 novel therapeutics for 543 indications, of which 43 (10.8%) had at least one real-world evidence study that supported premarket determinations of effectiveness and/or safety (64 unique studies), and 138 (34.5%) had at least one real-world evidence study required or requested by the U.S. Food and Drug Administration to be conducted postmarket after approval (208 unique studies). Among the 64 unique premarket real-world evidence studies, the most common study designs were non-interventional (observational) studies (35, 54.7%) and externally controlled trials (17, 26.6%); 38 (59.4%) studies utilized electronic health or medical records, and 47 (73.4%) provided evidence on effectiveness. Among the 208 unique postmarket real-world evidence studies, the most common study design was non-interventional (observational) studies (159, 76.4%); 61 (29.3%) studies identified registries as the proposed data source, and 197 (94.7%) were designed to provide evidence on safety alone. The proportion of therapeutics approved with at least one postmarket real-world evidence study increased over time from 2 of 20 (10.0%) in 2016 to 23 of 47 (48.9%) in 2024; however, only 7 (3.4%) of these studies were classified by the U.S. Food and Drug Administration as fulfilled or submitted as of May 2025.</p><p><strong>Conclusions: </strong>Real-world evidence studies are infrequently used to support the U.S. Food and Drug Administration's premarket determinations of effectiveness and/or safety but have been increasingly required or re","PeriodicalId":10685,"journal":{"name":"Clinical Trials","volume":" ","pages":"17407745251415190"},"PeriodicalIF":2.2,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12900038/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146124124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-10-16DOI: 10.1177/17407745251378407
Tansy Edwards, Jennifer Thompson, Charles Opondo, Elizabeth Allen
Background: Individual non-compliance with an intervention in cluster randomised trials can occur and estimating an intervention effect according to intention-to-treat ignores non-compliance and underestimates efficacy. The effect of the intervention among compliers (the complier average causal effect) provides an unbiased estimate of efficacy but inference can be complex in cluster randomised trials.
Methods: We evaluated the performance of a pragmatic bootstrapping approach accounting for clustering to obtain a 95% confidence interval (CI) for a CACE for cluster randomised trials with monotonicity and one-sided non-compliance. We investigated a variety of scenarios for correlated cluster-level prevalence of a binary outcome and non-compliance (5%, 10%, 20%, 30%, 40%). Cluster randomised trials were simulated with the minimum number of clusters to provide at least 80% and at least 90% power, to detect an ITT odds ratio (OR) of 0.5 with 100 individuals per cluster.
Results: Under all non-compliance scenarios (5%-40%), there was negligible bias for the CACE. In the worst-case of bias, a true OR of 0.18 was estimated as 0.15 for the rarest outcome (5%) and highest non-compliance (40%). There was no under-coverage of bootstrap CIs. CIs were the correct width for an outcome prevalence of 20%-40% but too wide for a less common outcome. Loss of power for a CACE bootstrap analysis versus ITT regression analysis increased as the prevalence of the outcome decreased across all non-compliance scenarios, particularly for an outcome prevalence of less than 20%.
Conclusions: Our bootstrapping approach provides an accessible and computationally simple method to evaluate efficacy in support of ITT analyses in cluster randomised trials.
{"title":"Practical inference for a complier average causal effect in cluster randomised trials with a binary outcome.","authors":"Tansy Edwards, Jennifer Thompson, Charles Opondo, Elizabeth Allen","doi":"10.1177/17407745251378407","DOIUrl":"10.1177/17407745251378407","url":null,"abstract":"<p><strong>Background: </strong>Individual non-compliance with an intervention in cluster randomised trials can occur and estimating an intervention effect according to intention-to-treat ignores non-compliance and underestimates efficacy. The effect of the intervention among compliers (the complier average causal effect) provides an unbiased estimate of efficacy but inference can be complex in cluster randomised trials.</p><p><strong>Methods: </strong>We evaluated the performance of a pragmatic bootstrapping approach accounting for clustering to obtain a 95% confidence interval (CI) for a CACE for cluster randomised trials with monotonicity and one-sided non-compliance. We investigated a variety of scenarios for correlated cluster-level prevalence of a binary outcome and non-compliance (5%, 10%, 20%, 30%, 40%). Cluster randomised trials were simulated with the minimum number of clusters to provide at least 80% and at least 90% power, to detect an ITT odds ratio (OR) of 0.5 with 100 individuals per cluster.</p><p><strong>Results: </strong>Under all non-compliance scenarios (5%-40%), there was negligible bias for the CACE. In the worst-case of bias, a true OR of 0.18 was estimated as 0.15 for the rarest outcome (5%) and highest non-compliance (40%). There was no under-coverage of bootstrap CIs. CIs were the correct width for an outcome prevalence of 20%-40% but too wide for a less common outcome. Loss of power for a CACE bootstrap analysis versus ITT regression analysis increased as the prevalence of the outcome decreased across all non-compliance scenarios, particularly for an outcome prevalence of less than 20%.</p><p><strong>Conclusions: </strong>Our bootstrapping approach provides an accessible and computationally simple method to evaluate efficacy in support of ITT analyses in cluster randomised trials.</p>","PeriodicalId":10685,"journal":{"name":"Clinical Trials","volume":" ","pages":"33-42"},"PeriodicalIF":2.2,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12909608/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145298963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-12-10DOI: 10.1177/17407745251389185
Lisa Eckstein, Akram Ibrahim, Olivia Orr, Annette Rid, Seema K Shah
Background: Data monitoring committees play a critical role in ensuring the ethical conduct of clinical trials. Data monitoring committee charters set out the role and processes for data monitoring committees in monitoring clinical trials; however, little is known about the information charters contain.
Methods: We conducted a summative content analysis of a convenience sample of data monitoring committee charters based on the criteria set out for charters by the DAMOCLES Study Group in 2005. Thirteen charters from public and commercially sponsored clinical trials were obtained for review.
Results: Although the data monitoring committee charters we analyzed broadly satisfied the criteria set out by the DAMOCLES Study Group, some issues warrant further attention. These included variability in the availability of unmasked data for review, communication across data monitoring committees for related trials, post-trial DMC responsibilities, and a need for more explicit decision-making processes and conflict resolution procedures. Moreover, few of the data monitoring committee charters we were able to analyze included legal protection for members.
Conclusion: Despite limitations due to the difficulties in obtaining data monitoring committee charters, the convenience sample reviewed suggests variability, including in terms of implementation of some best-practice recommendations. There is a need for further exploration of these issues in a larger sample size. Undertaking such research would be assisted by requiring or incentivizing public access to data monitoring committee charters.
{"title":"Charting the content of data monitoring committee charters for clinical trials.","authors":"Lisa Eckstein, Akram Ibrahim, Olivia Orr, Annette Rid, Seema K Shah","doi":"10.1177/17407745251389185","DOIUrl":"10.1177/17407745251389185","url":null,"abstract":"<p><strong>Background: </strong>Data monitoring committees play a critical role in ensuring the ethical conduct of clinical trials. Data monitoring committee charters set out the role and processes for data monitoring committees in monitoring clinical trials; however, little is known about the information charters contain.</p><p><strong>Methods: </strong>We conducted a summative content analysis of a convenience sample of data monitoring committee charters based on the criteria set out for charters by the DAMOCLES Study Group in 2005. Thirteen charters from public and commercially sponsored clinical trials were obtained for review.</p><p><strong>Results: </strong>Although the data monitoring committee charters we analyzed broadly satisfied the criteria set out by the DAMOCLES Study Group, some issues warrant further attention. These included variability in the availability of unmasked data for review, communication across data monitoring committees for related trials, post-trial DMC responsibilities, and a need for more explicit decision-making processes and conflict resolution procedures. Moreover, few of the data monitoring committee charters we were able to analyze included legal protection for members.</p><p><strong>Conclusion: </strong>Despite limitations due to the difficulties in obtaining data monitoring committee charters, the convenience sample reviewed suggests variability, including in terms of implementation of some best-practice recommendations. There is a need for further exploration of these issues in a larger sample size. Undertaking such research would be assisted by requiring or incentivizing public access to data monitoring committee charters.</p>","PeriodicalId":10685,"journal":{"name":"Clinical Trials","volume":" ","pages":"121-126"},"PeriodicalIF":2.2,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12841360/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145713342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}