Background/AimsClinical trials require numerous documents to be written: Protocols, consent forms, clinical study reports, and many others. Large language models offer the potential to rapidly generate first-draft versions of these documents; however, there are concerns about the quality of their output. Here, we report an evaluation of how good large language models are at generating sections of one such document, clinical trial protocols.MethodsUsing an off-the-shelf large language model, we generated protocol sections for a broad range of diseases and clinical trial phases. Each of these document sections we assessed across four dimensions: Clinical thinking and logic; Transparency and references; Medical and clinical terminology; and Content relevance and suitability. To improve performance, we used the retrieval-augmented generation method to enhance the large language model with accurate up-to-date information, including regulatory guidance documents and data from ClinicalTrials.gov. Using this retrieval-augmented generation large language model, we regenerated the same protocol sections and assessed them across the same four dimensions.ResultsWe find that the off-the-shelf large language model delivers reasonable results, especially when assessing content relevance and the correct use of medical and clinical terminology, with scores of over 80%. However, the off-the-shelf large language model shows limited performance in clinical thinking and logic and transparency and references, with assessment scores of ≈40% or less. The use of retrieval-augmented generation substantially improves the writing quality of the large language model, with clinical thinking and logic and transparency and references scores increasing to ≈80%. The retrieval-augmented generation method thus greatly improves the practical usability of large language models for clinical trial-related writing.DiscussionOur results suggest that hybrid large language model architectures, such as the retrieval-augmented generation method we utilized, offer strong potential for clinical trial-related writing, including a wide variety of documents. This is potentially transformative, since it addresses several major bottlenecks of drug development.
Background: Participant dependence, if present, must be accounted for in the analysis of randomized trials. This dependence, also referred to as "clustering," can occur in one or more trial arms. This dependence may predate randomization or arise after randomization. We examine three trial designs: one "fully clustered" (where all participants are dependent within clusters or groups) and two "partially clustered" (where some participants are dependent within clusters and some participants are completely independent of all others).
Methods: For these three designs, we (1) use causal models to non-parametrically describe the data generating process and formalize the dependence in the observed data distribution; (2) develop a novel implementation of targeted minimum loss-based estimation for analysis; (3) evaluate the finite-sample performance of targeted minimum loss-based estimation and common alternatives via a simulation study; and (4) apply the methods to real-data from the SEARCH-IPT trial.
Results: We show that the two randomization schemes resulting in partially clustered trials have the same dependence structure, enabling use of the same statistical methods for estimation and inference of causal effects. Our novel targeted minimum loss-based estimation approach leverages covariate adjustment and machine learning to improve precision and facilitates estimation of a large set of causal effects. In simulations, we demonstrate that targeted minimum loss-based estimation achieves comparable or markedly higher statistical power than common alternatives for these partially clustered designs. Finally, application of targeted minimum loss-based estimation to real data from the SEARCH-IPT trial resulted in 20%-57% efficiency gains, demonstrating the real-world consequences of our proposed approach.ConclusionsPartially clustered trial analysis can be made more efficient by implementing targeted minimum loss-based estimation, assuming care is taken to account for the dependent nature of the observed data.
Background/aims: Sample size determination for cluster randomised trials is challenging because it requires robust estimation of the intra-cluster correlation coefficient. Typically, the sample size is chosen to provide a certain level of power to reject the null hypothesis in a two-sample hypothesis test. This relies on the minimal clinically important difference and estimates for the overall standard deviation, the intra-cluster correlation coefficient and, if cluster sizes are assumed to be unequal, the coefficient of variation of the cluster size. Varying any of these parameters can have a strong effect on the required sample size. In particular, it is very sensitive to small differences in the intra-cluster correlation coefficient. A relevant intra-cluster correlation coefficient estimate is often not available, or the available estimate is imprecise due to being based on studies with low numbers of clusters. If the intra-cluster correlation coefficient value used in the power calculation is far from the unknown true value, this could lead to trials which are substantially over- or under-powered.
Methods: In this article, we propose a hybrid approach using Bayesian assurance to determine the sample size for a cluster randomised trial in combination with a frequentist analysis. Assurance is an alternative to traditional power, which incorporates the uncertainty on key parameters through a prior distribution. We suggest specifying prior distributions for the overall standard deviation, intra-cluster correlation coefficient and coefficient of variation of the cluster size, while still utilising the minimal clinically important difference. We illustrate the approach through the design of a cluster randomised trial in post-stroke incontinence and compare the results to those obtained from a standard power calculation.
Results: We show that assurance can be used to calculate a sample size based on an elicited prior distribution for the intra-cluster correlation coefficient, whereas a power calculation discards all of the information in the prior except for a single point estimate. Results show that this approach can avoid misspecifying sample sizes when the prior medians for the intra-cluster correlation coefficient are very similar, but the underlying prior distributions exhibit quite different behaviour. Incorporating uncertainty on all three of the nuisance parameters, rather than only on the intra-cluster correlation coefficient, does not notably increase the required sample size.
Conclusion: Assurance provides a better understanding of the probability of success of a trial given a particular minimal clinically important difference and can be used instead of power to produce sample sizes that are more robust to parameter uncertainty. This is especially useful when there is difficulty obtaining reliable parameter estimates.
Evidence-based medicine relies heavily on well-conducted clinical trials. Australia lacks a discipline-specific education pathway to provide the specialist skills necessary to conduct clinical trials to the highest standards. Unlike allied health professionals, clinical trialists who currently possess the specialist skills to conduct clinical trials do not receive professional recognition. The National Health and Medical Research Council defines 'clinical trialist' to include site staff as well as investigators. In this perspective piece, we explore the importance of discipline-specific education in creating a job-ready workforce of clinical trialists; the need for recognition of clinical trialists as an allied health profession in concert with their existing medical, nursing and other professional qualifications and outline a proposed specialist education and accreditation strategy.
BackgroundIn 2022, SWOG S1801 was the first trial to demonstrate that single-agent anti-PD-1 checkpoint inhibition used as neoadjuvant-adjuvant therapy leads to significantly improved outcomes compared to adjuvant-only therapy. Endpoints in trials comparing neoadjuvant-adjuvant to adjuvant strategies need special consideration to ensure that event measurement timing is appropriately accounted for in analyses to avoid biased comparisons artificially favoring one arm over another.MethodsThe S1801 trial is used a case study to evaluate the issues involved in selecting endpoints for trials comparing neoadjuvant-adjuvant versus adjuvant-only strategies.ResultsDefinitions and timing of measurement of events is provided. Trial scenarios when recurrence-free versus event-free survival should be used are provided.ConclusionsIn randomized trials comparing neoadjuvant-adjuvant to adjuvant-only strategies, event-free survival endpoints measured from randomization are required for unbiased comparison of the arms. The time at which events can be measured on each arm needs to be carefully considered. If measurement of events occurs at different times on the randomized arms, modified definitions of event-free survival must be used to avoid bias.
Best practices for design, conduct, analysis, and interpretation of randomized controlled trials should adhere to rigorous statistical principles. The reliable detection of small effects of treatment should be based on results reported from the primary pre-specified endpoints of large-scale randomized trials designed a priori to test relevant hypotheses. Inference about treatment should not be based on undue reliance on individual small trials, meta-analyses of small trials, subgroups, or post hoc analyses. Failure to follow these principles can lead to conclusions inconsistent with the totality of evidence and to inappropriate recommendations made by guideline committees. The American Heart Association/American College of Cardiology Task Force published guidelines to restrict aspirin for primary prevention of cardiovascular disease to patients below 70 years of age, and the United States Preventive Services Task Force to below 60 years. These guidelines were both unduly influenced by the Aspirin in Reducing Events in the Elderly trial, the results of which were uninformative; they did not provide evidence that aspirin showed no benefit in these age groups. We present several major methodological pitfalls in interpreting the results from the Aspirin in Reducing Events in the Elderly trial of aspirin in the primary prevention of cardiovascular disease. We believe that undue reliance on this uninformative trial has led to misinformed guidelines. Furthermore, given the totality of evidence, we believe that general guidelines for aspirin in the primary prevention of cardiovascular disease are unwarranted. Prescription should be based on an assessment of an individual's benefit to risk; age should be only one component of that assessment.

