Pub Date : 2025-04-01DOI: 10.1177/17407745251324866
Janet Wittes, David L DeMets, KyungMann Kim, Dennis G Maki, Marc A Pfeffer, J Michael Gaziano, Panagiota Kitsantas, Charles H Hennekens, Sarah K Wood
Best practices for design, conduct, analysis, and interpretation of randomized controlled trials should adhere to rigorous statistical principles. The reliable detection of small effects of treatment should be based on results reported from the primary pre-specified endpoints of large-scale randomized trials designed a priori to test relevant hypotheses. Inference about treatment should not be based on undue reliance on individual small trials, meta-analyses of small trials, subgroups, or post hoc analyses. Failure to follow these principles can lead to conclusions inconsistent with the totality of evidence and to inappropriate recommendations made by guideline committees. The American Heart Association/American College of Cardiology Task Force published guidelines to restrict aspirin for primary prevention of cardiovascular disease to patients below 70 years of age, and the United States Preventive Services Task Force to below 60 years. These guidelines were both unduly influenced by the Aspirin in Reducing Events in the Elderly trial, the results of which were uninformative; they did not provide evidence that aspirin showed no benefit in these age groups. We present several major methodological pitfalls in interpreting the results from the Aspirin in Reducing Events in the Elderly trial of aspirin in the primary prevention of cardiovascular disease. We believe that undue reliance on this uninformative trial has led to misinformed guidelines. Furthermore, given the totality of evidence, we believe that general guidelines for aspirin in the primary prevention of cardiovascular disease are unwarranted. Prescription should be based on an assessment of an individual's benefit to risk; age should be only one component of that assessment.
{"title":"Aspirin in primary prevention: Undue reliance on an uninformative trial led to misinformed clinical guidelines.","authors":"Janet Wittes, David L DeMets, KyungMann Kim, Dennis G Maki, Marc A Pfeffer, J Michael Gaziano, Panagiota Kitsantas, Charles H Hennekens, Sarah K Wood","doi":"10.1177/17407745251324866","DOIUrl":"https://doi.org/10.1177/17407745251324866","url":null,"abstract":"<p><p>Best practices for design, conduct, analysis, and interpretation of randomized controlled trials should adhere to rigorous statistical principles. The reliable detection of small effects of treatment should be based on results reported from the primary pre-specified endpoints of large-scale randomized trials designed a priori to test relevant hypotheses. Inference about treatment should not be based on undue reliance on individual small trials, meta-analyses of small trials, subgroups, or post hoc analyses. Failure to follow these principles can lead to conclusions inconsistent with the totality of evidence and to inappropriate recommendations made by guideline committees. The American Heart Association/American College of Cardiology Task Force published guidelines to restrict aspirin for primary prevention of cardiovascular disease to patients below 70 years of age, and the United States Preventive Services Task Force to below 60 years. These guidelines were both unduly influenced by the Aspirin in Reducing Events in the Elderly trial, the results of which were uninformative; they did not provide evidence that aspirin showed no benefit in these age groups. We present several major methodological pitfalls in interpreting the results from the Aspirin in Reducing Events in the Elderly trial of aspirin in the primary prevention of cardiovascular disease. We believe that undue reliance on this uninformative trial has led to misinformed guidelines. Furthermore, given the totality of evidence, we believe that general guidelines for aspirin in the primary prevention of cardiovascular disease are unwarranted. Prescription should be based on an assessment of an individual's benefit to risk; age should be only one component of that assessment.</p>","PeriodicalId":10685,"journal":{"name":"Clinical Trials","volume":" ","pages":"17407745251324866"},"PeriodicalIF":2.2,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143751495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-31DOI: 10.1177/17407745251324843
Janet Wittes, David L DeMets, KyungMann Kim, Dennis G Maki, Marc A Pfeffer, J Michael Gaziano, Panagiota Kitsantas, Charles H Hennekens, Sarah K Wood
{"title":"Response to Cleland and Anzar.","authors":"Janet Wittes, David L DeMets, KyungMann Kim, Dennis G Maki, Marc A Pfeffer, J Michael Gaziano, Panagiota Kitsantas, Charles H Hennekens, Sarah K Wood","doi":"10.1177/17407745251324843","DOIUrl":"https://doi.org/10.1177/17407745251324843","url":null,"abstract":"","PeriodicalId":10685,"journal":{"name":"Clinical Trials","volume":" ","pages":"17407745251324843"},"PeriodicalIF":2.2,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143751501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BackgroundNutrition and dietary trials are often prone to bias, leading to inaccurate or questionable estimates of intervention efficacy. However, reports on quality management practices of well-controlled dietary trials are scarce. This study aims to introduce the quality management system of the Diet, ExerCIse and CarDiovascular hEalth-Diet Study and report its performance in ensuring study quality.MethodsThe quality management system consisted of a study coordinating center, trial governance, and quality control measures covering study design, conduct, and data analysis and reporting. Metrics for evaluating the performance of the system were collected throughout the whole trial development and conducted from September 2016 to June 2021, covering major activities at the coordinating center, study sites, and central laboratories, with a focus on the protocol amendments, protocol deviations (eligibility, fidelity, confounders management, loss to follow-up and outside-of-window visits, and blindness success), and measurement accuracy.ResultsThree amendments to the study protocol enhanced feasibility. All participants (265) met the eligibility criteria. Among them, only 3% were lost to the primary outcome follow-up measurement. More than 95% of participants completed the study, they consumed more than 96% of the study meals, and more than 94% of participants consumed more than 18 meals per week, with no between-group differences. Online monitoring of nutrient targets for the intervention diet showed that all targets were achieved except for the fiber intake, which was 4.3 g less on average. Only 3% experienced a body weight change greater than 2.0 kg, and 3% had medication changes which were not allowed by the study. James' blinding index at the end of the study was 0.68. The end digits of both systolic and diastolic blood pressure readings were distributed equally. For laboratory measures, 100% of standard samples, 97% of blood-split samples, and 87% of urine-split samples had test results within the acceptable range. Only 1.4% of data items required queries, for which only 30% needed corrections.DiscussionThe Diet, ExerCIse and CarDiovascular hEalth-Diet Study quality management system provides a framework for conducting a high-quality dietary intervention clinical trial.
{"title":"Quality management of a multi-center randomized controlled feeding trial: A prospective observational study.","authors":"Xiayan Chen, Huijuan Li, Lin Feng, Xi Lan, Shuyi Li, Yanfang Zhao, Guo Zeng, Huilian Zhu, Jianqin Sun, Yanfang Wang, Yangfeng Wu","doi":"10.1177/17407745251324653","DOIUrl":"https://doi.org/10.1177/17407745251324653","url":null,"abstract":"<p><p>BackgroundNutrition and dietary trials are often prone to bias, leading to inaccurate or questionable estimates of intervention efficacy. However, reports on quality management practices of well-controlled dietary trials are scarce. This study aims to introduce the quality management system of the Diet, ExerCIse and CarDiovascular hEalth-Diet Study and report its performance in ensuring study quality.MethodsThe quality management system consisted of a study coordinating center, trial governance, and quality control measures covering study design, conduct, and data analysis and reporting. Metrics for evaluating the performance of the system were collected throughout the whole trial development and conducted from September 2016 to June 2021, covering major activities at the coordinating center, study sites, and central laboratories, with a focus on the protocol amendments, protocol deviations (eligibility, fidelity, confounders management, loss to follow-up and outside-of-window visits, and blindness success), and measurement accuracy.ResultsThree amendments to the study protocol enhanced feasibility. All participants (265) met the eligibility criteria. Among them, only 3% were lost to the primary outcome follow-up measurement. More than 95% of participants completed the study, they consumed more than 96% of the study meals, and more than 94% of participants consumed more than 18 meals per week, with no between-group differences. Online monitoring of nutrient targets for the intervention diet showed that all targets were achieved except for the fiber intake, which was 4.3 g less on average. Only 3% experienced a body weight change greater than 2.0 kg, and 3% had medication changes which were not allowed by the study. James' blinding index at the end of the study was 0.68. The end digits of both systolic and diastolic blood pressure readings were distributed equally. For laboratory measures, 100% of standard samples, 97% of blood-split samples, and 87% of urine-split samples had test results within the acceptable range. Only 1.4% of data items required queries, for which only 30% needed corrections.DiscussionThe Diet, ExerCIse and CarDiovascular hEalth-Diet Study quality management system provides a framework for conducting a high-quality dietary intervention clinical trial.</p>","PeriodicalId":10685,"journal":{"name":"Clinical Trials","volume":" ","pages":"17407745251324653"},"PeriodicalIF":2.2,"publicationDate":"2025-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143751500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Evidence-based medicine relies heavily on well-conducted clinical trials. Australia lacks a discipline-specific education pathway to provide the specialist skills necessary to conduct clinical trials to the highest standards. Unlike allied health professionals, clinical trialists who currently possess the specialist skills to conduct clinical trials do not receive professional recognition. The National Health and Medical Research Council defines 'clinical trialist' to include site staff as well as investigators. In this perspective piece, we explore the importance of discipline-specific education in creating a job-ready workforce of clinical trialists; the need for recognition of clinical trialists as an allied health profession in concert with their existing medical, nursing and other professional qualifications and outline a proposed specialist education and accreditation strategy.
{"title":"Building a professionally recognised clinical trial workforce: Is it time for an education and accreditation strategy?","authors":"Simone Spark, Prudence Perry, Thobekile Mthethwa-Pitt, Dragan Ilic, Anne Woollett, Sophia Zoungas, Marina Skiba","doi":"10.1177/17407745251328287","DOIUrl":"https://doi.org/10.1177/17407745251328287","url":null,"abstract":"<p><p>Evidence-based medicine relies heavily on well-conducted clinical trials. Australia lacks a discipline-specific education pathway to provide the specialist skills necessary to conduct clinical trials to the highest standards. Unlike allied health professionals, clinical trialists who currently possess the specialist skills to conduct clinical trials do not receive professional recognition. The National Health and Medical Research Council defines 'clinical trialist' to include site staff as well as investigators. In this perspective piece, we explore the importance of discipline-specific education in creating a job-ready workforce of clinical trialists; the need for recognition of clinical trialists as an allied health profession in concert with their existing medical, nursing and other professional qualifications and outline a proposed specialist education and accreditation strategy.</p>","PeriodicalId":10685,"journal":{"name":"Clinical Trials","volume":" ","pages":"17407745251328287"},"PeriodicalIF":2.2,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143718174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-18DOI: 10.1177/17407745251321371
Megan Othus, Elad Sharon, Michael C Wu, Vernon K Sondak, Antoni Ribas, Sapna P Patel
BackgroundIn 2022, SWOG S1801 was the first trial to demonstrate that single-agent anti-PD-1 checkpoint inhibition used as neoadjuvant-adjuvant therapy leads to significantly improved outcomes compared to adjuvant-only therapy. Endpoints in trials comparing neoadjuvant-adjuvant to adjuvant strategies need special consideration to ensure that event measurement timing is appropriately accounted for in analyses to avoid biased comparisons artificially favoring one arm over another.MethodsThe S1801 trial is used a case study to evaluate the issues involved in selecting endpoints for trials comparing neoadjuvant-adjuvant versus adjuvant-only strategies.ResultsDefinitions and timing of measurement of events is provided. Trial scenarios when recurrence-free versus event-free survival should be used are provided.ConclusionsIn randomized trials comparing neoadjuvant-adjuvant to adjuvant-only strategies, event-free survival endpoints measured from randomization are required for unbiased comparison of the arms. The time at which events can be measured on each arm needs to be carefully considered. If measurement of events occurs at different times on the randomized arms, modified definitions of event-free survival must be used to avoid bias.
{"title":"Design considerations for randomized comparisons of neoadjuvant-adjuvant versus adjuvant-only cancer immunotherapy when tumor measurement schedules do not align (SWOG S1801).","authors":"Megan Othus, Elad Sharon, Michael C Wu, Vernon K Sondak, Antoni Ribas, Sapna P Patel","doi":"10.1177/17407745251321371","DOIUrl":"https://doi.org/10.1177/17407745251321371","url":null,"abstract":"<p><p>BackgroundIn 2022, SWOG S1801 was the first trial to demonstrate that single-agent anti-PD-1 checkpoint inhibition used as neoadjuvant-adjuvant therapy leads to significantly improved outcomes compared to adjuvant-only therapy. Endpoints in trials comparing neoadjuvant-adjuvant to adjuvant strategies need special consideration to ensure that event measurement timing is appropriately accounted for in analyses to avoid biased comparisons artificially favoring one arm over another.MethodsThe S1801 trial is used a case study to evaluate the issues involved in selecting endpoints for trials comparing neoadjuvant-adjuvant versus adjuvant-only strategies.ResultsDefinitions and timing of measurement of events is provided. Trial scenarios when recurrence-free versus event-free survival should be used are provided.ConclusionsIn randomized trials comparing neoadjuvant-adjuvant to adjuvant-only strategies, event-free survival endpoints measured from randomization are required for unbiased comparison of the arms. The time at which events can be measured on each arm needs to be carefully considered. If measurement of events occurs at different times on the randomized arms, modified definitions of event-free survival must be used to avoid bias.</p>","PeriodicalId":10685,"journal":{"name":"Clinical Trials","volume":" ","pages":"17407745251321371"},"PeriodicalIF":2.2,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143656289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-12DOI: 10.1177/17407745251320888
Laura Doherty, Catherine Arundel, Elizabeth Coleman, Ailish Byrne, Katherine Jones
<p><strong>Background: </strong>Randomised controlled trials are widely accepted as the gold standard research methodology for the evaluation of interventions. However, they often display poor participant retention. To prevent this, various participant interventions have been identified and evaluated through the use of studies within a trial. Two such interventions are participant short message service reminders (also known as text-messages) and personalised participant short message service reminders, designed to encourage a participant to return a study questionnaire. While previous studies within a trial have evaluated the effectiveness of these two retention strategies, trialists continue to spend both time and money on these strategies while the evidence remains inconclusive.</p><p><strong>Methods: </strong>This systematic review and meta-analysis compared the use of short message service reminders with no short message service reminder and personalised short message service reminders with non-personalised short message service reminders, on participant retention. Eligible studies were identified through advanced searches of electronic databases (MEDLINE, EMBASE and Cochrane Library) and hand-searching of alternative information sources. The review primary outcome was the proportion of study questionnaires returned for the individual study within a trial primary analysis time points.</p><p><strong>Results: </strong>Nine eligible studies within a trial were identified, of which four compared short message service versus no short message service and five compared personalised short message service versus non-personalised short message service. For those that compared personalised short message service versus non-personalised short message service, only three were deemed appropriate for meta-analysis. The primary outcome results for short message service versus no short message service concluded that short message service led to a statistically non-significant increase in the odds of study questionnaire return by 9% (odds ratio = 1.09, 95% confidence interval = 0.92 to 1.30). Similarly, comparison of personalised short message service versus non-personalised short message service concluded that personalised short message service caused a statistically non-significant increase in odds by 22% (odds ratio = 1.22, 95% confidence interval = 0.95 to 1.59).</p><p><strong>Conclusion: </strong>The effectiveness of both short message service and personalised short message service as retention tools remains inconclusive and further study within a trial evaluations are required. However, as short message services are low in cost, easy to use and generally well accepted by participants, it is suggested that trialists adopt a pragmatic approach and utilise these reminders until further research is conducted. Given both the minimal addition in cost for studies already utilising short message service reminders and some evidence of effect, personalisation shoul
{"title":"Evaluating the use of text-message reminders and personalised text-message reminders on the return of participant questionnaires in trials, a systematic review and meta-analysis.","authors":"Laura Doherty, Catherine Arundel, Elizabeth Coleman, Ailish Byrne, Katherine Jones","doi":"10.1177/17407745251320888","DOIUrl":"https://doi.org/10.1177/17407745251320888","url":null,"abstract":"<p><strong>Background: </strong>Randomised controlled trials are widely accepted as the gold standard research methodology for the evaluation of interventions. However, they often display poor participant retention. To prevent this, various participant interventions have been identified and evaluated through the use of studies within a trial. Two such interventions are participant short message service reminders (also known as text-messages) and personalised participant short message service reminders, designed to encourage a participant to return a study questionnaire. While previous studies within a trial have evaluated the effectiveness of these two retention strategies, trialists continue to spend both time and money on these strategies while the evidence remains inconclusive.</p><p><strong>Methods: </strong>This systematic review and meta-analysis compared the use of short message service reminders with no short message service reminder and personalised short message service reminders with non-personalised short message service reminders, on participant retention. Eligible studies were identified through advanced searches of electronic databases (MEDLINE, EMBASE and Cochrane Library) and hand-searching of alternative information sources. The review primary outcome was the proportion of study questionnaires returned for the individual study within a trial primary analysis time points.</p><p><strong>Results: </strong>Nine eligible studies within a trial were identified, of which four compared short message service versus no short message service and five compared personalised short message service versus non-personalised short message service. For those that compared personalised short message service versus non-personalised short message service, only three were deemed appropriate for meta-analysis. The primary outcome results for short message service versus no short message service concluded that short message service led to a statistically non-significant increase in the odds of study questionnaire return by 9% (odds ratio = 1.09, 95% confidence interval = 0.92 to 1.30). Similarly, comparison of personalised short message service versus non-personalised short message service concluded that personalised short message service caused a statistically non-significant increase in odds by 22% (odds ratio = 1.22, 95% confidence interval = 0.95 to 1.59).</p><p><strong>Conclusion: </strong>The effectiveness of both short message service and personalised short message service as retention tools remains inconclusive and further study within a trial evaluations are required. However, as short message services are low in cost, easy to use and generally well accepted by participants, it is suggested that trialists adopt a pragmatic approach and utilise these reminders until further research is conducted. Given both the minimal addition in cost for studies already utilising short message service reminders and some evidence of effect, personalisation shoul","PeriodicalId":10685,"journal":{"name":"Clinical Trials","volume":" ","pages":"17407745251320888"},"PeriodicalIF":2.2,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143604142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-02DOI: 10.1177/17407745251313925
Yoseph Caraco, Matthew G Johnson, Joseph A Chiarappa, Brian M Maas, Julie A Stone, Matthew L Rizk, Mary Vesnesky, Julie M Strizki, Angela Williams-Diaz, Michelle L Brown, Patricia Carmelitano, Hong Wan, Alison Pedley, Akshita Chawla, Dominik J Wolf, Jay A Grobler, Amanda Paschke, Carisa De Anda
<p><strong>Background: </strong>Pre-specified interim analyses allow for more timely evaluation of efficacy or futility, potentially accelerating decision-making on an investigational intervention. In such an analysis, the randomized, double-blind MOVe-OUT trial demonstrated superiority of molnupiravir over placebo for outpatient treatment of COVID-19 in high-risk patients. In the full analysis population, the point estimate of the treatment difference in the primary endpoint was notably lower than at the interim analysis. We conducted a comprehensive assessment to investigate this unexpected difference in treatment effect size, with the goal of informing future clinical research evaluating treatments for rapidly evolving infectious diseases.</p><p><strong>Methods: </strong>The modified intention-to-treat population of the MOVe-OUT trial was divided into an interim analysis cohort (i.e. all participants included in the interim analysis; prospectively defined) and a post-interim analysis cohort (i.e. all remaining participants; retrospectively defined). Baseline characteristics (including many well-established prognostic factors for disease progression), clinical outcomes, and virologic outcomes were retrospectively evaluated. The impact of changes in baseline characteristics over time was explored using logistic regression modeling and simulations.</p><p><strong>Results: </strong>Baseline characteristics were well-balanced between arms overall. However, between- and within-arm differences in known prognostic baseline factors (e.g. comorbidities, SARS-CoV-2 viral load, and anti-SARS-CoV-2 antibody status) were observed for the interim and post-interim analysis cohorts. For the individual factors, these differences were generally minor and otherwise not notable; as the trial progressed, however, these shifts in combination increasingly favored the placebo arm across most of the evaluated factors in the post-interim cohort. Model-based simulations confirmed that the reduction in effect size could be accounted for by these longitudinal trends toward a lower-risk study population among placebo participants. Infectivity and viral load data confirmed that molnupiravir's antiviral activity was consistent across both cohorts, which were heavily dominated by different viral clades (reflecting the rapid SARS-CoV-2 evolution).</p><p><strong>Discussion: </strong>The cumulative effect of randomly occurring minor differences in prognostic baseline characteristics within and between arms over time, rather than virologic factors such as reduced activity of molnupiravir against evolving variants, likely impacted the observed outcomes. Our results have broader implications for group sequential trials seeking to evaluate treatments for rapidly emerging pathogens. During dynamic epidemic or pandemic conditions, adaptive trials should be designed and interpreted especially carefully, considering that they will likely rapidly enroll a large post-interim overrun populat
{"title":"Impact of differences between interim and post-interim analysis populations on outcomes of a group sequential trial: Example of the MOVe-OUT study.","authors":"Yoseph Caraco, Matthew G Johnson, Joseph A Chiarappa, Brian M Maas, Julie A Stone, Matthew L Rizk, Mary Vesnesky, Julie M Strizki, Angela Williams-Diaz, Michelle L Brown, Patricia Carmelitano, Hong Wan, Alison Pedley, Akshita Chawla, Dominik J Wolf, Jay A Grobler, Amanda Paschke, Carisa De Anda","doi":"10.1177/17407745251313925","DOIUrl":"https://doi.org/10.1177/17407745251313925","url":null,"abstract":"<p><strong>Background: </strong>Pre-specified interim analyses allow for more timely evaluation of efficacy or futility, potentially accelerating decision-making on an investigational intervention. In such an analysis, the randomized, double-blind MOVe-OUT trial demonstrated superiority of molnupiravir over placebo for outpatient treatment of COVID-19 in high-risk patients. In the full analysis population, the point estimate of the treatment difference in the primary endpoint was notably lower than at the interim analysis. We conducted a comprehensive assessment to investigate this unexpected difference in treatment effect size, with the goal of informing future clinical research evaluating treatments for rapidly evolving infectious diseases.</p><p><strong>Methods: </strong>The modified intention-to-treat population of the MOVe-OUT trial was divided into an interim analysis cohort (i.e. all participants included in the interim analysis; prospectively defined) and a post-interim analysis cohort (i.e. all remaining participants; retrospectively defined). Baseline characteristics (including many well-established prognostic factors for disease progression), clinical outcomes, and virologic outcomes were retrospectively evaluated. The impact of changes in baseline characteristics over time was explored using logistic regression modeling and simulations.</p><p><strong>Results: </strong>Baseline characteristics were well-balanced between arms overall. However, between- and within-arm differences in known prognostic baseline factors (e.g. comorbidities, SARS-CoV-2 viral load, and anti-SARS-CoV-2 antibody status) were observed for the interim and post-interim analysis cohorts. For the individual factors, these differences were generally minor and otherwise not notable; as the trial progressed, however, these shifts in combination increasingly favored the placebo arm across most of the evaluated factors in the post-interim cohort. Model-based simulations confirmed that the reduction in effect size could be accounted for by these longitudinal trends toward a lower-risk study population among placebo participants. Infectivity and viral load data confirmed that molnupiravir's antiviral activity was consistent across both cohorts, which were heavily dominated by different viral clades (reflecting the rapid SARS-CoV-2 evolution).</p><p><strong>Discussion: </strong>The cumulative effect of randomly occurring minor differences in prognostic baseline characteristics within and between arms over time, rather than virologic factors such as reduced activity of molnupiravir against evolving variants, likely impacted the observed outcomes. Our results have broader implications for group sequential trials seeking to evaluate treatments for rapidly emerging pathogens. During dynamic epidemic or pandemic conditions, adaptive trials should be designed and interpreted especially carefully, considering that they will likely rapidly enroll a large post-interim overrun populat","PeriodicalId":10685,"journal":{"name":"Clinical Trials","volume":" ","pages":"17407745251313925"},"PeriodicalIF":2.2,"publicationDate":"2025-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143536668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-27DOI: 10.1177/17407745251320806
Nigel Markey, Ilyass El-Mansouri, Gaetan Rensonnet, Casper van Langen, Christoph Meier
Background/aims: Clinical trials require numerous documents to be written: Protocols, consent forms, clinical study reports, and many others. Large language models offer the potential to rapidly generate first-draft versions of these documents; however, there are concerns about the quality of their output. Here, we report an evaluation of how good large language models are at generating sections of one such document, clinical trial protocols.
Methods: Using an off-the-shelf large language model, we generated protocol sections for a broad range of diseases and clinical trial phases. Each of these document sections we assessed across four dimensions: Clinical thinking and logic; Transparency and references; Medical and clinical terminology; and Content relevance and suitability. To improve performance, we used the retrieval-augmented generation method to enhance the large language model with accurate up-to-date information, including regulatory guidance documents and data from ClinicalTrials.gov. Using this retrieval-augmented generation large language model, we regenerated the same protocol sections and assessed them across the same four dimensions.
Results: We find that the off-the-shelf large language model delivers reasonable results, especially when assessing content relevance and the correct use of medical and clinical terminology, with scores of over 80%. However, the off-the-shelf large language model shows limited performance in clinical thinking and logic and transparency and references, with assessment scores of ≈40% or less. The use of retrieval-augmented generation substantially improves the writing quality of the large language model, with clinical thinking and logic and transparency and references scores increasing to ≈80%. The retrieval-augmented generation method thus greatly improves the practical usability of large language models for clinical trial-related writing.
Discussion: Our results suggest that hybrid large language model architectures, such as the retrieval-augmented generation method we utilized, offer strong potential for clinical trial-related writing, including a wide variety of documents. This is potentially transformative, since it addresses several major bottlenecks of drug development.
{"title":"From RAGs to riches: Utilizing large language models to write documents for clinical trials.","authors":"Nigel Markey, Ilyass El-Mansouri, Gaetan Rensonnet, Casper van Langen, Christoph Meier","doi":"10.1177/17407745251320806","DOIUrl":"https://doi.org/10.1177/17407745251320806","url":null,"abstract":"<p><strong>Background/aims: </strong>Clinical trials require numerous documents to be written: Protocols, consent forms, clinical study reports, and many others. Large language models offer the potential to rapidly generate first-draft versions of these documents; however, there are concerns about the quality of their output. Here, we report an evaluation of how good large language models are at generating sections of one such document, clinical trial protocols.</p><p><strong>Methods: </strong>Using an off-the-shelf large language model, we generated protocol sections for a broad range of diseases and clinical trial phases. Each of these document sections we assessed across four dimensions: <i>Clinical thinking and logic; Transparency and references; Medical and clinical terminology</i>; and <i>Content relevance and suitability</i>. To improve performance, we used the retrieval-augmented generation method to enhance the large language model with accurate up-to-date information, including regulatory guidance documents and data from ClinicalTrials.gov. Using this retrieval-augmented generation large language model, we regenerated the same protocol sections and assessed them across the same four dimensions.</p><p><strong>Results: </strong>We find that the off-the-shelf large language model delivers reasonable results, especially when assessing <i>content relevance</i> and the <i>correct use of medical and clinical terminology</i>, with scores of over 80%. However, the off-the-shelf large language model shows limited performance in <i>clinical thinking and logic</i> and <i>transparency and references</i>, with assessment scores of ≈40% or less. The use of retrieval-augmented generation substantially improves the writing quality of the large language model, with <i>clinical thinking and logic</i> and <i>transparency and references</i> scores increasing to ≈80%. The retrieval-augmented generation method thus greatly improves the practical usability of large language models for clinical trial-related writing.</p><p><strong>Discussion: </strong>Our results suggest that hybrid large language model architectures, such as the retrieval-augmented generation method we utilized, offer strong potential for clinical trial-related writing, including a wide variety of documents. This is potentially transformative, since it addresses several major bottlenecks of drug development.</p>","PeriodicalId":10685,"journal":{"name":"Clinical Trials","volume":" ","pages":"17407745251320806"},"PeriodicalIF":2.2,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143514403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-11DOI: 10.1177/17407745241312635
S Faye Williamson, Svetlana V Tishkovskaya, Kevin J Wilson
Background/aims: Sample size determination for cluster randomised trials is challenging because it requires robust estimation of the intra-cluster correlation coefficient. Typically, the sample size is chosen to provide a certain level of power to reject the null hypothesis in a two-sample hypothesis test. This relies on the minimal clinically important difference and estimates for the overall standard deviation, the intra-cluster correlation coefficient and, if cluster sizes are assumed to be unequal, the coefficient of variation of the cluster size. Varying any of these parameters can have a strong effect on the required sample size. In particular, it is very sensitive to small differences in the intra-cluster correlation coefficient. A relevant intra-cluster correlation coefficient estimate is often not available, or the available estimate is imprecise due to being based on studies with low numbers of clusters. If the intra-cluster correlation coefficient value used in the power calculation is far from the unknown true value, this could lead to trials which are substantially over- or under-powered.
Methods: In this article, we propose a hybrid approach using Bayesian assurance to determine the sample size for a cluster randomised trial in combination with a frequentist analysis. Assurance is an alternative to traditional power, which incorporates the uncertainty on key parameters through a prior distribution. We suggest specifying prior distributions for the overall standard deviation, intra-cluster correlation coefficient and coefficient of variation of the cluster size, while still utilising the minimal clinically important difference. We illustrate the approach through the design of a cluster randomised trial in post-stroke incontinence and compare the results to those obtained from a standard power calculation.
Results: We show that assurance can be used to calculate a sample size based on an elicited prior distribution for the intra-cluster correlation coefficient, whereas a power calculation discards all of the information in the prior except for a single point estimate. Results show that this approach can avoid misspecifying sample sizes when the prior medians for the intra-cluster correlation coefficient are very similar, but the underlying prior distributions exhibit quite different behaviour. Incorporating uncertainty on all three of the nuisance parameters, rather than only on the intra-cluster correlation coefficient, does not notably increase the required sample size.
Conclusion: Assurance provides a better understanding of the probability of success of a trial given a particular minimal clinically important difference and can be used instead of power to produce sample sizes that are more robust to parameter uncertainty. This is especially useful when there is difficulty obtaining reliable parameter estimates.
{"title":"Hybrid sample size calculations for cluster randomised trials using assurance.","authors":"S Faye Williamson, Svetlana V Tishkovskaya, Kevin J Wilson","doi":"10.1177/17407745241312635","DOIUrl":"https://doi.org/10.1177/17407745241312635","url":null,"abstract":"<p><strong>Background/aims: </strong>Sample size determination for cluster randomised trials is challenging because it requires robust estimation of the intra-cluster correlation coefficient. Typically, the sample size is chosen to provide a certain level of power to reject the null hypothesis in a two-sample hypothesis test. This relies on the minimal clinically important difference and estimates for the overall standard deviation, the intra-cluster correlation coefficient and, if cluster sizes are assumed to be unequal, the coefficient of variation of the cluster size. Varying any of these parameters can have a strong effect on the required sample size. In particular, it is very sensitive to small differences in the intra-cluster correlation coefficient. A relevant intra-cluster correlation coefficient estimate is often not available, or the available estimate is imprecise due to being based on studies with low numbers of clusters. If the intra-cluster correlation coefficient value used in the power calculation is far from the unknown true value, this could lead to trials which are substantially over- or under-powered.</p><p><strong>Methods: </strong>In this article, we propose a hybrid approach using Bayesian assurance to determine the sample size for a cluster randomised trial in combination with a frequentist analysis. Assurance is an alternative to traditional power, which incorporates the uncertainty on key parameters through a prior distribution. We suggest specifying prior distributions for the overall standard deviation, intra-cluster correlation coefficient and coefficient of variation of the cluster size, while still utilising the minimal clinically important difference. We illustrate the approach through the design of a cluster randomised trial in post-stroke incontinence and compare the results to those obtained from a standard power calculation.</p><p><strong>Results: </strong>We show that assurance can be used to calculate a sample size based on an elicited prior distribution for the intra-cluster correlation coefficient, whereas a power calculation discards all of the information in the prior except for a single point estimate. Results show that this approach can avoid misspecifying sample sizes when the prior medians for the intra-cluster correlation coefficient are very similar, but the underlying prior distributions exhibit quite different behaviour. Incorporating uncertainty on all three of the nuisance parameters, rather than only on the intra-cluster correlation coefficient, does not notably increase the required sample size.</p><p><strong>Conclusion: </strong>Assurance provides a better understanding of the probability of success of a trial given a particular minimal clinically important difference and can be used instead of power to produce sample sizes that are more robust to parameter uncertainty. This is especially useful when there is difficulty obtaining reliable parameter estimates.</p>","PeriodicalId":10685,"journal":{"name":"Clinical Trials","volume":" ","pages":"17407745241312635"},"PeriodicalIF":2.2,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143398464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}