{"title":"PD-(L)1抑制剂试验中的统计噪音:揭示持久应答效应。","authors":"Michael Coory, Susan J Jordan","doi":"10.1016/j.jclinepi.2024.111589","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Programmed-death-1/ligand-1 inhibitors (PD-1/L1i's) have emerged as pivotal treatments for many cancers. A notable feature of this class of medicines is the dichotomous response pattern: A small (but clinically-relevant) percentage of patients (5% - 20%) benefit from deep and durable responses resembling functional cures (durable responders), while most patients experience only a modest or negligible response. Accurately predicting durable responders remains elusive due to the lack of a reliable biomarker. Another notable feature of these medicines is that different PD-1/L1's have obtained statistically significant results, leading to marketing approval, for some cancer indications, but not for others, with no discernible pattern. These puzzling inconsistencies have generated extensive discussions among oncologists. Proposed (but not entirely convincing) explanations include true underlying differences in efficacy for some types of cancer, but not others; or subtle differences in trial design.</p><p><strong>Objective: </strong>To investigate a less-explored hypothesis-the durable-responder effect: An initially unidentified group of durable responders generates more statistical noise than anticipated, leading to low-powered randomised controlled trials (RCTs) that report randomly variable results.</p><p><strong>Study design: </strong>Employing simulation, this investigation divides participants in PD-(L)1i RCTs into two groups: durable responders and patients with a more modest response. Drawing on published data for melanoma, lung and urothelial cancers, multiple pre-specified scenarios are replicated 50,000 times, systematically varying the durable-responder percentage from 5% to 20% and the modest-response hazard ratio for overall survival [HR(OS)] from 0.8 to 1.0. This allowed evaluation of the effect of durable responders on power, point estimates of the treatment effect for OS, and the probability of a misleading signal for harm.</p><p><strong>Results: </strong>When the treatment effect for the modest responders is similar to the comparator arm, statistical power remains below 80%, limiting the ability to reliably detect durable responders. Conversely, there is a material probability of obtaining a statistically significant result that exaggerates the treatment effect by chance. For instance, with an average HR(OS) of 0.93 (corresponding to 5% durable responders), statistically significant trials (7.2%) show an average HR(OS) of 0.77. Additionally, when 5% are durable responders, there is a 20% probability that the HR(OS) will exceed 1.0-suggesting potential harm, when none exists.</p><p><strong>Conclusion: </strong>This paper adds to the possible explanations for the puzzlingly inconsistent results from PD-(L)1i RCTs. Initially unidentified durable responders introduce features typical of imprecise, low-powered studies: a propensity for false-negative results; estimates of benefit that might not replicate; and misleading signals for harm.</p>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":" ","pages":"111589"},"PeriodicalIF":7.3000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Statistical noise in PD-(L)1 inhibitor trials: Unravelling the durable-responder effect.\",\"authors\":\"Michael Coory, Susan J Jordan\",\"doi\":\"10.1016/j.jclinepi.2024.111589\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Programmed-death-1/ligand-1 inhibitors (PD-1/L1i's) have emerged as pivotal treatments for many cancers. A notable feature of this class of medicines is the dichotomous response pattern: A small (but clinically-relevant) percentage of patients (5% - 20%) benefit from deep and durable responses resembling functional cures (durable responders), while most patients experience only a modest or negligible response. Accurately predicting durable responders remains elusive due to the lack of a reliable biomarker. Another notable feature of these medicines is that different PD-1/L1's have obtained statistically significant results, leading to marketing approval, for some cancer indications, but not for others, with no discernible pattern. These puzzling inconsistencies have generated extensive discussions among oncologists. Proposed (but not entirely convincing) explanations include true underlying differences in efficacy for some types of cancer, but not others; or subtle differences in trial design.</p><p><strong>Objective: </strong>To investigate a less-explored hypothesis-the durable-responder effect: An initially unidentified group of durable responders generates more statistical noise than anticipated, leading to low-powered randomised controlled trials (RCTs) that report randomly variable results.</p><p><strong>Study design: </strong>Employing simulation, this investigation divides participants in PD-(L)1i RCTs into two groups: durable responders and patients with a more modest response. Drawing on published data for melanoma, lung and urothelial cancers, multiple pre-specified scenarios are replicated 50,000 times, systematically varying the durable-responder percentage from 5% to 20% and the modest-response hazard ratio for overall survival [HR(OS)] from 0.8 to 1.0. This allowed evaluation of the effect of durable responders on power, point estimates of the treatment effect for OS, and the probability of a misleading signal for harm.</p><p><strong>Results: </strong>When the treatment effect for the modest responders is similar to the comparator arm, statistical power remains below 80%, limiting the ability to reliably detect durable responders. Conversely, there is a material probability of obtaining a statistically significant result that exaggerates the treatment effect by chance. For instance, with an average HR(OS) of 0.93 (corresponding to 5% durable responders), statistically significant trials (7.2%) show an average HR(OS) of 0.77. Additionally, when 5% are durable responders, there is a 20% probability that the HR(OS) will exceed 1.0-suggesting potential harm, when none exists.</p><p><strong>Conclusion: </strong>This paper adds to the possible explanations for the puzzlingly inconsistent results from PD-(L)1i RCTs. Initially unidentified durable responders introduce features typical of imprecise, low-powered studies: a propensity for false-negative results; estimates of benefit that might not replicate; and misleading signals for harm.</p>\",\"PeriodicalId\":51079,\"journal\":{\"name\":\"Journal of Clinical Epidemiology\",\"volume\":\" \",\"pages\":\"111589\"},\"PeriodicalIF\":7.3000,\"publicationDate\":\"2024-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Clinical Epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jclinepi.2024.111589\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jclinepi.2024.111589","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Statistical noise in PD-(L)1 inhibitor trials: Unravelling the durable-responder effect.
Background: Programmed-death-1/ligand-1 inhibitors (PD-1/L1i's) have emerged as pivotal treatments for many cancers. A notable feature of this class of medicines is the dichotomous response pattern: A small (but clinically-relevant) percentage of patients (5% - 20%) benefit from deep and durable responses resembling functional cures (durable responders), while most patients experience only a modest or negligible response. Accurately predicting durable responders remains elusive due to the lack of a reliable biomarker. Another notable feature of these medicines is that different PD-1/L1's have obtained statistically significant results, leading to marketing approval, for some cancer indications, but not for others, with no discernible pattern. These puzzling inconsistencies have generated extensive discussions among oncologists. Proposed (but not entirely convincing) explanations include true underlying differences in efficacy for some types of cancer, but not others; or subtle differences in trial design.
Objective: To investigate a less-explored hypothesis-the durable-responder effect: An initially unidentified group of durable responders generates more statistical noise than anticipated, leading to low-powered randomised controlled trials (RCTs) that report randomly variable results.
Study design: Employing simulation, this investigation divides participants in PD-(L)1i RCTs into two groups: durable responders and patients with a more modest response. Drawing on published data for melanoma, lung and urothelial cancers, multiple pre-specified scenarios are replicated 50,000 times, systematically varying the durable-responder percentage from 5% to 20% and the modest-response hazard ratio for overall survival [HR(OS)] from 0.8 to 1.0. This allowed evaluation of the effect of durable responders on power, point estimates of the treatment effect for OS, and the probability of a misleading signal for harm.
Results: When the treatment effect for the modest responders is similar to the comparator arm, statistical power remains below 80%, limiting the ability to reliably detect durable responders. Conversely, there is a material probability of obtaining a statistically significant result that exaggerates the treatment effect by chance. For instance, with an average HR(OS) of 0.93 (corresponding to 5% durable responders), statistically significant trials (7.2%) show an average HR(OS) of 0.77. Additionally, when 5% are durable responders, there is a 20% probability that the HR(OS) will exceed 1.0-suggesting potential harm, when none exists.
Conclusion: This paper adds to the possible explanations for the puzzlingly inconsistent results from PD-(L)1i RCTs. Initially unidentified durable responders introduce features typical of imprecise, low-powered studies: a propensity for false-negative results; estimates of benefit that might not replicate; and misleading signals for harm.
期刊介绍:
The Journal of Clinical Epidemiology strives to enhance the quality of clinical and patient-oriented healthcare research by advancing and applying innovative methods in conducting, presenting, synthesizing, disseminating, and translating research results into optimal clinical practice. Special emphasis is placed on training new generations of scientists and clinical practice leaders.