PD-(L)1抑制剂试验中的统计噪音：揭示持久应答效应。

IF 5.2 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES Journal of Clinical Epidemiology Pub Date : 2025-01-01 Epub Date: 2024-11-05 DOI:10.1016/j.jclinepi.2024.111589

Michael Coory , Susan J. Jordan

{"title":"PD-(L)1抑制剂试验中的统计噪音：揭示持久应答效应。","authors":"Michael Coory , Susan J. Jordan","doi":"10.1016/j.jclinepi.2024.111589","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objectives</h3><div>Programmed-death-1/ligand-1 inhibitors (PD-1/L1is) have emerged as pivotal treatments for many cancers. A notable feature of this class of medicines is the dichotomous response pattern: A small (but clinically relevant) percentage of patients (5%–20%) benefit from deep and durable responses resembling functional cures (durable responders), while most patients experience only a modest or negligible response. Accurately predicting durable responders remains elusive due to the lack of a reliable biomarker. Another notable feature of these medicines is that different PD-1/L1 is have obtained statistically significant results, leading to marketing approval for some cancer indications but not for others, with no discernible pattern. These puzzling inconsistencies have generated extensive discussions among oncologists. Proposed (but not entirely convincing) explanations include true underlying differences in efficacy for some types of cancer but not others; or subtle differences in trial design. To investigate a less-explored hypothesis—the durable-responder effect: An initially unidentified group of durable responders generates more statistical noise than anticipated, leading to low-powered randomized controlled trials (RCTs) that report randomly variable results.</div></div><div><h3>Study Design</h3><div>Employing simulation, this investigation divides participants in PD-(L)1i RCTs into two groups: durable responders and patients with a more modest response. Drawing on published data for melanoma, lung and urothelial cancers, multiple prespecified scenarios are replicated 50,000 times, systematically varying the durable-responder percentage from 5% to 20% and the modest-response hazard ratio for overall survival [HR(OS)] from 0.8 to 1.0. This allowed evaluation of the effect of durable responders on power, point estimates of the treatment effect for OS, and the probability of a misleading signal for harm.</div></div><div><h3>Results</h3><div>When the treatment effect for the modest responders is similar to the comparator arm, statistical power remains below 80%, limiting the ability to reliably detect durable responders. Conversely, there is a material probability of obtaining a statistically significant result that exaggerates the treatment effect by chance. For instance, with an average HR(OS) of 0.93 (corresponding to 5% durable responders), statistically significant trials (7.2%) show an average HR(OS) of 0.77. Additionally, when 5% are durable responders, there is a 20% probability that the HR(OS) will exceed 1.0—suggesting potential harm when none exists.</div></div><div><h3>Conclusion</h3><div>This article adds to the possible explanations for the puzzlingly inconsistent results from PD-(L)1i RCTs. Initially, unidentified durable responders introduce features typical of imprecise, low-powered studies: a propensity for false-negative results; estimates of benefit that might not replicate; and misleading signals for harm.</div></div><div><h3>Plain Language Summary</h3><div>Programmed-death-1/ligand-1 (PD-1(L)1) inhibitors are crucial cancer treatments, with global spending expected to surpass $75 billion by 2026. Multiple versions of these medicines are available, all designed to boost the immune system to fight cancer. We would expect them all to work similarly, but clinical trials show mixed results—some seem effective for certain cancers but not others, without a clear pattern. This article uses simulations (virtual trials) to suggest that these inconsistent results may be due to chance, caused by a small group of patients who respond very well to the treatment. Larger trials or specific analysis methods could help reduce the chance effects and provide more robust data for clinician and patient decision-making.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"177 ","pages":"Article 111589"},"PeriodicalIF":5.2000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Statistical noise in PD-(L)1 inhibitor trials: unraveling the durable-responder effect\",\"authors\":\"Michael Coory , Susan J. Jordan\",\"doi\":\"10.1016/j.jclinepi.2024.111589\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background and Objectives</h3><div>Programmed-death-1/ligand-1 inhibitors (PD-1/L1is) have emerged as pivotal treatments for many cancers. A notable feature of this class of medicines is the dichotomous response pattern: A small (but clinically relevant) percentage of patients (5%–20%) benefit from deep and durable responses resembling functional cures (durable responders), while most patients experience only a modest or negligible response. Accurately predicting durable responders remains elusive due to the lack of a reliable biomarker. Another notable feature of these medicines is that different PD-1/L1 is have obtained statistically significant results, leading to marketing approval for some cancer indications but not for others, with no discernible pattern. These puzzling inconsistencies have generated extensive discussions among oncologists. Proposed (but not entirely convincing) explanations include true underlying differences in efficacy for some types of cancer but not others; or subtle differences in trial design. To investigate a less-explored hypothesis—the durable-responder effect: An initially unidentified group of durable responders generates more statistical noise than anticipated, leading to low-powered randomized controlled trials (RCTs) that report randomly variable results.</div></div><div><h3>Study Design</h3><div>Employing simulation, this investigation divides participants in PD-(L)1i RCTs into two groups: durable responders and patients with a more modest response. Drawing on published data for melanoma, lung and urothelial cancers, multiple prespecified scenarios are replicated 50,000 times, systematically varying the durable-responder percentage from 5% to 20% and the modest-response hazard ratio for overall survival [HR(OS)] from 0.8 to 1.0. This allowed evaluation of the effect of durable responders on power, point estimates of the treatment effect for OS, and the probability of a misleading signal for harm.</div></div><div><h3>Results</h3><div>When the treatment effect for the modest responders is similar to the comparator arm, statistical power remains below 80%, limiting the ability to reliably detect durable responders. Conversely, there is a material probability of obtaining a statistically significant result that exaggerates the treatment effect by chance. For instance, with an average HR(OS) of 0.93 (corresponding to 5% durable responders), statistically significant trials (7.2%) show an average HR(OS) of 0.77. Additionally, when 5% are durable responders, there is a 20% probability that the HR(OS) will exceed 1.0—suggesting potential harm when none exists.</div></div><div><h3>Conclusion</h3><div>This article adds to the possible explanations for the puzzlingly inconsistent results from PD-(L)1i RCTs. Initially, unidentified durable responders introduce features typical of imprecise, low-powered studies: a propensity for false-negative results; estimates of benefit that might not replicate; and misleading signals for harm.</div></div><div><h3>Plain Language Summary</h3><div>Programmed-death-1/ligand-1 (PD-1(L)1) inhibitors are crucial cancer treatments, with global spending expected to surpass $75 billion by 2026. Multiple versions of these medicines are available, all designed to boost the immune system to fight cancer. We would expect them all to work similarly, but clinical trials show mixed results—some seem effective for certain cancers but not others, without a clear pattern. This article uses simulations (virtual trials) to suggest that these inconsistent results may be due to chance, caused by a small group of patients who respond very well to the treatment. Larger trials or specific analysis methods could help reduce the chance effects and provide more robust data for clinician and patient decision-making.</div></div>\",\"PeriodicalId\":51079,\"journal\":{\"name\":\"Journal of Clinical Epidemiology\",\"volume\":\"177 \",\"pages\":\"Article 111589\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Clinical Epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0895435624003457\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/11/5 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0895435624003457","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/5 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

背景：程序性死亡-1/配体-1抑制剂（PD-1/L1i）已成为治疗多种癌症的关键药物。这类药物的一个显著特点是二分反应模式：一小部分（但与临床相关）患者（5% - 20%）可从类似功能性治愈的深度和持久反应中获益（持久反应者），而大多数患者仅有轻微或可忽略不计的反应。由于缺乏可靠的生物标志物，准确预测持久应答者仍是一个难题。这些药物的另一个显著特点是，不同的 PD-1/L1 药物在某些癌症适应症上取得了有统计学意义的结果，从而获得了上市许可，但在另一些适应症上却没有，没有明显的规律可循。这些令人费解的不一致引起了肿瘤学家的广泛讨论。提出的解释（但并不完全令人信服）包括：某些类型癌症的疗效存在真正的潜在差异，而其他类型则没有；或者试验设计存在微妙差异：研究一个较少探讨的假设--持久应答效应：最初未被发现的持久应答者群体会产生比预期更多的统计噪声，导致低效随机对照试验（RCT）报告的结果随机变化：研究设计：本研究通过模拟，将 PD-(L)1i RCT 的参与者分为两组：持久应答者和应答较弱的患者。根据已公布的黑色素瘤、肺癌和尿道癌数据，对多个预先指定的情景进行了 50,000 次重复，系统地将持久应答者的比例从 5% 调整到 20%，将总生存期的中度应答危险比[HR(OS)]从 0.8 调整到 1.0。这样就可以评估持久应答者对疗效的影响、OS治疗效果的点估计以及危害信号误导的概率：结果：当适度应答者的治疗效果与对照组相似时，统计功率仍低于 80%，从而限制了可靠检测持久应答者的能力。相反，如果偶然夸大了治疗效果，则很有可能获得具有统计学意义的结果。例如，平均 HR(OS)为 0.93（对应 5%的持久应答者），具有统计学意义的试验（7.2%）显示平均 HR(OS) 为 0.77。此外，当5%为持久应答者时，HR(OS)超过1.0的概率为20%--这表明存在潜在危害，但实际上并不存在：本文为 PD-(L)1i RCT 令人费解的不一致结果提供了更多可能的解释。最初未被发现的持久应答者带来了不精确、低效研究的典型特征：假阴性结果的倾向；可能无法复制的获益估计；以及误导性的危害信号。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Statistical noise in PD-(L)1 inhibitor trials: unraveling the durable-responder effect

Background and Objectives

Programmed-death-1/ligand-1 inhibitors (PD-1/L1is) have emerged as pivotal treatments for many cancers. A notable feature of this class of medicines is the dichotomous response pattern: A small (but clinically relevant) percentage of patients (5%–20%) benefit from deep and durable responses resembling functional cures (durable responders), while most patients experience only a modest or negligible response. Accurately predicting durable responders remains elusive due to the lack of a reliable biomarker. Another notable feature of these medicines is that different PD-1/L1 is have obtained statistically significant results, leading to marketing approval for some cancer indications but not for others, with no discernible pattern. These puzzling inconsistencies have generated extensive discussions among oncologists. Proposed (but not entirely convincing) explanations include true underlying differences in efficacy for some types of cancer but not others; or subtle differences in trial design. To investigate a less-explored hypothesis—the durable-responder effect: An initially unidentified group of durable responders generates more statistical noise than anticipated, leading to low-powered randomized controlled trials (RCTs) that report randomly variable results.

Study Design

Employing simulation, this investigation divides participants in PD-(L)1i RCTs into two groups: durable responders and patients with a more modest response. Drawing on published data for melanoma, lung and urothelial cancers, multiple prespecified scenarios are replicated 50,000 times, systematically varying the durable-responder percentage from 5% to 20% and the modest-response hazard ratio for overall survival [HR(OS)] from 0.8 to 1.0. This allowed evaluation of the effect of durable responders on power, point estimates of the treatment effect for OS, and the probability of a misleading signal for harm.

Results

When the treatment effect for the modest responders is similar to the comparator arm, statistical power remains below 80%, limiting the ability to reliably detect durable responders. Conversely, there is a material probability of obtaining a statistically significant result that exaggerates the treatment effect by chance. For instance, with an average HR(OS) of 0.93 (corresponding to 5% durable responders), statistically significant trials (7.2%) show an average HR(OS) of 0.77. Additionally, when 5% are durable responders, there is a 20% probability that the HR(OS) will exceed 1.0—suggesting potential harm when none exists.

Conclusion

This article adds to the possible explanations for the puzzlingly inconsistent results from PD-(L)1i RCTs. Initially, unidentified durable responders introduce features typical of imprecise, low-powered studies: a propensity for false-negative results; estimates of benefit that might not replicate; and misleading signals for harm.

Plain Language Summary

Programmed-death-1/ligand-1 (PD-1(L)1) inhibitors are crucial cancer treatments, with global spending expected to surpass $75 billion by 2026. Multiple versions of these medicines are available, all designed to boost the immune system to fight cancer. We would expect them all to work similarly, but clinical trials show mixed results—some seem effective for certain cancers but not others, without a clear pattern. This article uses simulations (virtual trials) to suggest that these inconsistent results may be due to chance, caused by a small group of patients who respond very well to the treatment. Larger trials or specific analysis methods could help reduce the chance effects and provide more robust data for clinician and patient decision-making.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Clinical Epidemiology 医学-公共卫生、环境卫生与职业卫生

CiteScore

12.00

自引率

6.90%

发文量

320

审稿时长

44 days

期刊介绍： The Journal of Clinical Epidemiology strives to enhance the quality of clinical and patient-oriented healthcare research by advancing and applying innovative methods in conducting, presenting, synthesizing, disseminating, and translating research results into optimal clinical practice. Special emphasis is placed on training new generations of scientists and clinical practice leaders.