Clinical Advancement Forecasting

medRxiv - Genetic and Genomic Medicine Pub Date : 2024-08-05 DOI:10.1101/2024.08.02.24311422

Eric Czech, Rafal Wojdyla, Daniel Himmelstein, Daniel Frank, Nick Miller, Jack Milwid, Adam Kolom, Jeff Hammerbacher

{"title":"Clinical Advancement Forecasting","authors":"Eric Czech, Rafal Wojdyla, Daniel Himmelstein, Daniel Frank, Nick Miller, Jack Milwid, Adam Kolom, Jeff Hammerbacher","doi":"10.1101/2024.08.02.24311422","DOIUrl":null,"url":null,"abstract":"Choosing which drug targets to pursue for a given disease is one of the most impactful decisions made in the global development of new medicines. This study examines the extent to which the outcomes of clinical trials can be predicted based on a small set of longitudinal (temporally labeled) evidence and properties of drug targets and diseases. We demonstrate a novel statistical learning framework for identifying the top 2% of target-disease pairs that are as much as 4-5x more likely to advance beyond phase 2 trials. This framework is 1.5-2x more effective than an Open Targets composite score based on the same set of evidence. It is also 2x more effective than a common measure for genetic support that has been observed previously, as well as in this study, to confer a 2x higher likelihood of success. Utilizing a subset of our biomedical evidence base, non-negative linear models resulting from this framework can produce simple weighting schemes across various types of human, animal, and cell model genomic, transcriptomic, proteomic, and clinical evidence to identify previously undeveloped target-disease pairs poised for clinical success. In this study we further explore: i) how longitudinal treatment of evidence relates to leakage and reverse causality in biomedical research and how temporalized evidence can mitigate common forms of potential biases and inflation ii) the relative impact of different types of features on our predictions; and iii) an analysis of the space of currently undeveloped, tractable targets predicted with these methods to have the highest likelihood of clinical success. To ease reproduction and deployment, no data is used outside of Open Targets and the described methods require no expert knowledge, and can support expansion of lines of evidence to further improve performance.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"16 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Genetic and Genomic Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.02.24311422","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Choosing which drug targets to pursue for a given disease is one of the most impactful decisions made in the global development of new medicines. This study examines the extent to which the outcomes of clinical trials can be predicted based on a small set of longitudinal (temporally labeled) evidence and properties of drug targets and diseases. We demonstrate a novel statistical learning framework for identifying the top 2% of target-disease pairs that are as much as 4-5x more likely to advance beyond phase 2 trials. This framework is 1.5-2x more effective than an Open Targets composite score based on the same set of evidence. It is also 2x more effective than a common measure for genetic support that has been observed previously, as well as in this study, to confer a 2x higher likelihood of success. Utilizing a subset of our biomedical evidence base, non-negative linear models resulting from this framework can produce simple weighting schemes across various types of human, animal, and cell model genomic, transcriptomic, proteomic, and clinical evidence to identify previously undeveloped target-disease pairs poised for clinical success. In this study we further explore: i) how longitudinal treatment of evidence relates to leakage and reverse causality in biomedical research and how temporalized evidence can mitigate common forms of potential biases and inflation ii) the relative impact of different types of features on our predictions; and iii) an analysis of the space of currently undeveloped, tractable targets predicted with these methods to have the highest likelihood of clinical success. To ease reproduction and deployment, no data is used outside of Open Targets and the described methods require no expert knowledge, and can support expansion of lines of evidence to further improve performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

临床进展预测

选择治疗特定疾病的药物靶点是全球新药开发过程中最具影响力的决策之一。本研究探讨了在多大程度上可以根据一小部分纵向（时间标记）证据以及药物靶点和疾病的特性来预测临床试验的结果。我们展示了一种新颖的统计学习框架，可用于识别前 2% 的靶点-疾病配对，这些配对有高达 4-5 倍的可能性推进到第二阶段试验之后。该框架比基于相同证据集的 Open Targets 综合评分要有效 1.5-2 倍。同时，它比基因支持的常用衡量标准高出 2 倍，而基因支持的衡量标准在以前和本研究中都被观察到，它能使成功的可能性提高 2 倍。利用我们生物医学证据库的一个子集，该框架所产生的非负线性模型可以在各种类型的人类、动物和细胞模型的基因组、转录组、蛋白质组和临床证据中产生简单的加权方案，从而识别出以前未开发的、有望取得临床成功的靶向疾病配对。在这项研究中，我们将进一步探讨：i) 证据的纵向处理与生物医学研究中的泄漏和反向因果关系的关系，以及时间化证据如何减轻常见形式的潜在偏差和膨胀 ii) 不同类型的特征对我们预测的相对影响；iii) 对目前尚未开发的、可利用的靶点空间的分析，这些方法预测这些靶点最有可能获得临床成功。为了便于复制和部署，我们没有使用 Open Targets 以外的数据，所述方法也不需要专家知识，并可支持扩展证据线以进一步提高性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

medRxiv - Genetic and Genomic Medicine

自引率

0.00%

发文量