{"title":"人工智能在行动:重新定义药物发现和开发。","authors":"Anshul Kanakia, Mark Sale, Liang Zhao, Zhu Zhou","doi":"10.1111/cts.70149","DOIUrl":null,"url":null,"abstract":"<p>The 2024 Nobel Prize in Chemistry was awarded to David Baker, Demis Hassabis, and John Jumper for their groundbreaking work in using AI to predict protein structures and design functional proteins. The development of the AlphaFold model has solved a long-standing challenge in biology by accurately predicting the complex structures of proteins, which are crucial for understanding their function. AlphaFold enhances our ability to design new proteins with specific functions and accelerates drug discovery and development by providing detailed insights into protein behavior and interactions. The recognition of this work underscores the transformative potential of AI in the life sciences and its critical role in future drug research and development (R&D).</p><p>AI has revolutionized the drug discovery space in recent years, with applications ranging from highly accurate structure predictions of proteins [<span>1</span>], to the design and optimization of both small and large molecules [<span>2</span>]. Several large foundational models have been developed for encoding functional information of proteins in a powerful way to support the drug development pipeline [<span>3, 4</span>]. Figure 1 highlights the areas in the pipeline where AI now plays a significant role and is poised to disrupt traditional experimental techniques. The culmination of AI-driven discovery is de novo design, where the entire preclinical pipeline can be performed in silico, resulting in billions of dollars of R&D cost savings, translating to reduced costs of medications and higher clinical success rates via optimization of safer and more developable molecules showing strong efficacy for well-selected targets.</p><p>While de novo design is as-yet unproven, the success rate of the 21 AI-developed drugs that have completed Phase I trials as of December 2023 is 80%–90%, significantly higher than ~40% for traditional methods [<span>5</span>]. We continue to see an increase in the number of candidate drugs developed using AI enter clinical stages, and this trend is growing at an exponential rate—from 3 in 2016 to 17 in 2020 and 67 in 2023 [<span>5</span>].</p><p>The intersection between high-quality data access across life science modalities like imaging, multi-omics, DMRs, and very large protein repertoires, and recent advancements in the scaling and architecture of large deep learning models has led to an explosion in AI applications for healthcare. While some of this data is publicly available, much of it is proprietary and under the control of large pharmaceutical companies, partly due to regulatory and privacy concerns. Conversely, innovation in AI for drug discovery is being led by academic and industry research laboratories, often resulting in highly funded spin-off ventures like Genentech, Recursion, Absci, and more recently, Evolutionary Scale. Such AI-first life sciences companies have found success in synergistic partnerships with large pharmaceutical companies, thereby gaining access to the large proprietary datasets upon which to apply their AI expertise. Some of these partnerships have led to acquisitions such as the 2009 purchase of Genentech by Roche for approximately $46.8 billion, highlighting the value that AI internalization brings to large pharmaceutical companies.</p><p>The use of AI is poised to cover the full life cycle of a drug product, including drug discovery, drug development, and application assessment in a regulatory setting. Recent research from the Food and Drug Administration (FDA) included two distinct case studies. The first case exemplifies the use of conventional machine learning (ML) approaches through a project aimed at decoding kinase–adverse event associations for small molecule kinase inhibitors (SMKIs). By constructing a multi-domain dataset from 4638 patients in registrational trials of 16 FDA-approved SMKIs, ML models such as Random Survival Forests (RSF), Artificial Neural Networks (ANNs), and DeepHit were utilized to find potential associations between 442 kinases and 2145 adverse events. This information was made publicly accessible via an interactive web application, “Identification of Kinase-Specific Signal” (https://gongj.shinyapps.io/ml4ki). This platform aids experimentalists in identifying and verifying kinase-inhibitor adverse event pairs and serves as a precision-medicine tool to mitigate individual patient safety risks by forecasting clinical safety signals [<span>6</span>]. In general, the credibility of AI models in extrapolation and generalization heavily depends on the diversity and comprehensiveness of the training data. Future studies integrating richer datasets with detailed genomic, phenotypic, and demographic information could further improve the precision of such associations and help refine the applicability of these models to specific patient subgroups. For future research, while Multi-Input Neural Networks were not employed in this study, they represent a promising architecture for integrating heterogeneous datasets, such as kinase activity, demographic data, and clinical outcomes, into a unified predictive framework. Additionally, hybrid approaches combining neural networks with Markov Chains could be explored to capture sequential dependencies in disease progression and improve the robustness of predictions across diverse patient cohorts.</p><p>The second case study showcases the application of generative AI methods through the development of PharmBERT, a domain-specific large language model (LLM) for drug labels [<span>7</span>]. Leveraging the foundational BERT architecture, PharmBERT was pre-trained on textual data extracted from 138,924 raw drug labels sourced from DailyMed. This pre-training on domain-specific text significantly improved the model's performance in extracting pharmacokinetic information from drug labeling. PharmBERT demonstrated superior performance in tasks such as adverse drug reaction (ADR) detection and ADME (absorption, distribution, metabolism, and excretion) classification, surpassing other models like ClinicalBERT and BioBERT. This advancement underscores the potential of LLMs to enhance the efficiency of text-related regulatory work and improve the extraction of critical information from complex drug labels.</p><p>Together, these case studies illustrate the transformative impact of AI on drug development and regulatory science. Traditional AI methods provide robust frameworks for specific, structured data analyses, while generative AI methods offer expansive capabilities for handling unstructured data and developing generalized intelligence. Both approaches are crucial for advancing personalized medicine and optimizing drug development processes.</p><p>Figure 2 summarizes the results from two surveys during the “When AI Meets Drug Development” session at the 2024 American Society of Clinical Pharmacology and Therapeutics Annual Meeting. The first question evaluates views on AI's potential as a significant change in drug R&D. Notably, 80% of participants recognized AI's significant impact, while 12% were unconvinced. No participants were unaware of AI's application in drug R&D, suggesting a high level of awareness within the clinical pharmacology community. A small minority (6%) were uncertain about AI's current capabilities, and 2% selected an unspecified option. Regarding AI's future impact in the next 5–10 years, 45% highlighted a preference for its application in molecule design and optimization, followed by clinical trials and development (28%), target discovery and validation (20%), and preclinical testing and screening (7%). The results highlight the current familiarity, usage, and perceptions of AI among clinical pharmacology community, indicating a strong interest and optimism about AI's role in the future of drug development.</p><p>Looking ahead, the integration of AI in drug R&D is poised to accelerate, driven by advancements from leading tech companies. NVIDIA's powerful GPUs and AI frameworks are enabling faster and more efficient generative drug discovery processes. Google Health is leveraging its expertise in data analytics and ML to enhance predictive modeling and patient data analysis. Apple Health is contributing through its health data ecosystem, facilitating personalized medicine and real-time health monitoring. OpenAI's cutting-edge language models are revolutionizing the way researchers generate hypotheses and analyze scientific literature. These innovations collectively promise to streamline the drug development pipeline, reduce costs, and improve clinical outcomes, heralding a new era of precision medicine.</p><p>As global investment in AI for drug discovery accelerates, so does the expectation of improved outcomes for drug programs. As of 2024, there are no on-market medications that have been developed using an AI-first pipeline. Future drivers for AI, particularly in healthcare, need to show disruption to existing business processes and tangible financial gains. This could happen via the launch of the first AI-developed medication or AI-based clinical pipeline improvements that significantly reduce the lead time from first patient in to regulatory approval.</p><p>M.S. is an employee of Certara. A.K. is an employee of AstraZeneca. All other authors declared no competing interests for this work.</p>","PeriodicalId":50610,"journal":{"name":"Cts-Clinical and Translational Science","volume":"18 2","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cts.70149","citationCount":"0","resultStr":"{\"title\":\"AI In Action: Redefining Drug Discovery and Development\",\"authors\":\"Anshul Kanakia, Mark Sale, Liang Zhao, Zhu Zhou\",\"doi\":\"10.1111/cts.70149\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The 2024 Nobel Prize in Chemistry was awarded to David Baker, Demis Hassabis, and John Jumper for their groundbreaking work in using AI to predict protein structures and design functional proteins. The development of the AlphaFold model has solved a long-standing challenge in biology by accurately predicting the complex structures of proteins, which are crucial for understanding their function. AlphaFold enhances our ability to design new proteins with specific functions and accelerates drug discovery and development by providing detailed insights into protein behavior and interactions. The recognition of this work underscores the transformative potential of AI in the life sciences and its critical role in future drug research and development (R&D).</p><p>AI has revolutionized the drug discovery space in recent years, with applications ranging from highly accurate structure predictions of proteins [<span>1</span>], to the design and optimization of both small and large molecules [<span>2</span>]. Several large foundational models have been developed for encoding functional information of proteins in a powerful way to support the drug development pipeline [<span>3, 4</span>]. Figure 1 highlights the areas in the pipeline where AI now plays a significant role and is poised to disrupt traditional experimental techniques. The culmination of AI-driven discovery is de novo design, where the entire preclinical pipeline can be performed in silico, resulting in billions of dollars of R&D cost savings, translating to reduced costs of medications and higher clinical success rates via optimization of safer and more developable molecules showing strong efficacy for well-selected targets.</p><p>While de novo design is as-yet unproven, the success rate of the 21 AI-developed drugs that have completed Phase I trials as of December 2023 is 80%–90%, significantly higher than ~40% for traditional methods [<span>5</span>]. We continue to see an increase in the number of candidate drugs developed using AI enter clinical stages, and this trend is growing at an exponential rate—from 3 in 2016 to 17 in 2020 and 67 in 2023 [<span>5</span>].</p><p>The intersection between high-quality data access across life science modalities like imaging, multi-omics, DMRs, and very large protein repertoires, and recent advancements in the scaling and architecture of large deep learning models has led to an explosion in AI applications for healthcare. While some of this data is publicly available, much of it is proprietary and under the control of large pharmaceutical companies, partly due to regulatory and privacy concerns. Conversely, innovation in AI for drug discovery is being led by academic and industry research laboratories, often resulting in highly funded spin-off ventures like Genentech, Recursion, Absci, and more recently, Evolutionary Scale. Such AI-first life sciences companies have found success in synergistic partnerships with large pharmaceutical companies, thereby gaining access to the large proprietary datasets upon which to apply their AI expertise. Some of these partnerships have led to acquisitions such as the 2009 purchase of Genentech by Roche for approximately $46.8 billion, highlighting the value that AI internalization brings to large pharmaceutical companies.</p><p>The use of AI is poised to cover the full life cycle of a drug product, including drug discovery, drug development, and application assessment in a regulatory setting. Recent research from the Food and Drug Administration (FDA) included two distinct case studies. The first case exemplifies the use of conventional machine learning (ML) approaches through a project aimed at decoding kinase–adverse event associations for small molecule kinase inhibitors (SMKIs). By constructing a multi-domain dataset from 4638 patients in registrational trials of 16 FDA-approved SMKIs, ML models such as Random Survival Forests (RSF), Artificial Neural Networks (ANNs), and DeepHit were utilized to find potential associations between 442 kinases and 2145 adverse events. This information was made publicly accessible via an interactive web application, “Identification of Kinase-Specific Signal” (https://gongj.shinyapps.io/ml4ki). This platform aids experimentalists in identifying and verifying kinase-inhibitor adverse event pairs and serves as a precision-medicine tool to mitigate individual patient safety risks by forecasting clinical safety signals [<span>6</span>]. In general, the credibility of AI models in extrapolation and generalization heavily depends on the diversity and comprehensiveness of the training data. Future studies integrating richer datasets with detailed genomic, phenotypic, and demographic information could further improve the precision of such associations and help refine the applicability of these models to specific patient subgroups. For future research, while Multi-Input Neural Networks were not employed in this study, they represent a promising architecture for integrating heterogeneous datasets, such as kinase activity, demographic data, and clinical outcomes, into a unified predictive framework. Additionally, hybrid approaches combining neural networks with Markov Chains could be explored to capture sequential dependencies in disease progression and improve the robustness of predictions across diverse patient cohorts.</p><p>The second case study showcases the application of generative AI methods through the development of PharmBERT, a domain-specific large language model (LLM) for drug labels [<span>7</span>]. Leveraging the foundational BERT architecture, PharmBERT was pre-trained on textual data extracted from 138,924 raw drug labels sourced from DailyMed. This pre-training on domain-specific text significantly improved the model's performance in extracting pharmacokinetic information from drug labeling. PharmBERT demonstrated superior performance in tasks such as adverse drug reaction (ADR) detection and ADME (absorption, distribution, metabolism, and excretion) classification, surpassing other models like ClinicalBERT and BioBERT. This advancement underscores the potential of LLMs to enhance the efficiency of text-related regulatory work and improve the extraction of critical information from complex drug labels.</p><p>Together, these case studies illustrate the transformative impact of AI on drug development and regulatory science. Traditional AI methods provide robust frameworks for specific, structured data analyses, while generative AI methods offer expansive capabilities for handling unstructured data and developing generalized intelligence. Both approaches are crucial for advancing personalized medicine and optimizing drug development processes.</p><p>Figure 2 summarizes the results from two surveys during the “When AI Meets Drug Development” session at the 2024 American Society of Clinical Pharmacology and Therapeutics Annual Meeting. The first question evaluates views on AI's potential as a significant change in drug R&D. Notably, 80% of participants recognized AI's significant impact, while 12% were unconvinced. No participants were unaware of AI's application in drug R&D, suggesting a high level of awareness within the clinical pharmacology community. A small minority (6%) were uncertain about AI's current capabilities, and 2% selected an unspecified option. Regarding AI's future impact in the next 5–10 years, 45% highlighted a preference for its application in molecule design and optimization, followed by clinical trials and development (28%), target discovery and validation (20%), and preclinical testing and screening (7%). The results highlight the current familiarity, usage, and perceptions of AI among clinical pharmacology community, indicating a strong interest and optimism about AI's role in the future of drug development.</p><p>Looking ahead, the integration of AI in drug R&D is poised to accelerate, driven by advancements from leading tech companies. NVIDIA's powerful GPUs and AI frameworks are enabling faster and more efficient generative drug discovery processes. Google Health is leveraging its expertise in data analytics and ML to enhance predictive modeling and patient data analysis. Apple Health is contributing through its health data ecosystem, facilitating personalized medicine and real-time health monitoring. OpenAI's cutting-edge language models are revolutionizing the way researchers generate hypotheses and analyze scientific literature. These innovations collectively promise to streamline the drug development pipeline, reduce costs, and improve clinical outcomes, heralding a new era of precision medicine.</p><p>As global investment in AI for drug discovery accelerates, so does the expectation of improved outcomes for drug programs. As of 2024, there are no on-market medications that have been developed using an AI-first pipeline. Future drivers for AI, particularly in healthcare, need to show disruption to existing business processes and tangible financial gains. This could happen via the launch of the first AI-developed medication or AI-based clinical pipeline improvements that significantly reduce the lead time from first patient in to regulatory approval.</p><p>M.S. is an employee of Certara. A.K. is an employee of AstraZeneca. All other authors declared no competing interests for this work.</p>\",\"PeriodicalId\":50610,\"journal\":{\"name\":\"Cts-Clinical and Translational Science\",\"volume\":\"18 2\",\"pages\":\"\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-02-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cts.70149\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cts-Clinical and Translational Science\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://ascpt.onlinelibrary.wiley.com/doi/10.1111/cts.70149\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICINE, RESEARCH & EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cts-Clinical and Translational Science","FirstCategoryId":"3","ListUrlMain":"https://ascpt.onlinelibrary.wiley.com/doi/10.1111/cts.70149","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
摘要
近年来,人工智能彻底改变了药物发现领域,其应用范围从蛋白质[1]的高精度结构预测,到小分子和大分子[2]的设计和优化。已经开发了几个大型的基础模型,用于编码蛋白质的功能信息,以强有力的方式支持药物开发管道[3,4]。图1强调了人工智能现在发挥重要作用的领域,并准备颠覆传统的实验技术。人工智能驱动的发现的高潮是从头设计,整个临床前管道可以在计算机上进行,从而节省数十亿美元的研发成本,通过优化更安全、更可开发的分子,降低药物成本,提高临床成功率,对选定的目标显示出强大的功效。虽然从头设计尚未得到证实,但截至2023年12月,已完成I期试验的21种人工智能开发药物的成功率为80%-90%,显著高于传统方法的约40%。我们继续看到越来越多使用人工智能开发的候选药物进入临床阶段,这一趋势正以指数速度增长——从2016年的3个到2020年的17个,再到2023年的67个。生命科学模式(如成像、多组学、DMRs和非常大的蛋白质库)的高质量数据访问与大型深度学习模型的扩展和架构的最新进展之间的交叉,导致了医疗保健领域人工智能应用的爆炸式增长。虽然其中一些数据是公开的,但其中大部分是专有的,由大型制药公司控制,部分原因是出于监管和隐私方面的考虑。相反,人工智能在药物发现方面的创新是由学术和行业研究实验室主导的,通常会产生资金雄厚的衍生企业,如Genentech、Recursion、Absci,以及最近的Evolutionary Scale。这些人工智能优先的生命科学公司已经成功地与大型制药公司建立了协同合作伙伴关系,从而获得了大型专有数据集,并在此基础上应用其人工智能专业知识。其中一些合作伙伴关系导致了收购,如2009年罗氏以约468亿美元收购基因泰克,这突显了人工智能内部化给大型制药公司带来的价值。人工智能的使用将覆盖药品的整个生命周期,包括药物发现、药物开发和监管环境中的应用评估。美国食品和药物管理局(FDA)最近的研究包括两个不同的案例研究。第一个案例通过一个旨在解码小分子激酶抑制剂(SMKIs)的激酶不良事件关联的项目,举例说明了传统机器学习(ML)方法的使用。通过构建来自16种fda批准的SMKIs注册试验的4638例患者的多域数据集,利用随机生存森林(RSF)、人工神经网络(ANNs)和DeepHit等ML模型发现442种激酶与2145种不良事件之间的潜在关联。这些信息可以通过一个交互式网络应用程序“激酶特异性信号的识别”(https://gongj.shinyapps.io/ml4ki)公开获取。该平台帮助实验人员识别和验证激酶抑制剂不良事件对,并作为精准医学工具,通过预测临床安全信号[6]来减轻个体患者的安全风险。一般来说,人工智能模型外推和泛化的可信度很大程度上取决于训练数据的多样性和全面性。未来的研究将更丰富的数据集与详细的基因组、表型和人口统计信息结合起来,可以进一步提高这种关联的准确性,并有助于完善这些模型对特定患者亚组的适用性。对于未来的研究,虽然本研究没有使用多输入神经网络,但它们代表了一种很有前途的架构,可以将异构数据集(如激酶活性、人口统计数据和临床结果)整合到一个统一的预测框架中。此外,可以探索将神经网络与马尔可夫链相结合的混合方法,以捕获疾病进展中的顺序依赖性,并提高不同患者队列预测的鲁棒性。第二个案例研究通过开发PharmBERT展示了生成式人工智能方法的应用,PharmBERT是一个针对药物标签[7]的特定领域的大型语言模型(LLM)。利用基本的BERT架构,PharmBERT对来自DailyMed的138,924个原始药物标签的文本数据进行了预训练。这种针对特定领域文本的预训练显著提高了模型从药物标签中提取药代动力学信息的性能。 在药物不良反应(ADR)检测和ADME(吸收、分布、代谢和排泄)分类等任务上,PharmBERT的表现优于ClinicalBERT和BioBERT等其他模型。这一进展强调了法学硕士在提高文本相关监管工作效率和改进从复杂药物标签中提取关键信息方面的潜力。这些案例研究共同说明了人工智能对药物开发和监管科学的变革性影响。传统的人工智能方法为特定的结构化数据分析提供了强大的框架,而生成式人工智能方法为处理非结构化数据和开发广义智能提供了扩展能力。这两种方法对于推进个性化医疗和优化药物开发过程至关重要。图2总结了2024年美国临床药理学和治疗学学会年会上“When AI Meets Drug Development”部分的两项调查结果。第一个问题评估了人们对人工智能作为药物研发领域重大变革的潜力的看法。值得注意的是,80%的参与者认识到人工智能的重大影响,而12%的人不相信。没有参与者不知道人工智能在药物研发中的应用,这表明临床药理学界对人工智能的认识水平很高。少数人(6%)不确定AI当前的能力,2%的人选择了一个未指明的选项。对于人工智能在未来5-10年的影响,45%的受访者表示更倾向于将其应用于分子设计和优化,其次是临床试验和开发(28%)、靶点发现和验证(20%)以及临床前测试和筛选(7%)。结果突出了临床药理学界对人工智能的熟悉、使用和认知,表明了对人工智能在未来药物开发中的作用的强烈兴趣和乐观态度。展望未来,在领先科技公司的推动下,人工智能与药物研发的整合有望加速。NVIDIA强大的gpu和AI框架正在实现更快、更高效的生成药物发现过程。谷歌Health正在利用其在数据分析和机器学习方面的专业知识来增强预测建模和患者数据分析。苹果健康通过其健康数据生态系统做出贡献,促进个性化医疗和实时健康监测。OpenAI的尖端语言模型正在彻底改变研究人员提出假设和分析科学文献的方式。这些创新共同承诺简化药物开发管道,降低成本,改善临床结果,预示着精准医疗的新时代。随着全球对人工智能用于药物发现的投资加速,人们对药物项目改善结果的期望也在加快。截至2024年,还没有使用人工智能优先管道开发的上市药物。人工智能的未来驱动力,特别是在医疗保健领域,需要显示出对现有业务流程的颠覆和切实的财务收益。这可以通过推出第一个人工智能开发的药物或基于人工智能的临床管道改进来实现,这些改进可以显着缩短从第一位患者到监管部门批准的前置时间。是Certara的雇员。A.K.是阿斯利康公司的雇员。所有其他作者声明对这项工作没有竞争利益。
AI In Action: Redefining Drug Discovery and Development
The 2024 Nobel Prize in Chemistry was awarded to David Baker, Demis Hassabis, and John Jumper for their groundbreaking work in using AI to predict protein structures and design functional proteins. The development of the AlphaFold model has solved a long-standing challenge in biology by accurately predicting the complex structures of proteins, which are crucial for understanding their function. AlphaFold enhances our ability to design new proteins with specific functions and accelerates drug discovery and development by providing detailed insights into protein behavior and interactions. The recognition of this work underscores the transformative potential of AI in the life sciences and its critical role in future drug research and development (R&D).
AI has revolutionized the drug discovery space in recent years, with applications ranging from highly accurate structure predictions of proteins [1], to the design and optimization of both small and large molecules [2]. Several large foundational models have been developed for encoding functional information of proteins in a powerful way to support the drug development pipeline [3, 4]. Figure 1 highlights the areas in the pipeline where AI now plays a significant role and is poised to disrupt traditional experimental techniques. The culmination of AI-driven discovery is de novo design, where the entire preclinical pipeline can be performed in silico, resulting in billions of dollars of R&D cost savings, translating to reduced costs of medications and higher clinical success rates via optimization of safer and more developable molecules showing strong efficacy for well-selected targets.
While de novo design is as-yet unproven, the success rate of the 21 AI-developed drugs that have completed Phase I trials as of December 2023 is 80%–90%, significantly higher than ~40% for traditional methods [5]. We continue to see an increase in the number of candidate drugs developed using AI enter clinical stages, and this trend is growing at an exponential rate—from 3 in 2016 to 17 in 2020 and 67 in 2023 [5].
The intersection between high-quality data access across life science modalities like imaging, multi-omics, DMRs, and very large protein repertoires, and recent advancements in the scaling and architecture of large deep learning models has led to an explosion in AI applications for healthcare. While some of this data is publicly available, much of it is proprietary and under the control of large pharmaceutical companies, partly due to regulatory and privacy concerns. Conversely, innovation in AI for drug discovery is being led by academic and industry research laboratories, often resulting in highly funded spin-off ventures like Genentech, Recursion, Absci, and more recently, Evolutionary Scale. Such AI-first life sciences companies have found success in synergistic partnerships with large pharmaceutical companies, thereby gaining access to the large proprietary datasets upon which to apply their AI expertise. Some of these partnerships have led to acquisitions such as the 2009 purchase of Genentech by Roche for approximately $46.8 billion, highlighting the value that AI internalization brings to large pharmaceutical companies.
The use of AI is poised to cover the full life cycle of a drug product, including drug discovery, drug development, and application assessment in a regulatory setting. Recent research from the Food and Drug Administration (FDA) included two distinct case studies. The first case exemplifies the use of conventional machine learning (ML) approaches through a project aimed at decoding kinase–adverse event associations for small molecule kinase inhibitors (SMKIs). By constructing a multi-domain dataset from 4638 patients in registrational trials of 16 FDA-approved SMKIs, ML models such as Random Survival Forests (RSF), Artificial Neural Networks (ANNs), and DeepHit were utilized to find potential associations between 442 kinases and 2145 adverse events. This information was made publicly accessible via an interactive web application, “Identification of Kinase-Specific Signal” (https://gongj.shinyapps.io/ml4ki). This platform aids experimentalists in identifying and verifying kinase-inhibitor adverse event pairs and serves as a precision-medicine tool to mitigate individual patient safety risks by forecasting clinical safety signals [6]. In general, the credibility of AI models in extrapolation and generalization heavily depends on the diversity and comprehensiveness of the training data. Future studies integrating richer datasets with detailed genomic, phenotypic, and demographic information could further improve the precision of such associations and help refine the applicability of these models to specific patient subgroups. For future research, while Multi-Input Neural Networks were not employed in this study, they represent a promising architecture for integrating heterogeneous datasets, such as kinase activity, demographic data, and clinical outcomes, into a unified predictive framework. Additionally, hybrid approaches combining neural networks with Markov Chains could be explored to capture sequential dependencies in disease progression and improve the robustness of predictions across diverse patient cohorts.
The second case study showcases the application of generative AI methods through the development of PharmBERT, a domain-specific large language model (LLM) for drug labels [7]. Leveraging the foundational BERT architecture, PharmBERT was pre-trained on textual data extracted from 138,924 raw drug labels sourced from DailyMed. This pre-training on domain-specific text significantly improved the model's performance in extracting pharmacokinetic information from drug labeling. PharmBERT demonstrated superior performance in tasks such as adverse drug reaction (ADR) detection and ADME (absorption, distribution, metabolism, and excretion) classification, surpassing other models like ClinicalBERT and BioBERT. This advancement underscores the potential of LLMs to enhance the efficiency of text-related regulatory work and improve the extraction of critical information from complex drug labels.
Together, these case studies illustrate the transformative impact of AI on drug development and regulatory science. Traditional AI methods provide robust frameworks for specific, structured data analyses, while generative AI methods offer expansive capabilities for handling unstructured data and developing generalized intelligence. Both approaches are crucial for advancing personalized medicine and optimizing drug development processes.
Figure 2 summarizes the results from two surveys during the “When AI Meets Drug Development” session at the 2024 American Society of Clinical Pharmacology and Therapeutics Annual Meeting. The first question evaluates views on AI's potential as a significant change in drug R&D. Notably, 80% of participants recognized AI's significant impact, while 12% were unconvinced. No participants were unaware of AI's application in drug R&D, suggesting a high level of awareness within the clinical pharmacology community. A small minority (6%) were uncertain about AI's current capabilities, and 2% selected an unspecified option. Regarding AI's future impact in the next 5–10 years, 45% highlighted a preference for its application in molecule design and optimization, followed by clinical trials and development (28%), target discovery and validation (20%), and preclinical testing and screening (7%). The results highlight the current familiarity, usage, and perceptions of AI among clinical pharmacology community, indicating a strong interest and optimism about AI's role in the future of drug development.
Looking ahead, the integration of AI in drug R&D is poised to accelerate, driven by advancements from leading tech companies. NVIDIA's powerful GPUs and AI frameworks are enabling faster and more efficient generative drug discovery processes. Google Health is leveraging its expertise in data analytics and ML to enhance predictive modeling and patient data analysis. Apple Health is contributing through its health data ecosystem, facilitating personalized medicine and real-time health monitoring. OpenAI's cutting-edge language models are revolutionizing the way researchers generate hypotheses and analyze scientific literature. These innovations collectively promise to streamline the drug development pipeline, reduce costs, and improve clinical outcomes, heralding a new era of precision medicine.
As global investment in AI for drug discovery accelerates, so does the expectation of improved outcomes for drug programs. As of 2024, there are no on-market medications that have been developed using an AI-first pipeline. Future drivers for AI, particularly in healthcare, need to show disruption to existing business processes and tangible financial gains. This could happen via the launch of the first AI-developed medication or AI-based clinical pipeline improvements that significantly reduce the lead time from first patient in to regulatory approval.
M.S. is an employee of Certara. A.K. is an employee of AstraZeneca. All other authors declared no competing interests for this work.
期刊介绍:
Clinical and Translational Science (CTS), an official journal of the American Society for Clinical Pharmacology and Therapeutics, highlights original translational medicine research that helps bridge laboratory discoveries with the diagnosis and treatment of human disease. Translational medicine is a multi-faceted discipline with a focus on translational therapeutics. In a broad sense, translational medicine bridges across the discovery, development, regulation, and utilization spectrum. Research may appear as Full Articles, Brief Reports, Commentaries, Phase Forwards (clinical trials), Reviews, or Tutorials. CTS also includes invited didactic content that covers the connections between clinical pharmacology and translational medicine. Best-in-class methodologies and best practices are also welcomed as Tutorials. These additional features provide context for research articles and facilitate understanding for a wide array of individuals interested in clinical and translational science. CTS welcomes high quality, scientifically sound, original manuscripts focused on clinical pharmacology and translational science, including animal, in vitro, in silico, and clinical studies supporting the breadth of drug discovery, development, regulation and clinical use of both traditional drugs and innovative modalities.