Systematic generation and analysis of counterfactuals for compound activity predictions using multi-task models

IF 3.597 Q2 Pharmacology, Toxicology and Pharmaceutics MedChemComm Pub Date : 2024-04-08 DOI:10.1039/D4MD00128A

Alec Lamens and Jürgen Bajorath

{"title":"Systematic generation and analysis of counterfactuals for compound activity predictions using multi-task models","authors":"Alec Lamens and Jürgen Bajorath","doi":"10.1039/D4MD00128A","DOIUrl":null,"url":null,"abstract":"<p >Most machine learning (ML) methods produce predictions that are hard or impossible to understand. The black box nature of predictive models obscures potential learning bias and makes it difficult to recognize and trace problems. Moreover, the inability to rationalize model decisions causes reluctance to accept predictions for experimental design. For ML, limited trust in predictions presents a substantial problem and continues to limit its impact in interdisciplinary research, including early-phase drug discovery. As a desirable remedy, approaches from explainable artificial intelligence (XAI) are increasingly applied to shed light on the ML black box and help to rationalize predictions. Among these is the concept of counterfactuals (CFs), which are best understood as test cases with small modifications yielding opposing prediction outcomes (such as different class labels in object classification). For ML applications in medicinal chemistry, for example, compound activity predictions, CFs are particularly intuitive because these hypothetical molecules enable immediate comparisons with actual test compounds that do not require expert ML knowledge and are accessible to practicing chemists. Such comparisons often reveal structural moieties in compounds that determine their predictions and can be further investigated. Herein, we adapt and extend a recently introduced concept for the systematic generation of molecular CFs to multi-task predictions of different classes of protein kinase inhibitors, analyze CFs in detail, rationalize the origins of CF formation in multi-task modeling, and present exemplary explanations of predictions.</p>","PeriodicalId":88,"journal":{"name":"MedChemComm","volume":" 5","pages":" 1547-1555"},"PeriodicalIF":3.5970,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MedChemComm","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2024/md/d4md00128a","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Pharmacology, Toxicology and Pharmaceutics","Score":null,"Total":0}

引用次数: 0

Abstract

Most machine learning (ML) methods produce predictions that are hard or impossible to understand. The black box nature of predictive models obscures potential learning bias and makes it difficult to recognize and trace problems. Moreover, the inability to rationalize model decisions causes reluctance to accept predictions for experimental design. For ML, limited trust in predictions presents a substantial problem and continues to limit its impact in interdisciplinary research, including early-phase drug discovery. As a desirable remedy, approaches from explainable artificial intelligence (XAI) are increasingly applied to shed light on the ML black box and help to rationalize predictions. Among these is the concept of counterfactuals (CFs), which are best understood as test cases with small modifications yielding opposing prediction outcomes (such as different class labels in object classification). For ML applications in medicinal chemistry, for example, compound activity predictions, CFs are particularly intuitive because these hypothetical molecules enable immediate comparisons with actual test compounds that do not require expert ML knowledge and are accessible to practicing chemists. Such comparisons often reveal structural moieties in compounds that determine their predictions and can be further investigated. Herein, we adapt and extend a recently introduced concept for the systematic generation of molecular CFs to multi-task predictions of different classes of protein kinase inhibitors, analyze CFs in detail, rationalize the origins of CF formation in multi-task modeling, and present exemplary explanations of predictions.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用多任务模型系统地生成和分析用于化合物活性预测的反事实数据

大多数机器学习 (ML) 方法产生的预测结果很难或根本无法理解。预测模型的黑箱性质掩盖了潜在的学习偏差，难以识别和追踪问题。此外，由于无法合理解释模型决策，人们也不愿意在实验设计中接受预测结果。对于 ML 而言，对预测的有限信任是一个重大问题，并将继续限制其在跨学科研究（包括早期药物发现）中的影响。作为一种可取的补救措施，可解释人工智能（XAI）方法越来越多地被用于揭示 ML 黑箱，并帮助使预测合理化。其中包括反事实（counterfactuals，CFs）的概念，反事实最好理解为对预测结果（如对象分类中的不同类标签）进行微小修改的测试案例。对于药物化学中的 ML 应用（例如化合物活性预测）来说，CFs 尤其直观，因为这些假定的分子可以与实际的测试化合物进行直接比较，而不需要 ML 专家的知识，实践化学家也可以进行比较。这种比较往往能揭示化合物中决定其预测结果的结构分子，并可对其进行进一步研究。在本文中，我们将最近引入的系统生成分子CF的概念调整并扩展到不同类别蛋白激酶抑制剂的多任务预测中，详细分析了CF，合理解释了多任务建模中CF形成的起源，并提出了预测的示范性解释。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助