{"title":"图神经网络中基于相互作用的归纳偏差:从三维结构增强蛋白质配体结合亲和力预测。","authors":"Ziduo Yang, Weihe Zhong, Qiujie Lv, Tiejun Dong, Guanxing Chen, Calvin Yu-Chian Chen","doi":"10.1109/TPAMI.2024.3400515","DOIUrl":null,"url":null,"abstract":"<p><p>Inductive bias in machine learning (ML) is the set of assumptions describing how a model makes predictions. Different ML-based methods for protein-ligand binding affinity (PLA) prediction have different inductive biases, leading to different levels of generalization capability and interpretability. Intuitively, the inductive bias of an ML-based model for PLA prediction should fit in with biological mechanisms relevant for binding to achieve good predictions with meaningful reasons. To this end, we propose an interaction-based inductive bias to restrict neural networks to functions relevant for binding with two assumptions: 1) A protein-ligand complex can be naturally expressed as a heterogeneous graph with covalent and non-covalent interactions; 2) The predicted PLA is the sum of pairwise atom-atom affinities determined by non-covalent interactions. The interaction-based inductive bias is embodied by an explainable heterogeneous interaction graph neural network (EHIGN) for explicitly modeling pairwise atom-atom interactions to predict PLA from 3D structures. Extensive experiments demonstrate that EHIGN achieves better generalization capability than other state-of-the-art ML-based baselines in PLA prediction and structure-based virtual screening. More importantly, comprehensive analyses of distance-affinity, pose-affinity, and substructure-affinity relations suggest that the interaction-based inductive bias can guide the model to learn atomic interactions that are consistent with physical reality. As a case study to demonstrate practical usefulness, our method is tested for predicting the efficacy of Nirmatrelvir against SARS-CoV-2 variants. EHIGN successfully recognizes the changes in the efficacy of Nirmatrelvir for different SARS-CoV-2 variants with meaningful reasons.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Interaction-Based Inductive Bias in Graph Neural Networks: Enhancing Protein-Ligand Binding Affinity Predictions From 3D Structures.\",\"authors\":\"Ziduo Yang, Weihe Zhong, Qiujie Lv, Tiejun Dong, Guanxing Chen, Calvin Yu-Chian Chen\",\"doi\":\"10.1109/TPAMI.2024.3400515\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Inductive bias in machine learning (ML) is the set of assumptions describing how a model makes predictions. Different ML-based methods for protein-ligand binding affinity (PLA) prediction have different inductive biases, leading to different levels of generalization capability and interpretability. Intuitively, the inductive bias of an ML-based model for PLA prediction should fit in with biological mechanisms relevant for binding to achieve good predictions with meaningful reasons. To this end, we propose an interaction-based inductive bias to restrict neural networks to functions relevant for binding with two assumptions: 1) A protein-ligand complex can be naturally expressed as a heterogeneous graph with covalent and non-covalent interactions; 2) The predicted PLA is the sum of pairwise atom-atom affinities determined by non-covalent interactions. The interaction-based inductive bias is embodied by an explainable heterogeneous interaction graph neural network (EHIGN) for explicitly modeling pairwise atom-atom interactions to predict PLA from 3D structures. Extensive experiments demonstrate that EHIGN achieves better generalization capability than other state-of-the-art ML-based baselines in PLA prediction and structure-based virtual screening. More importantly, comprehensive analyses of distance-affinity, pose-affinity, and substructure-affinity relations suggest that the interaction-based inductive bias can guide the model to learn atomic interactions that are consistent with physical reality. As a case study to demonstrate practical usefulness, our method is tested for predicting the efficacy of Nirmatrelvir against SARS-CoV-2 variants. EHIGN successfully recognizes the changes in the efficacy of Nirmatrelvir for different SARS-CoV-2 variants with meaningful reasons.</p>\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TPAMI.2024.3400515\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/11/6 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TPAMI.2024.3400515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/6 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
机器学习(ML)中的归纳偏差是描述模型如何进行预测的一组假设。不同的基于 ML 的蛋白质配体结合亲和力(PLA)预测方法具有不同的归纳偏差,从而导致不同程度的泛化能力和可解释性。从直观上讲,基于 ML 的蛋白质配体结合力预测模型的归纳偏差应符合与结合力相关的生物机制,从而获得有意义的良好预测。为此,我们提出了一种基于相互作用的归纳偏倚,将神经网络限制在与结合相关的功能上,并有两个假设:(1) 蛋白质配体复合物可以自然地表达为具有共价和非共价相互作用的异质图;(2) 预测的 PLA 是由非共价相互作用决定的成对原子-原子亲和力之和。可解释异质相互作用图神经网络(EHIGN)体现了基于相互作用的归纳偏差,该网络可明确模拟成对原子-原子相互作用,从而从三维结构中预测聚乳酸。广泛的实验证明,在聚乳酸预测和基于结构的虚拟筛选方面,EHIGN 比其他最先进的基于 ML 的基线方法具有更好的泛化能力。更重要的是,对距离-亲和力、姿势-亲和力和亚结构-亲和力关系的综合分析表明,基于相互作用的归纳偏差可以引导模型学习与物理现实相一致的原子相互作用。为证明该方法的实用性,我们以案例研究的形式测试了 Nirmatrelvir 对 SARS-CoV-2 变种的疗效预测。EHIGN成功地识别了Nirmatrelvir对不同SARS-CoV-2变体的疗效变化,并给出了有意义的理由。
Interaction-Based Inductive Bias in Graph Neural Networks: Enhancing Protein-Ligand Binding Affinity Predictions From 3D Structures.
Inductive bias in machine learning (ML) is the set of assumptions describing how a model makes predictions. Different ML-based methods for protein-ligand binding affinity (PLA) prediction have different inductive biases, leading to different levels of generalization capability and interpretability. Intuitively, the inductive bias of an ML-based model for PLA prediction should fit in with biological mechanisms relevant for binding to achieve good predictions with meaningful reasons. To this end, we propose an interaction-based inductive bias to restrict neural networks to functions relevant for binding with two assumptions: 1) A protein-ligand complex can be naturally expressed as a heterogeneous graph with covalent and non-covalent interactions; 2) The predicted PLA is the sum of pairwise atom-atom affinities determined by non-covalent interactions. The interaction-based inductive bias is embodied by an explainable heterogeneous interaction graph neural network (EHIGN) for explicitly modeling pairwise atom-atom interactions to predict PLA from 3D structures. Extensive experiments demonstrate that EHIGN achieves better generalization capability than other state-of-the-art ML-based baselines in PLA prediction and structure-based virtual screening. More importantly, comprehensive analyses of distance-affinity, pose-affinity, and substructure-affinity relations suggest that the interaction-based inductive bias can guide the model to learn atomic interactions that are consistent with physical reality. As a case study to demonstrate practical usefulness, our method is tested for predicting the efficacy of Nirmatrelvir against SARS-CoV-2 variants. EHIGN successfully recognizes the changes in the efficacy of Nirmatrelvir for different SARS-CoV-2 variants with meaningful reasons.