CaXML: Chemistry-informed machine learning explains mutual changes between protein conformations and calcium ions in calcium-binding proteins using structural and topological features.

IF 5.2 3区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Protein Science Pub Date : 2025-02-01 DOI:10.1002/pro.70023
Pengzhi Zhang, Jules Nde, Yossi Eliaz, Nathaniel Jennings, Piotr Cieplak, Margaret S Cheung
{"title":"Ca<sup>X</sup>ML: Chemistry-informed machine learning explains mutual changes between protein conformations and calcium ions in calcium-binding proteins using structural and topological features.","authors":"Pengzhi Zhang, Jules Nde, Yossi Eliaz, Nathaniel Jennings, Piotr Cieplak, Margaret S Cheung","doi":"10.1002/pro.70023","DOIUrl":null,"url":null,"abstract":"<p><p>Proteins' flexibility is a feature in communicating changes in cell signaling instigated by binding with secondary messengers, such as calcium ions, associated with the coordination of muscle contraction, neurotransmitter release, and gene expression. When binding with the disordered parts of a protein, calcium ions must balance their charge states with the shape of calcium-binding proteins and their versatile pool of partners depending on the circumstances they transmit. Accurately determining the ionic charges of those ions is essential for understanding their role in such processes. However, it is unclear whether the limited experimental data available can be effectively used to train models to accurately predict the charges of calcium-binding protein variants. Here, we developed a chemistry-informed, machine-learning algorithm that implements a game theoretic approach to explain the output of a machine-learning model without the prerequisite of an excessively large database for high-performance prediction of atomic charges. We used the ab initio electronic structure data representing calcium ions and the structures of the disordered segments of calcium-binding peptides with surrounding water molecules to train several explainable models. Network theory was used to extract the topological features of atomic interactions in the structurally complex data dictated by the coordination chemistry of a calcium ion, a potent indicator of its charge state in protein. Our design created a computational tool of Ca<sup>X</sup>ML, which provided a framework of explainable machine learning model to annotate ionic charges of calcium ions in calcium-binding proteins in response to the chemical changes in an environment. Our framework will provide new insights into protein design for engineering functionality based on the limited size of scientific data in a genome space.</p>","PeriodicalId":20761,"journal":{"name":"Protein Science","volume":"34 2","pages":"e70023"},"PeriodicalIF":5.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11761698/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Protein Science","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/pro.70023","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Proteins' flexibility is a feature in communicating changes in cell signaling instigated by binding with secondary messengers, such as calcium ions, associated with the coordination of muscle contraction, neurotransmitter release, and gene expression. When binding with the disordered parts of a protein, calcium ions must balance their charge states with the shape of calcium-binding proteins and their versatile pool of partners depending on the circumstances they transmit. Accurately determining the ionic charges of those ions is essential for understanding their role in such processes. However, it is unclear whether the limited experimental data available can be effectively used to train models to accurately predict the charges of calcium-binding protein variants. Here, we developed a chemistry-informed, machine-learning algorithm that implements a game theoretic approach to explain the output of a machine-learning model without the prerequisite of an excessively large database for high-performance prediction of atomic charges. We used the ab initio electronic structure data representing calcium ions and the structures of the disordered segments of calcium-binding peptides with surrounding water molecules to train several explainable models. Network theory was used to extract the topological features of atomic interactions in the structurally complex data dictated by the coordination chemistry of a calcium ion, a potent indicator of its charge state in protein. Our design created a computational tool of CaXML, which provided a framework of explainable machine learning model to annotate ionic charges of calcium ions in calcium-binding proteins in response to the chemical changes in an environment. Our framework will provide new insights into protein design for engineering functionality based on the limited size of scientific data in a genome space.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CaXML:化学信息机器学习利用结构和拓扑特征解释了钙结合蛋白中蛋白质构象和钙离子之间的相互变化。
蛋白质的灵活性是通过与钙离子等次级信使结合而引发的细胞信号传递变化的一个特征,这些次级信使与肌肉收缩、神经递质释放和基因表达的协调有关。当钙离子与蛋白质的无序部分结合时,钙离子必须根据它们传输的环境,平衡它们的电荷状态与钙结合蛋白的形状和它们的多功能伙伴池。准确测定这些离子的电荷对于理解它们在这一过程中的作用至关重要。然而,目前尚不清楚有限的实验数据是否可以有效地用于训练模型,以准确预测钙结合蛋白变异的电荷。在这里,我们开发了一种化学知识的机器学习算法,该算法实现了一种博弈论方法来解释机器学习模型的输出,而不需要一个超大的数据库来高性能地预测原子电荷。我们使用从头算电子结构数据表示钙离子和钙结合肽与周围水分子的无序段的结构来训练几个可解释的模型。网络理论用于提取由钙离子的配位化学决定的结构复杂数据中的原子相互作用的拓扑特征,钙离子是蛋白质中其电荷状态的有效指示器。我们的设计创建了一个CaXML计算工具,该工具提供了一个可解释的机器学习模型框架,用于注释钙结合蛋白中钙离子的离子电荷,以响应环境中的化学变化。我们的框架将为基于基因组空间中有限的科学数据的工程功能的蛋白质设计提供新的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Protein Science
Protein Science 生物-生化与分子生物学
CiteScore
12.40
自引率
1.20%
发文量
246
审稿时长
1 months
期刊介绍: Protein Science, the flagship journal of The Protein Society, is a publication that focuses on advancing fundamental knowledge in the field of protein molecules. The journal welcomes original reports and review articles that contribute to our understanding of protein function, structure, folding, design, and evolution. Additionally, Protein Science encourages papers that explore the applications of protein science in various areas such as therapeutics, protein-based biomaterials, bionanotechnology, synthetic biology, and bioelectronics. The journal accepts manuscript submissions in any suitable format for review, with the requirement of converting the manuscript to journal-style format only upon acceptance for publication. Protein Science is indexed and abstracted in numerous databases, including the Agricultural & Environmental Science Database (ProQuest), Biological Science Database (ProQuest), CAS: Chemical Abstracts Service (ACS), Embase (Elsevier), Health & Medical Collection (ProQuest), Health Research Premium Collection (ProQuest), Materials Science & Engineering Database (ProQuest), MEDLINE/PubMed (NLM), Natural Science Collection (ProQuest), and SciTech Premium Collection (ProQuest).
期刊最新文献
ApoJ regulates endothelial lipase activity and stability. Structural and morphological dynamics of "on-path" and "off-path" oligomers of human islet amyloid polypeptide. Unveiling nuclear localization signals in human arginine deiminase proteins. Functions of J-domain proteins in mitochondrial protein biogenesis. Ultrasonication-induced in vitro formation of transthyretin mature amyloid fibrils at neutral pH.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1