CaXML: Chemistry-informed machine learning explains mutual changes between protein conformations and calcium ions in calcium-binding proteins using structural and topological features.
Pengzhi Zhang, Jules Nde, Yossi Eliaz, Nathaniel Jennings, Piotr Cieplak, Margaret S Cheung
{"title":"Ca<sup>X</sup>ML: Chemistry-informed machine learning explains mutual changes between protein conformations and calcium ions in calcium-binding proteins using structural and topological features.","authors":"Pengzhi Zhang, Jules Nde, Yossi Eliaz, Nathaniel Jennings, Piotr Cieplak, Margaret S Cheung","doi":"10.1002/pro.70023","DOIUrl":null,"url":null,"abstract":"<p><p>Proteins' flexibility is a feature in communicating changes in cell signaling instigated by binding with secondary messengers, such as calcium ions, associated with the coordination of muscle contraction, neurotransmitter release, and gene expression. When binding with the disordered parts of a protein, calcium ions must balance their charge states with the shape of calcium-binding proteins and their versatile pool of partners depending on the circumstances they transmit. Accurately determining the ionic charges of those ions is essential for understanding their role in such processes. However, it is unclear whether the limited experimental data available can be effectively used to train models to accurately predict the charges of calcium-binding protein variants. Here, we developed a chemistry-informed, machine-learning algorithm that implements a game theoretic approach to explain the output of a machine-learning model without the prerequisite of an excessively large database for high-performance prediction of atomic charges. We used the ab initio electronic structure data representing calcium ions and the structures of the disordered segments of calcium-binding peptides with surrounding water molecules to train several explainable models. Network theory was used to extract the topological features of atomic interactions in the structurally complex data dictated by the coordination chemistry of a calcium ion, a potent indicator of its charge state in protein. Our design created a computational tool of Ca<sup>X</sup>ML, which provided a framework of explainable machine learning model to annotate ionic charges of calcium ions in calcium-binding proteins in response to the chemical changes in an environment. Our framework will provide new insights into protein design for engineering functionality based on the limited size of scientific data in a genome space.</p>","PeriodicalId":20761,"journal":{"name":"Protein Science","volume":"34 2","pages":"e70023"},"PeriodicalIF":4.5000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11761698/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Protein Science","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/pro.70023","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Proteins' flexibility is a feature in communicating changes in cell signaling instigated by binding with secondary messengers, such as calcium ions, associated with the coordination of muscle contraction, neurotransmitter release, and gene expression. When binding with the disordered parts of a protein, calcium ions must balance their charge states with the shape of calcium-binding proteins and their versatile pool of partners depending on the circumstances they transmit. Accurately determining the ionic charges of those ions is essential for understanding their role in such processes. However, it is unclear whether the limited experimental data available can be effectively used to train models to accurately predict the charges of calcium-binding protein variants. Here, we developed a chemistry-informed, machine-learning algorithm that implements a game theoretic approach to explain the output of a machine-learning model without the prerequisite of an excessively large database for high-performance prediction of atomic charges. We used the ab initio electronic structure data representing calcium ions and the structures of the disordered segments of calcium-binding peptides with surrounding water molecules to train several explainable models. Network theory was used to extract the topological features of atomic interactions in the structurally complex data dictated by the coordination chemistry of a calcium ion, a potent indicator of its charge state in protein. Our design created a computational tool of CaXML, which provided a framework of explainable machine learning model to annotate ionic charges of calcium ions in calcium-binding proteins in response to the chemical changes in an environment. Our framework will provide new insights into protein design for engineering functionality based on the limited size of scientific data in a genome space.
期刊介绍:
Protein Science, the flagship journal of The Protein Society, is a publication that focuses on advancing fundamental knowledge in the field of protein molecules. The journal welcomes original reports and review articles that contribute to our understanding of protein function, structure, folding, design, and evolution.
Additionally, Protein Science encourages papers that explore the applications of protein science in various areas such as therapeutics, protein-based biomaterials, bionanotechnology, synthetic biology, and bioelectronics.
The journal accepts manuscript submissions in any suitable format for review, with the requirement of converting the manuscript to journal-style format only upon acceptance for publication.
Protein Science is indexed and abstracted in numerous databases, including the Agricultural & Environmental Science Database (ProQuest), Biological Science Database (ProQuest), CAS: Chemical Abstracts Service (ACS), Embase (Elsevier), Health & Medical Collection (ProQuest), Health Research Premium Collection (ProQuest), Materials Science & Engineering Database (ProQuest), MEDLINE/PubMed (NLM), Natural Science Collection (ProQuest), and SciTech Premium Collection (ProQuest).