Pub Date : 2024-10-25DOI: 10.1038/s43588-024-00716-2
Yunxin Xu, Di Liu, Haipeng Gong
Accurate prediction of protein mutation effects is of great importance in protein engineering and design. Here we propose GeoStab-suite, a suite of three geometric learning-based models—GeoFitness, GeoDDG and GeoDTm—for the prediction of fitness score, ΔΔG and ΔTm of a protein upon mutations, respectively. GeoFitness engages a specialized loss function to allow supervised training of a unified model using the large amount of multi-labeled fitness data in the deep mutational scanning database. To further improve the downstream tasks of ΔΔG and ΔTm prediction, the encoder of GeoFitness is reutilized as a pre-trained module in GeoDDG and GeoDTm to overcome the challenge of lacking sufficient labeled data. This pre-training strategy, in combination with data expansion, markedly improves model performance and generalizability. In the benchmark test, GeoDDG and GeoDTm outperform the other state-of-the-art methods by at least 30% and 70%, respectively, in terms of the Spearman correlation coefficient. In this study, the authors propose a strategy to train a unified model to learn the general mutational effects based on multi-labeled deep mutational scanning (DMS) data, and then reutilize this pre-trained model to improve the downstream protein stability prediction tasks.
{"title":"Improving the prediction of protein stability changes upon mutations by geometric learning and a pre-training strategy","authors":"Yunxin Xu, Di Liu, Haipeng Gong","doi":"10.1038/s43588-024-00716-2","DOIUrl":"10.1038/s43588-024-00716-2","url":null,"abstract":"Accurate prediction of protein mutation effects is of great importance in protein engineering and design. Here we propose GeoStab-suite, a suite of three geometric learning-based models—GeoFitness, GeoDDG and GeoDTm—for the prediction of fitness score, ΔΔG and ΔTm of a protein upon mutations, respectively. GeoFitness engages a specialized loss function to allow supervised training of a unified model using the large amount of multi-labeled fitness data in the deep mutational scanning database. To further improve the downstream tasks of ΔΔG and ΔTm prediction, the encoder of GeoFitness is reutilized as a pre-trained module in GeoDDG and GeoDTm to overcome the challenge of lacking sufficient labeled data. This pre-training strategy, in combination with data expansion, markedly improves model performance and generalizability. In the benchmark test, GeoDDG and GeoDTm outperform the other state-of-the-art methods by at least 30% and 70%, respectively, in terms of the Spearman correlation coefficient. In this study, the authors propose a strategy to train a unified model to learn the general mutational effects based on multi-labeled deep mutational scanning (DMS) data, and then reutilize this pre-trained model to improve the downstream protein stability prediction tasks.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 11","pages":"840-850"},"PeriodicalIF":12.0,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-23DOI: 10.1038/s43588-024-00723-3
He Li, Zun Wang, Nianlong Zou, Meng Ye, Runzhang Xu, Xiaoxun Gong, Wenhui Duan, Yong Xu
{"title":"Author Correction: Deep-learning density functional theory Hamiltonian for efficient ab initio electronic-structure calculation","authors":"He Li, Zun Wang, Nianlong Zou, Meng Ye, Runzhang Xu, Xiaoxun Gong, Wenhui Duan, Yong Xu","doi":"10.1038/s43588-024-00723-3","DOIUrl":"10.1038/s43588-024-00723-3","url":null,"abstract":"","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 11","pages":"876-876"},"PeriodicalIF":12.0,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s43588-024-00723-3.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-23DOI: 10.1038/s43588-024-00704-6
Zachary Fralish, Daniel Reker
Active machine learning is employed in academia and industry to support drug discovery. A recent study unravels the factors that influence a deep learning models’ ability to guide iterative discovery.
{"title":"Taking a deep dive with active learning for drug discovery","authors":"Zachary Fralish, Daniel Reker","doi":"10.1038/s43588-024-00704-6","DOIUrl":"10.1038/s43588-024-00704-6","url":null,"abstract":"Active machine learning is employed in academia and industry to support drug discovery. A recent study unravels the factors that influence a deep learning models’ ability to guide iterative discovery.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 10","pages":"727-728"},"PeriodicalIF":12.0,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-22DOI: 10.1038/s43588-024-00708-2
Christina L. Vizcarra, Ryan F. Trainor, Ashley Ringer McDonald, Chris T. Richardson, Davit Potoyan, Jessica A. Nash, Britt Lundgren, Tyler Luchko, Glen M. Hocky, Jonathan J. Foley IV, Timothy J. Atherton, Grace Y. Stokes
{"title":"An interdisciplinary effort to integrate coding into science courses","authors":"Christina L. Vizcarra, Ryan F. Trainor, Ashley Ringer McDonald, Chris T. Richardson, Davit Potoyan, Jessica A. Nash, Britt Lundgren, Tyler Luchko, Glen M. Hocky, Jonathan J. Foley IV, Timothy J. Atherton, Grace Y. Stokes","doi":"10.1038/s43588-024-00708-2","DOIUrl":"10.1038/s43588-024-00708-2","url":null,"abstract":"","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 11","pages":"803-804"},"PeriodicalIF":12.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-15DOI: 10.1038/s43588-024-00699-0
Guy Durant, Fergus Boyles, Kristian Birchall, Charlotte M. Deane
Many studies have prophesied that the integration of machine learning techniques into small-molecule therapeutics development will help to deliver a true leap forward in drug discovery. However, increasingly advanced algorithms and novel architectures have not always yielded substantial improvements in results. In this Perspective, we propose that a greater focus on the data for training and benchmarking these models is more likely to drive future improvement, and explore avenues for future research and strategies to address these data challenges. The application of machine learning techniques to small-molecule drug discovery has not yet yielded a true leap forward in the field. This Perspective discusses how a renewed focus on data and validation could help unlock machine learning’s potential.
{"title":"The future of machine learning for small-molecule drug discovery will be driven by data","authors":"Guy Durant, Fergus Boyles, Kristian Birchall, Charlotte M. Deane","doi":"10.1038/s43588-024-00699-0","DOIUrl":"10.1038/s43588-024-00699-0","url":null,"abstract":"Many studies have prophesied that the integration of machine learning techniques into small-molecule therapeutics development will help to deliver a true leap forward in drug discovery. However, increasingly advanced algorithms and novel architectures have not always yielded substantial improvements in results. In this Perspective, we propose that a greater focus on the data for training and benchmarking these models is more likely to drive future improvement, and explore avenues for future research and strategies to address these data challenges. The application of machine learning techniques to small-molecule drug discovery has not yet yielded a true leap forward in the field. This Perspective discusses how a renewed focus on data and validation could help unlock machine learning’s potential.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 10","pages":"735-743"},"PeriodicalIF":12.0,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-14DOI: 10.1038/s43588-024-00706-4
Stefan Peidli
A recent study proposes a strategy for the prediction of genetic perturbation outcomes by breaking it down into three subtasks: identifying differentially expressed genes, determining expression change directions, and estimating gene expression magnitudes.
{"title":"The decomposition of perturbation modeling","authors":"Stefan Peidli","doi":"10.1038/s43588-024-00706-4","DOIUrl":"10.1038/s43588-024-00706-4","url":null,"abstract":"A recent study proposes a strategy for the prediction of genetic perturbation outcomes by breaking it down into three subtasks: identifying differentially expressed genes, determining expression change directions, and estimating gene expression magnitudes.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 10","pages":"725-726"},"PeriodicalIF":12.0,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-11DOI: 10.1038/s43588-024-00705-5
Adrian Pacheco-Pozo, Diego Krapf
A deep learning algorithm is presented to classify single-particle tracking trajectories into theoretical models of anomalous diffusion and detect if the trajectory is related to a model not originally found within the training dataset.
{"title":"Effectively detecting anomalous diffusion via deep learning","authors":"Adrian Pacheco-Pozo, Diego Krapf","doi":"10.1038/s43588-024-00705-5","DOIUrl":"10.1038/s43588-024-00705-5","url":null,"abstract":"A deep learning algorithm is presented to classify single-particle tracking trajectories into theoretical models of anomalous diffusion and detect if the trajectory is related to a model not originally found within the training dataset.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 10","pages":"731-732"},"PeriodicalIF":12.0,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142407284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-11DOI: 10.1038/s43588-024-00703-7
Xiaochen Feng, Hao Sha, Yongbing Zhang, Yaoquan Su, Shuai Liu, Yuan Jiang, Shangguo Hou, Sanyang Han, Xiangyang Ji
Anomalous diffusion plays a crucial rule in understanding molecular-level dynamics by offering valuable insights into molecular interactions, mobility states and the physical properties of systems across both biological and materials sciences. Deep-learning techniques have recently outperformed conventional statistical methods in anomalous diffusion recognition. However, deep-learning networks are typically trained by data with limited distribution, which inevitably fail to recognize unknown diffusion models and misinterpret dynamics when confronted with out-of-distribution (OOD) scenarios. In this work, we present a general framework for evaluating deep-learning-based OOD dynamics-detection methods. We further develop a baseline approach that achieves robust OOD dynamics detection as well as accurate recognition of in-distribution anomalous diffusion. We demonstrate that this method enables a reliable characterization of complex behaviors across a wide range of experimentally diverse systems, including nicotinic acetylcholine receptors in membranes, fluorescent beads in dextran solutions and silver nanoparticles undergoing active endocytosis. This work introduces a framework that enhances deep learning for anomalous diffusion, enabling reliable detection of out-of-distribution dynamics and characterization of complex behaviors across diverse systems.
{"title":"Reliable deep learning in anomalous diffusion against out-of-distribution dynamics","authors":"Xiaochen Feng, Hao Sha, Yongbing Zhang, Yaoquan Su, Shuai Liu, Yuan Jiang, Shangguo Hou, Sanyang Han, Xiangyang Ji","doi":"10.1038/s43588-024-00703-7","DOIUrl":"10.1038/s43588-024-00703-7","url":null,"abstract":"Anomalous diffusion plays a crucial rule in understanding molecular-level dynamics by offering valuable insights into molecular interactions, mobility states and the physical properties of systems across both biological and materials sciences. Deep-learning techniques have recently outperformed conventional statistical methods in anomalous diffusion recognition. However, deep-learning networks are typically trained by data with limited distribution, which inevitably fail to recognize unknown diffusion models and misinterpret dynamics when confronted with out-of-distribution (OOD) scenarios. In this work, we present a general framework for evaluating deep-learning-based OOD dynamics-detection methods. We further develop a baseline approach that achieves robust OOD dynamics detection as well as accurate recognition of in-distribution anomalous diffusion. We demonstrate that this method enables a reliable characterization of complex behaviors across a wide range of experimentally diverse systems, including nicotinic acetylcholine receptors in membranes, fluorescent beads in dextran solutions and silver nanoparticles undergoing active endocytosis. This work introduces a framework that enhances deep learning for anomalous diffusion, enabling reliable detection of out-of-distribution dynamics and characterization of complex behaviors across diverse systems.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 10","pages":"761-772"},"PeriodicalIF":12.0,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142407285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}