Pub Date : 2025-12-09DOI: 10.1038/s43588-025-00906-6
Erzhuo Shao, Yifang Wang, Yifan Qian, Zhenyu Pan, Han Liu, Dashun Wang
We introduce SciSciGPT, an open-source, prototype artificial intelligence (AI) collaborator that uses the domain of science of science as a testbed to explore the potential of large language model-powered research tools. SciSciGPT automates complex workflows, supports diverse analytical approaches, accelerates research prototyping and iteration and facilitates reproducibility. Through case studies, we demonstrate its ability to streamline a wide range of empirical and analytical research tasks while highlighting its broader potential to advance research. We further propose a large language model agent capability maturity model for human-AI collaboration, envisioning a roadmap to further improve and expand upon frameworks such as SciSciGPT. As AI capabilities continue to evolve, frameworks such as SciSciGPT may play increasingly pivotal roles in scientific research and discovery. At the same time, these new advances also raise critical challenges, from ensuring transparency and ethical use to balancing human and AI contributions. Addressing these issues may shape the future of scientific inquiry and inform how we train the next generation of scientists to thrive in an increasingly AI-integrated research ecosystem.
{"title":"SciSciGPT: advancing human-AI collaboration in the science of science.","authors":"Erzhuo Shao, Yifang Wang, Yifan Qian, Zhenyu Pan, Han Liu, Dashun Wang","doi":"10.1038/s43588-025-00906-6","DOIUrl":"10.1038/s43588-025-00906-6","url":null,"abstract":"<p><p>We introduce SciSciGPT, an open-source, prototype artificial intelligence (AI) collaborator that uses the domain of science of science as a testbed to explore the potential of large language model-powered research tools. SciSciGPT automates complex workflows, supports diverse analytical approaches, accelerates research prototyping and iteration and facilitates reproducibility. Through case studies, we demonstrate its ability to streamline a wide range of empirical and analytical research tasks while highlighting its broader potential to advance research. We further propose a large language model agent capability maturity model for human-AI collaboration, envisioning a roadmap to further improve and expand upon frameworks such as SciSciGPT. As AI capabilities continue to evolve, frameworks such as SciSciGPT may play increasingly pivotal roles in scientific research and discovery. At the same time, these new advances also raise critical challenges, from ensuring transparency and ethical use to balancing human and AI contributions. Addressing these issues may shape the future of scientific inquiry and inform how we train the next generation of scientists to thrive in an increasingly AI-integrated research ecosystem.</p>","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":" ","pages":""},"PeriodicalIF":18.3,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145717139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-08DOI: 10.1038/s43588-025-00917-3
Eric Sivonxay, Lucas Attia, Evan Walter Clark Spotte-Smith, Benjamin Sanchez-Lengeling, Xiaojing Xia, Daniel Barter, Emory M. Chan, Samuel M. Blau
Applications of deep learning (DL) to design nanomaterials are hampered by a lack of suitable data representations and training data. Here we report efforts to overcome these limitations and leverage DL to optimize the nonlinear optical properties of core–shell upconverting nanoparticles (UCNPs). UCNPs, which have applications in fields such as biosensing, super-resolution microscopy and three-dimensional printing, can emit visible and ultraviolet light from near-infrared excitations. We report a large-scale dataset of UCNP emission spectra based on accurate but expensive kinetic Monte Carlo simulations (N > 6,000) and use these data to train a heterogeneous graph neural network using a physically motivated representation of UCNP nanostructure. Applying gradient-based optimization on the trained graph neural network, we identify structures with 6.5× higher predicted emission under 800-nm illumination than any UCNP in our training set. Our work reveals design principles for UCNP heterostructures and presents a roadmap for DL-based inverse design of nanomaterials. Graph neural networks built on physically motivated representations enable gradient-based optimization of complex upconverting nanoparticle heterostructures, revealing photophysical design rules and a roadmap for deep learning in nanoscience.
{"title":"Gradient-based optimization of complex nanoparticle heterostructures enabled by deep learning on heterogeneous graphs","authors":"Eric Sivonxay, Lucas Attia, Evan Walter Clark Spotte-Smith, Benjamin Sanchez-Lengeling, Xiaojing Xia, Daniel Barter, Emory M. Chan, Samuel M. Blau","doi":"10.1038/s43588-025-00917-3","DOIUrl":"10.1038/s43588-025-00917-3","url":null,"abstract":"Applications of deep learning (DL) to design nanomaterials are hampered by a lack of suitable data representations and training data. Here we report efforts to overcome these limitations and leverage DL to optimize the nonlinear optical properties of core–shell upconverting nanoparticles (UCNPs). UCNPs, which have applications in fields such as biosensing, super-resolution microscopy and three-dimensional printing, can emit visible and ultraviolet light from near-infrared excitations. We report a large-scale dataset of UCNP emission spectra based on accurate but expensive kinetic Monte Carlo simulations (N > 6,000) and use these data to train a heterogeneous graph neural network using a physically motivated representation of UCNP nanostructure. Applying gradient-based optimization on the trained graph neural network, we identify structures with 6.5× higher predicted emission under 800-nm illumination than any UCNP in our training set. Our work reveals design principles for UCNP heterostructures and presents a roadmap for DL-based inverse design of nanomaterials. Graph neural networks built on physically motivated representations enable gradient-based optimization of complex upconverting nanoparticle heterostructures, revealing photophysical design rules and a roadmap for deep learning in nanoscience.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"6 1","pages":"83-95"},"PeriodicalIF":18.3,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145709944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-05DOI: 10.1038/s43588-025-00912-8
Ouyang Zhu, Jun Li
Gene perturbation experiments followed by transcriptomic profiling are vital for uncovering causal gene effects. However, their limited throughput leaves many perturbations of interest unexplored. Computational methods are therefore needed to predict genome-wide transcriptional responses to gene perturbations that were not experimentally assayed within a given dataset. Existing approaches often rely on Gene Ontology graphs to encode prior knowledge, but their predictive power and applicability are constrained by the graphs’ sparsity and incomplete gene coverage. Here we present Scouter, a computational method that uses gene embeddings generated by large language models and a lightweight compressor–generator neural network. Scouter accurately predicts transcriptional responses to both single- and two-gene perturbations, reducing errors from state-of-the-art Gene Ontology-term-based methods (GEARS and biolord) by half or more. Unlike recent approaches based on fine-tuning gene expression foundation models, Scouter offers substantially better accuracy and greater accessibility; it requires no pretraining and runs efficiently on standard hardware. A lightweight AI method called Scouter that predicts genome-wide transcriptional responses to single- and two-gene perturbations using large language model embeddings is presented and achieves substantially higher accuracy than leading approaches.
{"title":"Scouter predicts transcriptional responses to genetic perturbations with large language model embeddings","authors":"Ouyang Zhu, Jun Li","doi":"10.1038/s43588-025-00912-8","DOIUrl":"10.1038/s43588-025-00912-8","url":null,"abstract":"Gene perturbation experiments followed by transcriptomic profiling are vital for uncovering causal gene effects. However, their limited throughput leaves many perturbations of interest unexplored. Computational methods are therefore needed to predict genome-wide transcriptional responses to gene perturbations that were not experimentally assayed within a given dataset. Existing approaches often rely on Gene Ontology graphs to encode prior knowledge, but their predictive power and applicability are constrained by the graphs’ sparsity and incomplete gene coverage. Here we present Scouter, a computational method that uses gene embeddings generated by large language models and a lightweight compressor–generator neural network. Scouter accurately predicts transcriptional responses to both single- and two-gene perturbations, reducing errors from state-of-the-art Gene Ontology-term-based methods (GEARS and biolord) by half or more. Unlike recent approaches based on fine-tuning gene expression foundation models, Scouter offers substantially better accuracy and greater accessibility; it requires no pretraining and runs efficiently on standard hardware. A lightweight AI method called Scouter that predicts genome-wide transcriptional responses to single- and two-gene perturbations using large language model embeddings is presented and achieves substantially higher accuracy than leading approaches.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"6 1","pages":"21-28"},"PeriodicalIF":18.3,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.comhttps://www.nature.com/articles/s43588-025-00912-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145688781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-05DOI: 10.1038/s43588-025-00920-8
Li Cheng, Pan-Lin Shao, Jiahui Lv, Hongjun Xiao, Yanping Sun, Jingkai Yang, Ziyi Xu, Mingkun Lv, Guanghui Wang, Shaokang Zhao, Jiaxin Li, Ziqi Jin, Xuan Tan, Guichuan Xing, Bo Zhang
The asymmetric hydrogenation of olefins is one of the most important asymmetric transformations in molecular synthesis. While other machine learning models have successfully predicted stereoselectivity for reactions with a single prochiral site, existing models face limitations including narrow substrate–catalyst applicability, an inability to simultaneously predict stereoselectivity and absolute configurations in asymmetric hydrogenation of olefins with two prochiral sites, and a reliance on predefined descriptors. Here, to overcome these challenges, we introduce Chemistry-Informed Asymmetric Hydrogenation Network (ChemAHNet), a deep learning model based on the reaction mechanism of olefin asymmetric hydrogenation. By leveraging three structure-aware modules, ChemAHNet accurately predicts the absolute configuration of major enantiomers across diverse catalysts and substrates. It also defines the $$mathrm{varDelta varDelta }{G}^{{boldsymbol{ddagger }}}$$ of asymmetric hydrogenation via catalyst–olefin interactions, enabling concurrent prediction of stereoselectivity and absolute configuration. Notably, ChemAHNet extends to other asymmetric catalytic reactions. By operating solely on simplified molecular-input line-entry system inputs, it captures atomic-level spatial and electronic interactions, offering a robust tool for target-directed molecular engineering. This study introduces ChemAHNet, a deep learning model that predicts stereoselectivity and absolute configurations in the asymmetric hydrogenation of olefins with two prochiral centers, providing a broadly applicable tool for catalyst and substrate design.
{"title":"Chemistry-informed deep learning model for predicting stereoselectivity and absolute configuration in asymmetric hydrogenation","authors":"Li Cheng, Pan-Lin Shao, Jiahui Lv, Hongjun Xiao, Yanping Sun, Jingkai Yang, Ziyi Xu, Mingkun Lv, Guanghui Wang, Shaokang Zhao, Jiaxin Li, Ziqi Jin, Xuan Tan, Guichuan Xing, Bo Zhang","doi":"10.1038/s43588-025-00920-8","DOIUrl":"10.1038/s43588-025-00920-8","url":null,"abstract":"The asymmetric hydrogenation of olefins is one of the most important asymmetric transformations in molecular synthesis. While other machine learning models have successfully predicted stereoselectivity for reactions with a single prochiral site, existing models face limitations including narrow substrate–catalyst applicability, an inability to simultaneously predict stereoselectivity and absolute configurations in asymmetric hydrogenation of olefins with two prochiral sites, and a reliance on predefined descriptors. Here, to overcome these challenges, we introduce Chemistry-Informed Asymmetric Hydrogenation Network (ChemAHNet), a deep learning model based on the reaction mechanism of olefin asymmetric hydrogenation. By leveraging three structure-aware modules, ChemAHNet accurately predicts the absolute configuration of major enantiomers across diverse catalysts and substrates. It also defines the $$mathrm{varDelta varDelta }{G}^{{boldsymbol{ddagger }}}$$ of asymmetric hydrogenation via catalyst–olefin interactions, enabling concurrent prediction of stereoselectivity and absolute configuration. Notably, ChemAHNet extends to other asymmetric catalytic reactions. By operating solely on simplified molecular-input line-entry system inputs, it captures atomic-level spatial and electronic interactions, offering a robust tool for target-directed molecular engineering. This study introduces ChemAHNet, a deep learning model that predicts stereoselectivity and absolute configurations in the asymmetric hydrogenation of olefins with two prochiral centers, providing a broadly applicable tool for catalyst and substrate design.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"6 2","pages":"145-155"},"PeriodicalIF":18.3,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145688777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-04DOI: 10.1038/s43588-025-00911-9
Zixia Zhou, Junyan Liu, Wei Emma Wu, Ruogu Fang, Sheng Liu, Qingyue Wei, Rui Yan, Yi Guo, Qian Tao, Yuanyuan Wang, Md Tauhidul Islam, Lei Xing
Dynamic brain data are becoming increasingly accessible, providing a gateway to understanding the inner workings of the brain in living participants. However, the size and complexity of the data pose a challenge in extracting meaningful information across various data sources. Here we introduce a generalizable unsupervised deep manifold learning for exploration of neurocognitive and behavioral patterns. Unlike existing methods that extract patterns directly from the input data, the proposed brain-dynamic convolutional-network-based embedding (BCNE) captures brain-state trajectories by analyzing temporospatial correlations within the data and applying manifold learning. The results demonstrate that BCNE effectively delineates scene transitions, underscores the involvement of different brain regions in memory and narrative processing, distinguishes dynamic learning processes and identifies differences between active and passive behaviors. BCNE provides an effective tool for exploring general neuroscience inquiries or individual-specific patterns. BCNE, an unsupervised deep-learning method, reveals clear trajectories of brain activity and effectively distinguishes cognitive events, learning stages and active versus passive movement, outperforming traditional data visualization methods.
{"title":"Revealing neurocognitive and behavioral patterns through unsupervised manifold learning of dynamic brain data","authors":"Zixia Zhou, Junyan Liu, Wei Emma Wu, Ruogu Fang, Sheng Liu, Qingyue Wei, Rui Yan, Yi Guo, Qian Tao, Yuanyuan Wang, Md Tauhidul Islam, Lei Xing","doi":"10.1038/s43588-025-00911-9","DOIUrl":"10.1038/s43588-025-00911-9","url":null,"abstract":"Dynamic brain data are becoming increasingly accessible, providing a gateway to understanding the inner workings of the brain in living participants. However, the size and complexity of the data pose a challenge in extracting meaningful information across various data sources. Here we introduce a generalizable unsupervised deep manifold learning for exploration of neurocognitive and behavioral patterns. Unlike existing methods that extract patterns directly from the input data, the proposed brain-dynamic convolutional-network-based embedding (BCNE) captures brain-state trajectories by analyzing temporospatial correlations within the data and applying manifold learning. The results demonstrate that BCNE effectively delineates scene transitions, underscores the involvement of different brain regions in memory and narrative processing, distinguishes dynamic learning processes and identifies differences between active and passive behaviors. BCNE provides an effective tool for exploring general neuroscience inquiries or individual-specific patterns. BCNE, an unsupervised deep-learning method, reveals clear trajectories of brain activity and effectively distinguishes cognitive events, learning stages and active versus passive movement, outperforming traditional data visualization methods.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"5 12","pages":"1238-1252"},"PeriodicalIF":18.3,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145679551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1038/s43588-025-00923-5
Mustafa Guler, Benjamin Krummenacher, Thomas Hall, Meghana Tandon, Joshua Abrams, Sanjana Ravi, Peng Chen, Matthew Lauber, Bahar Behsaz, Hosein Mohimani
Mass spectrometry is a widely used method for the identification of molecules in complex samples. Current tools for database search of experimental spectra against libraries of molecules are not scalable. Moreover, these tools are often limited to known molecules and only perform an exact search. Here, to address this, we introduce Variable Interpretation of Spectrum–Molecule Couples, or VInSMoC, a mass spectral database search algorithm for the identification of variants of molecules. VInSMoC removes some false identifications by estimating the statistical significance of matches between spectra and molecular structures. Benchmarking VInSMoC in a search of 483 million spectra from GNPS against 87 million molecules from PubChem and COCONUT revealed 43,000 known molecules and 85,000 variants that were previously unreported. VInSMoC further facilitates identifying putative microbial biosynthesis pathways of promothiocin B and depsidomycin in Streptomyces bellus and Streptomyces sp. F-2747, respectively. The authors present a scalable mass spectral search tool that identifies both known molecules and structural variants by estimating match significance. The method revealed biosynthetic pathways in Streptomyces, expanding the scope of metabolite discovery.
{"title":"Identifying variants of molecules through database search of mass spectra","authors":"Mustafa Guler, Benjamin Krummenacher, Thomas Hall, Meghana Tandon, Joshua Abrams, Sanjana Ravi, Peng Chen, Matthew Lauber, Bahar Behsaz, Hosein Mohimani","doi":"10.1038/s43588-025-00923-5","DOIUrl":"10.1038/s43588-025-00923-5","url":null,"abstract":"Mass spectrometry is a widely used method for the identification of molecules in complex samples. Current tools for database search of experimental spectra against libraries of molecules are not scalable. Moreover, these tools are often limited to known molecules and only perform an exact search. Here, to address this, we introduce Variable Interpretation of Spectrum–Molecule Couples, or VInSMoC, a mass spectral database search algorithm for the identification of variants of molecules. VInSMoC removes some false identifications by estimating the statistical significance of matches between spectra and molecular structures. Benchmarking VInSMoC in a search of 483 million spectra from GNPS against 87 million molecules from PubChem and COCONUT revealed 43,000 known molecules and 85,000 variants that were previously unreported. VInSMoC further facilitates identifying putative microbial biosynthesis pathways of promothiocin B and depsidomycin in Streptomyces bellus and Streptomyces sp. F-2747, respectively. The authors present a scalable mass spectral search tool that identifies both known molecules and structural variants by estimating match significance. The method revealed biosynthetic pathways in Streptomyces, expanding the scope of metabolite discovery.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"5 12","pages":"1227-1237"},"PeriodicalIF":18.3,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-25DOI: 10.1038/s43588-025-00907-5
Xiu-Hao Deng, Yuan Xu
Quantum computers are inching closer to practical deployment, but shielding fragile quantum information from errors is still very challenging. Now, a machine-learning-based decoder offers a strategy for rectifying errors in logic quantum circuits, hastening the advent of reliable and fault-tolerant quantum systems.
{"title":"Efficiently decoding quantum errors with machine learning","authors":"Xiu-Hao Deng, Yuan Xu","doi":"10.1038/s43588-025-00907-5","DOIUrl":"10.1038/s43588-025-00907-5","url":null,"abstract":"Quantum computers are inching closer to practical deployment, but shielding fragile quantum information from errors is still very challenging. Now, a machine-learning-based decoder offers a strategy for rectifying errors in logic quantum circuits, hastening the advent of reliable and fault-tolerant quantum systems.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"5 12","pages":"1100-1101"},"PeriodicalIF":18.3,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145607886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-24DOI: 10.1038/s43588-025-00937-z
Eva Portelance, Masoud Jasbi
{"title":"Publisher Correction: On the compatibility of generative AI and generative linguistics","authors":"Eva Portelance, Masoud Jasbi","doi":"10.1038/s43588-025-00937-z","DOIUrl":"10.1038/s43588-025-00937-z","url":null,"abstract":"","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"6 1","pages":"109-109"},"PeriodicalIF":18.3,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.comhttps://www.nature.com/articles/s43588-025-00937-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145597461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-21DOI: 10.1038/s43588-025-00913-7
Alex Murphy
Research now suggests that large language models (LLMs) are viable in silico models of human language processing. By examining multi-participant high-quality brain responses, researchers were able to break new ground in the validation of this proposal, which could dramatically reduce the barrier to studying how language is processed in the human brain.
{"title":"Viability of using LLMs as models of human language processing","authors":"Alex Murphy","doi":"10.1038/s43588-025-00913-7","DOIUrl":"10.1038/s43588-025-00913-7","url":null,"abstract":"Research now suggests that large language models (LLMs) are viable in silico models of human language processing. By examining multi-participant high-quality brain responses, researchers were able to break new ground in the validation of this proposal, which could dramatically reduce the barrier to studying how language is processed in the human brain.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"6 2","pages":"119-120"},"PeriodicalIF":18.3,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145575062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-21DOI: 10.1038/s43588-025-00904-8
Kai Ruan, Yilong Xu, Ze-Feng Gao, Yang Liu, Yike Guo, Ji-Rong Wen, Hao Sun
Symbolic regression has a crucial role in modern scientific research owing to its capability of discovering concise and interpretable mathematical expressions from data. A key challenge lies in the search for parsimonious and generalizable mathematical formulas, in an infinite search space, while intending to fit the training data. Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade when handling problems of complexity, which essentially hinders the pace of applying symbolic regression for scientific exploration across interdisciplinary domains. Here, to this end, we introduce parallel symbolic enumeration (PSE) to efficiently distill generic mathematical expressions from limited data. Experiments show that PSE achieves higher accuracy and faster computation compared with the state-of-the-art baseline algorithms across over 200 synthetic and experimental problem sets (for example, improving the recovery accuracy by up to 99% and reducing runtime by an order of magnitude). PSE represents an advance in accurate and efficient data-driven discovery of symbolic, interpretable models (for example, underlying physical laws), and improves the scalability of symbolic learning. In this work, the authors introduce parallel symbolic enumeration (PSE), a model that discovers physical laws from data with improved accuracy and speed. By evaluating millions of expressions in parallel and reusing computations, PSE outperforms the state-of-the-art methods.
{"title":"Discovering physical laws with parallel symbolic enumeration","authors":"Kai Ruan, Yilong Xu, Ze-Feng Gao, Yang Liu, Yike Guo, Ji-Rong Wen, Hao Sun","doi":"10.1038/s43588-025-00904-8","DOIUrl":"10.1038/s43588-025-00904-8","url":null,"abstract":"Symbolic regression has a crucial role in modern scientific research owing to its capability of discovering concise and interpretable mathematical expressions from data. A key challenge lies in the search for parsimonious and generalizable mathematical formulas, in an infinite search space, while intending to fit the training data. Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade when handling problems of complexity, which essentially hinders the pace of applying symbolic regression for scientific exploration across interdisciplinary domains. Here, to this end, we introduce parallel symbolic enumeration (PSE) to efficiently distill generic mathematical expressions from limited data. Experiments show that PSE achieves higher accuracy and faster computation compared with the state-of-the-art baseline algorithms across over 200 synthetic and experimental problem sets (for example, improving the recovery accuracy by up to 99% and reducing runtime by an order of magnitude). PSE represents an advance in accurate and efficient data-driven discovery of symbolic, interpretable models (for example, underlying physical laws), and improves the scalability of symbolic learning. In this work, the authors introduce parallel symbolic enumeration (PSE), a model that discovers physical laws from data with improved accuracy and speed. By evaluating millions of expressions in parallel and reusing computations, PSE outperforms the state-of-the-art methods.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"6 1","pages":"53-66"},"PeriodicalIF":18.3,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.comhttps://www.nature.com/articles/s43588-025-00904-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145575035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}