Pub Date : 2025-12-10DOI: 10.1038/s43588-025-00922-6
Luca Manneschi, Matthew O A Ellis
{"title":"Predicting physics efficiently with hybrid hardware.","authors":"Luca Manneschi, Matthew O A Ellis","doi":"10.1038/s43588-025-00922-6","DOIUrl":"https://doi.org/10.1038/s43588-025-00922-6","url":null,"abstract":"","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":" ","pages":""},"PeriodicalIF":18.3,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145727761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-09DOI: 10.1038/s43588-025-00933-3
Bart Ghesquiere
{"title":"A scalable tool for fast and flexible variant identification in mass spectrometry.","authors":"Bart Ghesquiere","doi":"10.1038/s43588-025-00933-3","DOIUrl":"https://doi.org/10.1038/s43588-025-00933-3","url":null,"abstract":"","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":" ","pages":""},"PeriodicalIF":18.3,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145717052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-09DOI: 10.1038/s43588-025-00916-4
Maximilian Josef Joas, Neringa Jurenaite, Dušan Praščević, Nico Scherf, Jan Ewald
In recent years, autoencoders, a family of deep learning-based methods for representation learning, are advancing data-driven research owing to their variability and nonlinear power for multimodal data integration. Despite their success, current implementations lack standardization, versatility, comparability and generalizability. Here we present AUTOENCODIX, an open-source framework, designed as a standardized and flexible pipeline for preprocessing, training and evaluation of autoencoder architectures. These architectures, such as ontology-based and cross-modal autoencoders, provide key advantages over traditional methods by offering explainability of embeddings or the ability to translate across data modalities. We apply the method to datasets from pan-cancer studies (The Cancer Genome Atlas) and single-cell sequencing as well as in combination with imaging. Our studies provide important user-centric insights and recommendations to navigate through architectures, hyperparameters and important tradeoffs in representation learning. These include the reconstruction capability of input data, the quality of embedding for downstream machine learning models and the reliability of ontology-based embeddings for explainability.
{"title":"AUTOENCODIX: a generalized and versatile framework to train and evaluate autoencoders for biological representation learning and beyond.","authors":"Maximilian Josef Joas, Neringa Jurenaite, Dušan Praščević, Nico Scherf, Jan Ewald","doi":"10.1038/s43588-025-00916-4","DOIUrl":"https://doi.org/10.1038/s43588-025-00916-4","url":null,"abstract":"<p><p>In recent years, autoencoders, a family of deep learning-based methods for representation learning, are advancing data-driven research owing to their variability and nonlinear power for multimodal data integration. Despite their success, current implementations lack standardization, versatility, comparability and generalizability. Here we present AUTOENCODIX, an open-source framework, designed as a standardized and flexible pipeline for preprocessing, training and evaluation of autoencoder architectures. These architectures, such as ontology-based and cross-modal autoencoders, provide key advantages over traditional methods by offering explainability of embeddings or the ability to translate across data modalities. We apply the method to datasets from pan-cancer studies (The Cancer Genome Atlas) and single-cell sequencing as well as in combination with imaging. Our studies provide important user-centric insights and recommendations to navigate through architectures, hyperparameters and important tradeoffs in representation learning. These include the reconstruction capability of input data, the quality of embedding for downstream machine learning models and the reliability of ontology-based embeddings for explainability.</p>","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":" ","pages":""},"PeriodicalIF":18.3,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145717100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-09DOI: 10.1038/s43588-025-00906-6
Erzhuo Shao, Yifang Wang, Yifan Qian, Zhenyu Pan, Han Liu, Dashun Wang
We introduce SciSciGPT, an open-source, prototype artificial intelligence (AI) collaborator that uses the domain of science of science as a testbed to explore the potential of large language model-powered research tools. SciSciGPT automates complex workflows, supports diverse analytical approaches, accelerates research prototyping and iteration and facilitates reproducibility. Through case studies, we demonstrate its ability to streamline a wide range of empirical and analytical research tasks while highlighting its broader potential to advance research. We further propose a large language model agent capability maturity model for human-AI collaboration, envisioning a roadmap to further improve and expand upon frameworks such as SciSciGPT. As AI capabilities continue to evolve, frameworks such as SciSciGPT may play increasingly pivotal roles in scientific research and discovery. At the same time, these new advances also raise critical challenges, from ensuring transparency and ethical use to balancing human and AI contributions. Addressing these issues may shape the future of scientific inquiry and inform how we train the next generation of scientists to thrive in an increasingly AI-integrated research ecosystem.
{"title":"SciSciGPT: advancing human-AI collaboration in the science of science.","authors":"Erzhuo Shao, Yifang Wang, Yifan Qian, Zhenyu Pan, Han Liu, Dashun Wang","doi":"10.1038/s43588-025-00906-6","DOIUrl":"https://doi.org/10.1038/s43588-025-00906-6","url":null,"abstract":"<p><p>We introduce SciSciGPT, an open-source, prototype artificial intelligence (AI) collaborator that uses the domain of science of science as a testbed to explore the potential of large language model-powered research tools. SciSciGPT automates complex workflows, supports diverse analytical approaches, accelerates research prototyping and iteration and facilitates reproducibility. Through case studies, we demonstrate its ability to streamline a wide range of empirical and analytical research tasks while highlighting its broader potential to advance research. We further propose a large language model agent capability maturity model for human-AI collaboration, envisioning a roadmap to further improve and expand upon frameworks such as SciSciGPT. As AI capabilities continue to evolve, frameworks such as SciSciGPT may play increasingly pivotal roles in scientific research and discovery. At the same time, these new advances also raise critical challenges, from ensuring transparency and ethical use to balancing human and AI contributions. Addressing these issues may shape the future of scientific inquiry and inform how we train the next generation of scientists to thrive in an increasingly AI-integrated research ecosystem.</p>","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":" ","pages":""},"PeriodicalIF":18.3,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145717139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-08DOI: 10.1038/s43588-025-00917-3
Eric Sivonxay, Lucas Attia, Evan Walter Clark Spotte-Smith, Benjamin Sanchez-Lengeling, Xiaojing Xia, Daniel Barter, Emory M Chan, Samuel M Blau
Applications of deep learning (DL) to design nanomaterials are hampered by a lack of suitable data representations and training data. Here we report efforts to overcome these limitations and leverage DL to optimize the nonlinear optical properties of core-shell upconverting nanoparticles (UCNPs). UCNPs, which have applications in fields such as biosensing, super-resolution microscopy and three-dimensional printing, can emit visible and ultraviolet light from near-infrared excitations. We report a large-scale dataset of UCNP emission spectra based on accurate but expensive kinetic Monte Carlo simulations (N > 6,000) and use these data to train a heterogeneous graph neural network using a physically motivated representation of UCNP nanostructure. Applying gradient-based optimization on the trained graph neural network, we identify structures with 6.5× higher predicted emission under 800-nm illumination than any UCNP in our training set. Our work reveals design principles for UCNP heterostructures and presents a roadmap for DL-based inverse design of nanomaterials.
{"title":"Gradient-based optimization of complex nanoparticle heterostructures enabled by deep learning on heterogeneous graphs.","authors":"Eric Sivonxay, Lucas Attia, Evan Walter Clark Spotte-Smith, Benjamin Sanchez-Lengeling, Xiaojing Xia, Daniel Barter, Emory M Chan, Samuel M Blau","doi":"10.1038/s43588-025-00917-3","DOIUrl":"10.1038/s43588-025-00917-3","url":null,"abstract":"<p><p>Applications of deep learning (DL) to design nanomaterials are hampered by a lack of suitable data representations and training data. Here we report efforts to overcome these limitations and leverage DL to optimize the nonlinear optical properties of core-shell upconverting nanoparticles (UCNPs). UCNPs, which have applications in fields such as biosensing, super-resolution microscopy and three-dimensional printing, can emit visible and ultraviolet light from near-infrared excitations. We report a large-scale dataset of UCNP emission spectra based on accurate but expensive kinetic Monte Carlo simulations (N > 6,000) and use these data to train a heterogeneous graph neural network using a physically motivated representation of UCNP nanostructure. Applying gradient-based optimization on the trained graph neural network, we identify structures with 6.5× higher predicted emission under 800-nm illumination than any UCNP in our training set. Our work reveals design principles for UCNP heterostructures and presents a roadmap for DL-based inverse design of nanomaterials.</p>","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":" ","pages":""},"PeriodicalIF":18.3,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145709944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-05DOI: 10.1038/s43588-025-00912-8
Ouyang Zhu, Jun Li
Gene perturbation experiments followed by transcriptomic profiling are vital for uncovering causal gene effects. However, their limited throughput leaves many perturbations of interest unexplored. Computational methods are therefore needed to predict genome-wide transcriptional responses to gene perturbations that were not experimentally assayed within a given dataset. Existing approaches often rely on Gene Ontology graphs to encode prior knowledge, but their predictive power and applicability are constrained by the graphs' sparsity and incomplete gene coverage. Here we present Scouter, a computational method that uses gene embeddings generated by large language models and a lightweight compressor-generator neural network. Scouter accurately predicts transcriptional responses to both single- and two-gene perturbations, reducing errors from state-of-the-art Gene Ontology-term-based methods (GEARS and biolord) by half or more. Unlike recent approaches based on fine-tuning gene expression foundation models, Scouter offers substantially better accuracy and greater accessibility; it requires no pretraining and runs efficiently on standard hardware.
{"title":"Scouter predicts transcriptional responses to genetic perturbations with large language model embeddings.","authors":"Ouyang Zhu, Jun Li","doi":"10.1038/s43588-025-00912-8","DOIUrl":"10.1038/s43588-025-00912-8","url":null,"abstract":"<p><p>Gene perturbation experiments followed by transcriptomic profiling are vital for uncovering causal gene effects. However, their limited throughput leaves many perturbations of interest unexplored. Computational methods are therefore needed to predict genome-wide transcriptional responses to gene perturbations that were not experimentally assayed within a given dataset. Existing approaches often rely on Gene Ontology graphs to encode prior knowledge, but their predictive power and applicability are constrained by the graphs' sparsity and incomplete gene coverage. Here we present Scouter, a computational method that uses gene embeddings generated by large language models and a lightweight compressor-generator neural network. Scouter accurately predicts transcriptional responses to both single- and two-gene perturbations, reducing errors from state-of-the-art Gene Ontology-term-based methods (GEARS and biolord) by half or more. Unlike recent approaches based on fine-tuning gene expression foundation models, Scouter offers substantially better accuracy and greater accessibility; it requires no pretraining and runs efficiently on standard hardware.</p>","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":" ","pages":""},"PeriodicalIF":18.3,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145688781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-05DOI: 10.1038/s43588-025-00920-8
Li Cheng, Pan-Lin Shao, Jiahui Lv, Hongjun Xiao, Yanping Sun, Jingkai Yang, Ziyi Xu, Mingkun Lv, Guanghui Wang, Shaokang Zhao, Jiaxin Li, Ziqi Jin, Xuan Tan, Guichuan Xing, Bo Zhang
The asymmetric hydrogenation of olefins is one of the most important asymmetric transformations in molecular synthesis. While other machine learning models have successfully predicted stereoselectivity for reactions with a single prochiral site, existing models face limitations including narrow substrate-catalyst applicability, an inability to simultaneously predict stereoselectivity and absolute configurations in asymmetric hydrogenation of olefins with two prochiral sites, and a reliance on predefined descriptors. Here, to overcome these challenges, we introduce Chemistry-Informed Asymmetric Hydrogenation Network (ChemAHNet), a deep learning model based on the reaction mechanism of olefin asymmetric hydrogenation. By leveraging three structure-aware modules, ChemAHNet accurately predicts the absolute configuration of major enantiomers across diverse catalysts and substrates. It also defines the of asymmetric hydrogenation via catalyst-olefin interactions, enabling concurrent prediction of stereoselectivity and absolute configuration. Notably, ChemAHNet extends to other asymmetric catalytic reactions. By operating solely on simplified molecular-input line-entry system inputs, it captures atomic-level spatial and electronic interactions, offering a robust tool for target-directed molecular engineering.
{"title":"Chemistry-informed deep learning model for predicting stereoselectivity and absolute configuration in asymmetric hydrogenation.","authors":"Li Cheng, Pan-Lin Shao, Jiahui Lv, Hongjun Xiao, Yanping Sun, Jingkai Yang, Ziyi Xu, Mingkun Lv, Guanghui Wang, Shaokang Zhao, Jiaxin Li, Ziqi Jin, Xuan Tan, Guichuan Xing, Bo Zhang","doi":"10.1038/s43588-025-00920-8","DOIUrl":"https://doi.org/10.1038/s43588-025-00920-8","url":null,"abstract":"<p><p>The asymmetric hydrogenation of olefins is one of the most important asymmetric transformations in molecular synthesis. While other machine learning models have successfully predicted stereoselectivity for reactions with a single prochiral site, existing models face limitations including narrow substrate-catalyst applicability, an inability to simultaneously predict stereoselectivity and absolute configurations in asymmetric hydrogenation of olefins with two prochiral sites, and a reliance on predefined descriptors. Here, to overcome these challenges, we introduce Chemistry-Informed Asymmetric Hydrogenation Network (ChemAHNet), a deep learning model based on the reaction mechanism of olefin asymmetric hydrogenation. By leveraging three structure-aware modules, ChemAHNet accurately predicts the absolute configuration of major enantiomers across diverse catalysts and substrates. It also defines the <math><mrow><mi>ΔΔ</mi> <msup><mrow><mi>G</mi></mrow> <mo>‡</mo></msup> </mrow> </math> of asymmetric hydrogenation via catalyst-olefin interactions, enabling concurrent prediction of stereoselectivity and absolute configuration. Notably, ChemAHNet extends to other asymmetric catalytic reactions. By operating solely on simplified molecular-input line-entry system inputs, it captures atomic-level spatial and electronic interactions, offering a robust tool for target-directed molecular engineering.</p>","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":" ","pages":""},"PeriodicalIF":18.3,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145688777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-04DOI: 10.1038/s43588-025-00911-9
Zixia Zhou, Junyan Liu, Wei Emma Wu, Ruogu Fang, Sheng Liu, Qingyue Wei, Rui Yan, Yi Guo, Qian Tao, Yuanyuan Wang, Md Tauhidul Islam, Lei Xing
Dynamic brain data are becoming increasingly accessible, providing a gateway to understanding the inner workings of the brain in living participants. However, the size and complexity of the data pose a challenge in extracting meaningful information across various data sources. Here we introduce a generalizable unsupervised deep manifold learning for exploration of neurocognitive and behavioral patterns. Unlike existing methods that extract patterns directly from the input data, the proposed brain-dynamic convolutional-network-based embedding (BCNE) captures brain-state trajectories by analyzing temporospatial correlations within the data and applying manifold learning. The results demonstrate that BCNE effectively delineates scene transitions, underscores the involvement of different brain regions in memory and narrative processing, distinguishes dynamic learning processes and identifies differences between active and passive behaviors. BCNE provides an effective tool for exploring general neuroscience inquiries or individual-specific patterns.
{"title":"Revealing neurocognitive and behavioral patterns through unsupervised manifold learning of dynamic brain data.","authors":"Zixia Zhou, Junyan Liu, Wei Emma Wu, Ruogu Fang, Sheng Liu, Qingyue Wei, Rui Yan, Yi Guo, Qian Tao, Yuanyuan Wang, Md Tauhidul Islam, Lei Xing","doi":"10.1038/s43588-025-00911-9","DOIUrl":"https://doi.org/10.1038/s43588-025-00911-9","url":null,"abstract":"<p><p>Dynamic brain data are becoming increasingly accessible, providing a gateway to understanding the inner workings of the brain in living participants. However, the size and complexity of the data pose a challenge in extracting meaningful information across various data sources. Here we introduce a generalizable unsupervised deep manifold learning for exploration of neurocognitive and behavioral patterns. Unlike existing methods that extract patterns directly from the input data, the proposed brain-dynamic convolutional-network-based embedding (BCNE) captures brain-state trajectories by analyzing temporospatial correlations within the data and applying manifold learning. The results demonstrate that BCNE effectively delineates scene transitions, underscores the involvement of different brain regions in memory and narrative processing, distinguishes dynamic learning processes and identifies differences between active and passive behaviors. BCNE provides an effective tool for exploring general neuroscience inquiries or individual-specific patterns.</p>","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":" ","pages":""},"PeriodicalIF":18.3,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145679551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1038/s43588-025-00923-5
Mustafa Guler, Benjamin Krummenacher, Thomas Hall, Meghana Tandon, Joshua Abrams, Sanjana Ravi, Peng Chen, Matthew Lauber, Bahar Behsaz, Hosein Mohimani
Mass spectrometry is a widely used method for the identification of molecules in complex samples. Current tools for database search of experimental spectra against libraries of molecules are not scalable. Moreover, these tools are often limited to known molecules and only perform an exact search. Here, to address this, we introduce Variable Interpretation of Spectrum-Molecule Couples, or VInSMoC, a mass spectral database search algorithm for the identification of variants of molecules. VInSMoC removes some false identifications by estimating the statistical significance of matches between spectra and molecular structures. Benchmarking VInSMoC in a search of 483 million spectra from GNPS against 87 million molecules from PubChem and COCONUT revealed 43,000 known molecules and 85,000 variants that were previously unreported. VInSMoC further facilitates identifying putative microbial biosynthesis pathways of promothiocin B and depsidomycin in Streptomyces bellus and Streptomyces sp. F-2747, respectively.
{"title":"Identifying variants of molecules through database search of mass spectra.","authors":"Mustafa Guler, Benjamin Krummenacher, Thomas Hall, Meghana Tandon, Joshua Abrams, Sanjana Ravi, Peng Chen, Matthew Lauber, Bahar Behsaz, Hosein Mohimani","doi":"10.1038/s43588-025-00923-5","DOIUrl":"10.1038/s43588-025-00923-5","url":null,"abstract":"<p><p>Mass spectrometry is a widely used method for the identification of molecules in complex samples. Current tools for database search of experimental spectra against libraries of molecules are not scalable. Moreover, these tools are often limited to known molecules and only perform an exact search. Here, to address this, we introduce Variable Interpretation of Spectrum-Molecule Couples, or VInSMoC, a mass spectral database search algorithm for the identification of variants of molecules. VInSMoC removes some false identifications by estimating the statistical significance of matches between spectra and molecular structures. Benchmarking VInSMoC in a search of 483 million spectra from GNPS against 87 million molecules from PubChem and COCONUT revealed 43,000 known molecules and 85,000 variants that were previously unreported. VInSMoC further facilitates identifying putative microbial biosynthesis pathways of promothiocin B and depsidomycin in Streptomyces bellus and Streptomyces sp. F-2747, respectively.</p>","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":" ","pages":""},"PeriodicalIF":18.3,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}