首页 > 最新文献

Nature Machine Intelligence最新文献

英文 中文
Accelerating protein engineering with fitness landscape modelling and reinforcement learning 用适应性景观建模和强化学习加速蛋白质工程
IF 23.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-08 DOI: 10.1038/s42256-025-01103-w
Haoran Sun, Liang He, Pan Deng, Guoqing Liu, Zhiyu Zhao, Yuliang Jiang, Chuan Cao, Fusong Ju, Lijun Wu, Haiguang Liu, Tao Qin, Tie-Yan Liu
Protein engineering holds substantial promise for designing proteins with customized functions, yet the vast landscape of potential mutations versus limited laboratory capacity constrains the discovery of optimal sequences. Here, to address this, we present the μProtein framework, which accelerates protein engineering by combining μFormer, a deep learning model for accurate mutational effect prediction, with μSearch, a reinforcement learning algorithm designed to efficiently navigate the protein fitness landscape using μFormer as an oracle. μProtein leverages single-mutation data to predict optimal sequences with complex, multi-amino-acid mutations through its modelling of epistatic interactions and a multi-step search strategy. In addition to strong performance on benchmark datasets, μProtein identified high-gain-of-function multi-point mutants for the enzyme β-lactamase, surpassing one of the highest-known activity levels, in wet laboratory, trained solely on single-mutation data. These results demonstrate μProtein’s capability to discover impactful mutations across the vast protein sequence space, offering a robust, efficient approach for protein optimization. μProtein, combining deep learning and reinforcement learning, is developed to design high-function proteins. This framework, trained only on single-mutation data, discovers multi-site β-lactamase mutants with up to 2,000× growth rates.
蛋白质工程为设计具有定制功能的蛋白质提供了巨大的希望,然而潜在突变的巨大前景和有限的实验室能力限制了最佳序列的发现。为了解决这个问题,我们提出了μProtein框架,该框架将μFormer(用于准确预测突变效应的深度学习模型)和μSearch(用于高效导航蛋白质适应度景观的强化学习算法)结合起来,加速了蛋白质工程。μProtein利用单突变数据,通过上位相互作用模型和多步搜索策略,预测具有复杂、多氨基酸突变的最佳序列。除了在基准数据集上的出色表现外,μProtein还鉴定出了β-内酰胺酶的高功能增益多点突变体,超过了在潮湿实验室中仅使用单突变数据训练的最高活性水平之一。这些结果证明μProtein有能力在巨大的蛋白质序列空间中发现有影响的突变,为蛋白质优化提供了一种强大、有效的方法。
{"title":"Accelerating protein engineering with fitness landscape modelling and reinforcement learning","authors":"Haoran Sun, Liang He, Pan Deng, Guoqing Liu, Zhiyu Zhao, Yuliang Jiang, Chuan Cao, Fusong Ju, Lijun Wu, Haiguang Liu, Tao Qin, Tie-Yan Liu","doi":"10.1038/s42256-025-01103-w","DOIUrl":"10.1038/s42256-025-01103-w","url":null,"abstract":"Protein engineering holds substantial promise for designing proteins with customized functions, yet the vast landscape of potential mutations versus limited laboratory capacity constrains the discovery of optimal sequences. Here, to address this, we present the μProtein framework, which accelerates protein engineering by combining μFormer, a deep learning model for accurate mutational effect prediction, with μSearch, a reinforcement learning algorithm designed to efficiently navigate the protein fitness landscape using μFormer as an oracle. μProtein leverages single-mutation data to predict optimal sequences with complex, multi-amino-acid mutations through its modelling of epistatic interactions and a multi-step search strategy. In addition to strong performance on benchmark datasets, μProtein identified high-gain-of-function multi-point mutants for the enzyme β-lactamase, surpassing one of the highest-known activity levels, in wet laboratory, trained solely on single-mutation data. These results demonstrate μProtein’s capability to discover impactful mutations across the vast protein sequence space, offering a robust, efficient approach for protein optimization. μProtein, combining deep learning and reinforcement learning, is developed to design high-function proteins. This framework, trained only on single-mutation data, discovers multi-site β-lactamase mutants with up to 2,000× growth rates.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 9","pages":"1446-1460"},"PeriodicalIF":23.9,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145009025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Applying genomic AI to combat antibiotic resistance in low-income countries 应用基因组人工智能对抗低收入国家的抗生素耐药性
IF 23.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-04 DOI: 10.1038/s42256-025-01108-5
Dickson Aruhomukama
{"title":"Applying genomic AI to combat antibiotic resistance in low-income countries","authors":"Dickson Aruhomukama","doi":"10.1038/s42256-025-01108-5","DOIUrl":"10.1038/s42256-025-01108-5","url":null,"abstract":"","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 9","pages":"1369-1370"},"PeriodicalIF":23.9,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144987427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Author Correction: A framework to evaluate machine learning crystal stability predictions 作者更正:一个评估机器学习晶体稳定性预测的框架
IF 23.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-02 DOI: 10.1038/s42256-025-01117-4
Janosh Riebesell, Rhys E. A. Goodall, Philipp Benner, Yuan Chiang, Bowen Deng, Gerbrand Ceder, Mark Asta, Alpha A. Lee, Anubhav Jain, Kristin A. Persson
{"title":"Author Correction: A framework to evaluate machine learning crystal stability predictions","authors":"Janosh Riebesell, Rhys E. A. Goodall, Philipp Benner, Yuan Chiang, Bowen Deng, Gerbrand Ceder, Mark Asta, Alpha A. Lee, Anubhav Jain, Kristin A. Persson","doi":"10.1038/s42256-025-01117-4","DOIUrl":"10.1038/s42256-025-01117-4","url":null,"abstract":"","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 9","pages":"1586-1586"},"PeriodicalIF":23.9,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.comhttps://www.nature.com/articles/s42256-025-01117-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145129500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Brain–computer interface control with artificial intelligence copilots 人工智能副驾驶的脑机接口控制
IF 23.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-01 DOI: 10.1038/s42256-025-01090-y
Johannes Y. Lee, Sangjoon Lee, Abhishek Mishra, Xu Yan, Brandon McMahan, Brent Gaisford, Charles Kobashigawa, Mike Qu, Chang Xie, Jonathan C. Kao
Motor brain–computer interfaces (BCIs) decode neural signals to help people with paralysis move and communicate. Even with important advances in the past two decades, BCIs face a key obstacle to clinical viability: BCI performance should strongly outweigh costs and risks. To significantly increase the BCI performance, we use shared autonomy, where artificial intelligence (AI) copilots collaborate with BCI users to achieve task goals. We demonstrate this AI-BCI in a non-invasive BCI system decoding electroencephalography signals. We first contribute a hybrid adaptive decoding approach using a convolutional neural network and ReFIT-like Kalman filter, enabling healthy users and a participant with paralysis to control computer cursors and robotic arms via decoded electroencephalography signals. We then design two AI copilots to aid BCI users in a cursor control task and a robotic arm pick-and-place task. We demonstrate AI-BCIs that enable a participant with paralysis to achieve 3.9-times-higher performance in target hit rate during cursor control and control a robotic arm to sequentially move random blocks to random locations, a task they could not do without an AI copilot. As AI copilots improve, BCIs designed with shared autonomy may achieve higher performance. AI copilots are integrated into brain–computer interfaces, enabling a paralysed participant to achieve improved control of computer cursors and robotic arms. This shared autonomy approach offers a promising path to increase BCI performance and clinical viability.
运动脑机接口(bci)解码神经信号,帮助瘫痪患者移动和交流。即使在过去二十年中取得了重要进展,脑机接口仍面临着临床可行性的关键障碍:脑机接口的性能应该远远超过成本和风险。为了显著提高BCI性能,我们使用了共享自主权,人工智能(AI)副驾驶与BCI用户协作以实现任务目标。我们在非侵入性脑机接口系统中展示了这种AI-BCI解码脑电图信号。我们首先提出了一种混合自适应解码方法,使用卷积神经网络和类似于refit的卡尔曼滤波器,使健康用户和瘫痪参与者能够通过解码的脑电图信号控制计算机光标和机械臂。然后,我们设计了两个人工智能副驾驶来帮助BCI用户完成光标控制任务和机械臂拾取和放置任务。我们演示了AI- bci,使瘫痪参与者在光标控制和控制机械臂顺序移动随机块到随机位置时的目标命中率提高了3.9倍,这是他们没有AI副驾驶无法完成的任务。随着人工智能副驾驶的改进,具有共享自主权的bci可能会获得更高的性能。
{"title":"Brain–computer interface control with artificial intelligence copilots","authors":"Johannes Y. Lee, Sangjoon Lee, Abhishek Mishra, Xu Yan, Brandon McMahan, Brent Gaisford, Charles Kobashigawa, Mike Qu, Chang Xie, Jonathan C. Kao","doi":"10.1038/s42256-025-01090-y","DOIUrl":"10.1038/s42256-025-01090-y","url":null,"abstract":"Motor brain–computer interfaces (BCIs) decode neural signals to help people with paralysis move and communicate. Even with important advances in the past two decades, BCIs face a key obstacle to clinical viability: BCI performance should strongly outweigh costs and risks. To significantly increase the BCI performance, we use shared autonomy, where artificial intelligence (AI) copilots collaborate with BCI users to achieve task goals. We demonstrate this AI-BCI in a non-invasive BCI system decoding electroencephalography signals. We first contribute a hybrid adaptive decoding approach using a convolutional neural network and ReFIT-like Kalman filter, enabling healthy users and a participant with paralysis to control computer cursors and robotic arms via decoded electroencephalography signals. We then design two AI copilots to aid BCI users in a cursor control task and a robotic arm pick-and-place task. We demonstrate AI-BCIs that enable a participant with paralysis to achieve 3.9-times-higher performance in target hit rate during cursor control and control a robotic arm to sequentially move random blocks to random locations, a task they could not do without an AI copilot. As AI copilots improve, BCIs designed with shared autonomy may achieve higher performance. AI copilots are integrated into brain–computer interfaces, enabling a paralysed participant to achieve improved control of computer cursors and robotic arms. This shared autonomy approach offers a promising path to increase BCI performance and clinical viability.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 9","pages":"1510-1523"},"PeriodicalIF":23.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144928057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rigorous integration of single-cell ATAC-seq data using regularized barycentric mapping 严格整合单细胞ATAC-seq数据使用正则化质心映射
IF 23.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-26 DOI: 10.1038/s42256-025-01099-3
Shuchen Zhu, Heyang Hua, Shengquan Chen
Single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) deciphers genome-wide chromatin accessibility, providing profound insights into gene regulation mechanisms. With the rapid advance of sequencing technologies, scATAC-seq data typically encompass numerous samples from various conditions, resulting in complex batch effects, thus necessitating reliable integration tools. While numerous batch integration tools exist for single-cell RNA sequencing data, inherent data characteristic differences limit their effectiveness on scATAC-seq data. Existing integration methods for scATAC-seq data suffer from several fundamental limitations, such as disrupting the biological heterogeneity and focusing solely on low-dimensional correction, which may distort data and hinder downstream analysis. Here we propose Fountain, a deep learning framework for scATAC-seq data integration via rigorous barycentric mapping. Barycentric mapping transforms one data distribution to another in a principled and effective manner through optimal transport. By regularizing barycentric mapping with geometric data information, Fountain achieves accurate batch alignment while preserving biological heterogeneity. Comprehensive experiments across diverse real-world datasets demonstrate the advantages of Fountain over existing methods in batch correction and biological conservation. In addition, the trained Fountain model can integrate data from new batches alongside already integrated data without retraining, enabling continuous online data integration. Moreover, Fountain’s reconstruction strategy generates batch-corrected ATAC profiles, improving the capture of cellular heterogeneity and revealing cell-type-specific implications such as expression enrichment analysis and partitioned heritability analysis. Zhu, Hua and Chen propose Fountain, a deep learning framework for batch integration of scATAC-seq data that utilizes regularized barycentric mapping. It preserves biological heterogeneity, enabling online and original dimensionality integration.
利用测序技术(scATAC-seq)对转座酶可及染色质进行单细胞分析,可以破译全基因组染色质可及性,为基因调控机制提供深刻的见解。随着测序技术的快速发展,scATAC-seq数据通常包含来自不同条件的大量样本,导致复杂的批处理效果,因此需要可靠的集成工具。虽然存在许多用于单细胞RNA测序数据的批量集成工具,但固有的数据特征差异限制了它们在scATAC-seq数据上的有效性。现有的scATAC-seq数据整合方法存在一些根本性的局限性,例如破坏了生物异质性,只关注低维校正,这可能会扭曲数据并阻碍下游分析。在这里,我们提出了Fountain,这是一个通过严格的重心映射进行scATAC-seq数据集成的深度学习框架。重心映射通过优化传输,以原则和有效的方式将一个数据分布转换为另一个数据分布。通过正则化几何数据信息的质心映射,Fountain在保持生物异质性的同时实现了精确的批量对齐。在不同的真实世界数据集上进行的综合实验表明,Fountain在批量校正和生物保护方面优于现有方法。此外,经过训练的Fountain模型可以将新批次的数据与已经集成的数据集成在一起,而无需重新训练,从而实现持续的在线数据集成。此外,Fountain的重建策略生成了批量校正的ATAC谱,改善了细胞异质性的捕获,揭示了细胞类型特异性的含义,如表达富集分析和分区遗传力分析。
{"title":"Rigorous integration of single-cell ATAC-seq data using regularized barycentric mapping","authors":"Shuchen Zhu, Heyang Hua, Shengquan Chen","doi":"10.1038/s42256-025-01099-3","DOIUrl":"10.1038/s42256-025-01099-3","url":null,"abstract":"Single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) deciphers genome-wide chromatin accessibility, providing profound insights into gene regulation mechanisms. With the rapid advance of sequencing technologies, scATAC-seq data typically encompass numerous samples from various conditions, resulting in complex batch effects, thus necessitating reliable integration tools. While numerous batch integration tools exist for single-cell RNA sequencing data, inherent data characteristic differences limit their effectiveness on scATAC-seq data. Existing integration methods for scATAC-seq data suffer from several fundamental limitations, such as disrupting the biological heterogeneity and focusing solely on low-dimensional correction, which may distort data and hinder downstream analysis. Here we propose Fountain, a deep learning framework for scATAC-seq data integration via rigorous barycentric mapping. Barycentric mapping transforms one data distribution to another in a principled and effective manner through optimal transport. By regularizing barycentric mapping with geometric data information, Fountain achieves accurate batch alignment while preserving biological heterogeneity. Comprehensive experiments across diverse real-world datasets demonstrate the advantages of Fountain over existing methods in batch correction and biological conservation. In addition, the trained Fountain model can integrate data from new batches alongside already integrated data without retraining, enabling continuous online data integration. Moreover, Fountain’s reconstruction strategy generates batch-corrected ATAC profiles, improving the capture of cellular heterogeneity and revealing cell-type-specific implications such as expression enrichment analysis and partitioned heritability analysis. Zhu, Hua and Chen propose Fountain, a deep learning framework for batch integration of scATAC-seq data that utilizes regularized barycentric mapping. It preserves biological heterogeneity, enabling online and original dimensionality integration.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 9","pages":"1461-1477"},"PeriodicalIF":23.9,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144900548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LLMs as all-in-one tools to easily generate publication-ready citation diversity reports 法学硕士是一个多功能的工具,可以轻松地生成出版就绪的引文多样性报告
IF 23.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-25 DOI: 10.1038/s42256-025-01101-y
Melissa S. Cantú, Michael R. King
{"title":"LLMs as all-in-one tools to easily generate publication-ready citation diversity reports","authors":"Melissa S. Cantú, Michael R. King","doi":"10.1038/s42256-025-01101-y","DOIUrl":"10.1038/s42256-025-01101-y","url":null,"abstract":"","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 9","pages":"1371-1372"},"PeriodicalIF":23.9,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144900551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reusability report: Exploring the transferability of self-supervised learning models from single-cell to spatial transcriptomics 可重用性报告:探索自监督学习模型从单细胞到空间转录组学的可转移性
IF 23.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-21 DOI: 10.1038/s42256-025-01097-5
Chuangyi Han, Senlin Lin, Zhikang Wang, Yan Cui, Qi Zou, Zhiyuan Yuan
Self-supervised learning (SSL) has emerged as a powerful approach for learning meaningful representations from large-scale unlabelled datasets in single-cell genomics. Richter et al. evaluated SSL pretext tasks on modelling single-cell RNA sequencing (scRNA-seq) data, demonstrating the effective use of SSL models. However, the transferability of these pretrained SSL models to the spatial transcriptomics domain remains unexplored. Here we assess the performance of three SSL models (random mask, gene programme mask and Barlow Twins) pretrained on scRNA-seq data with spatial transcriptomics datasets, focusing on cell-type prediction and spatial clustering. Our experiments demonstrate that the SSL model with random mask strategy exhibits the best overall performance among evaluated SSL models. Moreover, the models trained from scratch on spatial transcriptomics data outperform the fine-tuned SSL models on cell-type prediction, highlighting a domain gap between scRNA-seq and spatial transcriptomics data whose underlying causes remain an open question. Through expanded analyses of multiple imputation methods and data degradation scenarios, we demonstrate that gene imputation would degrade SSL model performance on cell-type prediction, an effect that is exacerbated by increasing data sparsity. Finally, integrating zero-shot random mask embeddings into chosen spatial clustering methods significantly enhanced their accuracy. Overall, our findings provide valuable insights into the limitations and potential of transferring SSL models to spatial transcriptomics and offer practical guidance for researchers leveraging pretrained models for spatial transcriptomics data analysis. Self-supervised learning models for single-cell RNA sequencing data exhibit poor transferability to spatial transcriptomics for cell-type prediction, although their learned features may enhance spatial analysis.
自监督学习(SSL)已成为单细胞基因组学中从大规模未标记数据集中学习有意义表示的一种强大方法。Richter等人评估了SSL借口任务对单细胞RNA测序(scRNA-seq)数据的建模,证明了SSL模型的有效使用。然而,这些预训练SSL模型到空间转录组学领域的可移植性仍未被探索。在这里,我们评估了三种SSL模型(随机掩码、基因程序掩码和Barlow Twins)在scRNA-seq数据和空间转录组学数据集上预训练的性能,重点关注细胞类型预测和空间聚类。我们的实验表明,随机掩码策略的SSL模型在评估的SSL模型中表现出最好的综合性能。此外,在空间转录组学数据上从零开始训练的模型在细胞类型预测上优于经过微调的SSL模型,这突出了scRNA-seq和空间转录组学数据之间的结构域差距,其潜在原因仍然是一个悬而未决的问题。通过对多种插入方法和数据退化场景的扩展分析,我们证明基因插入会降低SSL模型在细胞类型预测方面的性能,这种影响会随着数据稀疏度的增加而加剧。最后,将零镜头随机掩模嵌入到所选择的空间聚类方法中,显著提高了聚类方法的精度。总的来说,我们的研究结果为将SSL模型转移到空间转录组学的局限性和潜力提供了有价值的见解,并为研究人员利用预训练模型进行空间转录组学数据分析提供了实用指导。
{"title":"Reusability report: Exploring the transferability of self-supervised learning models from single-cell to spatial transcriptomics","authors":"Chuangyi Han, Senlin Lin, Zhikang Wang, Yan Cui, Qi Zou, Zhiyuan Yuan","doi":"10.1038/s42256-025-01097-5","DOIUrl":"10.1038/s42256-025-01097-5","url":null,"abstract":"Self-supervised learning (SSL) has emerged as a powerful approach for learning meaningful representations from large-scale unlabelled datasets in single-cell genomics. Richter et al. evaluated SSL pretext tasks on modelling single-cell RNA sequencing (scRNA-seq) data, demonstrating the effective use of SSL models. However, the transferability of these pretrained SSL models to the spatial transcriptomics domain remains unexplored. Here we assess the performance of three SSL models (random mask, gene programme mask and Barlow Twins) pretrained on scRNA-seq data with spatial transcriptomics datasets, focusing on cell-type prediction and spatial clustering. Our experiments demonstrate that the SSL model with random mask strategy exhibits the best overall performance among evaluated SSL models. Moreover, the models trained from scratch on spatial transcriptomics data outperform the fine-tuned SSL models on cell-type prediction, highlighting a domain gap between scRNA-seq and spatial transcriptomics data whose underlying causes remain an open question. Through expanded analyses of multiple imputation methods and data degradation scenarios, we demonstrate that gene imputation would degrade SSL model performance on cell-type prediction, an effect that is exacerbated by increasing data sparsity. Finally, integrating zero-shot random mask embeddings into chosen spatial clustering methods significantly enhanced their accuracy. Overall, our findings provide valuable insights into the limitations and potential of transferring SSL models to spatial transcriptomics and offer practical guidance for researchers leveraging pretrained models for spatial transcriptomics data analysis. Self-supervised learning models for single-cell RNA sequencing data exhibit poor transferability to spatial transcriptomics for cell-type prediction, although their learned features may enhance spatial analysis.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 9","pages":"1414-1428"},"PeriodicalIF":23.9,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144900442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards responsible geospatial foundation models 建立负责任的地理空间基础模型
IF 23.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-20 DOI: 10.1038/s42256-025-01106-7
Recent years have seen a surge in geospatial artificial intelligence models, with promising applications in ecological and environmental monitoring tasks. Further work should also focus on the sustainable development of such models.
近年来,地理空间人工智能模型激增,在生态和环境监测任务中具有广阔的应用前景。进一步的工作还应侧重于这种模式的可持续发展。
{"title":"Towards responsible geospatial foundation models","authors":"","doi":"10.1038/s42256-025-01106-7","DOIUrl":"10.1038/s42256-025-01106-7","url":null,"abstract":"Recent years have seen a surge in geospatial artificial intelligence models, with promising applications in ecological and environmental monitoring tasks. Further work should also focus on the sustainable development of such models.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 8","pages":"1189-1189"},"PeriodicalIF":23.9,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.comhttps://www.nature.com/articles/s42256-025-01106-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144900431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Electron-density-informed effective and reliable de novo molecular design and optimization with ED2Mol 基于电子密度的ED2Mol有效可靠的从头分子设计和优化
IF 23.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-20 DOI: 10.1038/s42256-025-01095-7
Mingyu Li, Kun Song, Jixiao He, Mingzhu Zhao, Gengshu You, Jie Zhong, Mengxi Zhao, Arong Li, Yu Chen, Guobin Li, Ying Kong, Jiacheng Wei, Zhaofu Wang, Jiamin Zhou, Hongbing Yang, Shichao Ma, Hailong Zhang, Irakoze Loïca Mélita, Weidong Lin, Yuhang Lu, Zhengtian Yu, Xun Lu, Yujun Zhao, Jian Zhang
Generative drug design opens avenues for discovering novel compounds within the vast chemical space rather than conventional screening against limited libraries. However, the practical utility of the generated molecules is frequently constrained, as many designs prioritize a narrow range of pharmacological properties and neglect physical reliability, which hinders the success rate of subsequent wet-laboratory evaluations. Here, to address this, we propose ED2Mol, a deep learning-based approach that leverages fundamental electron density information to improve de novo molecular generation and optimization. The extensive evaluations across multiple benchmarks demonstrate that ED2Mol surpasses existing methods in terms of the generation success rate and >97% physical reliability. It also facilitates automated hit optimization that is not fully implemented by other methods using fragment-based strategies. Furthermore, ED2Mol exhibits generalizability to more challenging, unseen allosteric pocket benchmarks, attaining consistent performance. More importantly, ED2Mol has been applied to various real-world essential targets, successfully identifying wet-laboratory-validated bioactive compounds, ranging from FGFR3 orthosteric inhibitors to CDC42 allosteric inhibitors, GCK and GPRC5A allosteric activators. The directly generated binding modes of these compounds are close to predictions through molecular docking and further validated via the X-ray co-crystal structure. All these results highlight ED2Mol’s potential as a useful tool in drug design with enhanced effectiveness, physical reliability and practical applicability. A deep generative model is developed for de novo molecular design and optimization by leveraging electron density. Wet-laboratory assays validated its reliability to generate diverse bioactive molecules—orthosteric and allosteric, inhibitors and activators.
生成式药物设计为在广阔的化学空间中发现新化合物开辟了途径,而不是传统的针对有限文库的筛选。然而,所生成分子的实际效用经常受到限制,因为许多设计优先考虑药理学性质的狭窄范围,而忽略了物理可靠性,这阻碍了后续湿实验室评估的成功率。在这里,为了解决这个问题,我们提出了ED2Mol,一种基于深度学习的方法,利用基本的电子密度信息来改进从头分子生成和优化。在多个基准测试中进行的广泛评估表明,ED2Mol在生成成功率和97%物理可靠性方面优于现有方法。它还促进了自动命中优化,这是使用基于片段的策略的其他方法无法完全实现的。此外,ED2Mol在更具挑战性、不可见的变构口袋基准测试中表现出通用性,从而获得一致的性能。更重要的是,ED2Mol已应用于各种现实世界的基本靶标,成功识别湿实验室验证的生物活性化合物,范围从FGFR3正构抑制剂到CDC42变构抑制剂,GCK和GPRC5A变构激活剂。直接生成的这些化合物的结合模式与通过分子对接预测的结果接近,并通过x射线共晶结构进一步验证。所有这些结果都突出了ED2Mol作为药物设计有用工具的潜力,具有增强的有效性,物理可靠性和实用性。
{"title":"Electron-density-informed effective and reliable de novo molecular design and optimization with ED2Mol","authors":"Mingyu Li, Kun Song, Jixiao He, Mingzhu Zhao, Gengshu You, Jie Zhong, Mengxi Zhao, Arong Li, Yu Chen, Guobin Li, Ying Kong, Jiacheng Wei, Zhaofu Wang, Jiamin Zhou, Hongbing Yang, Shichao Ma, Hailong Zhang, Irakoze Loïca Mélita, Weidong Lin, Yuhang Lu, Zhengtian Yu, Xun Lu, Yujun Zhao, Jian Zhang","doi":"10.1038/s42256-025-01095-7","DOIUrl":"10.1038/s42256-025-01095-7","url":null,"abstract":"Generative drug design opens avenues for discovering novel compounds within the vast chemical space rather than conventional screening against limited libraries. However, the practical utility of the generated molecules is frequently constrained, as many designs prioritize a narrow range of pharmacological properties and neglect physical reliability, which hinders the success rate of subsequent wet-laboratory evaluations. Here, to address this, we propose ED2Mol, a deep learning-based approach that leverages fundamental electron density information to improve de novo molecular generation and optimization. The extensive evaluations across multiple benchmarks demonstrate that ED2Mol surpasses existing methods in terms of the generation success rate and >97% physical reliability. It also facilitates automated hit optimization that is not fully implemented by other methods using fragment-based strategies. Furthermore, ED2Mol exhibits generalizability to more challenging, unseen allosteric pocket benchmarks, attaining consistent performance. More importantly, ED2Mol has been applied to various real-world essential targets, successfully identifying wet-laboratory-validated bioactive compounds, ranging from FGFR3 orthosteric inhibitors to CDC42 allosteric inhibitors, GCK and GPRC5A allosteric activators. The directly generated binding modes of these compounds are close to predictions through molecular docking and further validated via the X-ray co-crystal structure. All these results highlight ED2Mol’s potential as a useful tool in drug design with enhanced effectiveness, physical reliability and practical applicability. A deep generative model is developed for de novo molecular design and optimization by leveraging electron density. Wet-laboratory assays validated its reliability to generate diverse bioactive molecules—orthosteric and allosteric, inhibitors and activators.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 8","pages":"1355-1368"},"PeriodicalIF":23.9,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144901527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Training data composition determines machine learning generalization and biological rule discovery 训练数据的组成决定了机器学习的泛化和生物规则的发现
IF 23.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-20 DOI: 10.1038/s42256-025-01089-5
Eugen Ursu, Aygul Minnegalieva, Puneet Rawat, Maria Chernigovskaya, Robi Tacutu, Geir Kjetil Sandve, Philippe A. Robert, Victor Greiff
Supervised machine learning models depend on training datasets containing positive and negative examples: dataset composition directly impacts model performance and bias. Given the importance of machine learning for immunotherapeutic design, we examined how different negative class definitions affect model generalization and rule discovery for antibody–antigen binding. Using synthetic-structure-based binding data, we evaluated models trained with various definitions of negative sets. Our findings reveal that high out-of-distribution performance can be achieved when the negative dataset contains more similar samples to the positive dataset, despite lower in-distribution performance. Furthermore, by leveraging ground-truth information, we show that binding rules associated with positive data change based on the negative data used. Validation on experimental data supported simulation-based observations. This work underscores the role of dataset composition in creating robust, generalizable and biology-aware sequence-based ML models. Negative data composition critically shapes machine learning robustness in sequence-based biological tasks. Training data composition and its implications are investigated on biological rule discoveries.
监督式机器学习模型依赖于包含正例和负例的训练数据集:数据集的组成直接影响模型的性能和偏差。鉴于机器学习对免疫治疗设计的重要性,我们研究了不同的负类定义如何影响抗体-抗原结合的模型泛化和规则发现。使用基于合成结构的绑定数据,我们评估了用各种负集定义训练的模型。我们的研究结果表明,尽管分布内性能较低,但当负数据集包含更多与正数据集相似的样本时,可以实现高的分布外性能。此外,通过利用真实信息,我们表明与正数据相关的绑定规则会根据所使用的负数据而变化。实验数据验证支持基于模拟的观察。这项工作强调了数据集组合在创建健壮、可推广和基于生物感知序列的ML模型中的作用。
{"title":"Training data composition determines machine learning generalization and biological rule discovery","authors":"Eugen Ursu, Aygul Minnegalieva, Puneet Rawat, Maria Chernigovskaya, Robi Tacutu, Geir Kjetil Sandve, Philippe A. Robert, Victor Greiff","doi":"10.1038/s42256-025-01089-5","DOIUrl":"10.1038/s42256-025-01089-5","url":null,"abstract":"Supervised machine learning models depend on training datasets containing positive and negative examples: dataset composition directly impacts model performance and bias. Given the importance of machine learning for immunotherapeutic design, we examined how different negative class definitions affect model generalization and rule discovery for antibody–antigen binding. Using synthetic-structure-based binding data, we evaluated models trained with various definitions of negative sets. Our findings reveal that high out-of-distribution performance can be achieved when the negative dataset contains more similar samples to the positive dataset, despite lower in-distribution performance. Furthermore, by leveraging ground-truth information, we show that binding rules associated with positive data change based on the negative data used. Validation on experimental data supported simulation-based observations. This work underscores the role of dataset composition in creating robust, generalizable and biology-aware sequence-based ML models. Negative data composition critically shapes machine learning robustness in sequence-based biological tasks. Training data composition and its implications are investigated on biological rule discoveries.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 8","pages":"1206-1219"},"PeriodicalIF":23.9,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144898527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Nature Machine Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1