首页 > 最新文献

Nature Machine Intelligence最新文献

英文 中文
A flaw in using pretrained protein language models in protein–protein interaction inference models 在蛋白-蛋白相互作用推理模型中使用预训练蛋白语言模型的缺陷
IF 23.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-13 DOI: 10.1038/s42256-025-01176-7
Joseph Szymborski, Amin Emad
With the growing pervasiveness of pretrained protein language models (pLMs), pLM-based methods are increasingly being put forward for the protein–protein interaction (PPI) inference task. Here we identify and confirm that existing pretrained pLMs are a source of data leakage for the downstream PPI task. We characterize the extent of the data leakage problem by training and comparing small and efficient pLMs on a dataset that controls for data leakage (strict) with one that does not (non-strict). Although data leakage from pretrained pLMs cause a measurable inflation of testing scores, we find that this does not necessarily extend to other, non-paired biological tasks such as protein keyword annotation. Further, we find no connection between the context lengths of pLMs and the performance of pLM-based PPI inference methods on proteins with sequence lengths that surpass it. Furthermore, we show that pLM-based and non-pLM-based models fail to generalize in tasks such as prediction of the human-SARS-CoV-2 PPIs or the effect of point mutations on binding affinities. This study demonstrates the importance of extending existing protocols for the evaluation of pLM-based models applied to paired biological datasets and identifies areas of weakness of current pLM models. The usage of pretrained protein language models (pLMs) is rapidly growing. However, Szymborski and Emad find that pretrained pLMs can be a source of data leakage in the task of protein–protein interaction inference, showing inflated performance scores.
随着预训练蛋白质语言模型(pLMs)的日益普及,基于预训练蛋白质语言模型的方法被越来越多地用于蛋白质-蛋白质相互作用(PPI)推理任务。在这里,我们确定并确认现有的预训练plm是下游PPI任务的数据泄漏源。我们通过训练和比较控制数据泄漏(严格)和不控制数据泄漏(非严格)的数据集上的小型高效plm来表征数据泄漏问题的程度。虽然预训练plm的数据泄漏会导致测试分数的可测量膨胀,但我们发现这并不一定会扩展到其他非配对的生物学任务,如蛋白质关键字注释。此外,我们发现plm的上下文长度与基于plm的PPI推理方法对序列长度超过它的蛋白质的性能之间没有联系。此外,我们发现基于plm和非plm的模型在预测人类- sars - cov -2 PPIs或点突变对结合亲和力的影响等任务中不能泛化。本研究证明了扩展现有协议对于将基于pLM的模型应用于配对生物数据集的评估的重要性,并确定了当前pLM模型的弱点。
{"title":"A flaw in using pretrained protein language models in protein–protein interaction inference models","authors":"Joseph Szymborski, Amin Emad","doi":"10.1038/s42256-025-01176-7","DOIUrl":"10.1038/s42256-025-01176-7","url":null,"abstract":"With the growing pervasiveness of pretrained protein language models (pLMs), pLM-based methods are increasingly being put forward for the protein–protein interaction (PPI) inference task. Here we identify and confirm that existing pretrained pLMs are a source of data leakage for the downstream PPI task. We characterize the extent of the data leakage problem by training and comparing small and efficient pLMs on a dataset that controls for data leakage (strict) with one that does not (non-strict). Although data leakage from pretrained pLMs cause a measurable inflation of testing scores, we find that this does not necessarily extend to other, non-paired biological tasks such as protein keyword annotation. Further, we find no connection between the context lengths of pLMs and the performance of pLM-based PPI inference methods on proteins with sequence lengths that surpass it. Furthermore, we show that pLM-based and non-pLM-based models fail to generalize in tasks such as prediction of the human-SARS-CoV-2 PPIs or the effect of point mutations on binding affinities. This study demonstrates the importance of extending existing protocols for the evaluation of pLM-based models applied to paired biological datasets and identifies areas of weakness of current pLM models. The usage of pretrained protein language models (pLMs) is rapidly growing. However, Szymborski and Emad find that pretrained pLMs can be a source of data leakage in the task of protein–protein interaction inference, showing inflated performance scores.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"8 2","pages":"197-208"},"PeriodicalIF":23.9,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146196773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reusability Report: Evaluating the performance of a meta-learning foundation model on predicting the antibacterial activity of natural products 可重用性报告:评估元学习基础模型在预测天然产物抗菌活性方面的性能
IF 23.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-12 DOI: 10.1038/s42256-026-01187-y
Caitlin M. Butt, Allison S. Walker
Deep learning foundation models are becoming increasingly popular for use in bioactivity prediction. Recently, Feng et al. developed ActFound, a bioactive foundation model that jointly uses pairwise learning and meta-learning. By utilizing these techniques, the model is capable of being fine-tuned to a more specific bioactivity task with only a small amount of new data. Here, to investigate the generalizability of the model, we looked to fine-tune the foundation model on an antibacterial natural products (NPs) dataset. Large, labelled NPs datasets, which are needed to train traditional deep learning methods, are scarce. Therefore, the bioactivity prediction of NPs is an ideal task for foundation models. We studied the performance of ActFound on the NPs dataset using a range of few-shot settings. Additionally, we compared ActFound’s performance with those of other state-of-the-art models in the field. We found ActFound was unable to reach the same level of accuracy on the antibacterial NPs dataset as it did on other cross-domain tasks reported in the original publication. However, ActFound displayed comparable or better performance compared to the other models studied, especially at the low-shot settings. Our results establish ActFound as a useful foundation model for the bioactivity prediction of tasks with limited data, particularly for datasets that contain the bioactivities of similar compounds. This Reusability Report tests the ability of a foundation model, ActFound, to predict the antibacterial activity of plant natural products. We found that although all models performed poorly on this task, ActFound performed better than similar models.
深度学习基础模型在生物活性预测中越来越受欢迎。最近,Feng等人开发了ActFound,这是一种联合使用两两学习和元学习的生物活性基础模型。通过利用这些技术,该模型能够被微调到一个更具体的生物活性任务,只有少量的新数据。在这里,为了研究模型的泛化性,我们希望在抗菌天然产物(NPs)数据集上微调基础模型。训练传统深度学习方法所需的大型标记NPs数据集非常稀缺。因此,NPs的生物活性预测是基础模型的理想任务。我们使用一系列少镜头设置研究了ActFound在NPs数据集上的性能。此外,我们将ActFound的性能与该领域中其他最先进的模型进行了比较。我们发现ActFound在抗菌NPs数据集上无法达到与原始出版物中报告的其他跨域任务相同的准确性水平。然而,与其他模型相比,ActFound显示出相当或更好的性能,特别是在低镜头设置下。我们的结果建立了ActFound作为一个有用的基础模型,用于有限数据任务的生物活性预测,特别是对于包含类似化合物生物活性的数据集。
{"title":"Reusability Report: Evaluating the performance of a meta-learning foundation model on predicting the antibacterial activity of natural products","authors":"Caitlin M. Butt, Allison S. Walker","doi":"10.1038/s42256-026-01187-y","DOIUrl":"10.1038/s42256-026-01187-y","url":null,"abstract":"Deep learning foundation models are becoming increasingly popular for use in bioactivity prediction. Recently, Feng et al. developed ActFound, a bioactive foundation model that jointly uses pairwise learning and meta-learning. By utilizing these techniques, the model is capable of being fine-tuned to a more specific bioactivity task with only a small amount of new data. Here, to investigate the generalizability of the model, we looked to fine-tune the foundation model on an antibacterial natural products (NPs) dataset. Large, labelled NPs datasets, which are needed to train traditional deep learning methods, are scarce. Therefore, the bioactivity prediction of NPs is an ideal task for foundation models. We studied the performance of ActFound on the NPs dataset using a range of few-shot settings. Additionally, we compared ActFound’s performance with those of other state-of-the-art models in the field. We found ActFound was unable to reach the same level of accuracy on the antibacterial NPs dataset as it did on other cross-domain tasks reported in the original publication. However, ActFound displayed comparable or better performance compared to the other models studied, especially at the low-shot settings. Our results establish ActFound as a useful foundation model for the bioactivity prediction of tasks with limited data, particularly for datasets that contain the bioactivities of similar compounds. This Reusability Report tests the ability of a foundation model, ActFound, to predict the antibacterial activity of plant natural products. We found that although all models performed poorly on this task, ActFound performed better than similar models.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"8 2","pages":"270-275"},"PeriodicalIF":23.9,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.comhttps://www.nature.com/articles/s42256-026-01187-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146196778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What matters in building vision–language–action models for generalist robots 为多面手机器人构建视觉-语言-动作模型的关键是什么
IF 23.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-11 DOI: 10.1038/s42256-025-01168-7
Xinghang Li, Peiyan Li, Long Qian, Minghuan Liu, Dong Wang, Jirong Liu, Bingyi Kang, Xiao Ma, Xinlong Wang, Di Guo, Tao Kong, Hanbo Zhang, Huaping Liu
To utilize foundation vision–language models (VLMs) for robotic tasks and motion planning, the community has proposed different methods for injecting action components into VLMs and building the vision–language–action models (VLAs). Here we disclose the key factors that significantly influence the performance of VLA on robot manipulation problems and focus on answering three essential design choices: which backbone to select, how to formulate the VLA architectures and when to add cross-embodiment data. The obtained results convince us firmly to explain why we prefer VLA and develop a new family of VLAs, RoboVLMs, which require very few manual designs and achieve a new state-of-the-art performance in three simulation tasks and real-world experiments. Through our extensive experiments, which include over 8 VLM backbones, 4 policy architectures and over 600 distinct designed experiments, we provide a detailed guidebook for the future design of VLAs. In addition to the study, the highly flexible RoboVLMs framework, which supports easy integrations of new VLMs and free combinations of various design choices, is made public to facilitate future research. We open-source all details, including codes, models, datasets and toolkits, along with detailed training and evaluation recipes at robovlms.github.io . Vision–language–action models recently emerged as a tool for robotics. Here Li and colleagues compare vision–language–action models and highlight what makes a model useful.
为了利用基础视觉语言模型(VLMs)进行机器人任务和运动规划,业界提出了将动作组件注入VLMs并构建视觉语言-动作模型(VLAs)的不同方法。在此,我们揭示了影响VLA在机器人操作问题上性能的关键因素,并重点回答了三个基本的设计选择:选择哪种主干、如何制定VLA架构以及何时添加跨实施例数据。所获得的结果使我们坚定地解释了为什么我们更喜欢VLA并开发了一个新的VLA系列,robovlm,它需要很少的手动设计,并在三个模拟任务和现实世界的实验中实现了新的最先进的性能。通过我们广泛的实验,包括超过8个VLM主干,4个策略架构和600多个不同设计的实验,我们为vla的未来设计提供了详细的指南。此外,为了促进未来的研究,高度灵活的robovlm框架也被公开,该框架支持新vlm的轻松集成和各种设计选择的自由组合。我们在robovms .github.io上开放了所有细节,包括代码,模型,数据集和工具包,以及详细的培训和评估配方。
{"title":"What matters in building vision–language–action models for generalist robots","authors":"Xinghang Li, Peiyan Li, Long Qian, Minghuan Liu, Dong Wang, Jirong Liu, Bingyi Kang, Xiao Ma, Xinlong Wang, Di Guo, Tao Kong, Hanbo Zhang, Huaping Liu","doi":"10.1038/s42256-025-01168-7","DOIUrl":"10.1038/s42256-025-01168-7","url":null,"abstract":"To utilize foundation vision–language models (VLMs) for robotic tasks and motion planning, the community has proposed different methods for injecting action components into VLMs and building the vision–language–action models (VLAs). Here we disclose the key factors that significantly influence the performance of VLA on robot manipulation problems and focus on answering three essential design choices: which backbone to select, how to formulate the VLA architectures and when to add cross-embodiment data. The obtained results convince us firmly to explain why we prefer VLA and develop a new family of VLAs, RoboVLMs, which require very few manual designs and achieve a new state-of-the-art performance in three simulation tasks and real-world experiments. Through our extensive experiments, which include over 8 VLM backbones, 4 policy architectures and over 600 distinct designed experiments, we provide a detailed guidebook for the future design of VLAs. In addition to the study, the highly flexible RoboVLMs framework, which supports easy integrations of new VLMs and free combinations of various design choices, is made public to facilitate future research. We open-source all details, including codes, models, datasets and toolkits, along with detailed training and evaluation recipes at robovlms.github.io . Vision–language–action models recently emerged as a tool for robotics. Here Li and colleagues compare vision–language–action models and highlight what makes a model useful.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"8 2","pages":"158-172"},"PeriodicalIF":23.9,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146152318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
When large language models are reliable for judging empathic communication 当大型语言模型在判断移情沟通时是可靠的
IF 23.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-11 DOI: 10.1038/s42256-025-01169-6
Aakriti Kumar, Nalin Poungpeth, Diyi Yang, Erina Farrell, Bruce L. Lambert, Matthew Groh
Large language models (LLMs) excel at generating empathic responses in text-based conversations. But, how reliably do they judge the nuances of empathic communication? Here we investigate this question by comparing how experts, crowdworkers and LLMs annotate empathic communication across four evaluative frameworks drawn from psychology, natural language processing and communications applied to 200 real-world conversations where one speaker shares a personal problem and the other offers support. Drawing on 3,150 expert annotations, 2,844 crowd annotations and 3,150 LLM annotations, we assess interrater reliability between these three annotator groups. We find that expert agreement is high but varies across the frameworks’ subcomponents depending on their clarity, complexity and subjectivity. We show that expert agreement offers a more informative benchmark for contextualizing LLM performance than standard classification metrics. Across all four frameworks, LLMs consistently approach this expert level benchmark and exceed the reliability of crowdworkers. These results demonstrate how LLMs, when validated on specific tasks with appropriate benchmarks, can support transparency and oversight in emotionally sensitive applications including their use as conversational companions. Kumar et al. show that large language models (LLMs) nearly match expert reliability and outperform laypeople when assessing empathic communication across multiple frameworks. The performance of both LLMs and experts depends on clear and specific evaluation criteria.
大型语言模型(llm)擅长在基于文本的对话中产生移情反应。但是,他们判断移情沟通的细微差别有多可靠呢?在这里,我们通过比较专家、众包工作者和法学硕士如何在四个评估框架中诠释移情沟通来研究这个问题,这些框架来自心理学、自然语言处理和通信,应用于200个现实世界的对话,其中一个说话者分享个人问题,另一个提供支持。利用3150个专家注释、2844个人群注释和3150个LLM注释,我们评估了这三组注释者之间的相互可靠性。我们发现专家的一致性很高,但根据框架的清晰度、复杂性和主观性,各子组件之间存在差异。我们表明,与标准分类指标相比,专家协议为上下文化LLM性能提供了更多信息的基准。在所有四个框架中,法学硕士始终接近这个专家级基准,并超过众包工作者的可靠性。这些结果表明,llm在特定任务中经过适当的基准验证后,可以支持情感敏感应用程序的透明度和监督,包括将其用作会话伙伴。
{"title":"When large language models are reliable for judging empathic communication","authors":"Aakriti Kumar, Nalin Poungpeth, Diyi Yang, Erina Farrell, Bruce L. Lambert, Matthew Groh","doi":"10.1038/s42256-025-01169-6","DOIUrl":"10.1038/s42256-025-01169-6","url":null,"abstract":"Large language models (LLMs) excel at generating empathic responses in text-based conversations. But, how reliably do they judge the nuances of empathic communication? Here we investigate this question by comparing how experts, crowdworkers and LLMs annotate empathic communication across four evaluative frameworks drawn from psychology, natural language processing and communications applied to 200 real-world conversations where one speaker shares a personal problem and the other offers support. Drawing on 3,150 expert annotations, 2,844 crowd annotations and 3,150 LLM annotations, we assess interrater reliability between these three annotator groups. We find that expert agreement is high but varies across the frameworks’ subcomponents depending on their clarity, complexity and subjectivity. We show that expert agreement offers a more informative benchmark for contextualizing LLM performance than standard classification metrics. Across all four frameworks, LLMs consistently approach this expert level benchmark and exceed the reliability of crowdworkers. These results demonstrate how LLMs, when validated on specific tasks with appropriate benchmarks, can support transparency and oversight in emotionally sensitive applications including their use as conversational companions. Kumar et al. show that large language models (LLMs) nearly match expert reliability and outperform laypeople when assessing empathic communication across multiple frameworks. The performance of both LLMs and experts depends on clear and specific evaluation criteria.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"8 2","pages":"173-185"},"PeriodicalIF":23.9,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.comhttps://www.nature.com/articles/s42256-025-01169-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146152319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A federated graph learning method to realize multi-party collaboration for molecular discovery 一种实现分子发现多方协作的联邦图学习方法
IF 23.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-10 DOI: 10.1038/s42256-026-01184-1
Liang Zhang, Juan Zhang, Rui Huang, Yiwen Wang, Linjing Liu, Yanyong Zhang, Kong Chen, Jun Jiang, Yuen Wu
Optimizing molecular resource utilization for molecular discovery requires collaborative efforts across research institutions and organizations to accelerate progress. However, given the high research value of both successful and unsuccessful molecules produced by each institution (or organization), these findings are typically kept highly private and confidential until formal publication or commercialization, with even failed molecules rarely disclosed. This confidentiality requirement presents a great challenge for most existing methods when collaboratively handling molecular data with heterogeneous distributions under stringent privacy constraints. Here we propose FedLG (federated learning Lanczos graph), a federated graph learning method that leverages the Lanczos algorithm to facilitate collaborative model training across multiple parties, achieving reliable prediction performance under strict privacy protection conditions. Compared with various existing federate learning methods, FedLG exhibits excellent model performance on 18 benchmark datasets in a simulated federated learning environment. Under different privacy-preserving mechanism settings, FedLG demonstrates robust performance and resistance to noise. Leave-one-client-out experiments and comparison tests across each simulated institution show that FedLG achieves improved heterogeneous data aggregation capabilities and more promising outcomes than localized training. In addition, we incorporate Bayesian optimization into FedLG to show its scalability and further stabilize model performance. Overall, FedLG can be considered an effective method to realize multi-party collaboration while ensuring that sensitive molecular information is protected from potential leakage. Zhang et al. introduce FedLG, a federated graph learning framework that leverages Lanczos-based projection to effectively aggregate heterogeneous molecular data. Extensive benchmarks demonstrate its robustness across diverse molecular discovery tasks.
优化分子资源利用的分子发现需要跨研究机构和组织的合作努力,以加快进展。然而,鉴于每个机构(或组织)生产的成功和不成功的分子都具有很高的研究价值,这些发现通常是高度保密的,直到正式出版或商业化,即使是失败的分子也很少披露。当在严格的隐私约束下协作处理异构分布的分子数据时,这种机密性要求对大多数现有方法提出了巨大的挑战。在这里,我们提出federlg (federated learning Lanczos graph),这是一种利用Lanczos算法促进多方协作模型训练的联邦图学习方法,在严格的隐私保护条件下实现可靠的预测性能。与现有的各种联邦学习方法相比,在模拟联邦学习环境下,FedLG在18个基准数据集上表现出优异的模型性能。在不同的隐私保护机制设置下,FedLG表现出鲁棒性和抗噪声性。每个模拟机构的“留一个客户”实验和比较测试表明,与本地化培训相比,FedLG实现了更好的异构数据聚合能力和更有希望的结果。此外,我们将贝叶斯优化加入到FedLG中,以显示其可扩展性,并进一步稳定模型性能。总的来说,FedLG可以被认为是实现多方协作的有效方法,同时保证敏感的分子信息不被泄露。
{"title":"A federated graph learning method to realize multi-party collaboration for molecular discovery","authors":"Liang Zhang, Juan Zhang, Rui Huang, Yiwen Wang, Linjing Liu, Yanyong Zhang, Kong Chen, Jun Jiang, Yuen Wu","doi":"10.1038/s42256-026-01184-1","DOIUrl":"10.1038/s42256-026-01184-1","url":null,"abstract":"Optimizing molecular resource utilization for molecular discovery requires collaborative efforts across research institutions and organizations to accelerate progress. However, given the high research value of both successful and unsuccessful molecules produced by each institution (or organization), these findings are typically kept highly private and confidential until formal publication or commercialization, with even failed molecules rarely disclosed. This confidentiality requirement presents a great challenge for most existing methods when collaboratively handling molecular data with heterogeneous distributions under stringent privacy constraints. Here we propose FedLG (federated learning Lanczos graph), a federated graph learning method that leverages the Lanczos algorithm to facilitate collaborative model training across multiple parties, achieving reliable prediction performance under strict privacy protection conditions. Compared with various existing federate learning methods, FedLG exhibits excellent model performance on 18 benchmark datasets in a simulated federated learning environment. Under different privacy-preserving mechanism settings, FedLG demonstrates robust performance and resistance to noise. Leave-one-client-out experiments and comparison tests across each simulated institution show that FedLG achieves improved heterogeneous data aggregation capabilities and more promising outcomes than localized training. In addition, we incorporate Bayesian optimization into FedLG to show its scalability and further stabilize model performance. Overall, FedLG can be considered an effective method to realize multi-party collaboration while ensuring that sensitive molecular information is protected from potential leakage. Zhang et al. introduce FedLG, a federated graph learning framework that leverages Lanczos-based projection to effectively aggregate heterogeneous molecular data. Extensive benchmarks demonstrate its robustness across diverse molecular discovery tasks.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"8 2","pages":"246-256"},"PeriodicalIF":23.9,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146152324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attributing and situating knowledge cannot be left to language models 知识的归属和定位不能留给语言模型
IF 23.8 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-06 DOI: 10.1038/s42256-026-01193-0
Roxana Radu, Luc Rocher
{"title":"Attributing and situating knowledge cannot be left to language models","authors":"Roxana Radu, Luc Rocher","doi":"10.1038/s42256-026-01193-0","DOIUrl":"https://doi.org/10.1038/s42256-026-01193-0","url":null,"abstract":"","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"3 1","pages":""},"PeriodicalIF":23.8,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146135552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Authorization of prognostic AI medical devices 预后AI医疗设备的授权
IF 23.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-06 DOI: 10.1038/s42256-025-01171-y
Urs J. Muehlematter, Kerstin Noelle Vokinger
Less than 2% of artificial intelligence devices authorized by the US Food and Drug Agency are prognostic, with prediction horizons ranging from minutes to several years. As the number of prognostic AI devices could increase, it is important to address the accompanying regulatory and ethical challenges.
美国食品和药物管理局(fda)批准的人工智能设备中,只有不到2%具有预测能力,预测期限从几分钟到几年不等。随着预测人工智能设备的数量可能会增加,解决随之而来的监管和道德挑战非常重要。
{"title":"Authorization of prognostic AI medical devices","authors":"Urs J. Muehlematter, Kerstin Noelle Vokinger","doi":"10.1038/s42256-025-01171-y","DOIUrl":"10.1038/s42256-025-01171-y","url":null,"abstract":"Less than 2% of artificial intelligence devices authorized by the US Food and Drug Agency are prognostic, with prediction horizons ranging from minutes to several years. As the number of prognostic AI devices could increase, it is important to address the accompanying regulatory and ethical challenges.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"8 2","pages":"138-143"},"PeriodicalIF":23.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146135551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visual language models show widespread visual deficits on neuropsychological tests 视觉语言模型在神经心理学测试中显示出广泛的视觉缺陷
IF 23.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-06 DOI: 10.1038/s42256-026-01179-y
Gene Tangtartharakul, Katherine R. Storrs
Visual language models (VLMs) show remarkable performance in visual reasoning tasks, successfully tackling college-level challenges that require a high-level understanding of images. However, some recent reports of VLMs struggling to reason about elemental visual concepts such as orientation, position, continuity and occlusion suggest a potential gulf between human and VLM vision. Currently, few assessments enable a direct comparison between human and VLM performance, which limits our ability to measure alignment between the two systems. Here we use the toolkit of neuropsychology to systematically evaluate the capabilities of three state-of-the-art VLMs across low, mid and high visual domains. Using 51 tests drawn from 6 clinical and experimental psychology batteries, we characterize the visual abilities of leading VLMs relative to normative performance in healthy adults. While the models excel in straightforward object recognition tasks, we find widespread deficits in low- and mid-level visual abilities that would be considered clinically significant in humans. These selective deficits, profiled through validated test batteries, suggest that an artificial system can achieve complex object recognition without developing foundational visual concepts that in humans require no explicit training. Tangtartharakul and Storrs use standardized neuropsychological tests to compare human visual abilities with those of visual language models (VLMs). They report that while VLMs excel in high-level object recognition, they show deficits in low- and mid-level visual abilities.
视觉语言模型(VLMs)在视觉推理任务中表现出色,成功地解决了需要对图像有高水平理解的大学水平的挑战。然而,最近一些关于VLM难以推理基本视觉概念(如方向、位置、连续性和遮挡)的报道表明,人类和VLM视觉之间存在潜在的鸿沟。目前,很少有评估能够直接比较人类和VLM的表现,这限制了我们衡量两个系统之间一致性的能力。在这里,我们使用神经心理学的工具包来系统地评估三种最先进的vlm在低、中、高视觉域的能力。利用来自6个临床和实验心理学组的51项测试,我们描述了领先的VLMs相对于健康成人的正常表现的视觉能力。虽然这些模型在简单的物体识别任务中表现出色,但我们发现,在人类临床上具有重要意义的中低水平视觉能力中存在广泛的缺陷。这些选择性缺陷,通过经过验证的测试电池来描述,表明人工系统可以实现复杂的物体识别,而不需要开发人类不需要明确训练的基本视觉概念。Tangtartharakul和Storrs使用标准化的神经心理学测试来比较人类的视觉能力和视觉语言模型(VLMs)的能力。他们报告说,虽然VLMs在高级物体识别方面表现出色,但他们在低级和中级视觉能力方面表现出缺陷。
{"title":"Visual language models show widespread visual deficits on neuropsychological tests","authors":"Gene Tangtartharakul, Katherine R. Storrs","doi":"10.1038/s42256-026-01179-y","DOIUrl":"10.1038/s42256-026-01179-y","url":null,"abstract":"Visual language models (VLMs) show remarkable performance in visual reasoning tasks, successfully tackling college-level challenges that require a high-level understanding of images. However, some recent reports of VLMs struggling to reason about elemental visual concepts such as orientation, position, continuity and occlusion suggest a potential gulf between human and VLM vision. Currently, few assessments enable a direct comparison between human and VLM performance, which limits our ability to measure alignment between the two systems. Here we use the toolkit of neuropsychology to systematically evaluate the capabilities of three state-of-the-art VLMs across low, mid and high visual domains. Using 51 tests drawn from 6 clinical and experimental psychology batteries, we characterize the visual abilities of leading VLMs relative to normative performance in healthy adults. While the models excel in straightforward object recognition tasks, we find widespread deficits in low- and mid-level visual abilities that would be considered clinically significant in humans. These selective deficits, profiled through validated test batteries, suggest that an artificial system can achieve complex object recognition without developing foundational visual concepts that in humans require no explicit training. Tangtartharakul and Storrs use standardized neuropsychological tests to compare human visual abilities with those of visual language models (VLMs). They report that while VLMs excel in high-level object recognition, they show deficits in low- and mid-level visual abilities.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"8 2","pages":"209-219"},"PeriodicalIF":23.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146135554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying spatial single-cell-level interactions with graph transformer 用图形转换器识别空间单细胞级交互
IF 23.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-06 DOI: 10.1038/s42256-026-01191-2
Xiangzheng Cheng, Suoqin Jin
Identifying cell–cell interactions from imaging-based spatial transcriptomics suffers from limited gene panels. A new self-supervised graph transformer-based method can resolve spatial single-cell-level interactions without requiring known ligand–receptor pairs.
从基于成像的空间转录组学中识别细胞-细胞相互作用受到有限的基因面板的影响。一种新的基于自监督图变换的方法可以解决空间单细胞水平的相互作用,而不需要已知的配体-受体对。
{"title":"Identifying spatial single-cell-level interactions with graph transformer","authors":"Xiangzheng Cheng, Suoqin Jin","doi":"10.1038/s42256-026-01191-2","DOIUrl":"10.1038/s42256-026-01191-2","url":null,"abstract":"Identifying cell–cell interactions from imaging-based spatial transcriptomics suffers from limited gene panels. A new self-supervised graph transformer-based method can resolve spatial single-cell-level interactions without requiring known ligand–receptor pairs.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"8 2","pages":"146-147"},"PeriodicalIF":23.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146135555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the troubling rise of generative AI suspicion in academic publishing 关于学术出版中对生成人工智能的怀疑令人不安的兴起
IF 23.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-30 DOI: 10.1038/s42256-026-01178-z
Raffaele Ciriello
{"title":"On the troubling rise of generative AI suspicion in academic publishing","authors":"Raffaele Ciriello","doi":"10.1038/s42256-026-01178-z","DOIUrl":"10.1038/s42256-026-01178-z","url":null,"abstract":"","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"8 2","pages":"136-137"},"PeriodicalIF":23.9,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146089497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Nature Machine Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1