Pub Date : 2025-01-17DOI: 10.1038/s42256-024-00966-9
Zhiwei Nie, Xudong Liu, Jie Chen, Zhennan Wang, Yutian Liu, Haorui Si, Tianyi Dong, Fan Xu, Guoli Song, Yu Wang, Peng Zhou, Wen Gao, Yonghong Tian
The increasing frequency of emerging viral infections necessitates a rapid human response, highlighting the cost-effectiveness of computational methods. However, existing computational approaches are limited by their input forms or incomplete functionalities, preventing a unified prediction of diverse virus variation drivers and hindering in-depth applications. To address this issue, we propose a unified evolution-driven framework for predicting virus variation drivers, named Evolution-driven Virus Variation Driver prediction (E2VD), which is guided by virus evolutionary traits. With evolution-inspired design, E2VD comprehensively and significantly outperforms state-of-the-art methods across various virus mutational driver prediction tasks. Moreover, E2VD effectively captures the fundamental patterns of virus evolution. It not only distinguishes different types of mutations but also accurately identifies rare beneficial mutations that are critical for viruses to survive, while maintaining generalization capabilities across different lineages of SARS-CoV-2 and different types of viruses. Importantly, with predicted biological drivers, E2VD perceives virus evolutionary trends in which potential high-risk mutation sites are accurately recommended. Overall, E2VD represents a unified, structure-free and interpretable approach for analysing and predicting viral evolutionary fitness, providing an ideal alternative to costly wet-lab measurements to accelerate responses to emerging viral infections. A unified evolution-driven deep learning framework is presented, which outperforms state-of-the-art methods across various virus mutational driver predictions, and which captures fundamental patterns of virus evolution.
{"title":"A unified evolution-driven deep learning framework for virus variation driver prediction","authors":"Zhiwei Nie, Xudong Liu, Jie Chen, Zhennan Wang, Yutian Liu, Haorui Si, Tianyi Dong, Fan Xu, Guoli Song, Yu Wang, Peng Zhou, Wen Gao, Yonghong Tian","doi":"10.1038/s42256-024-00966-9","DOIUrl":"10.1038/s42256-024-00966-9","url":null,"abstract":"The increasing frequency of emerging viral infections necessitates a rapid human response, highlighting the cost-effectiveness of computational methods. However, existing computational approaches are limited by their input forms or incomplete functionalities, preventing a unified prediction of diverse virus variation drivers and hindering in-depth applications. To address this issue, we propose a unified evolution-driven framework for predicting virus variation drivers, named Evolution-driven Virus Variation Driver prediction (E2VD), which is guided by virus evolutionary traits. With evolution-inspired design, E2VD comprehensively and significantly outperforms state-of-the-art methods across various virus mutational driver prediction tasks. Moreover, E2VD effectively captures the fundamental patterns of virus evolution. It not only distinguishes different types of mutations but also accurately identifies rare beneficial mutations that are critical for viruses to survive, while maintaining generalization capabilities across different lineages of SARS-CoV-2 and different types of viruses. Importantly, with predicted biological drivers, E2VD perceives virus evolutionary trends in which potential high-risk mutation sites are accurately recommended. Overall, E2VD represents a unified, structure-free and interpretable approach for analysing and predicting viral evolutionary fitness, providing an ideal alternative to costly wet-lab measurements to accelerate responses to emerging viral infections. A unified evolution-driven deep learning framework is presented, which outperforms state-of-the-art methods across various virus mutational driver predictions, and which captures fundamental patterns of virus evolution.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 1","pages":"131-144"},"PeriodicalIF":18.8,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142987611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-17DOI: 10.1038/s42256-024-00977-6
Pengfei Liu, Jun Tao, Zhixiang Ren
Deep learning has significantly advanced molecular modelling and design, enabling an efficient understanding and discovery of novel molecules. In particular, large language models introduce a fresh research paradigm to tackle scientific problems from a natural language processing perspective. Large language models significantly enhance our understanding and generation of molecules, often surpassing existing methods with their capabilities to decode and synthesize complex molecular patterns. However, two key issues remain: how to quantify the match between model and data modalities and how to identify the knowledge-learning preferences of models. To address these challenges, we propose a multimodal benchmark, named ChEBI-20-MM, and perform 1,263 experiments to assess the model’s compatibility with data modalities and knowledge acquisition. Through the modal transition probability matrix, we provide insights into the most suitable modalities for tasks. Furthermore, we introduce a statistically interpretable approach to discover context-specific knowledge mapping by localized feature filtering. Our analysis offers an exploration of the learning mechanism and paves the way for advancing large language models in molecular science. Large language models promise substantial advances in molecular modelling and design. A multimodal benchmark is proposed to analyse performance, and 1,263 experiments are conducted to examine the compatibility of a large language model with data modalities and knowledge acquisition.
{"title":"A quantitative analysis of knowledge-learning preferences in large language models in molecular science","authors":"Pengfei Liu, Jun Tao, Zhixiang Ren","doi":"10.1038/s42256-024-00977-6","DOIUrl":"10.1038/s42256-024-00977-6","url":null,"abstract":"Deep learning has significantly advanced molecular modelling and design, enabling an efficient understanding and discovery of novel molecules. In particular, large language models introduce a fresh research paradigm to tackle scientific problems from a natural language processing perspective. Large language models significantly enhance our understanding and generation of molecules, often surpassing existing methods with their capabilities to decode and synthesize complex molecular patterns. However, two key issues remain: how to quantify the match between model and data modalities and how to identify the knowledge-learning preferences of models. To address these challenges, we propose a multimodal benchmark, named ChEBI-20-MM, and perform 1,263 experiments to assess the model’s compatibility with data modalities and knowledge acquisition. Through the modal transition probability matrix, we provide insights into the most suitable modalities for tasks. Furthermore, we introduce a statistically interpretable approach to discover context-specific knowledge mapping by localized feature filtering. Our analysis offers an exploration of the learning mechanism and paves the way for advancing large language models in molecular science. Large language models promise substantial advances in molecular modelling and design. A multimodal benchmark is proposed to analyse performance, and 1,263 experiments are conducted to examine the compatibility of a large language model with data modalities and knowledge acquisition.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 2","pages":"315-327"},"PeriodicalIF":18.8,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142987610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-16DOI: 10.1038/s42256-024-00961-0
Hongling Zheng, Li Shen, Anke Tang, Yong Luo, Han Hu, Bo Du, Yonggang Wen, Dacheng Tao
Foundation models have demonstrated remarkable performance across various tasks, primarily due to their abilities to comprehend instructions and access extensive, high-quality data. These capabilities showcase the effectiveness of current foundation models and suggest a promising trajectory. Owing to multiple constraints, such as the extreme scarcity or inaccessibility of raw data used to train foundation models and the high cost of training large-scale foundation models from scratch, the use of pre-existing foundation models or application programming interfaces for downstream tasks has become a new research trend, which we call Learn from Model (LFM). LFM involves extracting and leveraging prior knowledge from foundation models through fine-tuning, editing and fusion methods and applying it to downstream tasks. We emphasize that maximizing the use of parametric knowledge in data-scarce scenarios is critical to LFM. Analysing the LFM paradigm can guide the selection of the most appropriate technology in a given scenario to minimize parameter storage and computational costs while improving the performance of foundation models on new tasks. This Review provides a comprehensive overview of current methods based on foundation models from the perspective of LFM. Large general-purpose models are becoming more prevalent and useful, but also harder to train and find suitable training data for. Zheng et al. discuss how models can be used to train other models.
{"title":"Learning from models beyond fine-tuning","authors":"Hongling Zheng, Li Shen, Anke Tang, Yong Luo, Han Hu, Bo Du, Yonggang Wen, Dacheng Tao","doi":"10.1038/s42256-024-00961-0","DOIUrl":"10.1038/s42256-024-00961-0","url":null,"abstract":"Foundation models have demonstrated remarkable performance across various tasks, primarily due to their abilities to comprehend instructions and access extensive, high-quality data. These capabilities showcase the effectiveness of current foundation models and suggest a promising trajectory. Owing to multiple constraints, such as the extreme scarcity or inaccessibility of raw data used to train foundation models and the high cost of training large-scale foundation models from scratch, the use of pre-existing foundation models or application programming interfaces for downstream tasks has become a new research trend, which we call Learn from Model (LFM). LFM involves extracting and leveraging prior knowledge from foundation models through fine-tuning, editing and fusion methods and applying it to downstream tasks. We emphasize that maximizing the use of parametric knowledge in data-scarce scenarios is critical to LFM. Analysing the LFM paradigm can guide the selection of the most appropriate technology in a given scenario to minimize parameter storage and computational costs while improving the performance of foundation models on new tasks. This Review provides a comprehensive overview of current methods based on foundation models from the perspective of LFM. Large general-purpose models are becoming more prevalent and useful, but also harder to train and find suitable training data for. Zheng et al. discuss how models can be used to train other models.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 1","pages":"6-17"},"PeriodicalIF":18.8,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142986399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-16DOI: 10.1038/s42256-024-00974-9
Samson J. Mataraso, Camilo A. Espinosa, David Seong, S. Momsen Reincke, Eloise Berson, Jonathan D. Reiss, Yeasul Kim, Marc Ghanem, Chi-Hung Shu, Tomin James, Yuqi Tan, Sayane Shome, Ina A. Stelzer, Dorien Feyaerts, Ronald J. Wong, Gary M. Shaw, Martin S. Angst, Brice Gaudilliere, David K. Stevenson, Nima Aghaeepour
Omics studies produce a large number of measurements, enabling the development, validation and interpretation of systems-level biological models. Large cohorts are required to power these complex models; yet, the cohort size remains limited due to clinical and budgetary constraints. We introduce clinical and omics multimodal analysis enhanced with transfer learning (COMET), a machine learning framework that incorporates large, observational electronic health record databases and transfer learning to improve the analysis of small datasets from omics studies. By pretraining on electronic health record data and adaptively blending both early and late fusion strategies, COMET overcomes the limitations of existing multimodal machine learning methods. Using two independent datasets, we showed that COMET improved the predictive modelling performance and biological discovery compared with the analysis of omics data with traditional methods. By incorporating electronic health record data into omics analyses, COMET enables more precise patient classifications, beyond the simplistic binary reduction to cases and controls. This framework can be broadly applied to the analysis of multimodal omics studies and reveals more powerful biological insights from limited cohort sizes. COMET, an artificial intelligence method that improves the analysis of small medical studies using large clinical databases, has been created. COMET can help develop better artificial intelligence tools and identify key biomarkers across many diseases, potentially changing medical research.
{"title":"A machine learning approach to leveraging electronic health records for enhanced omics analysis","authors":"Samson J. Mataraso, Camilo A. Espinosa, David Seong, S. Momsen Reincke, Eloise Berson, Jonathan D. Reiss, Yeasul Kim, Marc Ghanem, Chi-Hung Shu, Tomin James, Yuqi Tan, Sayane Shome, Ina A. Stelzer, Dorien Feyaerts, Ronald J. Wong, Gary M. Shaw, Martin S. Angst, Brice Gaudilliere, David K. Stevenson, Nima Aghaeepour","doi":"10.1038/s42256-024-00974-9","DOIUrl":"10.1038/s42256-024-00974-9","url":null,"abstract":"Omics studies produce a large number of measurements, enabling the development, validation and interpretation of systems-level biological models. Large cohorts are required to power these complex models; yet, the cohort size remains limited due to clinical and budgetary constraints. We introduce clinical and omics multimodal analysis enhanced with transfer learning (COMET), a machine learning framework that incorporates large, observational electronic health record databases and transfer learning to improve the analysis of small datasets from omics studies. By pretraining on electronic health record data and adaptively blending both early and late fusion strategies, COMET overcomes the limitations of existing multimodal machine learning methods. Using two independent datasets, we showed that COMET improved the predictive modelling performance and biological discovery compared with the analysis of omics data with traditional methods. By incorporating electronic health record data into omics analyses, COMET enables more precise patient classifications, beyond the simplistic binary reduction to cases and controls. This framework can be broadly applied to the analysis of multimodal omics studies and reveals more powerful biological insights from limited cohort sizes. COMET, an artificial intelligence method that improves the analysis of small medical studies using large clinical databases, has been created. COMET can help develop better artificial intelligence tools and identify key biomarkers across many diseases, potentially changing medical research.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 2","pages":"293-306"},"PeriodicalIF":18.8,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s42256-024-00974-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142986398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurately predicting battery lifetime in early cycles holds tremendous value in real-world applications. However, this task poses significant challenges due to diverse factors influencing complex battery capacity degradation, such as cycling protocols, ambient temperatures and electrode materials. Moreover, cycling under specific conditions is both resource-intensive and time-consuming. Existing predictive models, primarily developed and validated within a restricted set of ageing conditions, thus raise doubts regarding their extensive applicability. Here we introduce BatLiNet, a deep learning framework tailored to predict battery lifetime reliably across a variety of ageing conditions. The distinctive design is integrating an inter-cell learning mechanism to predict the lifetime differences between two battery cells. This mechanism, when combined with conventional single-cell learning, enhances the stability of lifetime predictions for a target cell under varied ageing conditions. Our experimental results, derived from a broad spectrum of ageing conditions, demonstrate BatLiNet’s superior accuracy and robustness compared to existing models. BatLiNet also exhibits transferring capabilities across different battery chemistries, benefitting scenarios with limited resources. We expect this study could promote exploration of cross-cell insights and facilitate battery research across comprehensive ageing factors. Zhang and colleagues introduce an inter-cell learning mechanism to predict battery lifetime in the presence of diverse ageing conditions.
{"title":"Battery lifetime prediction across diverse ageing conditions with inter-cell deep learning","authors":"Han Zhang, Yuqi Li, Shun Zheng, Ziheng Lu, Xiaofan Gui, Wei Xu, Jiang Bian","doi":"10.1038/s42256-024-00972-x","DOIUrl":"10.1038/s42256-024-00972-x","url":null,"abstract":"Accurately predicting battery lifetime in early cycles holds tremendous value in real-world applications. However, this task poses significant challenges due to diverse factors influencing complex battery capacity degradation, such as cycling protocols, ambient temperatures and electrode materials. Moreover, cycling under specific conditions is both resource-intensive and time-consuming. Existing predictive models, primarily developed and validated within a restricted set of ageing conditions, thus raise doubts regarding their extensive applicability. Here we introduce BatLiNet, a deep learning framework tailored to predict battery lifetime reliably across a variety of ageing conditions. The distinctive design is integrating an inter-cell learning mechanism to predict the lifetime differences between two battery cells. This mechanism, when combined with conventional single-cell learning, enhances the stability of lifetime predictions for a target cell under varied ageing conditions. Our experimental results, derived from a broad spectrum of ageing conditions, demonstrate BatLiNet’s superior accuracy and robustness compared to existing models. BatLiNet also exhibits transferring capabilities across different battery chemistries, benefitting scenarios with limited resources. We expect this study could promote exploration of cross-cell insights and facilitate battery research across comprehensive ageing factors. Zhang and colleagues introduce an inter-cell learning mechanism to predict battery lifetime in the presence of diverse ageing conditions.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 2","pages":"270-277"},"PeriodicalIF":18.8,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s42256-024-00972-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142981810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-15DOI: 10.1038/s42256-024-00963-y
Luca M. Schulze Buschoff, Elif Akata, Matthias Bethge, Eric Schulz
A chief goal of artificial intelligence is to build machines that think like people. Yet it has been argued that deep neural network architectures fail to accomplish this. Researchers have asserted these models’ limitations in the domains of causal reasoning, intuitive physics and intuitive psychology. Yet recent advancements, namely the rise of large language models, particularly those designed for visual processing, have rekindled interest in the potential to emulate human-like cognitive abilities. This paper evaluates the current state of vision-based large language models in the domains of intuitive physics, causal reasoning and intuitive psychology. Through a series of controlled experiments, we investigate the extent to which these modern models grasp complex physical interactions, causal relationships and intuitive understanding of others’ preferences. Our findings reveal that, while some of these models demonstrate a notable proficiency in processing and interpreting visual data, they still fall short of human capabilities in these areas. Our results emphasize the need for integrating more robust mechanisms for understanding causality, physical dynamics and social cognition into modern-day, vision-based language models, and point out the importance of cognitively inspired benchmarks. Modern vision-based language models face challenges with complex physical interactions, causal reasoning and intuitive psychology. Schulze Buschoff and colleagues demonstrate that while some models exhibit proficient visual data processing capabilities, they fall short of human performance in these cognitive domains.
{"title":"Visual cognition in multimodal large language models","authors":"Luca M. Schulze Buschoff, Elif Akata, Matthias Bethge, Eric Schulz","doi":"10.1038/s42256-024-00963-y","DOIUrl":"10.1038/s42256-024-00963-y","url":null,"abstract":"A chief goal of artificial intelligence is to build machines that think like people. Yet it has been argued that deep neural network architectures fail to accomplish this. Researchers have asserted these models’ limitations in the domains of causal reasoning, intuitive physics and intuitive psychology. Yet recent advancements, namely the rise of large language models, particularly those designed for visual processing, have rekindled interest in the potential to emulate human-like cognitive abilities. This paper evaluates the current state of vision-based large language models in the domains of intuitive physics, causal reasoning and intuitive psychology. Through a series of controlled experiments, we investigate the extent to which these modern models grasp complex physical interactions, causal relationships and intuitive understanding of others’ preferences. Our findings reveal that, while some of these models demonstrate a notable proficiency in processing and interpreting visual data, they still fall short of human capabilities in these areas. Our results emphasize the need for integrating more robust mechanisms for understanding causality, physical dynamics and social cognition into modern-day, vision-based language models, and point out the importance of cognitively inspired benchmarks. Modern vision-based language models face challenges with complex physical interactions, causal reasoning and intuitive psychology. Schulze Buschoff and colleagues demonstrate that while some models exhibit proficient visual data processing capabilities, they fall short of human performance in these cognitive domains.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 1","pages":"96-106"},"PeriodicalIF":18.8,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s42256-024-00963-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142981812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-15DOI: 10.1038/s42256-024-00956-x
Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, Gábor Csányi
Molecular dynamics simulation is an important tool in computational materials science and chemistry, and in the past decade it has been revolutionized by machine learning. This rapid progress in machine learning interatomic potentials has produced a number of new architectures in just the past few years. Particularly notable among these are the atomic cluster expansion, which unified many of the earlier ideas around atom-density-based descriptors, and Neural Equivariant Interatomic Potentials (NequIP), a message-passing neural network with equivariant features that exhibited state-of-the-art accuracy at the time. Here we construct a mathematical framework that unifies these models: atomic cluster expansion is extended and recast as one layer of a multi-layer architecture, while the linearized version of NequIP is understood as a particular sparsification of a much larger polynomial model. Our framework also provides a practical tool for systematically probing different choices in this unified design space. An ablation study of NequIP, via a set of experiments looking at in- and out-of-domain accuracy and smooth extrapolation very far from the training data, sheds some light on which design choices are critical to achieving high accuracy. A much-simplified version of NequIP, which we call BOTnet (for body-ordered tensor network), has an interpretable architecture and maintains its accuracy on benchmark datasets. Batatia and colleagues introduce a computational framework that combines message-passing networks with the atomic cluster expansion architecture and incorporates a many-body description of the geometry of molecular structures. The resulting models are interpretable and accurate.
{"title":"The design space of E(3)-equivariant atom-centred interatomic potentials","authors":"Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, Gábor Csányi","doi":"10.1038/s42256-024-00956-x","DOIUrl":"10.1038/s42256-024-00956-x","url":null,"abstract":"Molecular dynamics simulation is an important tool in computational materials science and chemistry, and in the past decade it has been revolutionized by machine learning. This rapid progress in machine learning interatomic potentials has produced a number of new architectures in just the past few years. Particularly notable among these are the atomic cluster expansion, which unified many of the earlier ideas around atom-density-based descriptors, and Neural Equivariant Interatomic Potentials (NequIP), a message-passing neural network with equivariant features that exhibited state-of-the-art accuracy at the time. Here we construct a mathematical framework that unifies these models: atomic cluster expansion is extended and recast as one layer of a multi-layer architecture, while the linearized version of NequIP is understood as a particular sparsification of a much larger polynomial model. Our framework also provides a practical tool for systematically probing different choices in this unified design space. An ablation study of NequIP, via a set of experiments looking at in- and out-of-domain accuracy and smooth extrapolation very far from the training data, sheds some light on which design choices are critical to achieving high accuracy. A much-simplified version of NequIP, which we call BOTnet (for body-ordered tensor network), has an interpretable architecture and maintains its accuracy on benchmark datasets. Batatia and colleagues introduce a computational framework that combines message-passing networks with the atomic cluster expansion architecture and incorporates a many-body description of the geometry of molecular structures. The resulting models are interpretable and accurate.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 1","pages":"56-67"},"PeriodicalIF":18.8,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s42256-024-00956-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142981811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-15DOI: 10.1038/s42256-024-00964-x
Juan L. Gamella, Jonas Peters, Peter Bühlmann
In some fields of artificial intelligence, machine learning and statistics, the validation of new methods and algorithms is often hindered by the scarcity of suitable real-world datasets. Researchers must often turn to simulated data, which yields limited information about the applicability of the proposed methods to real problems. As a step forward, we have constructed two devices that allow us to quickly and inexpensively produce large datasets from non-trivial but well-understood physical systems. The devices, which we call causal chambers, are computer-controlled laboratories that allow us to manipulate and measure an array of variables from these physical systems, providing a rich testbed for algorithms from a variety of fields. We illustrate potential applications through a series of case studies in fields such as causal discovery, out-of-distribution generalization, change point detection, independent component analysis and symbolic regression. For applications to causal inference, the chambers allow us to carefully perform interventions. We also provide and empirically validate a causal model of each chamber, which can be used as ground truth for different tasks. The hardware and software are made open source, and the datasets are publicly available at causalchamber.org or through the Python package causalchamber . Two devices are constructed to manipulate and collect data from non-trivial but well-understood physical systems. The devices serve as a flexible real-world testbed for artificial intelligence algorithms.
{"title":"Causal chambers as a real-world physical testbed for AI methodology","authors":"Juan L. Gamella, Jonas Peters, Peter Bühlmann","doi":"10.1038/s42256-024-00964-x","DOIUrl":"10.1038/s42256-024-00964-x","url":null,"abstract":"In some fields of artificial intelligence, machine learning and statistics, the validation of new methods and algorithms is often hindered by the scarcity of suitable real-world datasets. Researchers must often turn to simulated data, which yields limited information about the applicability of the proposed methods to real problems. As a step forward, we have constructed two devices that allow us to quickly and inexpensively produce large datasets from non-trivial but well-understood physical systems. The devices, which we call causal chambers, are computer-controlled laboratories that allow us to manipulate and measure an array of variables from these physical systems, providing a rich testbed for algorithms from a variety of fields. We illustrate potential applications through a series of case studies in fields such as causal discovery, out-of-distribution generalization, change point detection, independent component analysis and symbolic regression. For applications to causal inference, the chambers allow us to carefully perform interventions. We also provide and empirically validate a causal model of each chamber, which can be used as ground truth for different tasks. The hardware and software are made open source, and the datasets are publicly available at causalchamber.org or through the Python package causalchamber . Two devices are constructed to manipulate and collect data from non-trivial but well-understood physical systems. The devices serve as a flexible real-world testbed for artificial intelligence algorithms.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 1","pages":"107-118"},"PeriodicalIF":18.8,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s42256-024-00964-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142981813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-13DOI: 10.1038/s42256-024-00970-z
M. J. Crockett
As powerful institutions increasingly promote AI systems, efforts to align those systems with human morality have grown. An open-source AI system aims to predict human moral judgments across a broad spectrum of everyday situations expressed in natural language. Identifying the limitations of such systems offers important insights for future work.
{"title":"Modern maxims for an AI oracle","authors":"M. J. Crockett","doi":"10.1038/s42256-024-00970-z","DOIUrl":"10.1038/s42256-024-00970-z","url":null,"abstract":"As powerful institutions increasingly promote AI systems, efforts to align those systems with human morality have grown. An open-source AI system aims to predict human moral judgments across a broad spectrum of everyday situations expressed in natural language. Identifying the limitations of such systems offers important insights for future work.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 1","pages":"4-5"},"PeriodicalIF":18.8,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142968279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-13DOI: 10.1038/s42256-024-00965-w
Fernando Pérez-García, Harshita Sharma, Sam Bond-Taylor, Kenza Bouzid, Valentina Salvatelli, Maximilian Ilse, Shruthi Bannur, Daniel C. Castro, Anton Schwaighofer, Matthew P. Lungren, Maria Teodora Wetscherek, Noel Codella, Stephanie L. Hyland, Javier Alvarez-Valle, Ozan Oktay
Language-supervised pretraining has proven to be a valuable method for extracting semantically meaningful features from images, serving as a foundational element in multimodal systems within the computer vision and medical imaging domains. However, the computed features are limited by the information contained in the text, which is particularly problematic in medical imaging, in which the findings described by radiologists focus on specific observations. This challenge is compounded by the scarcity of paired imaging–text data due to concerns over the leakage of personal health information. In this work, we fundamentally challenge the prevailing reliance on language supervision for learning general-purpose biomedical imaging encoders. We introduce RAD-DINO, a biomedical image encoder pretrained solely on unimodal biomedical imaging data that obtains similar or greater performance than state-of-the-art biomedical-language-supervised models on a diverse range of benchmarks. Specifically, the quality of learned representations is evaluated on standard imaging tasks (classification and semantic segmentation), and a vision–language alignment task (text report generation from images). To further demonstrate the drawback of language supervision, we show that features from RAD-DINO correlate with other medical records (for example, sex or age) better than language-supervised models, which are generally not mentioned in radiology reports. Finally, we conduct a series of ablations determining the factors in RAD-DINO’s performance. In particular, we observe that RAD-DINO’s downstream performance scales well with the quantity and diversity of training data, demonstrating that image-only supervision is a scalable approach for training a foundational biomedical image encoder. Reliance on text supervision for biomedical image encoders is investigated. The proposed RAD-DINO, pretrained solely on unimodal data, achieves similar or greater performance than state-of-the-art multimodal models on various benchmarks.
{"title":"Exploring scalable medical image encoders beyond text supervision","authors":"Fernando Pérez-García, Harshita Sharma, Sam Bond-Taylor, Kenza Bouzid, Valentina Salvatelli, Maximilian Ilse, Shruthi Bannur, Daniel C. Castro, Anton Schwaighofer, Matthew P. Lungren, Maria Teodora Wetscherek, Noel Codella, Stephanie L. Hyland, Javier Alvarez-Valle, Ozan Oktay","doi":"10.1038/s42256-024-00965-w","DOIUrl":"10.1038/s42256-024-00965-w","url":null,"abstract":"Language-supervised pretraining has proven to be a valuable method for extracting semantically meaningful features from images, serving as a foundational element in multimodal systems within the computer vision and medical imaging domains. However, the computed features are limited by the information contained in the text, which is particularly problematic in medical imaging, in which the findings described by radiologists focus on specific observations. This challenge is compounded by the scarcity of paired imaging–text data due to concerns over the leakage of personal health information. In this work, we fundamentally challenge the prevailing reliance on language supervision for learning general-purpose biomedical imaging encoders. We introduce RAD-DINO, a biomedical image encoder pretrained solely on unimodal biomedical imaging data that obtains similar or greater performance than state-of-the-art biomedical-language-supervised models on a diverse range of benchmarks. Specifically, the quality of learned representations is evaluated on standard imaging tasks (classification and semantic segmentation), and a vision–language alignment task (text report generation from images). To further demonstrate the drawback of language supervision, we show that features from RAD-DINO correlate with other medical records (for example, sex or age) better than language-supervised models, which are generally not mentioned in radiology reports. Finally, we conduct a series of ablations determining the factors in RAD-DINO’s performance. In particular, we observe that RAD-DINO’s downstream performance scales well with the quantity and diversity of training data, demonstrating that image-only supervision is a scalable approach for training a foundational biomedical image encoder. Reliance on text supervision for biomedical image encoders is investigated. The proposed RAD-DINO, pretrained solely on unimodal data, achieves similar or greater performance than state-of-the-art multimodal models on various benchmarks.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 1","pages":"119-130"},"PeriodicalIF":18.8,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142974695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}