Pub Date : 2025-02-06DOI: 10.1038/s42256-024-00978-5
Jaehoon Cha, Jinhae Park, Samuel Pinilla, Kyle L. Morris, Christopher S. Allen, Mark I. Wilkinson, Jeyan Thiyagalingam
Learning meaningful representations of images in scientific domains that are robust to variations in centroids and orientations remains an important challenge. Here we introduce centroid- and orientation-aware disentangling autoencoder (CODAE), an encoder–decoder-based neural network that learns meaningful content of objects in a latent space. Specifically, a combination of a translation- and rotation-equivariant encoder, Euler encoding and an image moment loss enables CODAE to extract features invariant to positions and orientations of objects of interest from randomly translated and rotated images. We evaluate this approach on several publicly available scientific datasets, including protein images from life sciences, four-dimensional scanning transmission electron microscopy data from material science and galaxy images from astronomy. The evaluation shows that CODAE learns centroids, orientations and their invariant features and outputs, as well as aligned reconstructions and the exact view reconstructions of the input images with high quality. Cha and colleagues present a translation- and rotation-equivariant autoencoder-based method for robust image recognition, which they demonstrate on diverse tasks from bioinformatics, material science and astronomy.
{"title":"Discovering fully semantic representations via centroid- and orientation-aware feature learning","authors":"Jaehoon Cha, Jinhae Park, Samuel Pinilla, Kyle L. Morris, Christopher S. Allen, Mark I. Wilkinson, Jeyan Thiyagalingam","doi":"10.1038/s42256-024-00978-5","DOIUrl":"10.1038/s42256-024-00978-5","url":null,"abstract":"Learning meaningful representations of images in scientific domains that are robust to variations in centroids and orientations remains an important challenge. Here we introduce centroid- and orientation-aware disentangling autoencoder (CODAE), an encoder–decoder-based neural network that learns meaningful content of objects in a latent space. Specifically, a combination of a translation- and rotation-equivariant encoder, Euler encoding and an image moment loss enables CODAE to extract features invariant to positions and orientations of objects of interest from randomly translated and rotated images. We evaluate this approach on several publicly available scientific datasets, including protein images from life sciences, four-dimensional scanning transmission electron microscopy data from material science and galaxy images from astronomy. The evaluation shows that CODAE learns centroids, orientations and their invariant features and outputs, as well as aligned reconstructions and the exact view reconstructions of the input images with high quality. Cha and colleagues present a translation- and rotation-equivariant autoencoder-based method for robust image recognition, which they demonstrate on diverse tasks from bioinformatics, material science and astronomy.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 2","pages":"307-314"},"PeriodicalIF":18.8,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s42256-024-00978-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143191831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-05DOI: 10.1038/s42256-025-00983-2
Yuan Meng, Zhenshan Bing, Xiangtong Yao, Kejia Chen, Kai Huang, Yang Gao, Fuchun Sun, Alois Knoll
Humans can continually accumulate knowledge and develop increasingly complex behaviours and skills throughout their lives, which is a capability known as ‘lifelong learning’. Although this lifelong learning capability is considered an essential mechanism that makes up general intelligence, recent advancements in artificial intelligence predominantly excel in narrow, specialized domains and generally lack this lifelong learning capability. Here we introduce a robotic lifelong reinforcement learning framework that addresses this gap by developing a knowledge space inspired by the Bayesian non-parametric domain. In addition, we enhance the agent’s semantic understanding of tasks by integrating language embeddings into the framework. Our proposed embodied agent can consistently accumulate knowledge from a continuous stream of one-time feeding tasks. Furthermore, our agent can tackle challenging real-world long-horizon tasks by combining and reapplying its acquired knowledge from the original tasks stream. The proposed framework advances our understanding of the robotic lifelong learning process and may inspire the development of more broadly applicable intelligence. Humans continuously acquire knowledge and develop complex behaviours. Meng, Bing, Yao and colleagues present a robotic lifelong learning framework using a Bayesian non-parametric knowledge space, enabling agents to dynamically preserve and integrate knowledge from sequential tasks, enhancing adaptability.
{"title":"Preserving and combining knowledge in robotic lifelong reinforcement learning","authors":"Yuan Meng, Zhenshan Bing, Xiangtong Yao, Kejia Chen, Kai Huang, Yang Gao, Fuchun Sun, Alois Knoll","doi":"10.1038/s42256-025-00983-2","DOIUrl":"10.1038/s42256-025-00983-2","url":null,"abstract":"Humans can continually accumulate knowledge and develop increasingly complex behaviours and skills throughout their lives, which is a capability known as ‘lifelong learning’. Although this lifelong learning capability is considered an essential mechanism that makes up general intelligence, recent advancements in artificial intelligence predominantly excel in narrow, specialized domains and generally lack this lifelong learning capability. Here we introduce a robotic lifelong reinforcement learning framework that addresses this gap by developing a knowledge space inspired by the Bayesian non-parametric domain. In addition, we enhance the agent’s semantic understanding of tasks by integrating language embeddings into the framework. Our proposed embodied agent can consistently accumulate knowledge from a continuous stream of one-time feeding tasks. Furthermore, our agent can tackle challenging real-world long-horizon tasks by combining and reapplying its acquired knowledge from the original tasks stream. The proposed framework advances our understanding of the robotic lifelong learning process and may inspire the development of more broadly applicable intelligence. Humans continuously acquire knowledge and develop complex behaviours. Meng, Bing, Yao and colleagues present a robotic lifelong learning framework using a Bayesian non-parametric knowledge space, enabling agents to dynamically preserve and integrate knowledge from sequential tasks, enhancing adaptability.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 2","pages":"256-269"},"PeriodicalIF":18.8,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s42256-025-00983-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143125412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-03DOI: 10.1038/s42256-025-00979-y
Leonie N. Bossert, Wulf Loh
There is a growing awareness of the substantial environmental costs of large language models (LLMs), but discussing the sustainability of LLMs only in terms of CO2 emissions is not enough. This Comment emphasizes the need to take into account the social and ecological costs and benefits of LLMs as well.
{"title":"Why the carbon footprint of generative large language models alone will not help us assess their sustainability","authors":"Leonie N. Bossert, Wulf Loh","doi":"10.1038/s42256-025-00979-y","DOIUrl":"10.1038/s42256-025-00979-y","url":null,"abstract":"There is a growing awareness of the substantial environmental costs of large language models (LLMs), but discussing the sustainability of LLMs only in terms of CO2 emissions is not enough. This Comment emphasizes the need to take into account the social and ecological costs and benefits of LLMs as well.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 2","pages":"164-165"},"PeriodicalIF":18.8,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143077564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-28DOI: 10.1038/s42256-024-00971-y
Jeremy Wohlwend, Anusha Nathan, Nitan Shalon, Charles R. Crain, Rhoda Tano-Menka, Benjamin Goldberg, Emma Richards, Gaurav D. Gaiha, Regina Barzilay
Accurate in silico determination of CD8+ T cell epitopes would greatly enhance T cell-based vaccine development, but current prediction models are not reliably successful. Here, motivated by recent successes applying machine learning to complex biology, we curated a dataset of 651,237 unique human leukocyte antigen class I (HLA-I) ligands and developed MUNIS, a deep learning model that identifies peptides presented by HLA-I alleles. MUNIS shows improved performance compared with existing models in predicting peptide presentation and CD8+ T cell epitope immunodominance hierarchies. Moreover, application of MUNIS to proteins from Epstein–Barr virus led to successful identification of both established and novel HLA-I epitopes which were experimentally validated by in vitro HLA-I-peptide stability and T cell immunogenicity assays. MUNIS performs comparably to an experimental stability assay in terms of immunogenicity prediction, suggesting that deep learning can reduce experimental burden and accelerate identification of CD8+ T cell epitopes for rapid T cell vaccine development. Accurate prediction of immunogenic CD8+ T cell epitopes would greatly accelerate T cell vaccine development. A new deep learning model, MUNIS, can rapidly identify HLA-binding, immunogenic and immunodominant peptides in foreign pathogens.
{"title":"Deep learning enhances the prediction of HLA class I-presented CD8+ T cell epitopes in foreign pathogens","authors":"Jeremy Wohlwend, Anusha Nathan, Nitan Shalon, Charles R. Crain, Rhoda Tano-Menka, Benjamin Goldberg, Emma Richards, Gaurav D. Gaiha, Regina Barzilay","doi":"10.1038/s42256-024-00971-y","DOIUrl":"10.1038/s42256-024-00971-y","url":null,"abstract":"Accurate in silico determination of CD8+ T cell epitopes would greatly enhance T cell-based vaccine development, but current prediction models are not reliably successful. Here, motivated by recent successes applying machine learning to complex biology, we curated a dataset of 651,237 unique human leukocyte antigen class I (HLA-I) ligands and developed MUNIS, a deep learning model that identifies peptides presented by HLA-I alleles. MUNIS shows improved performance compared with existing models in predicting peptide presentation and CD8+ T cell epitope immunodominance hierarchies. Moreover, application of MUNIS to proteins from Epstein–Barr virus led to successful identification of both established and novel HLA-I epitopes which were experimentally validated by in vitro HLA-I-peptide stability and T cell immunogenicity assays. MUNIS performs comparably to an experimental stability assay in terms of immunogenicity prediction, suggesting that deep learning can reduce experimental burden and accelerate identification of CD8+ T cell epitopes for rapid T cell vaccine development. Accurate prediction of immunogenic CD8+ T cell epitopes would greatly accelerate T cell vaccine development. A new deep learning model, MUNIS, can rapidly identify HLA-binding, immunogenic and immunodominant peptides in foreign pathogens.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 2","pages":"232-243"},"PeriodicalIF":18.8,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s42256-024-00971-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143050094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-28DOI: 10.1038/s42256-024-00973-w
Chenpeng Yu, Xing Fang, Shiye Tian, Hui Liu
The immune checkpoint inhibitors have demonstrated promising clinical efficacy across various tumour types, yet the percentage of patients who benefit from them remains low. The bindings between tumour antigens and human leukocyte antigen class I/T cell receptor molecules determine the antigen presentation and T cell activation, thereby playing an important role in the immunotherapy response. In this paper, we propose UnifyImmun, a unified cross-attention transformer model designed to simultaneously predict the bindings of peptides to both receptors, providing more comprehensive evaluation of antigen immunogenicity. We devise a two-phase strategy using virtual adversarial training that enables these two tasks to reinforce each other mutually, by compelling the encoders to extract more expressive features. Our method demonstrates superior performance in predicting both peptide-HLA and peptide-TCR binding on multiple independent and external test sets. Notably, on a large-scale COVID-19 peptide-TCR binding test set without any seen peptide in the training set, our method outperforms the current state-of-the-art methods by more than 10%. The predicted binding scores significantly correlate with the immunotherapy response and clinical outcomes on two clinical cohorts. Furthermore, the cross-attention scores and integrated gradients reveal the amino acid sites critical for peptide binding to receptors. In essence, our approach marks an essential step towards comprehensive evaluation of antigen immunogenicity. This work proposes a deep learning model based on the cross-attention mechanism to simultaneously predict peptide–HLA and peptide–TCR bindings. Experiments verify that its performance for both prediction tasks on multiple test sets compares favourably with previous methods.
{"title":"A unified cross-attention model for predicting antigen binding specificity to both HLA and TCR molecules","authors":"Chenpeng Yu, Xing Fang, Shiye Tian, Hui Liu","doi":"10.1038/s42256-024-00973-w","DOIUrl":"10.1038/s42256-024-00973-w","url":null,"abstract":"The immune checkpoint inhibitors have demonstrated promising clinical efficacy across various tumour types, yet the percentage of patients who benefit from them remains low. The bindings between tumour antigens and human leukocyte antigen class I/T cell receptor molecules determine the antigen presentation and T cell activation, thereby playing an important role in the immunotherapy response. In this paper, we propose UnifyImmun, a unified cross-attention transformer model designed to simultaneously predict the bindings of peptides to both receptors, providing more comprehensive evaluation of antigen immunogenicity. We devise a two-phase strategy using virtual adversarial training that enables these two tasks to reinforce each other mutually, by compelling the encoders to extract more expressive features. Our method demonstrates superior performance in predicting both peptide-HLA and peptide-TCR binding on multiple independent and external test sets. Notably, on a large-scale COVID-19 peptide-TCR binding test set without any seen peptide in the training set, our method outperforms the current state-of-the-art methods by more than 10%. The predicted binding scores significantly correlate with the immunotherapy response and clinical outcomes on two clinical cohorts. Furthermore, the cross-attention scores and integrated gradients reveal the amino acid sites critical for peptide binding to receptors. In essence, our approach marks an essential step towards comprehensive evaluation of antigen immunogenicity. This work proposes a deep learning model based on the cross-attention mechanism to simultaneously predict peptide–HLA and peptide–TCR bindings. Experiments verify that its performance for both prediction tasks on multiple test sets compares favourably with previous methods.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 2","pages":"278-292"},"PeriodicalIF":18.8,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143050075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-27DOI: 10.1038/s42256-025-00989-w
Machine learning models are promising approaches to tackle partial differential equations, which are foundational descriptions of many scientific and engineering problems. However, in speaking with several experts about progress in the area, questions are emerging over what realistic advantages machine learning models have and how their performance should be evaluated.
{"title":"Machine learning solutions looking for PDE problems","authors":"","doi":"10.1038/s42256-025-00989-w","DOIUrl":"10.1038/s42256-025-00989-w","url":null,"abstract":"Machine learning models are promising approaches to tackle partial differential equations, which are foundational descriptions of many scientific and engineering problems. However, in speaking with several experts about progress in the area, questions are emerging over what realistic advantages machine learning models have and how their performance should be evaluated.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 1","pages":"1-1"},"PeriodicalIF":18.8,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s42256-025-00989-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143044076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-27DOI: 10.1038/s42256-024-00975-8
Takuya Akiba, Makoto Shing, Yujin Tang, Qi Sun, David Ha
Large language models (LLMs) have become increasingly capable, but their development often requires substantial computational resources. Although model merging has emerged as a cost-effective promising approach for creating new models by combining existing ones, it currently relies on human intuition and domain knowledge, limiting its potential. Here we propose an evolutionary approach that overcomes this limitation by automatically discovering effective combinations of diverse open-source models, harnessing their collective intelligence without requiring extensive additional training data or compute. Our approach operates in both parameter space and data flow space, allowing optimization beyond just the weights of the individual models. This approach even facilitates cross-domain merging, generating models such as a Japanese LLM with math reasoning capabilities. Surprisingly, our Japanese math LLM achieved state-of-the-art performance on a variety of established Japanese LLM benchmarks, even surpassing models with substantially more parameters, despite not being explicitly trained for such tasks. Furthermore, a culturally aware Japanese vision–language model generated through our approach demonstrates its effectiveness in describing Japanese culture-specific content, outperforming previous Japanese vision–language models. This work not only contributes new state-of-the-art models back to the open-source community but also introduces a new paradigm for automated model composition, paving the way for exploring alternative, efficient approaches to foundation model development. Akiba et al. developed an evolutionary approach to automatically merge artificial intelligence models, creating powerful hybrid models without extensive training. The method produces models with enhanced mathematical and visual capabilities that outperform larger models.
{"title":"Evolutionary optimization of model merging recipes","authors":"Takuya Akiba, Makoto Shing, Yujin Tang, Qi Sun, David Ha","doi":"10.1038/s42256-024-00975-8","DOIUrl":"10.1038/s42256-024-00975-8","url":null,"abstract":"Large language models (LLMs) have become increasingly capable, but their development often requires substantial computational resources. Although model merging has emerged as a cost-effective promising approach for creating new models by combining existing ones, it currently relies on human intuition and domain knowledge, limiting its potential. Here we propose an evolutionary approach that overcomes this limitation by automatically discovering effective combinations of diverse open-source models, harnessing their collective intelligence without requiring extensive additional training data or compute. Our approach operates in both parameter space and data flow space, allowing optimization beyond just the weights of the individual models. This approach even facilitates cross-domain merging, generating models such as a Japanese LLM with math reasoning capabilities. Surprisingly, our Japanese math LLM achieved state-of-the-art performance on a variety of established Japanese LLM benchmarks, even surpassing models with substantially more parameters, despite not being explicitly trained for such tasks. Furthermore, a culturally aware Japanese vision–language model generated through our approach demonstrates its effectiveness in describing Japanese culture-specific content, outperforming previous Japanese vision–language models. This work not only contributes new state-of-the-art models back to the open-source community but also introduces a new paradigm for automated model composition, paving the way for exploring alternative, efficient approaches to foundation model development. Akiba et al. developed an evolutionary approach to automatically merge artificial intelligence models, creating powerful hybrid models without extensive training. The method produces models with enhanced mathematical and visual capabilities that outperform larger models.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 2","pages":"195-204"},"PeriodicalIF":18.8,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s42256-024-00975-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143044077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-23DOI: 10.1038/s42256-024-00942-3
Shihao Ma, Andy G. X. Zeng, Benjamin Haibe-Kains, Anna Goldenberg, John E. Dick, Bo Wang
High-throughput omics profiling advancements have greatly enhanced cancer patient stratification. However, incomplete data in multi-omics integration present a substantial challenge, as traditional methods like sample exclusion or imputation often compromise biological diversity and dependencies. Furthermore, the critical task of accurately classifying new patients with partial omics data into existing subtypes is commonly overlooked. To address these issues, we introduce Integrate Any Omics (IntegrAO), an unsupervised framework for integrating incomplete multi-omics data and classifying new samples. IntegrAO first combines partially overlapping patient graphs from diverse omics sources and utilizes graph neural networks to produce unified patient embeddings. Our systematic evaluation across five cancer cohorts involving six omics modalities demonstrates IntegrAO’s robustness to missing data and its accuracy in classifying new samples with partial profiles. An acute myeloid leukaemia case study further validates its capability to uncover biological and clinical heterogeneities in incomplete datasets. IntegrAO’s ability to handle heterogeneous and incomplete data makes it an essential tool for precision oncology, offering a holistic approach to patient characterization. Integrating incomplete multi-omics data remains a key challenge in precision oncology. IntegrAO, an unsupervised framework that integrates diverse omics, enables accurate patient classification even with incomplete datasets.
{"title":"Moving towards genome-wide data integration for patient stratification with Integrate Any Omics","authors":"Shihao Ma, Andy G. X. Zeng, Benjamin Haibe-Kains, Anna Goldenberg, John E. Dick, Bo Wang","doi":"10.1038/s42256-024-00942-3","DOIUrl":"10.1038/s42256-024-00942-3","url":null,"abstract":"High-throughput omics profiling advancements have greatly enhanced cancer patient stratification. However, incomplete data in multi-omics integration present a substantial challenge, as traditional methods like sample exclusion or imputation often compromise biological diversity and dependencies. Furthermore, the critical task of accurately classifying new patients with partial omics data into existing subtypes is commonly overlooked. To address these issues, we introduce Integrate Any Omics (IntegrAO), an unsupervised framework for integrating incomplete multi-omics data and classifying new samples. IntegrAO first combines partially overlapping patient graphs from diverse omics sources and utilizes graph neural networks to produce unified patient embeddings. Our systematic evaluation across five cancer cohorts involving six omics modalities demonstrates IntegrAO’s robustness to missing data and its accuracy in classifying new samples with partial profiles. An acute myeloid leukaemia case study further validates its capability to uncover biological and clinical heterogeneities in incomplete datasets. IntegrAO’s ability to handle heterogeneous and incomplete data makes it an essential tool for precision oncology, offering a holistic approach to patient characterization. Integrating incomplete multi-omics data remains a key challenge in precision oncology. IntegrAO, an unsupervised framework that integrates diverse omics, enables accurate patient classification even with incomplete datasets.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 1","pages":"29-42"},"PeriodicalIF":18.8,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143020700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-21DOI: 10.1038/s42256-024-00976-7
Mark Steyvers, Heliodoro Tejeda, Aakriti Kumar, Catarina Belem, Sheer Karny, Xinyue Hu, Lukas W. Mayer, Padhraic Smyth
As artificial intelligence systems, particularly large language models (LLMs), become increasingly integrated into decision-making processes, the ability to trust their outputs is crucial. To earn human trust, LLMs must be well calibrated such that they can accurately assess and communicate the likelihood of their predictions being correct. Whereas recent work has focused on LLMs’ internal confidence, less is understood about how effectively they convey uncertainty to users. Here we explore the calibration gap, which refers to the difference between human confidence in LLM-generated answers and the models’ actual confidence, and the discrimination gap, which reflects how well humans and models can distinguish between correct and incorrect answers. Our experiments with multiple-choice and short-answer questions reveal that users tend to overestimate the accuracy of LLM responses when provided with default explanations. Moreover, longer explanations increased user confidence, even when the extra length did not improve answer accuracy. By adjusting LLM explanations to better reflect the models’ internal confidence, both the calibration gap and the discrimination gap narrowed, significantly improving user perception of LLM accuracy. These findings underscore the importance of accurate uncertainty communication and highlight the effect of explanation length in influencing user trust in artificial-intelligence-assisted decision-making environments. Understanding how people perceive and interpret uncertainty from large language models (LLMs) is crucial, as users often overestimate LLM accuracy, especially with default explanations. Steyvers et al. show that aligning LLM explanations with their internal confidence improves user perception.
{"title":"What large language models know and what people think they know","authors":"Mark Steyvers, Heliodoro Tejeda, Aakriti Kumar, Catarina Belem, Sheer Karny, Xinyue Hu, Lukas W. Mayer, Padhraic Smyth","doi":"10.1038/s42256-024-00976-7","DOIUrl":"10.1038/s42256-024-00976-7","url":null,"abstract":"As artificial intelligence systems, particularly large language models (LLMs), become increasingly integrated into decision-making processes, the ability to trust their outputs is crucial. To earn human trust, LLMs must be well calibrated such that they can accurately assess and communicate the likelihood of their predictions being correct. Whereas recent work has focused on LLMs’ internal confidence, less is understood about how effectively they convey uncertainty to users. Here we explore the calibration gap, which refers to the difference between human confidence in LLM-generated answers and the models’ actual confidence, and the discrimination gap, which reflects how well humans and models can distinguish between correct and incorrect answers. Our experiments with multiple-choice and short-answer questions reveal that users tend to overestimate the accuracy of LLM responses when provided with default explanations. Moreover, longer explanations increased user confidence, even when the extra length did not improve answer accuracy. By adjusting LLM explanations to better reflect the models’ internal confidence, both the calibration gap and the discrimination gap narrowed, significantly improving user perception of LLM accuracy. These findings underscore the importance of accurate uncertainty communication and highlight the effect of explanation length in influencing user trust in artificial-intelligence-assisted decision-making environments. Understanding how people perceive and interpret uncertainty from large language models (LLMs) is crucial, as users often overestimate LLM accuracy, especially with default explanations. Steyvers et al. show that aligning LLM explanations with their internal confidence improves user perception.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 2","pages":"221-231"},"PeriodicalIF":18.8,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s42256-024-00976-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142991513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}