Pub Date : 2026-02-27DOI: 10.1038/s42256-026-01199-8
Dhruv Ahlawat, Vaibhav Mishra, Somaditya Singh, Mohd Zaki, Vaibhav Bihani, Hargun Singh Grover, Biswajit Mishra, Santiago Miret, Mausam, N. M. Anoop Krishnan
Materials discovery and development are critical for addressing global challenges in renewable energy, sustainability, and advanced technology. Large language models (LLMs) offer unprecedented opportunities to accelerate materials research, yet their effective deployment requires domain-specific adaptation. Here we present large language models for materials (LLaMat), a family of foundational models for materials science, developed through continued pretraining of LLaMA models on 30 billion tokens derived from approximately 4 million materials science publications and crystallographic data. To develop a materials copilot, the models were adapted by instruction and task fine-tuning on 175,000 materials science question-answering pairs. Through evaluation across 42 tasks covering the entire spectrum of materials research, spanning natural language processing, structured information extraction and crystal generation, we demonstrate that LLaMat consistently outperforms state-of-the-art commercial LLMs (Claude, GPT and Gemini) while maintaining general linguistic capabilities. Beyond demonstrating the effectiveness of domain adaptation for practically deployable materials research copilots, our findings also reveal fundamental insights about LLM adaptation that may influence the development of specialized scientific artificial intelligence systems across domains. For instance, we identify increasing rigidity to domain adaptation in extensively pretrained LLMs such as LLaMA-3. This consistent pattern observed across the experiments suggests a previously unidentified ‘adaptation rigidity’, where overtrained LLMs exhibit increasing rigidity to domain adaptation.
{"title":"A family of large language models for materials research with insights into model adaptability in continued pretraining","authors":"Dhruv Ahlawat, Vaibhav Mishra, Somaditya Singh, Mohd Zaki, Vaibhav Bihani, Hargun Singh Grover, Biswajit Mishra, Santiago Miret, Mausam, N. M. Anoop Krishnan","doi":"10.1038/s42256-026-01199-8","DOIUrl":"https://doi.org/10.1038/s42256-026-01199-8","url":null,"abstract":"Materials discovery and development are critical for addressing global challenges in renewable energy, sustainability, and advanced technology. Large language models (LLMs) offer unprecedented opportunities to accelerate materials research, yet their effective deployment requires domain-specific adaptation. Here we present large language models for materials (LLaMat), a family of foundational models for materials science, developed through continued pretraining of LLaMA models on 30 billion tokens derived from approximately 4 million materials science publications and crystallographic data. To develop a materials copilot, the models were adapted by instruction and task fine-tuning on 175,000 materials science question-answering pairs. Through evaluation across 42 tasks covering the entire spectrum of materials research, spanning natural language processing, structured information extraction and crystal generation, we demonstrate that LLaMat consistently outperforms state-of-the-art commercial LLMs (Claude, GPT and Gemini) while maintaining general linguistic capabilities. Beyond demonstrating the effectiveness of domain adaptation for practically deployable materials research copilots, our findings also reveal fundamental insights about LLM adaptation that may influence the development of specialized scientific artificial intelligence systems across domains. For instance, we identify increasing rigidity to domain adaptation in extensively pretrained LLMs such as LLaMA-3. This consistent pattern observed across the experiments suggests a previously unidentified ‘adaptation rigidity’, where overtrained LLMs exhibit increasing rigidity to domain adaptation.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"420 1","pages":""},"PeriodicalIF":23.8,"publicationDate":"2026-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147320214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-24DOI: 10.1038/s42256-026-01203-1
Almost 10 years ago, AlphaGo defeated one of the world’s best professional players in the complex, ancient game of Go. It was a pivotal moment that spawned new research directions and marked the beginning of a busy decade in AI development.
{"title":"AI and the long game","authors":"","doi":"10.1038/s42256-026-01203-1","DOIUrl":"10.1038/s42256-026-01203-1","url":null,"abstract":"Almost 10 years ago, AlphaGo defeated one of the world’s best professional players in the complex, ancient game of Go. It was a pivotal moment that spawned new research directions and marked the beginning of a busy decade in AI development.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"8 2","pages":"135-135"},"PeriodicalIF":23.9,"publicationDate":"2026-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.comhttps://www.nature.com/articles/s42256-026-01203-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147275120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-24DOI: 10.1038/s42256-026-01180-5
Xiao Gu, Wei Tang, Jinpei Han, Veer Sangha, Fenglin Liu, Shreyank N. Gowda, Antonio H. Ribeiro, Patrick Schwab, Kim Branson, Lei Clifton, Antonio Luiz P. Ribeiro, Zhangdaihong Liu, David A. Clifton
Cardiovascular diseases remain a major contributor to the global burden of healthcare, highlighting the importance of accurate and scalable methods for cardiac monitoring. Cardiac biosignals, most notably electrocardiograms (ECG) and photoplethysmograms, are essential for diagnosing, preventing and managing cardiovascular conditions across clinical and home settings. However, their acquisition varies substantially across scenarios and devices, whereas existing analytical models often rely on homogeneous datasets and static bespoke models, limiting their robustness and generalizability in diverse real-world contexts. Here we present a cardiac sensing foundation model (CSFM) that leverages transformer architectures and a generative masked pretraining strategy to learn unified representations from heterogeneous health records. CSFM is pretrained on a multimodal integration of data from various large-scale datasets, comprising cardiac signals from approximately 1.7 million individuals and their corresponding clinical or machine-generated text reports. The embeddings derived from CSFM act as effective, transferable features across diverse cardiac sensing scenarios, supporting a seamless adaptation to the varied input configurations and sensor modalities. Extensive evaluations across diagnostic tasks, demographic recognition, vital sign measurement, clinical outcome prediction and ECG question answering demonstrate that CSFM consistently outperforms traditional one-modal-one-task approaches. Notably, CSFM maintains favourable performance across both 12-lead and single-lead ECGs, as well as in scenarios involving ECG only, photoplethysmogram only or a combination of both. This highlights its potential as a versatile and scalable foundation for comprehensive cardiac monitoring. Gu et al. introduce a cardiac foundation model that learns from millions of heart signals and textual interpretations, enabling it to handle heart data collected either in hospitals or at home. It offers clear and reliable insights across different devices and settings.
{"title":"Cardiac health assessment across scenarios and devices using a multimodal foundation model pretrained on data from 1.7 million individuals","authors":"Xiao Gu, Wei Tang, Jinpei Han, Veer Sangha, Fenglin Liu, Shreyank N. Gowda, Antonio H. Ribeiro, Patrick Schwab, Kim Branson, Lei Clifton, Antonio Luiz P. Ribeiro, Zhangdaihong Liu, David A. Clifton","doi":"10.1038/s42256-026-01180-5","DOIUrl":"10.1038/s42256-026-01180-5","url":null,"abstract":"Cardiovascular diseases remain a major contributor to the global burden of healthcare, highlighting the importance of accurate and scalable methods for cardiac monitoring. Cardiac biosignals, most notably electrocardiograms (ECG) and photoplethysmograms, are essential for diagnosing, preventing and managing cardiovascular conditions across clinical and home settings. However, their acquisition varies substantially across scenarios and devices, whereas existing analytical models often rely on homogeneous datasets and static bespoke models, limiting their robustness and generalizability in diverse real-world contexts. Here we present a cardiac sensing foundation model (CSFM) that leverages transformer architectures and a generative masked pretraining strategy to learn unified representations from heterogeneous health records. CSFM is pretrained on a multimodal integration of data from various large-scale datasets, comprising cardiac signals from approximately 1.7 million individuals and their corresponding clinical or machine-generated text reports. The embeddings derived from CSFM act as effective, transferable features across diverse cardiac sensing scenarios, supporting a seamless adaptation to the varied input configurations and sensor modalities. Extensive evaluations across diagnostic tasks, demographic recognition, vital sign measurement, clinical outcome prediction and ECG question answering demonstrate that CSFM consistently outperforms traditional one-modal-one-task approaches. Notably, CSFM maintains favourable performance across both 12-lead and single-lead ECGs, as well as in scenarios involving ECG only, photoplethysmogram only or a combination of both. This highlights its potential as a versatile and scalable foundation for comprehensive cardiac monitoring. Gu et al. introduce a cardiac foundation model that learns from millions of heart signals and textual interpretations, enabling it to handle heart data collected either in hospitals or at home. It offers clear and reliable insights across different devices and settings.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"8 2","pages":"220-233"},"PeriodicalIF":23.9,"publicationDate":"2026-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.comhttps://www.nature.com/articles/s42256-026-01180-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147275125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The clinical translation of miniature medical devices (MMDs) for minimally invasive surgery promises transformative advances in biomedical engineering, offering enhanced precision, reduced patient trauma and faster recovery times. However, their effective deployment in complex anatomies under real-time X-ray guidance—a widely used surgical imaging modality—presents challenges such as low imaging quality and difficulties of spatial MMD control. Manual identification and operation are labour intensive and error prone. Meanwhile, deep learning-based automation is limited by the scarcity of annotated X-ray datasets of MMDs owing to costly data collection, laborious annotation and privacy constraints. Here we introduce MicroSyn-X, a framework for training computer vision models to enable robotic teleoperation of MMDs using synthesized high-fidelity, pixel-accurate, auto-labelled and domain-randomized X-ray images, eliminating manual data curation. Integrating MicroSyn-X into a teleoperated robotic system enables real-time localization and navigation of magnetic soft and magnetic liquid MMDs within both ex vivo and dynamic in vivo environments, demonstrating robustness under challenging imaging conditions of low contrast, high noise and occlusion. With these promises, we open source the X-ray MMD dataset to enable benchmarking. Addressing data scarcity and enabling real-time robotic navigation, this work advances MMD-assisted minimally invasive surgery towards next-generation precision interventions. Wang et al. introduce MicroSyn-X, a synthetic X-ray data generation framework that overcomes data scarcity in miniature medical devices, enabling robust deep learning-based tracking and real-time robotic navigation in challenging surgical settings.
{"title":"Synthetic X‑ray‑driven tracking and control of miniature medical devices","authors":"Chunxiang Wang, Wenbin Kang, Mengmeng Sun, Hongchuan Zhang, Chong Hong, Sinan Ozgun Demir, Halim Ugurlu, Kun Hao, Zemin Liu, Tianlu Wang, Metin Sitti","doi":"10.1038/s42256-026-01190-3","DOIUrl":"10.1038/s42256-026-01190-3","url":null,"abstract":"The clinical translation of miniature medical devices (MMDs) for minimally invasive surgery promises transformative advances in biomedical engineering, offering enhanced precision, reduced patient trauma and faster recovery times. However, their effective deployment in complex anatomies under real-time X-ray guidance—a widely used surgical imaging modality—presents challenges such as low imaging quality and difficulties of spatial MMD control. Manual identification and operation are labour intensive and error prone. Meanwhile, deep learning-based automation is limited by the scarcity of annotated X-ray datasets of MMDs owing to costly data collection, laborious annotation and privacy constraints. Here we introduce MicroSyn-X, a framework for training computer vision models to enable robotic teleoperation of MMDs using synthesized high-fidelity, pixel-accurate, auto-labelled and domain-randomized X-ray images, eliminating manual data curation. Integrating MicroSyn-X into a teleoperated robotic system enables real-time localization and navigation of magnetic soft and magnetic liquid MMDs within both ex vivo and dynamic in vivo environments, demonstrating robustness under challenging imaging conditions of low contrast, high noise and occlusion. With these promises, we open source the X-ray MMD dataset to enable benchmarking. Addressing data scarcity and enabling real-time robotic navigation, this work advances MMD-assisted minimally invasive surgery towards next-generation precision interventions. Wang et al. introduce MicroSyn-X, a synthetic X-ray data generation framework that overcomes data scarcity in miniature medical devices, enabling robust deep learning-based tracking and real-time robotic navigation in challenging surgical settings.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"8 2","pages":"276-291"},"PeriodicalIF":23.9,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.comhttps://www.nature.com/articles/s42256-026-01190-3.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147275126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-23DOI: 10.1038/s42256-026-01188-x
Nitya Thakkar, Mert Yuksekgonul, Jake Silberg, Animesh Garg, Nanyun Peng, Fei Sha, Rose Yu, Carl Vondrick, James Zou
{"title":"A large-scale randomized study of large language model feedback in peer review","authors":"Nitya Thakkar, Mert Yuksekgonul, Jake Silberg, Animesh Garg, Nanyun Peng, Fei Sha, Rose Yu, Carl Vondrick, James Zou","doi":"10.1038/s42256-026-01188-x","DOIUrl":"https://doi.org/10.1038/s42256-026-01188-x","url":null,"abstract":"","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"71 1","pages":""},"PeriodicalIF":23.8,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147279256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-20DOI: 10.1038/s42256-026-01182-3
Shenglong Zhou (, ), Ouya Wang (, ), Ziyan Luo (, ), Yongxu Zhu (, ), Geoffrey Ye Li (, )
Deep learning models are usually trained with stochastic gradient descent-based algorithms, but these optimizers face inherent limitations, such as slow convergence and stringent assumptions for convergence. In particular, data heterogeneity arising from distributed settings poses significant challenges to their theoretical and numerical performance. Here we develop an algorithm called PISA (preconditioned inexact stochastic alternating direction method of multipliers). Grounded in rigorous theoretical guarantees, the algorithm converges under the sole assumption of Lipschitz continuity of the gradient on a bounded region, thereby removing the need for other conditions commonly imposed by stochastic methods. This capability enables the proposed algorithm to tackle the challenge of data heterogeneity effectively. Moreover, the algorithmic architecture enables scalable parallel computing and supports various preconditions, such as second-order information, second moment and orthogonalized momentum by Newton–Schulz iterations. Incorporating the last two preconditions in PISA yields two computationally efficient variants: SISA and NSISA. Comprehensive experimental evaluations for training or fine-tuning diverse deep models, including vision models, large language models, reinforcement learning models, generative adversarial networks and recurrent neural networks, demonstrate superior numerical performance of SISA and NSISA compared with various state-of-the-art optimizers. Zhou et al. develop PISA, an optimizer for deep learning models that supports heterogeneous data and various preconditions. It converges under minimal assumptions, while outperforming established methods for diverse tasks.
{"title":"Preconditioned inexact stochastic ADMM for deep models","authors":"Shenglong Zhou \u0000 (, ), Ouya Wang \u0000 (, ), Ziyan Luo \u0000 (, ), Yongxu Zhu \u0000 (, ), Geoffrey Ye Li \u0000 (, )","doi":"10.1038/s42256-026-01182-3","DOIUrl":"10.1038/s42256-026-01182-3","url":null,"abstract":"Deep learning models are usually trained with stochastic gradient descent-based algorithms, but these optimizers face inherent limitations, such as slow convergence and stringent assumptions for convergence. In particular, data heterogeneity arising from distributed settings poses significant challenges to their theoretical and numerical performance. Here we develop an algorithm called PISA (preconditioned inexact stochastic alternating direction method of multipliers). Grounded in rigorous theoretical guarantees, the algorithm converges under the sole assumption of Lipschitz continuity of the gradient on a bounded region, thereby removing the need for other conditions commonly imposed by stochastic methods. This capability enables the proposed algorithm to tackle the challenge of data heterogeneity effectively. Moreover, the algorithmic architecture enables scalable parallel computing and supports various preconditions, such as second-order information, second moment and orthogonalized momentum by Newton–Schulz iterations. Incorporating the last two preconditions in PISA yields two computationally efficient variants: SISA and NSISA. Comprehensive experimental evaluations for training or fine-tuning diverse deep models, including vision models, large language models, reinforcement learning models, generative adversarial networks and recurrent neural networks, demonstrate superior numerical performance of SISA and NSISA compared with various state-of-the-art optimizers. Zhou et al. develop PISA, an optimizer for deep learning models that supports heterogeneous data and various preconditions. It converges under minimal assumptions, while outperforming established methods for diverse tasks.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"8 2","pages":"234-245"},"PeriodicalIF":23.9,"publicationDate":"2026-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.comhttps://www.nature.com/articles/s42256-026-01182-3.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-19DOI: 10.1038/s42256-025-01153-0
Sören Arlt, Haonan Duan, Felix Li, Sang Michael Xie, Yuhuai Wu, Mario Krenn
Artificial intelligence can solve complex scientific problems beyond human capabilities, but the resulting solutions offer little insight into the underlying physical principles. One prominent example is quantum physics, where computers can discover experiments for the generation of specific quantum states, but it is unclear how finding general design concepts can be automated. Here we address this challenge by training a transformer-based language model to create human-readable Python code that generates entire families of experiments. The model is trained on millions of synthetic examples of quantum states and their corresponding experimental blueprints, enabling it to infer general construction rules rather than isolated solutions. This strategy, which we call meta-design, enables scientists to gain a deeper understanding and to extrapolate to larger experiments without additional optimization. We demonstrate that the approach can rediscover known design principles and uncover previously unknown generalizations of important quantum states, such as those from condensed-matter physics. Beyond quantum optics, the methodology provides a blueprint for applying language models to interpretable, generalizable scientific discovery across disciplines such as materials science and engineering. Language models can write human-readable code that captures general design rules, generating whole families of quantum experiments at once. A design strategy described here makes results interpretable and scalable, as well as accelerates discovery.
{"title":"Meta-designing quantum experiments with language models","authors":"Sören Arlt, Haonan Duan, Felix Li, Sang Michael Xie, Yuhuai Wu, Mario Krenn","doi":"10.1038/s42256-025-01153-0","DOIUrl":"10.1038/s42256-025-01153-0","url":null,"abstract":"Artificial intelligence can solve complex scientific problems beyond human capabilities, but the resulting solutions offer little insight into the underlying physical principles. One prominent example is quantum physics, where computers can discover experiments for the generation of specific quantum states, but it is unclear how finding general design concepts can be automated. Here we address this challenge by training a transformer-based language model to create human-readable Python code that generates entire families of experiments. The model is trained on millions of synthetic examples of quantum states and their corresponding experimental blueprints, enabling it to infer general construction rules rather than isolated solutions. This strategy, which we call meta-design, enables scientists to gain a deeper understanding and to extrapolate to larger experiments without additional optimization. We demonstrate that the approach can rediscover known design principles and uncover previously unknown generalizations of important quantum states, such as those from condensed-matter physics. Beyond quantum optics, the methodology provides a blueprint for applying language models to interpretable, generalizable scientific discovery across disciplines such as materials science and engineering. Language models can write human-readable code that captures general design rules, generating whole families of quantum experiments at once. A design strategy described here makes results interpretable and scalable, as well as accelerates discovery.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"8 2","pages":"148-157"},"PeriodicalIF":23.9,"publicationDate":"2026-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.comhttps://www.nature.com/articles/s42256-025-01153-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-17DOI: 10.1038/s42256-026-01185-0
Menoua Keshishian, Gavin Mischler, Samuel Thomas, Brian Kingsbury, Stephan Bickel, Ashesh D. Mehta, Nima Mesgarani
Transforming continuous acoustic speech signals into discrete linguistic meaning is a remarkable computational feat accomplished by both the human brain and modern artificial intelligence. A key scientific question is whether these biological and artificial systems, despite their different architectures, converge on similar strategies to solve this challenge. Although automatic speech recognition systems now achieve human-level performance, research on their parallels with the brain has been limited by biologically implausible, non-causal models and comparisons that stop at predicting brain activity without detailing the alignment of the underlying representations. Furthermore, studies using text-based models overlook the crucial acoustic stages of speech processing. Here we bridge these gaps by uncovering a striking correspondence between the brain’s processing hierarchy and the model’s internal representations using high-resolution intracranial recordings and a causal, recurrent automatic speech recognition model. Specifically, we demonstrate a deep alignment in their algorithmic approach: neural activity in distinct cortical regions maps topographically to corresponding model layers, and critically, the representational content at each stage follows a parallel progression from acoustic to phonetic, lexical and semantic information. This work thus moves beyond demonstrating simple model–brain alignment to specifying the shared underlying representations at each stage of processing, providing direct evidence that both systems converge on a similar computational strategy for transforming sound into meaning. Keshishian, Mischler et al. report that a recurrent automatic speech recognition system aligns closely with brain organization: model layers map to distinct cortical regions and naturally learn to encode a parallel progression from acoustic to phonetic, lexical and semantic content.
{"title":"Parallel hierarchical encoding of linguistic representations in the human auditory cortex and recurrent automatic speech recognition systems","authors":"Menoua Keshishian, Gavin Mischler, Samuel Thomas, Brian Kingsbury, Stephan Bickel, Ashesh D. Mehta, Nima Mesgarani","doi":"10.1038/s42256-026-01185-0","DOIUrl":"10.1038/s42256-026-01185-0","url":null,"abstract":"Transforming continuous acoustic speech signals into discrete linguistic meaning is a remarkable computational feat accomplished by both the human brain and modern artificial intelligence. A key scientific question is whether these biological and artificial systems, despite their different architectures, converge on similar strategies to solve this challenge. Although automatic speech recognition systems now achieve human-level performance, research on their parallels with the brain has been limited by biologically implausible, non-causal models and comparisons that stop at predicting brain activity without detailing the alignment of the underlying representations. Furthermore, studies using text-based models overlook the crucial acoustic stages of speech processing. Here we bridge these gaps by uncovering a striking correspondence between the brain’s processing hierarchy and the model’s internal representations using high-resolution intracranial recordings and a causal, recurrent automatic speech recognition model. Specifically, we demonstrate a deep alignment in their algorithmic approach: neural activity in distinct cortical regions maps topographically to corresponding model layers, and critically, the representational content at each stage follows a parallel progression from acoustic to phonetic, lexical and semantic information. This work thus moves beyond demonstrating simple model–brain alignment to specifying the shared underlying representations at each stage of processing, providing direct evidence that both systems converge on a similar computational strategy for transforming sound into meaning. Keshishian, Mischler et al. report that a recurrent automatic speech recognition system aligns closely with brain organization: model layers map to distinct cortical regions and naturally learn to encode a parallel progression from acoustic to phonetic, lexical and semantic content.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"8 2","pages":"257-269"},"PeriodicalIF":23.9,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146205128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}