Pub Date : 2025-10-03DOI: 10.1038/s43588-025-00887-6
Yixuan Wang, Xinyuan Liu, Yimin Fan, Binghui Xie, James Cheng, Kam Chung Wong, Peter Cheung, Irwin King, Yu Li
Drug repurposing through single-cell perturbation response prediction provides a cost-effective approach for drug development, but accurately predicting responses in unseen cell types that emerge during disease progression remains challenging. Existing methods struggle to achieve generalizable cell-type-specific predictions. To address these limitations, we introduce the cell-type-specific drug perturbatIon responses predictor (CRISP), a framework for predicting perturbation responses in previously unseen cell types at single-cell resolution. CRISP leverages foundation models and cell-type-specific learning strategies to enable effective transfer of information from control to perturbed states even with limited empirical data. Through systematic evaluation across increasingly challenging scenarios, from unseen cell types to cross-platform predictions, CRISP shows generalizability and performance improvements. We demonstrate CRISP’s drug repurposing potential through zero-shot prediction from solid tumor data to sorafenib’s therapeutic effects in chronic myeloid leukemia. The predicted anti-tumor mechanisms, including CXCR4 pathway inhibition, are supported by independent studies as an effective therapeutic strategy in chronic myeloid leukemia, aligning with past studies and clinical trials. This work develops CRISP, a framework using foundation models to predict drug responses in previously unseen cell types at single-cell resolution, advancing drug repurposing and drug screening capabilities.
{"title":"Predicting drug responses of unseen cell types through transfer learning with foundation models","authors":"Yixuan Wang, Xinyuan Liu, Yimin Fan, Binghui Xie, James Cheng, Kam Chung Wong, Peter Cheung, Irwin King, Yu Li","doi":"10.1038/s43588-025-00887-6","DOIUrl":"10.1038/s43588-025-00887-6","url":null,"abstract":"Drug repurposing through single-cell perturbation response prediction provides a cost-effective approach for drug development, but accurately predicting responses in unseen cell types that emerge during disease progression remains challenging. Existing methods struggle to achieve generalizable cell-type-specific predictions. To address these limitations, we introduce the cell-type-specific drug perturbatIon responses predictor (CRISP), a framework for predicting perturbation responses in previously unseen cell types at single-cell resolution. CRISP leverages foundation models and cell-type-specific learning strategies to enable effective transfer of information from control to perturbed states even with limited empirical data. Through systematic evaluation across increasingly challenging scenarios, from unseen cell types to cross-platform predictions, CRISP shows generalizability and performance improvements. We demonstrate CRISP’s drug repurposing potential through zero-shot prediction from solid tumor data to sorafenib’s therapeutic effects in chronic myeloid leukemia. The predicted anti-tumor mechanisms, including CXCR4 pathway inhibition, are supported by independent studies as an effective therapeutic strategy in chronic myeloid leukemia, aligning with past studies and clinical trials. This work develops CRISP, a framework using foundation models to predict drug responses in previously unseen cell types at single-cell resolution, advancing drug repurposing and drug screening capabilities.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"6 1","pages":"39-52"},"PeriodicalIF":18.3,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145226337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-03DOI: 10.1038/s43588-025-00880-z
Kunyi Li, Baozhen Shan, Lei Xin, Ming Li, Lusheng Wang
Here we propose a search algorithm for proteoform identification that computes the largest-size error-correction alignments between a protein mass graph and a spectrum mass graph. Our combined method uses a filtering algorithm to identify candidates and then applies a search algorithm to report the final results. Our exact searching method is 3.9 to 9.0 times faster than popular methods such as TopMG and TopPIC. Our combined method can further speed-up the running time of sTopMG without affecting the search accuracy. We develop a pipeline for generating simulated top-down spectra on the basis of input protein sequences with modifications. Experiments on simulated datasets show that our combined method has 95% accuracy, which exceeds existing methods. Experiments on real annotated datasets show that our method has ≥97.1% accuracy using deconvolution method FLASHDeconv. An algorithm for proteoform identification with top-down mass spectra is proposed, and a pipeline is developed for generating simulated top-down spectra on the basis of input protein sequences with modifications.
{"title":"Proteoform search from protein database with top-down mass spectra","authors":"Kunyi Li, Baozhen Shan, Lei Xin, Ming Li, Lusheng Wang","doi":"10.1038/s43588-025-00880-z","DOIUrl":"10.1038/s43588-025-00880-z","url":null,"abstract":"Here we propose a search algorithm for proteoform identification that computes the largest-size error-correction alignments between a protein mass graph and a spectrum mass graph. Our combined method uses a filtering algorithm to identify candidates and then applies a search algorithm to report the final results. Our exact searching method is 3.9 to 9.0 times faster than popular methods such as TopMG and TopPIC. Our combined method can further speed-up the running time of sTopMG without affecting the search accuracy. We develop a pipeline for generating simulated top-down spectra on the basis of input protein sequences with modifications. Experiments on simulated datasets show that our combined method has 95% accuracy, which exceeds existing methods. Experiments on real annotated datasets show that our method has ≥97.1% accuracy using deconvolution method FLASHDeconv. An algorithm for proteoform identification with top-down mass spectra is proposed, and a pipeline is developed for generating simulated top-down spectra on the basis of input protein sequences with modifications.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"5 11","pages":"998-1009"},"PeriodicalIF":18.3,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145226292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-02DOI: 10.1038/s43588-025-00892-9
We propose a computationally efficient genome-wide association study (GWAS) method, WtCoxG, for time-to-event (TTE) traits in the presence of case ascertainment— a form of oversampling bias. WtCoxG addresses case ascertainment bias by applying a weighted Cox proportional hazard model, and outperforms existing approaches when incorporating information on external allele frequencies.
{"title":"Boosting power for time-to-event GWAS analysis affected by case ascertainment","authors":"","doi":"10.1038/s43588-025-00892-9","DOIUrl":"10.1038/s43588-025-00892-9","url":null,"abstract":"We propose a computationally efficient genome-wide association study (GWAS) method, WtCoxG, for time-to-event (TTE) traits in the presence of case ascertainment— a form of oversampling bias. WtCoxG addresses case ascertainment bias by applying a weighted Cox proportional hazard model, and outperforms existing approaches when incorporating information on external allele frequencies.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"5 11","pages":"996-997"},"PeriodicalIF":18.3,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145214719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01DOI: 10.1038/s43588-025-00885-8
Evan Collins, Robert Langer, Daniel G. Anderson
Self-driving laboratories that integrate robotic production with artificial intelligence have the potential to accelerate innovation in biotechnology. Because self-driving labs can be complex and not universally applicable, it is useful to consider their suitable use cases for successful integration into discovery workflows. Here, we review strategies for assessing the suitability of self-driving labs for biochemical design problems.
{"title":"Self-driving labs for biotechnology","authors":"Evan Collins, Robert Langer, Daniel G. Anderson","doi":"10.1038/s43588-025-00885-8","DOIUrl":"10.1038/s43588-025-00885-8","url":null,"abstract":"Self-driving laboratories that integrate robotic production with artificial intelligence have the potential to accelerate innovation in biotechnology. Because self-driving labs can be complex and not universally applicable, it is useful to consider their suitable use cases for successful integration into discovery workflows. Here, we review strategies for assessing the suitability of self-driving labs for biochemical design problems.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"5 11","pages":"976-979"},"PeriodicalIF":18.3,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145208389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-26DOI: 10.1038/s43588-025-00878-7
Zhe Liu (, ), Yihang Bao (, ), An Gu (, ), Weichen Song (, ), Guan Ning Lin (, )
Noncoding mutations play a critical role in regulating gene expression, yet predicting their effects across diverse tissues and cell types remains a challenge. Here we present EMO, a transformer-based model that integrates DNA sequence with chromatin accessibility data (assay for transposase-accessible chromatin with sequencing) to predict the regulatory impact of noncoding single nucleotide polymorphisms on gene expression. A key component of EMO is its ability to incorporate personalized functional genomic profiles, enabling individual-level and disease-contextual predictions and addressing critical limitations of current approaches. EMO generalizes across tissues and cell types by modeling both short- and long-range regulatory interactions and capturing dynamic gene expression changes associated with disease progression. In benchmark evaluations, the pretraining-based EMO framework outperformed existing models, with fine-tuning small-sample tissues enhancing the model’s ability to fit target tissues. In single-cell contexts, EMO accurately identified cell-type-specific regulatory patterns and successfully captured the effects of disease-associated single nucleotide polymorphisms in conditions, linking genetic variation to disease-relevant pathways. EMO integrates DNA sequence and chromatin accessibility data to predict how noncoding variants regulate gene expression across tissues and single cells, enabling context-aware personalized insights into genetic effects for precision medicine.
{"title":"Predicting the regulatory impacts of noncoding variants on gene expression through epigenomic integration across tissues and single-cell landscapes","authors":"Zhe Liu \u0000 (, ), Yihang Bao \u0000 (, ), An Gu \u0000 (, ), Weichen Song \u0000 (, ), Guan Ning Lin \u0000 (, )","doi":"10.1038/s43588-025-00878-7","DOIUrl":"10.1038/s43588-025-00878-7","url":null,"abstract":"Noncoding mutations play a critical role in regulating gene expression, yet predicting their effects across diverse tissues and cell types remains a challenge. Here we present EMO, a transformer-based model that integrates DNA sequence with chromatin accessibility data (assay for transposase-accessible chromatin with sequencing) to predict the regulatory impact of noncoding single nucleotide polymorphisms on gene expression. A key component of EMO is its ability to incorporate personalized functional genomic profiles, enabling individual-level and disease-contextual predictions and addressing critical limitations of current approaches. EMO generalizes across tissues and cell types by modeling both short- and long-range regulatory interactions and capturing dynamic gene expression changes associated with disease progression. In benchmark evaluations, the pretraining-based EMO framework outperformed existing models, with fine-tuning small-sample tissues enhancing the model’s ability to fit target tissues. In single-cell contexts, EMO accurately identified cell-type-specific regulatory patterns and successfully captured the effects of disease-associated single nucleotide polymorphisms in conditions, linking genetic variation to disease-relevant pathways. EMO integrates DNA sequence and chromatin accessibility data to predict how noncoding variants regulate gene expression across tissues and single cells, enabling context-aware personalized insights into genetic effects for precision medicine.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"5 10","pages":"927-939"},"PeriodicalIF":18.3,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145180816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-26DOI: 10.1038/s43588-025-00876-9
Olesia Dogonasheva, Keith B. Doelling, Denis Zakharov, Anne-Lise Giraud, Boris Gutkin
Unraveling how humans understand speech despite distortions has long intrigued researchers. A prominent hypothesis highlights the role of multiple endogenous brain rhythms in forming the computational context to predict speech structure and content. Yet how neural processes may implement rhythm-based context formation remains unclear. Here we propose the brain rhythm-based inference model (BRyBI) as a possible neural implementation of speech processing in the auditory cortex based on the interaction of endogenous brain rhythms in a predictive coding framework. BRyBI encodes key rhythmic processes for parsing spectro-temporal representations of the speech signal into phoneme sequences and to govern the formation of the phrasal context. BRyBI matches patterns of human performance in speech recognition tasks and explains contradictory experimental observations of rhythms during speech listening and their dependence on the informational aspect of speech (uncertainty and surprise). This work highlights the computational role of multiscale brain rhythms in predictive speech processing. This study presents a brain rhythm-based inference model (BRyBI) for speech processing in the auditory cortex. BRyBI shows how rhythmic neural activity enables robust speech processing by dynamically predicting context and elucidates mechanistic principles that allow robust speech parsing in the brain.
{"title":"Rhythm-based hierarchical predictive computations support acoustic−semantic transformation in speech processing","authors":"Olesia Dogonasheva, Keith B. Doelling, Denis Zakharov, Anne-Lise Giraud, Boris Gutkin","doi":"10.1038/s43588-025-00876-9","DOIUrl":"10.1038/s43588-025-00876-9","url":null,"abstract":"Unraveling how humans understand speech despite distortions has long intrigued researchers. A prominent hypothesis highlights the role of multiple endogenous brain rhythms in forming the computational context to predict speech structure and content. Yet how neural processes may implement rhythm-based context formation remains unclear. Here we propose the brain rhythm-based inference model (BRyBI) as a possible neural implementation of speech processing in the auditory cortex based on the interaction of endogenous brain rhythms in a predictive coding framework. BRyBI encodes key rhythmic processes for parsing spectro-temporal representations of the speech signal into phoneme sequences and to govern the formation of the phrasal context. BRyBI matches patterns of human performance in speech recognition tasks and explains contradictory experimental observations of rhythms during speech listening and their dependence on the informational aspect of speech (uncertainty and surprise). This work highlights the computational role of multiscale brain rhythms in predictive speech processing. This study presents a brain rhythm-based inference model (BRyBI) for speech processing in the auditory cortex. BRyBI shows how rhythmic neural activity enables robust speech processing by dynamically predicting context and elucidates mechanistic principles that allow robust speech parsing in the brain.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"5 10","pages":"915-926"},"PeriodicalIF":18.3,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145180835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-24DOI: 10.1038/s43588-025-00890-x
This issue of Nature Computational Science features a Focus that highlights both the promises and perils of large language models, their emerging applications across diverse scientific domains, and the opportunities to overcome the challenges that lie ahead.
{"title":"The rise of large language models","authors":"","doi":"10.1038/s43588-025-00890-x","DOIUrl":"10.1038/s43588-025-00890-x","url":null,"abstract":"This issue of Nature Computational Science features a Focus that highlights both the promises and perils of large language models, their emerging applications across diverse scientific domains, and the opportunities to overcome the challenges that lie ahead.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"5 9","pages":"689-690"},"PeriodicalIF":18.3,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.comhttps://www.nature.com/articles/s43588-025-00890-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145129541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-16DOI: 10.1038/s43588-025-00868-9
Nathan Leroux, Jan Finkbeiner, Emre Neftci
Strong barriers remain between neuromorphic engineering and machine learning, especially with regard to recent large language models (LLMs) and transformers. This Comment makes the case that neuromorphic engineering may hold the keys to more efficient inference with transformer-like models.
{"title":"Neuromorphic principles in self-attention hardware for efficient transformers","authors":"Nathan Leroux, Jan Finkbeiner, Emre Neftci","doi":"10.1038/s43588-025-00868-9","DOIUrl":"10.1038/s43588-025-00868-9","url":null,"abstract":"Strong barriers remain between neuromorphic engineering and machine learning, especially with regard to recent large language models (LLMs) and transformers. This Comment makes the case that neuromorphic engineering may hold the keys to more efficient inference with transformer-like models.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"5 9","pages":"708-710"},"PeriodicalIF":18.3,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145076704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-16DOI: 10.1038/s43588-025-00861-2
Eva Portelance, Masoud Jasbi
Chomsky’s generative linguistics has made substantial contributions to cognitive science and symbolic artificial intelligence. With the rise of neural language models, however, the compatibility between generative artificial intelligence and generative linguistics has come under debate. Here we outline three ways in which generative artificial intelligence aligns with and supports the core ideas of generative linguistics. In turn, generative linguistics can provide criteria to evaluate and improve neural language models as models of human language and cognition. This Perspective discusses that generative AI aligns with generative linguistics by showing that neural language models (NLMs) are formal generative models. Furthermore, generative linguistics offers a framework for evaluating and improving NLMs.
{"title":"On the compatibility of generative AI and generative linguistics","authors":"Eva Portelance, Masoud Jasbi","doi":"10.1038/s43588-025-00861-2","DOIUrl":"10.1038/s43588-025-00861-2","url":null,"abstract":"Chomsky’s generative linguistics has made substantial contributions to cognitive science and symbolic artificial intelligence. With the rise of neural language models, however, the compatibility between generative artificial intelligence and generative linguistics has come under debate. Here we outline three ways in which generative artificial intelligence aligns with and supports the core ideas of generative linguistics. In turn, generative linguistics can provide criteria to evaluate and improve neural language models as models of human language and cognition. This Perspective discusses that generative AI aligns with generative linguistics by showing that neural language models (NLMs) are formal generative models. Furthermore, generative linguistics offers a framework for evaluating and improving NLMs.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"5 9","pages":"745-753"},"PeriodicalIF":18.3,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145076692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Transformer-based large language models (LLMs) have considerably advanced our understanding of how meaning is represented in the human brain; however, the validity of increasingly large LLMs is being questioned due to their extensive training data and their ability to access context thousands of words long. In this study we investigated whether instruction tuning—another core technique in recent LLMs that goes beyond mere scaling—can enhance models’ ability to capture linguistic information in the human brain. We compared base and instruction-tuned LLMs of varying sizes against human behavioral and brain activity measured with eye-tracking and functional magnetic resonance imaging during naturalistic reading. We show that simply making LLMs larger leads to a closer match with the human brain than fine-tuning them with instructions. These finding have substantial implications for understanding the cognitive plausibility of LLMs and their role in studying naturalistic language comprehension. Larger LLMs’ self-attention more accurately predicts readers’ regressive saccades and fMRI responses in language regions, whereas instruction tuning adds no benefit.
{"title":"Increasing alignment of large language models with language processing in the human brain","authors":"Changjiang Gao, Zhengwu Ma, Jiajun Chen, Ping Li, Shujian Huang, Jixing Li","doi":"10.1038/s43588-025-00863-0","DOIUrl":"10.1038/s43588-025-00863-0","url":null,"abstract":"Transformer-based large language models (LLMs) have considerably advanced our understanding of how meaning is represented in the human brain; however, the validity of increasingly large LLMs is being questioned due to their extensive training data and their ability to access context thousands of words long. In this study we investigated whether instruction tuning—another core technique in recent LLMs that goes beyond mere scaling—can enhance models’ ability to capture linguistic information in the human brain. We compared base and instruction-tuned LLMs of varying sizes against human behavioral and brain activity measured with eye-tracking and functional magnetic resonance imaging during naturalistic reading. We show that simply making LLMs larger leads to a closer match with the human brain than fine-tuning them with instructions. These finding have substantial implications for understanding the cognitive plausibility of LLMs and their role in studying naturalistic language comprehension. Larger LLMs’ self-attention more accurately predicts readers’ regressive saccades and fMRI responses in language regions, whereas instruction tuning adds no benefit.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"5 11","pages":"1080-1090"},"PeriodicalIF":18.3,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.comhttps://www.nature.com/articles/s43588-025-00863-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145076701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}