Pub Date : 2024-06-06DOI: 10.1038/s42256-024-00845-3
Kevin Max, Laura Kriener, Garibaldi Pineda García, Thomas Nowotny, Ismael Jaras, Walter Senn, Mihai A. Petrovici
Models of sensory processing and learning in the cortex need to efficiently assign credit to synapses in all areas. In deep learning, a known solution is error backpropagation, which requires biologically implausible weight transport from feed-forwards to feedback paths. We introduce phaseless alignment learning, a bio-plausible method to learn efficient feedback weights in layered cortical hierarchies. This is achieved by exploiting the noise naturally found in biophysical systems as an additional carrier of information. In our dynamical system, all weights are learned simultaneously with always-on plasticity and using only information locally available to the synapses. Our method is completely phase-free (no forwards and backwards passes or phased learning) and allows for efficient error propagation across multi-layer cortical hierarchies, while maintaining biologically plausible signal transport and learning. Our method is applicable to a wide class of models and improves on previously known biologically plausible ways of credit assignment: compared to random synaptic feedback, it can solve complex tasks with fewer neurons and learn more useful latent representations. We demonstrate this on various classification tasks using a cortical microcircuit model with prospective coding. The credit assignment problem involves assigning credit to synapses in a neural network so that weights are updated appropriately and the circuit learns. Max et al. developed an efficient solution to the weight transport problem in networks of biophysical neurons. The method exploits noise as an information carrier and enables networks to learn to solve a task efficiently.
{"title":"Learning efficient backprojections across cortical hierarchies in real time","authors":"Kevin Max, Laura Kriener, Garibaldi Pineda García, Thomas Nowotny, Ismael Jaras, Walter Senn, Mihai A. Petrovici","doi":"10.1038/s42256-024-00845-3","DOIUrl":"10.1038/s42256-024-00845-3","url":null,"abstract":"Models of sensory processing and learning in the cortex need to efficiently assign credit to synapses in all areas. In deep learning, a known solution is error backpropagation, which requires biologically implausible weight transport from feed-forwards to feedback paths. We introduce phaseless alignment learning, a bio-plausible method to learn efficient feedback weights in layered cortical hierarchies. This is achieved by exploiting the noise naturally found in biophysical systems as an additional carrier of information. In our dynamical system, all weights are learned simultaneously with always-on plasticity and using only information locally available to the synapses. Our method is completely phase-free (no forwards and backwards passes or phased learning) and allows for efficient error propagation across multi-layer cortical hierarchies, while maintaining biologically plausible signal transport and learning. Our method is applicable to a wide class of models and improves on previously known biologically plausible ways of credit assignment: compared to random synaptic feedback, it can solve complex tasks with fewer neurons and learn more useful latent representations. We demonstrate this on various classification tasks using a cortical microcircuit model with prospective coding. The credit assignment problem involves assigning credit to synapses in a neural network so that weights are updated appropriately and the circuit learns. Max et al. developed an efficient solution to the weight transport problem in networks of biophysical neurons. The method exploits noise as an information carrier and enables networks to learn to solve a task efficiently.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"6 6","pages":"619-630"},"PeriodicalIF":18.8,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141264732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Developing robust methods for evaluating protein–ligand interactions has been a long-standing problem. Data-driven methods may memorize ligand and protein training data rather than learning protein–ligand interactions. Here we show a scoring approach called EquiScore, which utilizes a heterogeneous graph neural network to integrate physical prior knowledge and characterize protein–ligand interactions in equivariant geometric space. EquiScore is trained based on a new dataset constructed with multiple data augmentation strategies and a stringent redundancy-removal scheme. On two large external test sets, EquiScore consistently achieved top-ranking performance compared to 21 other methods. When EquiScore is used alongside different docking methods, it can effectively enhance the screening ability of these docking methods. EquiScore also showed good performance on the activity-ranking task of a series of structural analogues, indicating its potential to guide lead compound optimization. Finally, we investigated different levels of interpretability of EquiScore, which may provide more insights into structure-based drug design. Machine learning can improve scoring methods to evaluate protein–ligand interactions, but achieving good generalization is an outstanding challenge. Cao et al. introduce EquiScore, which is based on a graph neural network that integrates physical knowledge and is shown to have robust capabilities when applied to unseen protein targets.
{"title":"Generic protein–ligand interaction scoring by integrating physical prior knowledge and data augmentation modelling","authors":"Duanhua Cao, Geng Chen, Jiaxin Jiang, Jie Yu, Runze Zhang, Mingan Chen, Wei Zhang, Lifan Chen, Feisheng Zhong, Yingying Zhang, Chenghao Lu, Xutong Li, Xiaomin Luo, Sulin Zhang, Mingyue Zheng","doi":"10.1038/s42256-024-00849-z","DOIUrl":"10.1038/s42256-024-00849-z","url":null,"abstract":"Developing robust methods for evaluating protein–ligand interactions has been a long-standing problem. Data-driven methods may memorize ligand and protein training data rather than learning protein–ligand interactions. Here we show a scoring approach called EquiScore, which utilizes a heterogeneous graph neural network to integrate physical prior knowledge and characterize protein–ligand interactions in equivariant geometric space. EquiScore is trained based on a new dataset constructed with multiple data augmentation strategies and a stringent redundancy-removal scheme. On two large external test sets, EquiScore consistently achieved top-ranking performance compared to 21 other methods. When EquiScore is used alongside different docking methods, it can effectively enhance the screening ability of these docking methods. EquiScore also showed good performance on the activity-ranking task of a series of structural analogues, indicating its potential to guide lead compound optimization. Finally, we investigated different levels of interpretability of EquiScore, which may provide more insights into structure-based drug design. Machine learning can improve scoring methods to evaluate protein–ligand interactions, but achieving good generalization is an outstanding challenge. Cao et al. introduce EquiScore, which is based on a graph neural network that integrates physical knowledge and is shown to have robust capabilities when applied to unseen protein targets.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"6 6","pages":"688-700"},"PeriodicalIF":18.8,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141264733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-30DOI: 10.1038/s42256-024-00833-7
Nasimeh Heydaribeni, Xinrui Zhan, Ruisi Zhang, Tina Eliassi-Rad, Farinaz Koushanfar
Scalable addressing of high-dimensional constrained combinatorial optimization problems is a challenge that arises in several science and engineering disciplines. Recent work introduced novel applications of graph neural networks for solving quadratic-cost combinatorial optimization problems. However, effective utilization of models such as graph neural networks to address general problems with higher-order constraints is an unresolved challenge. This paper presents a framework, HypOp, that advances the state of the art for solving combinatorial optimization problems in several aspects: (1) it generalizes the prior results to higher-order constrained problems with arbitrary cost functions by leveraging hypergraph neural networks; (2) it enables scalability to larger problems by introducing a new distributed and parallel training architecture; (3) it demonstrates generalizability across different problem formulations by transferring knowledge within the same hypergraph; (4) it substantially boosts the solution accuracy compared with the prior art by suggesting a fine-tuning step using simulated annealing; and (5) it shows remarkable progress on numerous benchmark examples, including hypergraph MaxCut, satisfiability and resource allocation problems, with notable run-time improvements using a combination of fine-tuning and distributed training techniques. We showcase the application of HypOp in scientific discovery by solving a hypergraph MaxCut problem on a National Drug Code drug-substance hypergraph. Through extensive experimentation on various optimization problems, HypOp demonstrates superiority over existing unsupervised-learning-based solvers and generic optimization methods. Bolstering the broad and deep applicability of graph neural networks, Heydaribeni et al. introduce HypOp, a framework that uses hypergraph neural networks to solve general constrained combinatorial optimization problems. The presented method scales and generalizes well, improves accuracy and outperforms existing solvers on various benchmarking examples.
{"title":"Distributed constrained combinatorial optimization leveraging hypergraph neural networks","authors":"Nasimeh Heydaribeni, Xinrui Zhan, Ruisi Zhang, Tina Eliassi-Rad, Farinaz Koushanfar","doi":"10.1038/s42256-024-00833-7","DOIUrl":"10.1038/s42256-024-00833-7","url":null,"abstract":"Scalable addressing of high-dimensional constrained combinatorial optimization problems is a challenge that arises in several science and engineering disciplines. Recent work introduced novel applications of graph neural networks for solving quadratic-cost combinatorial optimization problems. However, effective utilization of models such as graph neural networks to address general problems with higher-order constraints is an unresolved challenge. This paper presents a framework, HypOp, that advances the state of the art for solving combinatorial optimization problems in several aspects: (1) it generalizes the prior results to higher-order constrained problems with arbitrary cost functions by leveraging hypergraph neural networks; (2) it enables scalability to larger problems by introducing a new distributed and parallel training architecture; (3) it demonstrates generalizability across different problem formulations by transferring knowledge within the same hypergraph; (4) it substantially boosts the solution accuracy compared with the prior art by suggesting a fine-tuning step using simulated annealing; and (5) it shows remarkable progress on numerous benchmark examples, including hypergraph MaxCut, satisfiability and resource allocation problems, with notable run-time improvements using a combination of fine-tuning and distributed training techniques. We showcase the application of HypOp in scientific discovery by solving a hypergraph MaxCut problem on a National Drug Code drug-substance hypergraph. Through extensive experimentation on various optimization problems, HypOp demonstrates superiority over existing unsupervised-learning-based solvers and generic optimization methods. Bolstering the broad and deep applicability of graph neural networks, Heydaribeni et al. introduce HypOp, a framework that uses hypergraph neural networks to solve general constrained combinatorial optimization problems. The presented method scales and generalizes well, improves accuracy and outperforms existing solvers on various benchmarking examples.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"6 6","pages":"664-672"},"PeriodicalIF":18.8,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141177288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-24DOI: 10.1038/s42256-024-00850-6
Personalized LLMs built with the capacity for emulating empathy are right around the corner. The effects on individual users needs careful consideration.
具有移情能力的个性化 LLM 即将问世。需要仔细考虑对个人用户的影响。
{"title":"Empathic AI can’t get under the skin","authors":"","doi":"10.1038/s42256-024-00850-6","DOIUrl":"10.1038/s42256-024-00850-6","url":null,"abstract":"Personalized LLMs built with the capacity for emulating empathy are right around the corner. The effects on individual users needs careful consideration.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"6 5","pages":"495-495"},"PeriodicalIF":23.8,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s42256-024-00850-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141096488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-23DOI: 10.1038/s42256-024-00838-2
Milong Ren, Chungong Yu, Dongbo Bu, Haicang Zhang
Protein sequence design is critically important for protein engineering. Despite recent advancements in deep learning-based methods, achieving accurate and robust sequence design remains a challenge. Here we present CarbonDesign, an approach that draws inspiration from successful ingredients of AlphaFold and which has been developed specifically for protein sequence design. At its core, CarbonDesign introduces Inverseformer, which learns representations from backbone structures and an amortized Markov random fields model for sequence decoding. Moreover, we incorporate other essential AlphaFold concepts into CarbonDesign: an end-to-end network recycling technique to leverage evolutionary constraints from protein language models and a multitask learning technique for generating side-chain structures alongside designed sequences. CarbonDesign outperforms other methods on independent test sets including the 15th Critical Assessment of protein Structure Prediction (CASP15) dataset, the Continuous Automated Model Evaluation (CAMEO) dataset and de novo proteins from RFDiffusion. Furthermore, it supports zero-shot prediction of the functional effects of sequence variants, making it a promising tool for applications in bioengineering. Deep learning has led to great advances in predicting protein structure from sequences. Ren and colleagues present here a method for the inverse problem of finding a sequence that results in a desired protein structure, which is inspired by various components of AlphaFold combined with Markov random fields to decode sequences more efficiently.
{"title":"Accurate and robust protein sequence design with CarbonDesign","authors":"Milong Ren, Chungong Yu, Dongbo Bu, Haicang Zhang","doi":"10.1038/s42256-024-00838-2","DOIUrl":"10.1038/s42256-024-00838-2","url":null,"abstract":"Protein sequence design is critically important for protein engineering. Despite recent advancements in deep learning-based methods, achieving accurate and robust sequence design remains a challenge. Here we present CarbonDesign, an approach that draws inspiration from successful ingredients of AlphaFold and which has been developed specifically for protein sequence design. At its core, CarbonDesign introduces Inverseformer, which learns representations from backbone structures and an amortized Markov random fields model for sequence decoding. Moreover, we incorporate other essential AlphaFold concepts into CarbonDesign: an end-to-end network recycling technique to leverage evolutionary constraints from protein language models and a multitask learning technique for generating side-chain structures alongside designed sequences. CarbonDesign outperforms other methods on independent test sets including the 15th Critical Assessment of protein Structure Prediction (CASP15) dataset, the Continuous Automated Model Evaluation (CAMEO) dataset and de novo proteins from RFDiffusion. Furthermore, it supports zero-shot prediction of the functional effects of sequence variants, making it a promising tool for applications in bioengineering. Deep learning has led to great advances in predicting protein structure from sequences. Ren and colleagues present here a method for the inverse problem of finding a sequence that results in a desired protein structure, which is inspired by various components of AlphaFold combined with Markov random fields to decode sequences more efficiently.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"6 5","pages":"536-547"},"PeriodicalIF":23.8,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141085633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-20DOI: 10.1038/s42256-024-00831-9
Florian Fürrutter, Gorka Muñoz-Gil, Hans J. Briegel
Quantum computing has recently emerged as a transformative technology. Yet, its promised advantages rely on efficiently translating quantum operations into viable physical realizations. Here we use generative machine learning models, specifically denoising diffusion models (DMs), to facilitate this transformation. Leveraging text conditioning, we steer the model to produce desired quantum operations within gate-based quantum circuits. Notably, DMs allow to sidestep during training the exponential overhead inherent in the classical simulation of quantum dynamics—a consistent bottleneck in preceding machine learning techniques. We demonstrate the model’s capabilities across two tasks: entanglement generation and unitary compilation. The model excels at generating new circuits and supports typical DM extensions such as masking and editing to, for instance, align the circuit generation to the constraints of the targeted quantum device. Given their flexibility and generalization abilities, we envision DMs as pivotal in quantum circuit synthesis, both enhancing practical applications and providing insights into theoretical quantum computation. Achieving the promised advantages of quantum computing relies on translating quantum operations into physical realizations. Fürrutter and colleagues use diffusion models to create quantum circuits that are based on user specifications and tailored to experimental constraints.
{"title":"Quantum circuit synthesis with diffusion models","authors":"Florian Fürrutter, Gorka Muñoz-Gil, Hans J. Briegel","doi":"10.1038/s42256-024-00831-9","DOIUrl":"10.1038/s42256-024-00831-9","url":null,"abstract":"Quantum computing has recently emerged as a transformative technology. Yet, its promised advantages rely on efficiently translating quantum operations into viable physical realizations. Here we use generative machine learning models, specifically denoising diffusion models (DMs), to facilitate this transformation. Leveraging text conditioning, we steer the model to produce desired quantum operations within gate-based quantum circuits. Notably, DMs allow to sidestep during training the exponential overhead inherent in the classical simulation of quantum dynamics—a consistent bottleneck in preceding machine learning techniques. We demonstrate the model’s capabilities across two tasks: entanglement generation and unitary compilation. The model excels at generating new circuits and supports typical DM extensions such as masking and editing to, for instance, align the circuit generation to the constraints of the targeted quantum device. Given their flexibility and generalization abilities, we envision DMs as pivotal in quantum circuit synthesis, both enhancing practical applications and providing insights into theoretical quantum computation. Achieving the promised advantages of quantum computing relies on translating quantum operations into physical realizations. Fürrutter and colleagues use diffusion models to create quantum circuits that are based on user specifications and tailored to experimental constraints.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"6 5","pages":"515-524"},"PeriodicalIF":23.8,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141073903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-17DOI: 10.1038/s42256-024-00839-1
A. Diaw, M. McKerns, I. Sagert, L. G. Stanton, M. S. Murillo
Machine learning methods are increasingly deployed to construct surrogate models for complex physical systems at a reduced computational cost. However, the predictive capability of these surrogates degrades in the presence of noisy, sparse or dynamic data. We introduce an online learning method empowered by optimizer-driven sampling that has two advantages over current approaches: it ensures that all local extrema (including endpoints) of the model response surface are included in the training data, and it employs a continuous validation and update process in which surrogates undergo retraining when their performance falls below a validity threshold. We find, using benchmark functions, that optimizer-directed sampling generally outperforms traditional sampling methods in terms of accuracy around local extrema even when the scoring metric is biased towards assessing overall accuracy. Finally, the application to dense nuclear matter demonstrates that highly accurate surrogates for a nuclear equation-of-state model can be reliably autogenerated from expensive calculations using few model evaluations. Machine learning-based surrogate models are important to model complex systems at a reduced computational cost; however, they must often be re-evaluated and adapted for validity on future data. Diaw and colleagues propose an online training method leveraging optimizer-directed sampling to produce surrogate models that can be applied to any future data and demonstrate the approach on a dense nuclear-matter equation of state containing a phase transition.
{"title":"Efficient learning of accurate surrogates for simulations of complex systems","authors":"A. Diaw, M. McKerns, I. Sagert, L. G. Stanton, M. S. Murillo","doi":"10.1038/s42256-024-00839-1","DOIUrl":"10.1038/s42256-024-00839-1","url":null,"abstract":"Machine learning methods are increasingly deployed to construct surrogate models for complex physical systems at a reduced computational cost. However, the predictive capability of these surrogates degrades in the presence of noisy, sparse or dynamic data. We introduce an online learning method empowered by optimizer-driven sampling that has two advantages over current approaches: it ensures that all local extrema (including endpoints) of the model response surface are included in the training data, and it employs a continuous validation and update process in which surrogates undergo retraining when their performance falls below a validity threshold. We find, using benchmark functions, that optimizer-directed sampling generally outperforms traditional sampling methods in terms of accuracy around local extrema even when the scoring metric is biased towards assessing overall accuracy. Finally, the application to dense nuclear matter demonstrates that highly accurate surrogates for a nuclear equation-of-state model can be reliably autogenerated from expensive calculations using few model evaluations. Machine learning-based surrogate models are important to model complex systems at a reduced computational cost; however, they must often be re-evaluated and adapted for validity on future data. Diaw and colleagues propose an online training method leveraging optimizer-directed sampling to produce surrogate models that can be applied to any future data and demonstrate the approach on a dense nuclear-matter equation of state containing a phase transition.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"6 5","pages":"568-577"},"PeriodicalIF":23.8,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-17DOI: 10.1038/s42256-024-00842-6
Diego Marcondes, Adilson Simonis, Junior Barrera
Most research efforts in machine learning focus on performance and are detached from an explanation of the behaviour of the model. We call for going back to basics of machine learning methods, with more focus on the development of a basic understanding grounded in statistical theory.
{"title":"Back to basics to open the black box","authors":"Diego Marcondes, Adilson Simonis, Junior Barrera","doi":"10.1038/s42256-024-00842-6","DOIUrl":"10.1038/s42256-024-00842-6","url":null,"abstract":"Most research efforts in machine learning focus on performance and are detached from an explanation of the behaviour of the model. We call for going back to basics of machine learning methods, with more focus on the development of a basic understanding grounded in statistical theory.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"6 5","pages":"498-501"},"PeriodicalIF":23.8,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-15DOI: 10.1038/s42256-024-00841-7
Garriy Shteynberg, Jodi Halpern, Amir Sadovnik, Jon Garthoff, Anat Perry, Jessica Hay, Carlos Montemayor, Michael A. Olson, Tim L. Hulsey, Abrol Fairweather
{"title":"Does it matter if empathic AI has no empathy?","authors":"Garriy Shteynberg, Jodi Halpern, Amir Sadovnik, Jon Garthoff, Anat Perry, Jessica Hay, Carlos Montemayor, Michael A. Olson, Tim L. Hulsey, Abrol Fairweather","doi":"10.1038/s42256-024-00841-7","DOIUrl":"10.1038/s42256-024-00841-7","url":null,"abstract":"","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"6 5","pages":"496-497"},"PeriodicalIF":23.8,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140925029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pretrained language models have shown promise in analysing nucleotide sequences, yet a versatile model excelling across diverse tasks with a single pretrained weight set remains elusive. Here we introduce RNAErnie, an RNA-focused pretrained model built upon the transformer architecture, employing two simple yet effective strategies. First, RNAErnie enhances pretraining by incorporating RNA motifs as biological priors and introducing motif-level random masking in addition to masked language modelling at base/subsequence levels. It also tokenizes RNA types (for example, miRNA, lnRNA) as stop words, appending them to sequences during pretraining. Second, subject to out-of-distribution tasks with RNA sequences not seen during the pretraining phase, RNAErnie proposes a type-guided fine-tuning strategy that first predicts possible RNA types using an RNA sequence and then appends the predicted type to the tail of sequence to refine feature embedding in a post hoc way. Our extensive evaluation across seven datasets and five tasks demonstrates the superiority of RNAErnie in both supervised and unsupervised learning. It surpasses baselines with up to 1.8% higher accuracy in classification, 2.2% greater accuracy in interaction prediction and 3.3% improved F1 score in structure prediction, showcasing its robustness and adaptability with a unified pretrained foundation. Despite the existence of various pretrained language models for nucleotide sequence analysis, achieving good performance on a broad range of downstream tasks using a single model is challenging. Wang and colleagues develop a pretrained language model specifically optimized for RNA sequence analysis and show that it can outperform state-of-the-art methods in a diverse set of downstream tasks.
{"title":"Multi-purpose RNA language modelling with motif-aware pretraining and type-guided fine-tuning","authors":"Ning Wang, Jiang Bian, Yuchen Li, Xuhong Li, Shahid Mumtaz, Linghe Kong, Haoyi Xiong","doi":"10.1038/s42256-024-00836-4","DOIUrl":"10.1038/s42256-024-00836-4","url":null,"abstract":"Pretrained language models have shown promise in analysing nucleotide sequences, yet a versatile model excelling across diverse tasks with a single pretrained weight set remains elusive. Here we introduce RNAErnie, an RNA-focused pretrained model built upon the transformer architecture, employing two simple yet effective strategies. First, RNAErnie enhances pretraining by incorporating RNA motifs as biological priors and introducing motif-level random masking in addition to masked language modelling at base/subsequence levels. It also tokenizes RNA types (for example, miRNA, lnRNA) as stop words, appending them to sequences during pretraining. Second, subject to out-of-distribution tasks with RNA sequences not seen during the pretraining phase, RNAErnie proposes a type-guided fine-tuning strategy that first predicts possible RNA types using an RNA sequence and then appends the predicted type to the tail of sequence to refine feature embedding in a post hoc way. Our extensive evaluation across seven datasets and five tasks demonstrates the superiority of RNAErnie in both supervised and unsupervised learning. It surpasses baselines with up to 1.8% higher accuracy in classification, 2.2% greater accuracy in interaction prediction and 3.3% improved F1 score in structure prediction, showcasing its robustness and adaptability with a unified pretrained foundation. Despite the existence of various pretrained language models for nucleotide sequence analysis, achieving good performance on a broad range of downstream tasks using a single model is challenging. Wang and colleagues develop a pretrained language model specifically optimized for RNA sequence analysis and show that it can outperform state-of-the-art methods in a diverse set of downstream tasks.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"6 5","pages":"548-557"},"PeriodicalIF":23.8,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s42256-024-00836-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140919490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}