Pub Date : 2024-06-14DOI: 10.1038/s43588-024-00642-3
Mai Ha Vu, Philippe A. Robert, Rahmad Akbar, Bartlomiej Swiatczak, Geir Kjetil Sandve, Dag Trygve Truslew Haug, Victor Greiff
Apparent parallels between natural language and antibody sequences have led to a surge in deep language models applied to antibody sequences for predicting cognate antigen recognition. However, a linguistic formal definition of antibody language does not exist, and insight into how antibody language models capture antibody-specific binding features remains largely uninterpretable. Here we describe how a linguistic formalization of the antibody language, by characterizing its tokens and grammar, could address current challenges in antibody language model rule mining. The parallels between natural language and antibody sequences could serve as a stepping stone to using deep language models for analyzing antibody sequences. This Perspective discusses how issues in antibody language model rule mining could be addressed by linguistically formalizing the antibody language.
{"title":"Linguistics-based formalization of the antibody language as a basis for antibody language models","authors":"Mai Ha Vu, Philippe A. Robert, Rahmad Akbar, Bartlomiej Swiatczak, Geir Kjetil Sandve, Dag Trygve Truslew Haug, Victor Greiff","doi":"10.1038/s43588-024-00642-3","DOIUrl":"10.1038/s43588-024-00642-3","url":null,"abstract":"Apparent parallels between natural language and antibody sequences have led to a surge in deep language models applied to antibody sequences for predicting cognate antigen recognition. However, a linguistic formal definition of antibody language does not exist, and insight into how antibody language models capture antibody-specific binding features remains largely uninterpretable. Here we describe how a linguistic formalization of the antibody language, by characterizing its tokens and grammar, could address current challenges in antibody language model rule mining. The parallels between natural language and antibody sequences could serve as a stepping stone to using deep language models for analyzing antibody sequences. This Perspective discusses how issues in antibody language model rule mining could be addressed by linguistically formalizing the antibody language.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":12.0,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141322110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-12DOI: 10.1038/s43588-024-00636-1
Chiheb Ben Mahmoud, John L. A. Gardner, Volker L. Deringer
As machine learning models are becoming mainstream tools for molecular and materials research, there is an urgent need to improve the nature, quality, and accessibility of atomistic data. In turn, there are opportunities for a new generation of generally applicable datasets and distillable models.
{"title":"Data as the next challenge in atomistic machine learning","authors":"Chiheb Ben Mahmoud, John L. A. Gardner, Volker L. Deringer","doi":"10.1038/s43588-024-00636-1","DOIUrl":"10.1038/s43588-024-00636-1","url":null,"abstract":"As machine learning models are becoming mainstream tools for molecular and materials research, there is an urgent need to improve the nature, quality, and accessibility of atomistic data. In turn, there are opportunities for a new generation of generally applicable datasets and distillable models.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":12.0,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141312454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-07DOI: 10.1038/s43588-024-00646-z
Gokul Gowri, Kuanwei Sheng, Peng Yin
Orthogonal DNA barcode library design is an essential task in bioengineering. Here we present seqwalk, an efficient method for designing barcode libraries that satisfy a sequence symmetry minimization (SSM) heuristic for orthogonality, with theoretical guarantees of maximal or near-maximal library size under certain design constraints. Seqwalk encodes SSM constraints in a de Bruijn graph representation of sequence space, enabling the application of recent advances in discrete mathematics1 to the problem of orthogonal sequence design. We demonstrate the scalability of seqwalk by designing a library of >106 SSM-satisfying barcode sequences in less than 20 s on a standard laptop. Seqwalk is a scalable method for designing orthogonal DNA barcode libraries, producing one million barcodes in 20 s on a standard laptop.
正交 DNA 条形码文库设计是生物工程中的一项重要任务。在此,我们介绍一种高效的条形码文库设计方法 Seqwalk,该方法满足序列对称性最小化(SSM)启发式正交性要求,理论上保证了在特定设计约束条件下最大或接近最大的文库规模。Seqwalk 将 SSM 约束条件编码为序列空间的 de Bruijn 图表示法,从而将离散数学1 的最新进展应用于正交序列设计问题。我们在一台标准笔记本电脑上用不到 20 秒的时间就设计出了大于 106 个满足 SSM 的条形码序列库,证明了 seqwalk 的可扩展性。
{"title":"Scalable design of orthogonal DNA barcode libraries","authors":"Gokul Gowri, Kuanwei Sheng, Peng Yin","doi":"10.1038/s43588-024-00646-z","DOIUrl":"10.1038/s43588-024-00646-z","url":null,"abstract":"Orthogonal DNA barcode library design is an essential task in bioengineering. Here we present seqwalk, an efficient method for designing barcode libraries that satisfy a sequence symmetry minimization (SSM) heuristic for orthogonality, with theoretical guarantees of maximal or near-maximal library size under certain design constraints. Seqwalk encodes SSM constraints in a de Bruijn graph representation of sequence space, enabling the application of recent advances in discrete mathematics1 to the problem of orthogonal sequence design. We demonstrate the scalability of seqwalk by designing a library of >106 SSM-satisfying barcode sequences in less than 20 s on a standard laptop. Seqwalk is a scalable method for designing orthogonal DNA barcode libraries, producing one million barcodes in 20 s on a standard laptop.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":12.0,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11208133/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141289021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-07DOI: 10.1038/s43588-024-00647-y
Yifan Yang, Fan Xu
Morphing soft matter, which is capable of changing its shape and function in response to stimuli, has wide-ranging applications in robotics, medicine and biology. Recently, computational models have accelerated its development. Here, we highlight advances and challenges in developing computational techniques, and explore the potential applications enabled by such models.
{"title":"Computational morphology and morphogenesis for empowering soft-matter engineering","authors":"Yifan Yang, Fan Xu","doi":"10.1038/s43588-024-00647-y","DOIUrl":"10.1038/s43588-024-00647-y","url":null,"abstract":"Morphing soft matter, which is capable of changing its shape and function in response to stimuli, has wide-ranging applications in robotics, medicine and biology. Recently, computational models have accelerated its development. Here, we highlight advances and challenges in developing computational techniques, and explore the potential applications enabled by such models.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":12.0,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141289020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-05DOI: 10.1038/s43588-024-00645-0
Peilin Kang, Enrico Trizio, Michele Parrinello
The study of the kinetic bottlenecks that hinder the rare transitions between long-lived metastable states is a major challenge in atomistic simulations. Here we propose a method to explore the transition state ensemble, which is the distribution of configurations that the system passes through as it translocates from one metastable basin to another. We base our method on the committor function and the variational principle that it obeys. We find its minimum through a self-consistent procedure that starts from information limited to the initial and final states. Right from the start, our procedure allows the sampling of very many transition state configurations. With the help of the variational principle, we perform a detailed analysis of the transition state ensemble, ranking quantitatively the degrees of freedom mostly involved in the transition and enabling a systematic approach for the interpretation of simulation results and the construction of efficient physics-informed collective variables. A self-consistent iterative procedure is proposed to compute the committor function for rare events, via a variational principle, and extensively sample the transition state ensemble, allowing for the identification of the relevant variables in the process.
{"title":"Computing the committor with the committor to study the transition state ensemble","authors":"Peilin Kang, Enrico Trizio, Michele Parrinello","doi":"10.1038/s43588-024-00645-0","DOIUrl":"10.1038/s43588-024-00645-0","url":null,"abstract":"The study of the kinetic bottlenecks that hinder the rare transitions between long-lived metastable states is a major challenge in atomistic simulations. Here we propose a method to explore the transition state ensemble, which is the distribution of configurations that the system passes through as it translocates from one metastable basin to another. We base our method on the committor function and the variational principle that it obeys. We find its minimum through a self-consistent procedure that starts from information limited to the initial and final states. Right from the start, our procedure allows the sampling of very many transition state configurations. With the help of the variational principle, we perform a detailed analysis of the transition state ensemble, ranking quantitatively the degrees of freedom mostly involved in the transition and enabling a systematic approach for the interpretation of simulation results and the construction of efficient physics-informed collective variables. A self-consistent iterative procedure is proposed to compute the committor function for rare events, via a variational principle, and extensively sample the transition state ensemble, allowing for the identification of the relevant variables in the process.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":12.0,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141254943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-05DOI: 10.1038/s43588-024-00652-1
Data about the transition states of rare transitions between long-lived states are needed to simulate physical and chemical processes; however, existing computational approaches often gather little information about these states. A machine-learning technique resolves this challenge by exploiting the century-old theory of committor functions.
{"title":"Systematic simulations and analysis of transition states using committor functions","authors":"","doi":"10.1038/s43588-024-00652-1","DOIUrl":"10.1038/s43588-024-00652-1","url":null,"abstract":"Data about the transition states of rare transitions between long-lived states are needed to simulate physical and chemical processes; however, existing computational approaches often gather little information about these states. A machine-learning technique resolves this challenge by exploiting the century-old theory of committor functions.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":12.0,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141263493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-03DOI: 10.1038/s43588-024-00649-w
Ananya Rastogi
Dr Kelly Ruggles, associate professor at New York University Langone Health, discusses with Nature Computational Science how she uses computational approaches to gain insights into cancer, inflammation and cardiovascular disease, as well as the importance of mentorship.
{"title":"Integrating computational and experimental worlds","authors":"Ananya Rastogi","doi":"10.1038/s43588-024-00649-w","DOIUrl":"10.1038/s43588-024-00649-w","url":null,"abstract":"Dr Kelly Ruggles, associate professor at New York University Langone Health, discusses with Nature Computational Science how she uses computational approaches to gain insights into cancer, inflammation and cardiovascular disease, as well as the importance of mentorship.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":12.0,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s43588-024-00649-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141238934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-29DOI: 10.1038/s43588-024-00641-4
We recognize the importance of preprint posting in communicating research findings and encourage our authors to make use of this service.
我们认识到预印本发布在交流研究成果方面的重要性,并鼓励我们的作者利用这项服务。
{"title":"Accelerating scientific progress with preprints","authors":"","doi":"10.1038/s43588-024-00641-4","DOIUrl":"10.1038/s43588-024-00641-4","url":null,"abstract":"We recognize the importance of preprint posting in communicating research findings and encourage our authors to make use of this service.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s43588-024-00641-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141176767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-24DOI: 10.1038/s43588-024-00633-4
Martijn Meeter
A two-stage learning algorithm is proposed to directly uncover the symbolic representation of rules for skill acquisition from large-scale training log data.
本文提出了一种两阶段学习算法,可直接从大规模训练日志数据中挖掘出技能习得规则的符号表示。
{"title":"Outsourcing eureka moments to artificial intelligence","authors":"Martijn Meeter","doi":"10.1038/s43588-024-00633-4","DOIUrl":"10.1038/s43588-024-00633-4","url":null,"abstract":"A two-stage learning algorithm is proposed to directly uncover the symbolic representation of rules for skill acquisition from large-scale training log data.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141099980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}